Re: [google-appengine] Re: app-engine-data-pipelines session video

Jaroslav Záruba Mon, 07 Jun 2010 14:49:49 -0700

Thanks a lot!

On Mon, Jun 7, 2010 at 11:44 PM, Brett Slatkin
<[email protected]>wrote:


> I was using an integer hash to reduce the key size. You don't need to hash
> the whole thing. Bigtable will split tablets based on a string prefix, so
> all that matters is the data distribution beyond that prefix. So
> "foo-<hash>" is just as effective as "<hash of foo + number>", or even
> better since it's shorter.
>
> 2010/6/7 Jaroslav Záruba <[email protected]>
>
>> Thank you, Brett.
>>
>> Would it be wrong to hash whole work_index instead of only hashing its
>> second half? sum_name, knuth_hash(index)
>> By md5-ing only the sequence number I get work_index of 'mySumName' + 32B.
>> If I hashed mySumName together with the seq.number the key would be only
>> 32B. (Still quite huge though.)
>> Given how frequent a vote entity is I would like to have the keys as short
>> as possible.
>>
>> Regards
>>   J. Záruba
>>
>> On Mon, Jun 7, 2010 at 11:21 PM, Brett Slatkin <
>> [email protected]> wrote:
>>
>>> Hey all,
>>>
>>> The int(time.time()/30) part of the task name is to prevent queue stalls.
>>> When memcache gets evicted the work index counter will be reset to zero.
>>> That means new fork-join work items may insert tasks that are named the same
>>> as tasks that were already inserted. By including a time window of ~30
>>> seconds in the task name, we ensure that this problem can only last for
>>> about thirty seconds. This is also why you should raise an exception when
>>> you see a TombstonedTaskError exception.
>>>
>>> Worst-case scenario if the clocks are wonky is that two tasks are run to
>>> do the fan-in work instead of just one, which is an acceptable trade-off in
>>> many cases and a fundamental possibility when using the task queue API. This
>>> can be mitigated using pigeon-hole acknowledgment entities, like I use in my
>>> materialized view example.
>>>
>>> Hope that helps,
>>>
>>> -Brett
>>>
>>>
>>>
>>> On Mon, Jun 7, 2010 at 2:14 PM, Tristan <[email protected]>wrote:
>>>
>>>> not a python guy but, the purpose of int (now / 30) will be to come up
>>>> with the same name for a span of time (30 milliseconds?).
>>>>
>>>> notice that   int(1/30) = 0   int (3/30) = 0   int (29/30) = 0   and
>>>> int(32/30) = 1.  this is a way to come up with that task name
>>>> uniquely.
>>>>
>>>> although now i'm confused because doesn't he say later on that time is
>>>> a bad thing to use for synchronization and sequence numbers should be
>>>> used instead?
>>>>
>>>> On Jun 7, 2:40 am, Jaroslav Záruba <[email protected]> wrote:
>>>> > Also if someone knew what is the purpose of "now / 30" in the task
>>>> name,
>>>> > please:http://www.youtube.com/watch?v=zSDC_TU7rtc#t=41m35
>>>> >
>>>> > Regards
>>>> >   J. Záruba
>>>> >
>>>> > 2010/6/7 Jaroslav Záruba <[email protected]>
>>>> >
>>>> >
>>>> >
>>>> > > Hello
>>>> >
>>>> > > I'm reading through the PDF that Brett Slatkin has published for
>>>> %subj
>>>> > > %.
>>>> > >http://tinyurl.com/3523mej
>>>> >
>>>> > > In the video (the Fan-in part) Brett says that the work_index has to
>>>> > > be a hash, so that 'you distribute the load across the BigTable'
>>>> > >http://www.youtube.com/watch?v=zSDC_TU7rtc#t=48m44
>>>> >
>>>> > > And this is how work_index is created:
>>>> > > work_index = '%s-%d' % (sum_name, knuth_hash(index))
>>>> > > ...which I guess creates something like
>>>> 'votesMovieXYZ-54657651321987'
>>>> >
>>>> > > My question is why only one half of work_index is hashed? Is it
>>>> > > important?
>>>> > > Would it be bad to do md5('%s-%d' % (sum_name, index)) so that the
>>>> > > hash would be like '6gw8....hq6'?
>>>> >
>>>> > > Regards
>>>> > >  J. Záruba
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Google App Engine" group.
>>>> To post to this group, send email to [email protected].
>>>> To unsubscribe from this group, send email to
>>>> [email protected]<google-appengine%[email protected]>
>>>> .
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/google-appengine?hl=en.
>>>>
>>>>
>>>  --
>>> You received this message because you are subscribed to the Google Groups
>>> "Google App Engine" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to
>>> [email protected]<google-appengine%[email protected]>
>>> .
>>> For more options, visit this group at
>>> http://groups.google.com/group/google-appengine?hl=en.
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "Google App Engine" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected]<google-appengine%[email protected]>
>> .
>> For more options, visit this group at
>> http://groups.google.com/group/google-appengine?hl=en.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected]<google-appengine%[email protected]>
> .
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Re: app-engine-data-pipelines session video

Reply via email to