Thanks a lot! On Mon, Jun 7, 2010 at 11:44 PM, Brett Slatkin <[email protected]>wrote:
> I was using an integer hash to reduce the key size. You don't need to hash > the whole thing. Bigtable will split tablets based on a string prefix, so > all that matters is the data distribution beyond that prefix. So > "foo-<hash>" is just as effective as "<hash of foo + number>", or even > better since it's shorter. > > 2010/6/7 Jaroslav Záruba <[email protected]> > >> Thank you, Brett. >> >> Would it be wrong to hash whole work_index instead of only hashing its >> second half? sum_name, knuth_hash(index) >> By md5-ing only the sequence number I get work_index of 'mySumName' + 32B. >> If I hashed mySumName together with the seq.number the key would be only >> 32B. (Still quite huge though.) >> Given how frequent a vote entity is I would like to have the keys as short >> as possible. >> >> Regards >> J. Záruba >> >> On Mon, Jun 7, 2010 at 11:21 PM, Brett Slatkin < >> [email protected]> wrote: >> >>> Hey all, >>> >>> The int(time.time()/30) part of the task name is to prevent queue stalls. >>> When memcache gets evicted the work index counter will be reset to zero. >>> That means new fork-join work items may insert tasks that are named the same >>> as tasks that were already inserted. By including a time window of ~30 >>> seconds in the task name, we ensure that this problem can only last for >>> about thirty seconds. This is also why you should raise an exception when >>> you see a TombstonedTaskError exception. >>> >>> Worst-case scenario if the clocks are wonky is that two tasks are run to >>> do the fan-in work instead of just one, which is an acceptable trade-off in >>> many cases and a fundamental possibility when using the task queue API. This >>> can be mitigated using pigeon-hole acknowledgment entities, like I use in my >>> materialized view example. >>> >>> Hope that helps, >>> >>> -Brett >>> >>> >>> >>> On Mon, Jun 7, 2010 at 2:14 PM, Tristan <[email protected]>wrote: >>> >>>> not a python guy but, the purpose of int (now / 30) will be to come up >>>> with the same name for a span of time (30 milliseconds?). >>>> >>>> notice that int(1/30) = 0 int (3/30) = 0 int (29/30) = 0 and >>>> int(32/30) = 1. this is a way to come up with that task name >>>> uniquely. >>>> >>>> although now i'm confused because doesn't he say later on that time is >>>> a bad thing to use for synchronization and sequence numbers should be >>>> used instead? >>>> >>>> On Jun 7, 2:40 am, Jaroslav Záruba <[email protected]> wrote: >>>> > Also if someone knew what is the purpose of "now / 30" in the task >>>> name, >>>> > please:http://www.youtube.com/watch?v=zSDC_TU7rtc#t=41m35 >>>> > >>>> > Regards >>>> > J. Záruba >>>> > >>>> > 2010/6/7 Jaroslav Záruba <[email protected]> >>>> > >>>> > >>>> > >>>> > > Hello >>>> > >>>> > > I'm reading through the PDF that Brett Slatkin has published for >>>> %subj >>>> > > %. >>>> > >http://tinyurl.com/3523mej >>>> > >>>> > > In the video (the Fan-in part) Brett says that the work_index has to >>>> > > be a hash, so that 'you distribute the load across the BigTable' >>>> > >http://www.youtube.com/watch?v=zSDC_TU7rtc#t=48m44 >>>> > >>>> > > And this is how work_index is created: >>>> > > work_index = '%s-%d' % (sum_name, knuth_hash(index)) >>>> > > ...which I guess creates something like >>>> 'votesMovieXYZ-54657651321987' >>>> > >>>> > > My question is why only one half of work_index is hashed? Is it >>>> > > important? >>>> > > Would it be bad to do md5('%s-%d' % (sum_name, index)) so that the >>>> > > hash would be like '6gw8....hq6'? >>>> > >>>> > > Regards >>>> > > J. Záruba >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "Google App Engine" group. >>>> To post to this group, send email to [email protected]. >>>> To unsubscribe from this group, send email to >>>> [email protected]<google-appengine%[email protected]> >>>> . >>>> For more options, visit this group at >>>> http://groups.google.com/group/google-appengine?hl=en. >>>> >>>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "Google App Engine" group. >>> To post to this group, send email to [email protected]. >>> To unsubscribe from this group, send email to >>> [email protected]<google-appengine%[email protected]> >>> . >>> For more options, visit this group at >>> http://groups.google.com/group/google-appengine?hl=en. >>> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Google App Engine" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]<google-appengine%[email protected]> >> . >> For more options, visit this group at >> http://groups.google.com/group/google-appengine?hl=en. >> > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]<google-appengine%[email protected]> > . > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
