Brett, any plans to make an article of this talk? This feels like such a key strategy to getting stuff done on datastore, it should be part of the sdk.
On Jun 7, 11:44 pm, Brett Slatkin <[email protected]> wrote: > I was using an integer hash to reduce the key size. You don't need to hash > the whole thing. Bigtable will split tablets based on a string prefix, so > all that matters is the data distribution beyond that prefix. So > "foo-<hash>" is just as effective as "<hash of foo + number>", or even > better since it's shorter. > > 2010/6/7 Jaroslav Záruba <[email protected]> > > > > > Thank you, Brett. > > > Would it be wrong to hash whole work_index instead of only hashing its > > second half? sum_name, knuth_hash(index) > > By md5-ing only the sequence number I get work_index of 'mySumName' + 32B. > > If I hashed mySumName together with the seq.number the key would be only > > 32B. (Still quite huge though.) > > Given how frequent a vote entity is I would like to have the keys as short > > as possible. > > > Regards > > J. Záruba > > > On Mon, Jun 7, 2010 at 11:21 PM, Brett Slatkin <[email protected] > > > wrote: > > >> Hey all, > > >> The int(time.time()/30) part of the task name is to prevent queue stalls. > >> When memcache gets evicted the work index counter will be reset to zero. > >> That means new fork-join work items may insert tasks that are named the > >> same > >> as tasks that were already inserted. By including a time window of ~30 > >> seconds in the task name, we ensure that this problem can only last for > >> about thirty seconds. This is also why you should raise an exception when > >> you see a TombstonedTaskError exception. > > >> Worst-case scenario if the clocks are wonky is that two tasks are run to > >> do the fan-in work instead of just one, which is an acceptable trade-off in > >> many cases and a fundamental possibility when using the task queue API. > >> This > >> can be mitigated using pigeon-hole acknowledgment entities, like I use in > >> my > >> materialized view example. > > >> Hope that helps, > > >> -Brett > > >> On Mon, Jun 7, 2010 at 2:14 PM, Tristan <[email protected]>wrote: > > >>> not a python guy but, the purpose of int (now / 30) will be to come up > >>> with the same name for a span of time (30 milliseconds?). > > >>> notice that int(1/30) = 0 int (3/30) = 0 int (29/30) = 0 and > >>> int(32/30) = 1. this is a way to come up with that task name > >>> uniquely. > > >>> although now i'm confused because doesn't he say later on that time is > >>> a bad thing to use for synchronization and sequence numbers should be > >>> used instead? > > >>> On Jun 7, 2:40 am, Jaroslav Záruba <[email protected]> wrote: > >>> > Also if someone knew what is the purpose of "now / 30" in the task > >>> name, > >>> > please:http://www.youtube.com/watch?v=zSDC_TU7rtc#t=41m35 > > >>> > Regards > >>> > J. Záruba > > >>> > 2010/6/7 Jaroslav Záruba <[email protected]> > > >>> > > Hello > > >>> > > I'm reading through the PDF that Brett Slatkin has published for > >>> %subj > >>> > > %. > >>> > >http://tinyurl.com/3523mej > > >>> > > In the video (the Fan-in part) Brett says that the work_index has to > >>> > > be a hash, so that 'you distribute the load across the BigTable' > >>> > >http://www.youtube.com/watch?v=zSDC_TU7rtc#t=48m44 > > >>> > > And this is how work_index is created: > >>> > > work_index = '%s-%d' % (sum_name, knuth_hash(index)) > >>> > > ...which I guess creates something like > >>> 'votesMovieXYZ-54657651321987' > > >>> > > My question is why only one half of work_index is hashed? Is it > >>> > > important? > >>> > > Would it be bad to do md5('%s-%d' % (sum_name, index)) so that the > >>> > > hash would be like '6gw8....hq6'? > > >>> > > Regards > >>> > > J. Záruba > > >>> -- > >>> You received this message because you are subscribed to the Google Groups > >>> "Google App Engine" group. > >>> To post to this group, send email to [email protected]. > >>> To unsubscribe from this group, send email to > >>> [email protected]<google-appengine%2Bunsubscrib > >>> [email protected]> > >>> . > >>> For more options, visit this group at > >>>http://groups.google.com/group/google-appengine?hl=en. > > >> -- > >> You received this message because you are subscribed to the Google Groups > >> "Google App Engine" group. > >> To post to this group, send email to [email protected]. > >> To unsubscribe from this group, send email to > >> [email protected]<google-appengine%2Bunsubscrib > >> [email protected]> > >> . > >> For more options, visit this group at > >>http://groups.google.com/group/google-appengine?hl=en. > > > -- > > You received this message because you are subscribed to the Google Groups > > "Google App Engine" group. > > To post to this group, send email to [email protected]. > > To unsubscribe from this group, send email to > > [email protected]<google-appengine%2Bunsubscrib > > [email protected]> > > . > > For more options, visit this group at > >http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
