Hi Jason, It's possible to do that right now. You simply need to define your own InputReader - see the existing ones for examples how: http://code.google.com/p/appengine-mapreduce/source/browse/trunk/python/src/mapreduce/input_readers.py
The best option is probably to extend the existing DatastoreInputReader, and override the split_input method. -Nick Johnson On Wed, Jun 16, 2010 at 4:17 PM, Jason C <[email protected]> wrote: > Nick, > > It would be a great feature to allow us (developers) to define a > custom partitioning function for the shards - unless you guys have > some better magic in mind. > > It is very normal for us to see very poor distribution across our > shards in practise - in fact, we've only seen poor distributions. > > j > > On Jun 15, 8:04 am, "Nick Johnson (Google)" <[email protected]> > wrote: > > Hi Jason, > > > > The current implementation of the datastore mapper uses lexicographical > > sharding over keys to assign datastore shards. Unfortunately, this can > lead > > to very inconsistent shard sizes, as you observe. > > > > -Nick Johnson > > > > > > > > > > > > On Fri, Jun 11, 2010 at 4:17 PM, Jason C <[email protected]> > wrote: > > > We've been using MapReduce for App Engine for a couple of different > > > jobs. > > > > > Typically, we use 8 shards (the default), but it seems that only 3, > > > sometime 4, of the shards have any items in them? E.g., we're > > > currently running one job and three of the shards have >218,000 items > > > processed, but the other 5 shards appear to have zero. > > > > > I can understand that a particular key distribution would have > > > different amounts in each shard, but with so many at zero, I suspect > > > there is something else happening? > > > > > BTW, we have applied the mapreduce-recommended __key__ DESC index, but > > > we still see this strange shard distribution. > > > > > Is anyone else seeing this? > > > > > j > > > > > -- > > > You received this message because you are subscribed to the Google > Groups > > > "Google App Engine" group. > > > To post to this group, send email to [email protected] > . > > > To unsubscribe from this group, send email to > > > [email protected]<google-appengine%[email protected]><google-appengine%2Bunsubscrib > [email protected]> > > > . > > > For more options, visit this group at > > >http://groups.google.com/group/google-appengine?hl=en. > > > > -- > > Nick Johnson, Developer Programs Engineer, App Engine Google Ireland Ltd. > :: > > Registered in Dublin, Ireland, Registration Number: 368047 > > Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration > Number: > > 368047 > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]<google-appengine%[email protected]> > . > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > > -- Nick Johnson, Developer Programs Engineer, App Engine Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: 368047 Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: 368047 -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
