Stephen, My thinking on this is..
Say you have a Month Column.. and it has the default lexically sorted asc,dsc indexes on it. My assumption is, when you insert a new row with a Month value defined, the entire index will have to be updated (no matter how many shards or tablets it's broken up into). This is supported by this quote from the Index Building document you linked to: "When an entity is created or deleted, all of the index rows for that entity must be updated." So.. when you roll around to the June 2010 time period and start inserting rows for that.. if you have a generic Month column, all the indexes for every month for every year will have to be updated. Now, if you had them partitioned out in the way I described, presumably, you would just need to rebuild the indexes for entries with a June column. Granted this isn't most optimal, why not just go all out and do "June2009", "June2010" columns (and still have the "y2009","y2010" columns too for quickly grabbing yearly data).. that way.. once the month of the year is past, those indexes would never need to be rebuilt again.. yet, since you were using the Expando model, you wouldn't need seperately defined Models for each MonthYear combo. Mainly, I see this method as a way of helping BigTable out in understanding how to partition out my data... Does this make sense? I think your ListProperty idea sounds efficient to implement, but I think it would run into that index updating issue once you got into the 100s of millions and billions of rows.. every new insert would require the dimensions and date columns to be rebuilt. Now, these are my assumptions.. which are fraught with peril... so, I'm trying to post it here to see if anyone else out there is of a mind to think it through with me. Thanks for any input. On Nov 4, 9:10 am, Stephen <[email protected]> wrote: > On Nov 3, 11:35 pm, Eli <[email protected]> wrote: > > > > > (This is just the first usage example that comes to mind. This row > > naming method could be used for all sorts of set intersection stuff, > > and would cut down on insert times due to the fact that it should > > partition out the indexes when dealing with humongous datasets). > > I don't think what your proposing is a physical optimisation because > indexes are not discrete objects as they are in a traditional > relational database: > > http://code.google.com/appengine/articles/index_building.html#Index%2... > > Forgetting about physical optimisation and thinking only about > querying slices of the data, how about something like this: > > class Stats(db.Model): > number = db.IntegerProperty() > date = db.DateTimeProperty() > dimensions = db.StringListProperty() > > e = Stats(number=42, > date=datetime.datetime('1605-11-05'), > dimensions=['1600', 'November', 'Q4', 'd5', 'Saturday']) > > Then you could query by date, as in your example, by simply querying > against the date property. But you could also query for all numbers > for any Saturday in the 4th quarter of any year: > > q4Saturdays = Stats.all().filter('dimensions =', 'Q4').filter > ('dimensions =', 'Saturday') > > > Anyway, this was just a side thought I had while wondering what the > > point of Expando was.. since it's so unstructured.. I couldn't imagine > > why someone would want such an undependable datasource.. > > http://steve-yegge.blogspot.com/2008/10/universal-design-pattern.html --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~----------~----~----~----~------~----~------~--~---
