[google-appengine] Re: Expando and Index partitioning

Eli Wed, 04 Nov 2009 10:48:49 -0800

Stephen,

My thinking on this is..

Say you have a Month Column.. and it has the default lexically sorted
asc,dsc indexes on it.

My assumption is, when you insert a new row with a Month value
defined, the entire index will have to be updated (no matter how many
shards or tablets it's broken up into).

This is supported by this quote from the Index Building document you
linked to:

"When an entity is created or deleted, all of the index rows for that
entity must be updated."

So.. when you roll around to the June 2010 time period and start
inserting rows for that.. if you have a generic Month column, all the
indexes for every month for every year will have to be updated.

Now, if you had them partitioned out in the way I described,
presumably, you would just need to rebuild the indexes for entries
with a June column.

Granted this isn't most optimal, why not just go all out and do
"June2009", "June2010" columns (and still have the "y2009","y2010"
columns too for quickly grabbing yearly data).. that way.. once the
month of the year is past, those indexes would never need to be
rebuilt again.. yet, since you were using the Expando model, you
wouldn't need seperately defined Models for each MonthYear combo.

Mainly, I see this method as a way of helping BigTable out in
understanding how to partition out my data...

Does this make sense?

I think your ListProperty idea sounds efficient to implement, but I
think it would run into that index updating issue once you got into
the 100s of millions and billions of rows.. every new insert would
require the dimensions and date columns to be rebuilt.  Now, these are
my assumptions.. which are fraught with peril... so, I'm trying to
post it here to see if anyone else out there is of a mind to think it
through with me.

Thanks for any input.

On Nov 4, 9:10 am, Stephen <[email protected]> wrote:
> On Nov 3, 11:35 pm, Eli <[email protected]> wrote:
>
>
>
> > (This is just the first usage example that comes to mind.  This row
> > naming method could be used for all sorts of set intersection stuff,
> > and would cut down on insert times due to the fact that it should
> > partition out the indexes when dealing with humongous datasets).
>
> I don't think what your proposing is a physical optimisation because
> indexes are not discrete objects as they are in a traditional
> relational database:
>
> http://code.google.com/appengine/articles/index_building.html#Index%2...
>
> Forgetting about physical optimisation and thinking only about
> querying slices of the data, how about something like this:
>
> class Stats(db.Model):
>     number = db.IntegerProperty()
>     date = db.DateTimeProperty()
>     dimensions = db.StringListProperty()
>
> e = Stats(number=42,
>     date=datetime.datetime('1605-11-05'),
>     dimensions=['1600', 'November', 'Q4', 'd5', 'Saturday'])
>
> Then you could query by date, as in your example, by simply querying
> against the date property. But you could also query for all numbers
> for any Saturday in the 4th quarter of any year:
>
> q4Saturdays = Stats.all().filter('dimensions =', 'Q4').filter
> ('dimensions =', 'Saturday')
>
> > Anyway, this was just a side thought I had while wondering what the
> > point of Expando was.. since it's so unstructured.. I couldn't imagine
> > why someone would want such an undependable datasource..
>
> http://steve-yegge.blogspot.com/2008/10/universal-design-pattern.html
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

[google-appengine] Re: Expando and Index partitioning

Reply via email to