[google-appengine] Re: Optimizing BigTable queries for a timestamped class

Kevin Pierce Tue, 06 Oct 2009 10:08:43 -0700

Brett Slatkin gave a fantastic talk on this type problem/solution.

http://code.google.com/events/io/sessions/BuildingScalableComplexApps.html


He suggests using a ListProperty and exploding the date into it, then a
range can be selected by using set membership.

Watching this video it is worth every minute.


On Tue, Oct 6, 2009 at 10:56 AM, Erem <[email protected]> wrote:

>
> Anyone? Is manipulating key structure to order them properly in
> BigTable a good idea? How to structure my queries to do prefix/range
> scans on those keys?
> Thanks!
>
> Erem
>
> On Oct 4, 12:57 pm, Erem <[email protected]> wrote:
> > 3 questions about optimizing BigTable datastore queries on a
> > timestamped class. I'm using JDO on the Java runtime.
> >
> > SOME CONTEXT
> > My class, Event, requires speedy scans on time windows. An Event's
> > timestamp is immutable.
> >
> > Given that (1) BigTable physically stores rows in lexicographic order
> > by rowkey, and (2) what I specify as PrimaryKey directly maps to
> > BigTable rowkey (after appending to appID, etc), I will set up my
> > PrimaryKeys to be lexicographically ordered in time followed by an
> > entity-unique ID.
> >
> > Some example keys, where entity is identified by its UTCDate+millis
> > +unique ID.
> > UTCDATE  ::TIME           ::UNIQUE ID
> > 2000-12-01::13:15:00.000::1 //Dec 01, 2000, at 1:15PM
> > 2001-12-01::13:15:00.000::2 //Dec 01, 2001, same time
> > 2002-12-01::13:15:00.000::3 //Dec 01, 2002, same time
> >
> > This key structure allows me to do useful, dense prefix- and range-
> > scans directly on the entities table. For example
> > (not syntactic GQL. See Question 2.)
> > WHERE key MATCHES '2000*'       //all events from the year 2000
> > WHERE key MATCHES '2000-12*'    //all from month of December, 2000
> > WHERE key MATCHES '2000-12-01*' //all from Dec 01, 2000
> > WHERE key > '2000-12-01'    //between Dec 1 and Christmas, 2000
> >       && key < '2000-12-25'
> >
> > 3 QUESTIONS IN ORDER OF PRAGMATIC TO OBSCURE
> > (1) Am I actually optimizing anything by doing this, or am I wasting
> > time? Should I expect this to be super-fast because it's performing a
> > dense read on 1 or 2 BigTable Tablets?
> >
> > (2) I need to use range- and prefix- scan against the key in order to
> > take advantage of these optimizations. How do I use them in the Java
> > API? Is range-scan as simple as "WHERE key > :bottomRange and key
> > < :topRange"? What about prefix-scan?
> >
> > (3) BigTable guarantees that entities are stored in order by RowKey.
> > Its specifications also say that SSTables, once written, are
> > immutable. In this case, what happens when the following sequence
> > happens:
> >     (i) an SSTable is written that contains rowkeys = 1,2,3,5;
> >     (ii) I commit an entity with rowkey 4.
> > Does the SSTable get deleted and recreated? Is guarantee 1 broken and
> > my entity(key=4) written to a different SSTable? The answer to this
> > effects how optimized this approach really is.
> >
> > thanks in advance for the advice you datastore ninjas you.
> >
>


-- 
Kevin Pierce
Software Architect
VendAsta Technologies Inc.
[email protected]
(306)955.5512 ext 103
www.vendasta.com

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

[google-appengine] Re: Optimizing BigTable queries for a timestamped class

Reply via email to