[google-appengine] Re: LIFO key name strategy?

Joseph Letness Wed, 16 Feb 2011 08:03:22 -0800

Hi Nick, my query uses "zig-zag merge-join" with an arbitrary length
of equality filters on a StringListProperty to get result sets, so I
can't use a composite index (unless the exploding index problem has
been solved or I've fundamentally misunderstood exploding index).


Also Nick, maybe you could let me know if this is correct:

It occurred to me that using lexically descending keys would be an
optimization for certain use cases like mine.  My users are primarily
only interested in recently added items.  Since I can predict that the
majority of my queries are going to be for recently added items, it
makes sense to position them, lexicographically, so that table scans
will find matches at the beginning of the scan, especially with a zig-
zag merge-join.  Is this a correct assumption?

Thanks again to everyone who has replied.  I've used the method from
the mapreduce _get_descending_keys handler that Julian pointed me to
and have modified it for my use case.  It's not an "ideal" inverse-
timestampe algorithm, but it is definitely practical.

--Joe



The users of my app are mostly going to be interacting with the most
recently added items.

On Feb 15, 5:54 pm, "Nick Johnson (Google)" <[email protected]>
wrote:
> Why not just use regular timestamps, and sort descending?
>
> -Nick
>
> On Wed, Feb 16, 2011 at 1:42 AM, Joseph Letness <[email protected]>wrote:
>
>
>
>
>
> > Julian, that is _exactlty_ what I was looking for.  Counting up to a
> > future time to create descending values...  It seems so obvious now!
> > It's a much better solution than the de-accumulator handler that I
> > wrote.
>
> > Thanks!
>
> > On Feb 14, 11:57 pm, Julian Namaro <[email protected]> wrote:
> > > I am not sure about the mathematics of it, but intuitively there is no
> > > perfect algorithm for constructing timestamps in a reverse
> > > lexicographical ordering, because adding a character to a string will
> > > always make it lexicographically superior.
>
> > > But I noticed the mapreduce library just pick a "ridiculously future
> > > time" and go in reverse order from there:
> >http://code.google.com/p/appengine-mapreduce/source/browse/trunk/pyth...
>
> > > The library also add a random string to reduce the chance of
> > > duplicates, maybe that can be replaced by an UUID if you're really
> > > concerned by uniqueness.
>
> > > On Feb 14, 5:57 am, Joseph Letness <[email protected]> wrote:
>
> > > > Hi Calvin and Robert, thanks for your replies.  I should have been
> > > > more clear about what I am doing, here is some more info:
>
> > > > Calvin, thanks for the link to Ikai's blog post, I haven't seen that
> > > > one and it was very interesting.
>
> > > > Robert, here are specific answers to your questions:
>
> > > > >>Why do you say: " I can't use a composite index since it would
> > explode with my use case"?
>
> > > > I'm using Brett Slatkin's "Relation Index" method of building and
> > > > querying set memberships (Google I/O 2009 - Building Scalable, Complex
> > > > Apps on App Engine).  According to Brett, using a composite index on
> > > > this kind would cause explosion, so any ordering of results will need
> > > > to be done in-memory during the request. If the sort order is
> > > > immutable, sorted key names can be used to order results based on the
> > > > their lexicographical position.
>
> > > > Since a creation timestamp is "immutable" data, I figured that using
> > > > lexicographic key names would be the way to go.
>
> > > > >>What would be fine if you could handle your entire result set in one
> > request?
>
> > > > Ordering the result set in-memory.
>
> > > > >>What are you trying to do?
>
> > > > The app is a digital-asset manager.  Users need to be able to query a
> > > > set (using the relation index method) and have the results return the
> > > > most recent additions first.  The result set could easily be a few
> > > > thousand, so I want to use cursor-pagination to display the results
> > > > which would preclude any in-memory ordering.
>
> > > > (I'm actually refactoring my existing app that I use to manage/deliver
> > > > graphic assets to my clients so that they can add their own data.)
>
> > > > >>Is there a single global LIFO stack, or are there multiple stacks?
>
> > > > The entities are all of the same kind, however, LIFO behavior is
> > > > localized to individual user groups.
>
> > > > >>How are new items added to the stack(s)?,  What is the addition rate?
>
> > > > Just one item per user request.  User groups would be just a few
> > > > individual users probably less than twenty. The rate per group would
> > > > be so low that chances of contention on any sort of accumulator would
> > > > be almost nonexistent.
>
> > > > >>Is there a requirement that the items are precisely ordered or are
> > some (or small) mis-orderings acceptable?
>
> > > > Precision is NOT critical.  Close approximation of chronology is just
> > > > fine.
>
> > > > --The auto-generated ids are not strictly increasing
>
> > > > I did not know that.  Thanks!
>
> > > > --Using the current time may also be problematic since the machines
> > > > will have slight variations, and in some cases significant variations.
>
> > > > I was aware of that, but since absolute precision is not necessary I
> > > > could still use the timestamp as an accumulator if there is some thing
> > > > as an "inverse-timestamp algorithm"!?!?
>
> > > > So...
>
> > > > After spending some more time thinking about this, here is what I plan
> > > > to do:
>
> > > > Create a counter model kind that is created with an IntegerProperty
> > > > starting value of ten billion (I'd like to see somebody reach the
> > > > bottom of that!). Give each user group it's own counter and de-count
> > > > the values in a transaction (or not, it might be simpler to dismiss
> > > > contention and write a handler that ensures uniqueness of the key name
> > > > but maintains approximate lexicographic position).  When the counter
> > > > value is read, convert the value to a padded string and concatenate it
> > > > with the user group name and a leading lowercase letter (k9999999836/
> > > > usergroupname) and use that as the key name for the new asset.
>
> > > > Furthermore, it occurred to me that as a result set is reduced to a
> > > > manageable in-memory size, I could test for the length of results and
> > > > offer the user the ability to custom order their results (asset name
> > > > alphanumeric or asset kind, for example).  Just a thought.
>
> > > > Thanks again for the replies, If anyone thinks there is a better
> > > > approach please let me know, I kind of make this stuff up as I go
> > > > along..
>
> > > > --Joe
>
> > > > On Feb 12, 10:52 pm, Robert Kluin <[email protected]> wrote:
>
> > > > > Hi Joe,
> > > > >   What are you actually trying to do?  Is there a single global LIFO
> > > > > stack, or are there multiple stacks?  How are new items added to the
> > > > > stack(s)?  In batches to one stack at a time, batches across stacks?
> > > > > What is the addition rate?  How are items removed / processed from
> > the
> > > > > stack(s)?  Is there a requirement that the items are precisely
> > ordered
> > > > > or are some (or small) mis-orderings acceptable?
>
> > > > >   Why do you say: " I can't use a composite index since it would
> > > > > explode with my use case"?
>
> > > > >   The auto-generated ids are not strictly increasing.  What would be
> > > > > fine if you could handle your entire result set in one request?
>
> > > > >   Using the current time may also be problematic since the machines
> > > > > will have slight variations, and in some cases significant
> > variations.
>
> > > > > Robert
>
> > > > > On Sat, Feb 12, 2011 at 14:38, Joseph Letness <[email protected]>
> > wrote:
> > > > > > Hi everybody, I was wondering if anybody has any good ideas for
> > > > > > generating LIFO (Last In FIrst Out) key names.  I can't use a
> > > > > > composite index since it would explode with my use case.
>
> > > > > > Currently, I can think of two methods:
>
> > > > > > Use the auto generated id (which, I believe is accumulative), query
> > > > > > for keys only and reverse the list in memory.  This would be fine
> > if I
> > > > > > can guarantee that my entire result set can be handled within a
> > single
> > > > > > request.
>
> > > > > > OR
>
> > > > > > Create a de-accumulator Entity in the datastore and have it count
> > down
> > > > > > from some reasonably high integer and create my key name with that
> > (a
> > > > > > composite of the de-accumulation and the entity nam).  The draw
> > back
> > > > > > for this method is that I'm incurring an additional read-write
> > every
> > > > > > time a new LIFO entity is created and possible contention on the
> > de-
> > > > > > accumulator if I run it in a transaction (I haven't decided if
> > > > > > consistency of the de-accumulation is imperative for my use case
> > yet).
>
> > > > > > I'm using Python.  If anybody has any better ideas it would be much
> > > > > > appreciated!
>
> > > > > > Thanks,
>
> > > > > > --Joe
>
> > > > > > --
> > > > > > You received this message because you are subscribed to the Google
> > Groups "Google App Engine" group.
> > > > > > To post to this group, send email to
> > [email protected].
> > > > > > To unsubscribe from this group, send email to
> > [email protected].
> > > > > > For more options, visit this group athttp://
> > groups.google.com/group/google-appengine?hl=en.
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Google App Engine" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to
> > [email protected].
> > For more options, visit this group at
> >http://groups.google.com/group/google-appengine?hl=en.
>
> --
> Nick Johnson, Developer Programs Engineer, App Engine
> Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number:
> 368047

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

[google-appengine] Re: LIFO key name strategy?

Reply via email to