[google-appengine] Re: Indexes on lists of 'tags' & queries

Jai Tue, 28 Jul 2009 05:52:24 -0700

Thanks nick that made it crystal clear.

Now I get it, it was the 'order by' clause that was preventing it from
being a merge-join and the zig-zagging to be in effect, as you stated
in your reply to Kyle. I also understand why is that so - because that
would require an additional sort over the query result on gae's part
hence it maintains a separate index(exploding one though) to avoid
sorting.


So my next question is how to implement this query after all?

Foo.all().filter('tags =', foo).filter('tags =', bar).order('bar')

Isn't there an efficient way of doing this in gae using relation index
(I don't think so as it just helps the fanout) or many to many
relations using referenceList property?

Obviously, this has got implications for full text search as well,
where tags are keywords and ordering is used for ranking purposes.

Is the reason that, there is indeed no efficient way of doing it
without pre-computing views for different ranking levels, the
google.appengine.ext.fulltext search does not support ordering or
ranking?

Thanks for your time.

On Jul 27, 3:42 pm, "Nick Johnson (Google)" <[email protected]>
wrote:
> On Mon, Jul 27, 2009 at 7:39 PM, Jai <[email protected]> wrote:
>
> > Hi Nick,
>
> > Thanks for replying, that made the things a bit more clear. But I
> > still don't understand the reasoning behind it.
>
> > You said the following example
>
> > Foo.all().filter('tags =', foo).filter('tags =', bar).order('bar')
>
> > will result in an exploding index.
>
> > But why so?
>
> Because as soon as you apply an inequality filter or sort order to your
> query, it requires a composite index. In this case, the composite index has
> to include two list properties (actually the same list property twice),
> which leads to an exploding index.
>
> Naturally, if your number of tags is limited, this may not be a big problem.
>
>
>
> > I believe above statement is equivalent to:
>
> > select * from Foo where tags='foo' AND tags = 'bar' order by 'bar1'
>
> > I changed 'bar' to 'bar1' just to avoid confusion.
>
> > Now since the order by is applied to 'bar1' and not to 'tags'(can that
> > be even done??) why will two separate indexes be required.
>
> Not two separate indexes - a single composite index that includes all the
> fields being queried on (the number of times they're used).
>
> > I
> > understand they can be required if order by is applied on tags
> > somehow, but for this case an index with default ordering for
> > listproperty class could have been used.
>
> With the exception of merge join queries (those with only equality filters
> and no sort orders), queries only ever use a single index.
>
>
>
> > The non exploding index that can be used here is:
>
> > - kind: Foo
> >      property:
> >        - tags
> >        - bar1
> >            direction:desc
>
> > Or will this not work for the query in question?
>
> It won't, because the tags property has to appear as many times as it's used
> in a query.
>
> -Nick Johnson
>
>
>
> > Thanks for bearing with me.
>
> > On Jul 27, 12:56 pm, "Nick Johnson (Google)" <[email protected]>
> > wrote:
> > > Hi Jai,
>
> > > Both your example and my original example are cases of exploding indexes.
> > In
> > > my example, indexing on the same listproperty twice is required for
> > certain
> > > queries (for example, the Foo.all().filter('tags =', foo).filter('tags
> > =',
> > > bar).order('bar') example), but likewise results in an exploding index.
>
> > > -Nick Johnson
>
> > > On Mon, Jul 27, 2009 at 4:30 PM, Jai <[email protected]> wrote:
>
> > > > Hi Kyle,
>
> > > > I think what Nick meant was this:
>
> > > > - kind: Foo
> > > >  property: tags1
> > > >  property: tags2
>
> > > > that is having two different properties of type ListProperty because
> > > > then the index will have rows for each combination of values for each
> > > > of the property and number of rows will be multiplication of number of
> > > > distinct values in the indexes.
>
> > > > For your case it is ok because, there will be a single index 'tags'
> > > > with an entity being referenced by the DIFFERENT tag values in the
> > > > SAME index. So the references might be multiple, number of rows will
> > > > still be linear.
>
> > > > Hope it helps.
>
> > > > Regards,
> > > > Jai
> > > > On Jul 27, 11:16 am, Kyle Jensen <[email protected]> wrote:
> > > > > Hi Nick,
>
> > > > > I noticed that the dev server will automatically add indexes that
> > > > > appear to be exploding indexes when I run queries like the example
> > > > > above.  Ie, if I do Foo.get_by_tags(['tag1', 'tag2']) the dev server
> > > > > will create an index in index.yaml that is identical to the one you
> > > > > posted above:
>
> > > > > - kind: Foo
> > > > >   property: tags
> > > > >   property: tags
>
> > > > > ** Is that index not required for the snippet
> > Foo.get_by_tags(['tag1',
> > > > > 'tag2'])?
> > > > > ** What about Foo.get_by_tags(['tag1', 'tag2']).order('bar') where
> > bar
> > > > > is an IntegerProperty
>
> > > > > Thanks for your help and sorry for my ignorance!!
> > > > > Kyle
>
> > > > > On Jul 27, 8:07 am, "Nick Johnson (Google)" <[email protected]
>
> > > > > wrote:
>
> > > > > > Hi Kyle,
>
> > > > > > Exploding indexes only come into play when you're _defining_
> > indexes -
> > > > for
> > > > > > example, an index like this would be an exploding one:
>
> > > > > > - kind: Foo
> > > > > >   property: tags
> > > > > >   property: tags
>
> > > > > > As long as you can satisfy your queries using the built in merge
> > join
> > > > > > support (which is the case in your example, where you don't specify
> > > > > > inequality filters or sort orders), you'll be fine.
>
> > > > > > -Nick Johnson
>
> > > > > > On Mon, Jul 27, 2009 at 3:50 PM, Kyle Jensen <[email protected]>
> > > > wrote:
>
> > > > > > > Hi, I'd like to know if multiple equality filters on a
> > ListProperty
> > > > > > > leads to exploding indexes.
>
> > > > > > > I have a model something like the following (python):
>
> > > > > > > class Foo(db.model):
> > > > > > >    tags = db.ListProperty(db.Category)
>
> > > > > > >   �...@classmethod
> > > > > > >    def get_by_tags(cls, tags):
> > > > > > >        query = cls.all()
> > > > > > >        for tag in tags:
> > > > > > >            query.filter('tags =', tag)
> > > > > > >        return query
>
> > > > > > > I'd like to know if I can filter by an arbitrary number of tags
> > > > > > > without encountering the exploding index problem (its not clear
> > to me
> > > > > > > from the docs).
>
> > > > > > > E.g., can I do
> > > > > > > query = Foo.get_by_tags(['tag0', 'tag1', 'tag2', 'tag3' ....
> > 'tagN'])
>
> > > > > > > etc etc.
>
> > > > > > > Thanks! Kyle
>
> > > > > > --
> > > > > > Nick Johnson, Developer Programs Engineer, App Engine
>
> > > --
> > > Nick Johnson, Developer Programs Engineer, App Engine
>
>
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

[google-appengine] Re: Indexes on lists of 'tags' & queries

Reply via email to