Hi Jai,

Your only options for satisfying that query are:
- Tolerate an exploding index - which is not really a problem if you have
fewer than about 100 tags.
- Write your own query planner that performs a scan for each tag and takes
the intersection of the results.
- Look into using Brett's index relations to facilitate larger indexes.

-Nick Johnson

On Tue, Jul 28, 2009 at 1:52 PM, Jai <[email protected]> wrote:

>
> Thanks nick that made it crystal clear.
>
> Now I get it, it was the 'order by' clause that was preventing it from
> being a merge-join and the zig-zagging to be in effect, as you stated
> in your reply to Kyle. I also understand why is that so - because that
> would require an additional sort over the query result on gae's part
> hence it maintains a separate index(exploding one though) to avoid
> sorting.
>
> So my next question is how to implement this query after all?
>
> Foo.all().filter('tags =', foo).filter('tags =', bar).order('bar')
>
> Isn't there an efficient way of doing this in gae using relation index
> (I don't think so as it just helps the fanout) or many to many
> relations using referenceList property?
>
> Obviously, this has got implications for full text search as well,
> where tags are keywords and ordering is used for ranking purposes.
>
> Is the reason that, there is indeed no efficient way of doing it
> without pre-computing views for different ranking levels, the
> google.appengine.ext.fulltext search does not support ordering or
> ranking?
>
> Thanks for your time.
>
> On Jul 27, 3:42 pm, "Nick Johnson (Google)" <[email protected]>
> wrote:
> > On Mon, Jul 27, 2009 at 7:39 PM, Jai <[email protected]> wrote:
> >
> > > Hi Nick,
> >
> > > Thanks for replying, that made the things a bit more clear. But I
> > > still don't understand the reasoning behind it.
> >
> > > You said the following example
> >
> > > Foo.all().filter('tags =', foo).filter('tags =', bar).order('bar')
> >
> > > will result in an exploding index.
> >
> > > But why so?
> >
> > Because as soon as you apply an inequality filter or sort order to your
> > query, it requires a composite index. In this case, the composite index
> has
> > to include two list properties (actually the same list property twice),
> > which leads to an exploding index.
> >
> > Naturally, if your number of tags is limited, this may not be a big
> problem.
> >
> >
> >
> > > I believe above statement is equivalent to:
> >
> > > select * from Foo where tags='foo' AND tags = 'bar' order by 'bar1'
> >
> > > I changed 'bar' to 'bar1' just to avoid confusion.
> >
> > > Now since the order by is applied to 'bar1' and not to 'tags'(can that
> > > be even done??) why will two separate indexes be required.
> >
> > Not two separate indexes - a single composite index that includes all the
> > fields being queried on (the number of times they're used).
> >
> > > I
> > > understand they can be required if order by is applied on tags
> > > somehow, but for this case an index with default ordering for
> > > listproperty class could have been used.
> >
> > With the exception of merge join queries (those with only equality
> filters
> > and no sort orders), queries only ever use a single index.
> >
> >
> >
> > > The non exploding index that can be used here is:
> >
> > > - kind: Foo
> > >      property:
> > >        - tags
> > >        - bar1
> > >            direction:desc
> >
> > > Or will this not work for the query in question?
> >
> > It won't, because the tags property has to appear as many times as it's
> used
> > in a query.
> >
> > -Nick Johnson
> >
> >
> >
> > > Thanks for bearing with me.
> >
> > > On Jul 27, 12:56 pm, "Nick Johnson (Google)" <[email protected]>
> > > wrote:
> > > > Hi Jai,
> >
> > > > Both your example and my original example are cases of exploding
> indexes.
> > > In
> > > > my example, indexing on the same listproperty twice is required for
> > > certain
> > > > queries (for example, the Foo.all().filter('tags =',
> foo).filter('tags
> > > =',
> > > > bar).order('bar') example), but likewise results in an exploding
> index.
> >
> > > > -Nick Johnson
> >
> > > > On Mon, Jul 27, 2009 at 4:30 PM, Jai <[email protected]> wrote:
> >
> > > > > Hi Kyle,
> >
> > > > > I think what Nick meant was this:
> >
> > > > > - kind: Foo
> > > > >  property: tags1
> > > > >  property: tags2
> >
> > > > > that is having two different properties of type ListProperty
> because
> > > > > then the index will have rows for each combination of values for
> each
> > > > > of the property and number of rows will be multiplication of number
> of
> > > > > distinct values in the indexes.
> >
> > > > > For your case it is ok because, there will be a single index 'tags'
> > > > > with an entity being referenced by the DIFFERENT tag values in the
> > > > > SAME index. So the references might be multiple, number of rows
> will
> > > > > still be linear.
> >
> > > > > Hope it helps.
> >
> > > > > Regards,
> > > > > Jai
> > > > > On Jul 27, 11:16 am, Kyle Jensen <[email protected]> wrote:
> > > > > > Hi Nick,
> >
> > > > > > I noticed that the dev server will automatically add indexes that
> > > > > > appear to be exploding indexes when I run queries like the
> example
> > > > > > above.  Ie, if I do Foo.get_by_tags(['tag1', 'tag2']) the dev
> server
> > > > > > will create an index in index.yaml that is identical to the one
> you
> > > > > > posted above:
> >
> > > > > > - kind: Foo
> > > > > >   property: tags
> > > > > >   property: tags
> >
> > > > > > ** Is that index not required for the snippet
> > > Foo.get_by_tags(['tag1',
> > > > > > 'tag2'])?
> > > > > > ** What about Foo.get_by_tags(['tag1', 'tag2']).order('bar')
> where
> > > bar
> > > > > > is an IntegerProperty
> >
> > > > > > Thanks for your help and sorry for my ignorance!!
> > > > > > Kyle
> >
> > > > > > On Jul 27, 8:07 am, "Nick Johnson (Google)" <
> [email protected]
> >
> > > > > > wrote:
> >
> > > > > > > Hi Kyle,
> >
> > > > > > > Exploding indexes only come into play when you're _defining_
> > > indexes -
> > > > > for
> > > > > > > example, an index like this would be an exploding one:
> >
> > > > > > > - kind: Foo
> > > > > > >   property: tags
> > > > > > >   property: tags
> >
> > > > > > > As long as you can satisfy your queries using the built in
> merge
> > > join
> > > > > > > support (which is the case in your example, where you don't
> specify
> > > > > > > inequality filters or sort orders), you'll be fine.
> >
> > > > > > > -Nick Johnson
> >
> > > > > > > On Mon, Jul 27, 2009 at 3:50 PM, Kyle Jensen <
> [email protected]>
> > > > > wrote:
> >
> > > > > > > > Hi, I'd like to know if multiple equality filters on a
> > > ListProperty
> > > > > > > > leads to exploding indexes.
> >
> > > > > > > > I have a model something like the following (python):
> >
> > > > > > > > class Foo(db.model):
> > > > > > > >    tags = db.ListProperty(db.Category)
> >
> > > > > > > >    @classmethod
> > > > > > > >    def get_by_tags(cls, tags):
> > > > > > > >        query = cls.all()
> > > > > > > >        for tag in tags:
> > > > > > > >            query.filter('tags =', tag)
> > > > > > > >        return query
> >
> > > > > > > > I'd like to know if I can filter by an arbitrary number of
> tags
> > > > > > > > without encountering the exploding index problem (its not
> clear
> > > to me
> > > > > > > > from the docs).
> >
> > > > > > > > E.g., can I do
> > > > > > > > query = Foo.get_by_tags(['tag0', 'tag1', 'tag2', 'tag3' ....
> > > 'tagN'])
> >
> > > > > > > > etc etc.
> >
> > > > > > > > Thanks! Kyle
> >
> > > > > > > --
> > > > > > > Nick Johnson, Developer Programs Engineer, App Engine
> >
> > > > --
> > > > Nick Johnson, Developer Programs Engineer, App Engine
> >
> >
> >
>


-- 
Nick Johnson, Developer Programs Engineer, App Engine

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to