Hi Jai, Your only options for satisfying that query are: - Tolerate an exploding index - which is not really a problem if you have fewer than about 100 tags. - Write your own query planner that performs a scan for each tag and takes the intersection of the results. - Look into using Brett's index relations to facilitate larger indexes.
-Nick Johnson On Tue, Jul 28, 2009 at 1:52 PM, Jai <[email protected]> wrote: > > Thanks nick that made it crystal clear. > > Now I get it, it was the 'order by' clause that was preventing it from > being a merge-join and the zig-zagging to be in effect, as you stated > in your reply to Kyle. I also understand why is that so - because that > would require an additional sort over the query result on gae's part > hence it maintains a separate index(exploding one though) to avoid > sorting. > > So my next question is how to implement this query after all? > > Foo.all().filter('tags =', foo).filter('tags =', bar).order('bar') > > Isn't there an efficient way of doing this in gae using relation index > (I don't think so as it just helps the fanout) or many to many > relations using referenceList property? > > Obviously, this has got implications for full text search as well, > where tags are keywords and ordering is used for ranking purposes. > > Is the reason that, there is indeed no efficient way of doing it > without pre-computing views for different ranking levels, the > google.appengine.ext.fulltext search does not support ordering or > ranking? > > Thanks for your time. > > On Jul 27, 3:42 pm, "Nick Johnson (Google)" <[email protected]> > wrote: > > On Mon, Jul 27, 2009 at 7:39 PM, Jai <[email protected]> wrote: > > > > > Hi Nick, > > > > > Thanks for replying, that made the things a bit more clear. But I > > > still don't understand the reasoning behind it. > > > > > You said the following example > > > > > Foo.all().filter('tags =', foo).filter('tags =', bar).order('bar') > > > > > will result in an exploding index. > > > > > But why so? > > > > Because as soon as you apply an inequality filter or sort order to your > > query, it requires a composite index. In this case, the composite index > has > > to include two list properties (actually the same list property twice), > > which leads to an exploding index. > > > > Naturally, if your number of tags is limited, this may not be a big > problem. > > > > > > > > > I believe above statement is equivalent to: > > > > > select * from Foo where tags='foo' AND tags = 'bar' order by 'bar1' > > > > > I changed 'bar' to 'bar1' just to avoid confusion. > > > > > Now since the order by is applied to 'bar1' and not to 'tags'(can that > > > be even done??) why will two separate indexes be required. > > > > Not two separate indexes - a single composite index that includes all the > > fields being queried on (the number of times they're used). > > > > > I > > > understand they can be required if order by is applied on tags > > > somehow, but for this case an index with default ordering for > > > listproperty class could have been used. > > > > With the exception of merge join queries (those with only equality > filters > > and no sort orders), queries only ever use a single index. > > > > > > > > > The non exploding index that can be used here is: > > > > > - kind: Foo > > > property: > > > - tags > > > - bar1 > > > direction:desc > > > > > Or will this not work for the query in question? > > > > It won't, because the tags property has to appear as many times as it's > used > > in a query. > > > > -Nick Johnson > > > > > > > > > Thanks for bearing with me. > > > > > On Jul 27, 12:56 pm, "Nick Johnson (Google)" <[email protected]> > > > wrote: > > > > Hi Jai, > > > > > > Both your example and my original example are cases of exploding > indexes. > > > In > > > > my example, indexing on the same listproperty twice is required for > > > certain > > > > queries (for example, the Foo.all().filter('tags =', > foo).filter('tags > > > =', > > > > bar).order('bar') example), but likewise results in an exploding > index. > > > > > > -Nick Johnson > > > > > > On Mon, Jul 27, 2009 at 4:30 PM, Jai <[email protected]> wrote: > > > > > > > Hi Kyle, > > > > > > > I think what Nick meant was this: > > > > > > > - kind: Foo > > > > > property: tags1 > > > > > property: tags2 > > > > > > > that is having two different properties of type ListProperty > because > > > > > then the index will have rows for each combination of values for > each > > > > > of the property and number of rows will be multiplication of number > of > > > > > distinct values in the indexes. > > > > > > > For your case it is ok because, there will be a single index 'tags' > > > > > with an entity being referenced by the DIFFERENT tag values in the > > > > > SAME index. So the references might be multiple, number of rows > will > > > > > still be linear. > > > > > > > Hope it helps. > > > > > > > Regards, > > > > > Jai > > > > > On Jul 27, 11:16 am, Kyle Jensen <[email protected]> wrote: > > > > > > Hi Nick, > > > > > > > > I noticed that the dev server will automatically add indexes that > > > > > > appear to be exploding indexes when I run queries like the > example > > > > > > above. Ie, if I do Foo.get_by_tags(['tag1', 'tag2']) the dev > server > > > > > > will create an index in index.yaml that is identical to the one > you > > > > > > posted above: > > > > > > > > - kind: Foo > > > > > > property: tags > > > > > > property: tags > > > > > > > > ** Is that index not required for the snippet > > > Foo.get_by_tags(['tag1', > > > > > > 'tag2'])? > > > > > > ** What about Foo.get_by_tags(['tag1', 'tag2']).order('bar') > where > > > bar > > > > > > is an IntegerProperty > > > > > > > > Thanks for your help and sorry for my ignorance!! > > > > > > Kyle > > > > > > > > On Jul 27, 8:07 am, "Nick Johnson (Google)" < > [email protected] > > > > > > > > wrote: > > > > > > > > > Hi Kyle, > > > > > > > > > Exploding indexes only come into play when you're _defining_ > > > indexes - > > > > > for > > > > > > > example, an index like this would be an exploding one: > > > > > > > > > - kind: Foo > > > > > > > property: tags > > > > > > > property: tags > > > > > > > > > As long as you can satisfy your queries using the built in > merge > > > join > > > > > > > support (which is the case in your example, where you don't > specify > > > > > > > inequality filters or sort orders), you'll be fine. > > > > > > > > > -Nick Johnson > > > > > > > > > On Mon, Jul 27, 2009 at 3:50 PM, Kyle Jensen < > [email protected]> > > > > > wrote: > > > > > > > > > > Hi, I'd like to know if multiple equality filters on a > > > ListProperty > > > > > > > > leads to exploding indexes. > > > > > > > > > > I have a model something like the following (python): > > > > > > > > > > class Foo(db.model): > > > > > > > > tags = db.ListProperty(db.Category) > > > > > > > > > > @classmethod > > > > > > > > def get_by_tags(cls, tags): > > > > > > > > query = cls.all() > > > > > > > > for tag in tags: > > > > > > > > query.filter('tags =', tag) > > > > > > > > return query > > > > > > > > > > I'd like to know if I can filter by an arbitrary number of > tags > > > > > > > > without encountering the exploding index problem (its not > clear > > > to me > > > > > > > > from the docs). > > > > > > > > > > E.g., can I do > > > > > > > > query = Foo.get_by_tags(['tag0', 'tag1', 'tag2', 'tag3' .... > > > 'tagN']) > > > > > > > > > > etc etc. > > > > > > > > > > Thanks! Kyle > > > > > > > > > -- > > > > > > > Nick Johnson, Developer Programs Engineer, App Engine > > > > > > -- > > > > Nick Johnson, Developer Programs Engineer, App Engine > > > > > > > -- Nick Johnson, Developer Programs Engineer, App Engine --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~----------~----~----~----~------~----~------~--~---
