Hi Neves, Yes, that form of query should work fine - and eliminates the requirement for you to include all subsets of a word in your list - you only need to include all suffixes.
-Nick Johnson On Sat, Jun 13, 2009 at 3:02 PM, Neves <[email protected]> wrote: > > Sad for hearing that. But I think this kind of query is not possible > with a list property: > prop >= :1 AND prop < :2", "abc", u"abc" + u"\ufffd" > > is it? > > On 13 jun, 00:20, Barry Hunter <[email protected]> wrote: > > Yeh that should work, and reasonably well. Its very similar to how the > > SearchableModel demo works for whole word full text search, split the > > text into words and put in a string list. > > > > The main issue becomes when you want to be able to search on mulitple > > words, an index will be created that has 'like' property twice. This > > quickly leads to exploding indexes, as an entry in the index needs to > > be created for every word, with every other word, so for your example > > requiring a 210 words, would result in 44,100 index entries! Thats > > clearly not sustainable - espically if the result set is big and/or > > need to search three words! > > > > Not saying dont do it, just be weary that it will still quickly break > down... > > > > But there is a further way to optimise. See [1] for a tip on prefix > > matching. Which I think should still work on stringlists, so can > > actully just store > > > > word.like = ["open", "pen", "en", "n"] #(for 'open') > > > > as 'op' for example would prefix match on "open" > > > > (but only works if you need your inequality filter on a different > property... ) > > > > [1] > http://code.google.com/appengine/docs/python/datastore/queriesandinde... > > > > On 12/06/2009, Neves <[email protected]> wrote: > > > > > > > > > > > > > I have an idea to do LIKE search with small words in GAE. > > > > > The solution is create a Word model with the follow fields: > > > > > class Word(db.Model): > > > word = db.StringProperty() > > > like = db.StringListProperty() > > > > > # usage > > > word = Word() > > > > > word.word = "open" > > > # with the assignment above, the like property would automatically be > > > filled with the follow string list: > > > word.like = ["o", "op", "ope", "open", "p", "pe", "pen", "e", "en", > > > "n"] > > > word.put() > > > > > # a search would be like this: > > > part = "pen" > > > results = db.GqlQuery("SELECT * FROM Word WHERE like = :1", part) > > > # results would contain the word "open" cause it contains the > > > substring "pen" on the like list. > > > > > For optimization, repeated parts would not be saved on the list, for > > > example, the word "popo" would became: > > > word.like = ["p", "po", "pop", popo", "o", "op", "opo"] > > > > > The math to know how much words would exist on the list, is just do: > > > length * (1 + length) / 2 > > > So a word with 20 letters would have a maximum of 210 subwords, minus > > > the repeated ones. > > > Thats why it just works for small words, in my case, for domain names, > > > or email address. > > > > > This idea came fromhttp:// > code.google.com/events/io/sessions/BuildingScalableComplexApps... > > > > -- > > Barry > > > > -www.nearby.org.uk-www.geograph.org.uk- > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~----------~----~----~----~------~----~------~--~---
