Yeh that should work, and reasonably well. Its very similar to how the SearchableModel demo works for whole word full text search, split the text into words and put in a string list.
The main issue becomes when you want to be able to search on mulitple words, an index will be created that has 'like' property twice. This quickly leads to exploding indexes, as an entry in the index needs to be created for every word, with every other word, so for your example requiring a 210 words, would result in 44,100 index entries! Thats clearly not sustainable - espically if the result set is big and/or need to search three words! Not saying dont do it, just be weary that it will still quickly break down... But there is a further way to optimise. See [1] for a tip on prefix matching. Which I think should still work on stringlists, so can actully just store word.like = ["open", "pen", "en", "n"] #(for 'open') as 'op' for example would prefix match on "open" (but only works if you need your inequality filter on a different property... ) [1] http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html On 12/06/2009, Neves <[email protected]> wrote: > > I have an idea to do LIKE search with small words in GAE. > > The solution is create a Word model with the follow fields: > > class Word(db.Model): > word = db.StringProperty() > like = db.StringListProperty() > > # usage > word = Word() > > word.word = "open" > # with the assignment above, the like property would automatically be > filled with the follow string list: > word.like = ["o", "op", "ope", "open", "p", "pe", "pen", "e", "en", > "n"] > word.put() > > # a search would be like this: > part = "pen" > results = db.GqlQuery("SELECT * FROM Word WHERE like = :1", part) > # results would contain the word "open" cause it contains the > substring "pen" on the like list. > > For optimization, repeated parts would not be saved on the list, for > example, the word "popo" would became: > word.like = ["p", "po", "pop", popo", "o", "op", "opo"] > > The math to know how much words would exist on the list, is just do: > length * (1 + length) / 2 > So a word with 20 letters would have a maximum of 210 subwords, minus > the repeated ones. > Thats why it just works for small words, in my case, for domain names, > or email address. > > This idea came from > http://code.google.com/events/io/sessions/BuildingScalableComplexApps.html > > > -- Barry - www.nearby.org.uk - www.geograph.org.uk - --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~----------~----~----~----~------~----~------~--~---
