[google-appengine] Re: Like Search

Barry Hunter Fri, 12 Jun 2009 20:20:30 -0700

Yeh that should work, and reasonably well. Its very similar to how the
SearchableModel demo works for whole word full text search, split the
text into words and put in a string list.


The main issue becomes when you want to be able to search on mulitple
words, an index will be created that has 'like' property twice. This
quickly leads to exploding indexes, as an entry in the index needs to
be created for every word, with every other word, so for your example
requiring a 210 words, would result in 44,100 index entries! Thats
clearly not sustainable - espically if the result set is big and/or
need to search three words!

Not saying dont do it, just be weary that it will still quickly break down...


But there is a further way to optimise. See [1] for a tip on prefix
matching. Which I think should still work on stringlists, so can
actully just store

word.like = ["open", "pen", "en", "n"]   #(for 'open')

as 'op' for example would prefix match on "open"

(but only works if you need your inequality filter on a different property... )

[1] 
http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html


On 12/06/2009, Neves <[email protected]> wrote:
>
>  I have an idea to do LIKE search with small words in GAE.
>
>  The solution is create a Word model with the follow fields:
>
>  class Word(db.Model):
>  word = db.StringProperty()
>  like = db.StringListProperty()
>
>  # usage
>  word = Word()
>
>  word.word = "open"
>  # with the assignment above, the like property would automatically be
>  filled with the follow string list:
>  word.like = ["o", "op", "ope", "open", "p", "pe", "pen", "e", "en",
>  "n"]
>  word.put()
>
>  # a search would be like this:
>  part = "pen"
>  results = db.GqlQuery("SELECT * FROM Word WHERE like = :1", part)
>  # results would contain the word "open" cause it contains the
>  substring "pen" on the like list.
>
>  For optimization, repeated parts would not be saved on the list, for
>  example, the word "popo" would became:
>  word.like = ["p", "po", "pop", popo", "o", "op", "opo"]
>
>  The math to know how much words would exist on the list, is just do:
>  length * (1 + length) / 2
>  So a word with 20 letters would have a maximum of 210 subwords, minus
>  the repeated ones.
>  Thats why it just works for small words, in my case, for domain names,
>  or email address.
>
>  This idea came from 
> http://code.google.com/events/io/sessions/BuildingScalableComplexApps.html
>  >
>


-- 
Barry

- www.nearby.org.uk - www.geograph.org.uk -

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

[google-appengine] Re: Like Search

Reply via email to