[google-appengine] Re: Like Search

Nick Johnson (Google) Mon, 15 Jun 2009 05:50:36 -0700

Hi Neves,

Yes, that form of query should work fine - and eliminates the requirement
for you to include all subsets of a word in your list - you only need to
include all suffixes.


-Nick Johnson

On Sat, Jun 13, 2009 at 3:02 PM, Neves <[email protected]> wrote:

>
> Sad for hearing that. But I think this kind of query is not possible
> with a list property:
> prop >= :1 AND prop < :2", "abc", u"abc" + u"\ufffd"
>
> is it?
>
> On 13 jun, 00:20, Barry Hunter <[email protected]> wrote:
> > Yeh that should work, and reasonably well. Its very similar to how the
> > SearchableModel demo works for whole word full text search, split the
> > text into words and put in a string list.
> >
> > The main issue becomes when you want to be able to search on mulitple
> > words, an index will be created that has 'like' property twice. This
> > quickly leads to exploding indexes, as an entry in the index needs to
> > be created for every word, with every other word, so for your example
> > requiring a 210 words, would result in 44,100 index entries! Thats
> > clearly not sustainable - espically if the result set is big and/or
> > need to search three words!
> >
> > Not saying dont do it, just be weary that it will still quickly break
> down...
> >
> > But there is a further way to optimise. See [1] for a tip on prefix
> > matching. Which I think should still work on stringlists, so can
> > actully just store
> >
> > word.like = ["open", "pen", "en", "n"]   #(for 'open')
> >
> > as 'op' for example would prefix match on "open"
> >
> > (but only works if you need your inequality filter on a different
> property... )
> >
> > [1]
> http://code.google.com/appengine/docs/python/datastore/queriesandinde...
> >
> > On 12/06/2009, Neves <[email protected]> wrote:
> >
> >
> >
> >
> >
> > >  I have an idea to do LIKE search with small words in GAE.
> >
> > >  The solution is create a Word model with the follow fields:
> >
> > >  class Word(db.Model):
> > >  word = db.StringProperty()
> > >  like = db.StringListProperty()
> >
> > >  # usage
> > >  word = Word()
> >
> > >  word.word = "open"
> > >  # with the assignment above, the like property would automatically be
> > >  filled with the follow string list:
> > >  word.like = ["o", "op", "ope", "open", "p", "pe", "pen", "e", "en",
> > >  "n"]
> > >  word.put()
> >
> > >  # a search would be like this:
> > >  part = "pen"
> > >  results = db.GqlQuery("SELECT * FROM Word WHERE like = :1", part)
> > >  # results would contain the word "open" cause it contains the
> > >  substring "pen" on the like list.
> >
> > >  For optimization, repeated parts would not be saved on the list, for
> > >  example, the word "popo" would became:
> > >  word.like = ["p", "po", "pop", popo", "o", "op", "opo"]
> >
> > >  The math to know how much words would exist on the list, is just do:
> > >  length * (1 + length) / 2
> > >  So a word with 20 letters would have a maximum of 210 subwords, minus
> > >  the repeated ones.
> > >  Thats why it just works for small words, in my case, for domain names,
> > >  or email address.
> >
> > >  This idea came fromhttp://
> code.google.com/events/io/sessions/BuildingScalableComplexApps...
> >
> > --
> > Barry
> >
> > -www.nearby.org.uk-www.geograph.org.uk-
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

[google-appengine] Re: Like Search

Reply via email to