[google-appengine] Re: Ideas for a bit more advanced search

Hannes Thu, 26 Mar 2009 02:37:09 -0700

Hi,

"I think that you have problems scaling this as the inequality filter
on query for Keywords will actually result in a set of queries." -
Actually, after a bit of testing, I don't think this is that much of a
problem. In Steffen's version, the inequality filter is only used for
searches containing "-" (e.g. "linux -debian"), and it would still
allow plenty of excluded words before reaching the maximum number of
sub-queries. If you're afraid to reach this limit you could just skip
"-" functionality, as I don't think it's generally that important.
Also, he uses = on the list rather than IN, which doesn't make sub-
queries at all as far as I know. Using = rather than IN makes all the
search words required to match an entry, but this is usually what you
want.


I'll post my version of this shortly.

On 25 Mar, 08:34, Robert <[email protected]> wrote:
> Hi,
>
> Interesting issue. I think that you have problems scaling this as the
> inequality filter on query for Keywords will actually result in a set
> of queries. I'm trying an alternative approach, but to meet some
> possible slightly different requirements.
> I create a db record "IndexTerm" for each keyword and use that keyword
> as the key_name for that record. Each IndexTerm has a field
> "referents"; a db.Text field that contains a space separated list of
> key names to the objects (in your case documents) that have this
> keyword.
> Indexing a document is a bit of a job as for each term we need to walk
> the list_as_string to check if the doc's keyname is already there. But
> we can write all relevant IndexTerms in one db.put() operation.
> Searching is fast though as we simply fetch all IndexTerms with the
> requested keywords; this is one db.get() operation with db.Keys
> constructed form the keywords. Then need to walk the lists of
> referents and do some ranking. I used db.Text to avoid additional
> indexing by the GAE database.
> This approach works well for my purposes but it also has some issues:
> the main problem is that the list of referents will get long,
> especially for "popular" terms. I'm trying to partition that list,
> e.g. by type of object (which works in my domain, but probably not in
> others), e.g. "email docs" and "blog posts" could have their own lists
> or their own IndexTerms class.
>
> On Feb 23, 6:41 pm, Steffen 'stefreak' Neubauer <[email protected]>
> wrote:
>
> > I hacked something together, with google-like syntax, but its not
> > really satisfying my needs because of index.yaml-problems. But here is
> > it, maybe its helpful to someone else.
> > Possible search terms are:
> >  debian lenny
> >  "debian lenny"
> >  debian -lenny (this is not really working as expected at the moment)
>
> > models.py:http://nopaste.biz/67482
> > views.py:http://nopaste.biz/67481
>
> > Example:http://stefreakstest.appspot.com/
>
> >  signature.asc
> > < 1KViewDownload
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

[google-appengine] Re: Ideas for a bit more advanced search

Reply via email to