Hi guys,
Since there is no full-text search available in GAE/j and I really
need this for a new app I am writing I have made a prototype
implementation of an inverted index using GAE store.
Term is stored as a key with actual term as name in key (only key is
needed)
Below each term I've added document references as another key like
this Term("term")/DocumentRef("10") where 10 is the internal document
number.
An example:
Term("stuff")
DocRef("1")
DocRef("2")
Term("more")
DocRef("1")
When searching for e.g. "more stuff" (which is boolean and) I do this:
Query DocRef's from the Term with the least doc-refs (children, this
info is cached) and load keys into a sorted set.
Then query for doc-refs under the second term filtering from the min.
doc-id in the sorted set and the max doc-id (meaning we only get
possible matches in the docs we've know contains the first term.
Merge sets.
What do you think? Is this a fair way to implement this (working on
scoring using tf-idf) and do you think its possible to get it to
perform well?
/Lars Borup
--
You received this message because you are subscribed to the Google Groups
"Google App Engine for Java" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/google-appengine-java?hl=en.