I'm interested in performing an acronym search... looking for phrases
matching a set of initials. For example I'd like to search for 'POS' and
retrieve documents containing "point of sale" as well as "pot of stew". My
understanding is that the current PhraseQuery doesn't support prefix
wildcards (it makes sense but doesn't help me).
I tried implementing a new query class in a way similar to the current
PrefixQuery. For each prefix, I created a list of matching terms from the
index. I then created a PhraseQuery for each combination of terms. So the
'POS' search might produce a query like this:
"point of stew" OR "point of sale" OR "point orphan stew" OR "point orphan
sale" OR "pot of stew" OR "pot of sale" OR blah, blah...
The result was huge queries that ran out of memory.
It occurred to me that another solution might be to implement the prefix
wildcard way down in the IndexReader subclasses (like SegmentReader). I was
hoping to create an alternative to the termPositions(Term t) method. This
method gets a TermInfo object for the Term, and uses it to return a new
TermPositions object. Perhaps I could make a prefixPositions(Term prefix)
method. The idea would be to produce several TermInfo objects, one for each
term matching the prefix. Then return one TermPositions object representing
the locations of all of those terms.
This is where I'm stuck. Regardless of whether the approach is ill-advised,
does anyone know if it's possible to take several TermInfo objects and
create one TermPositions object for them?
If this approach is in fact ill-advised, does anyone have a better
suggestion for finding acronym phrases?
Thanks for your time!
-Lex
_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp
_______________________________________________
Lucene-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/lucene-users