[Lucene-users] Acronym Search

Lex Lawrence Mon, 24 Sep 2001 07:17:02 -0700
I'm interested in performing an acronym search... looking for phrases 
matching a set of initials.  For example I'd like to search for 'POS' and 
retrieve documents containing "point of sale" as well as "pot of stew".  My 
understanding is that the current PhraseQuery doesn't support prefix 
wildcards (it makes sense but doesn't help me).

I tried implementing a new query class in a way similar to the current 
PrefixQuery.  For each prefix, I created a list of matching terms from the 
index.  I then created a PhraseQuery for each combination of terms.  So the 
'POS' search might produce a query like this:
"point of stew" OR "point of sale" OR "point orphan stew" OR "point orphan 
sale" OR "pot of stew" OR "pot of sale" OR blah, blah...

The result was huge queries that ran out of memory.

It occurred to me that another solution might be to implement the prefix 
wildcard way down in the IndexReader subclasses (like SegmentReader).  I was 
hoping to create an alternative to the termPositions(Term t) method.  This 
method gets a TermInfo object for the Term, and uses it to return a new 
TermPositions object.  Perhaps I could make a prefixPositions(Term prefix) 
method.  The idea would be to produce several TermInfo objects, one for each 
term matching the prefix.  Then return one TermPositions object representing 
the locations of all of those terms.

This is where I'm stuck.  Regardless of whether the approach is ill-advised, 
does anyone know if it's possible to take several TermInfo objects and 
create one TermPositions object for them?

If this approach is in fact ill-advised, does anyone have a better 
suggestion for finding acronym phrases?

Thanks for your time!
-Lex

_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp


_______________________________________________
Lucene-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/lucene-users
[Lucene-users] Acronym Search

Reply via email to