Another way to do this would be to make a seperate field to keep the acronym
data, and then implement an AcronymFilter that only stored the first letter
of each word in the acronym field, and if you then use the TermQuery to
query for "P O S" in that field would give all documents that had words
beginning with "P" followed by a word beginning with "O" followed by a word
beginning with "S".


Regards

Anders Nielsen, CEO
____________________

Visator ApS
Kroghsgade 1
2100 Copenhagen
phone: +45-3555 4702
mobile: +45-2671 3663
____________________




-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]]On Behalf Of Lex
Lawrence
Sent: 24. september 2001 16:34
To: [EMAIL PROTECTED]
Subject: [Lucene-users] Acronym Search


I'm interested in performing an acronym search... looking for phrases
matching a set of initials.  For example I'd like to search for 'POS' and
retrieve documents containing "point of sale" as well as "pot of stew".  My
understanding is that the current PhraseQuery doesn't support prefix
wildcards (it makes sense but doesn't help me).

I tried implementing a new query class in a way similar to the current
PrefixQuery.  For each prefix, I created a list of matching terms from the
index.  I then created a PhraseQuery for each combination of terms.  So the
'POS' search might produce a query like this:
"point of stew" OR "point of sale" OR "point orphan stew" OR "point orphan
sale" OR "pot of stew" OR "pot of sale" OR blah, blah...

The result was huge queries that ran out of memory.

It occurred to me that another solution might be to implement the prefix
wildcard way down in the IndexReader subclasses (like SegmentReader).  I was
hoping to create an alternative to the termPositions(Term t) method.  This
method gets a TermInfo object for the Term, and uses it to return a new
TermPositions object.  Perhaps I could make a prefixPositions(Term prefix)
method.  The idea would be to produce several TermInfo objects, one for each
term matching the prefix.  Then return one TermPositions object representing
the locations of all of those terms.

This is where I'm stuck.  Regardless of whether the approach is ill-advised,
does anyone know if it's possible to take several TermInfo objects and
create one TermPositions object for them?

If this approach is in fact ill-advised, does anyone have a better
suggestion for finding acronym phrases?

Thanks for your time!
-Lex

_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp


_______________________________________________
Lucene-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/lucene-users


_______________________________________________
Lucene-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/lucene-users

Reply via email to