RE: [Lucene-users] Acronym Search

Doug Cutting Tue, 25 Sep 2001 08:24:55 -0700

> From: Lex Lawrence [mailto:[EMAIL PROTECTED]]
> 
> I tried adding the initial letter of each word to a new
> document field (per Anders Nielsen's suggestion), and it
> worked well; however, I need to be able to know which
> phrases produced the match.  It seems to me (although I
> may be overlooking something) that now I only have access
> to the initials, and can't retrieve the original words.
> If I search for 'POS' I can determine which documents
> contain matches, but I can't tell which document contains
> "point of sale" and which contains "part of speech".

You could re-tokenize the original text to find this.

To make acronym searches faster you could add acronym bigrams or trigrams to
the acronym field.  For example, for the preceding sentence you might add
"tma", "mas", "asf", "sfy", etc.  Then to search for the acronym "masf"
you'd search for the phrase "mas asf".  Does that make sense?

Doug

_______________________________________________
Lucene-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/lucene-users

RE: [Lucene-users] Acronym Search

Reply via email to