Thanks for the suggestions-
Dave Kor's TermQuery extension idea is an interesting one. It's similar to
what I'd tried, but more straightforward. Unfortunately, while trying to
implement it I've run into the same question I had earlier. The the new
class needs to override the TermQuery.scorer(IndexReader reader) method.
That method calls reader.termDocs(term) to get a TermDoc representing
instances of one term. The new method would need to somehow obtain a
TermDoc representing several terms (all terms starting with a particular
letter).
So now, instead of TermPosition's, we're trying to combine TermDoc's. If
there is a way of doing that I'm golden. Any thoughts?
As for the other ideas...
Otis Gospodnetic's suggestion of using a thesaurus to replace acronyms with
complete words is a very reasonable solution; but I need to be able to
produce hits for unanticipated/new acronyms. I need to start with an
arbitrary series of letters and retrieve documents containing any matching
phrase, not just the ones in a finite (and dated) list. Hence the "hack".
I tried adding the initial letter of each word to a new document field (per
Anders Nielsen's suggestion), and it worked well; however, I need to be able
to know which phrases produced the match. It seems to me (although I may be
overlooking something) that now I only have access to the initials, and
can't retrieve the original words. If I search for 'POS' I can determine
which documents contain matches, but I can't tell which document contains
"point of sale" and which contains "part of speech".
Regards- Lex
>From: Dave Kor <[EMAIL PROTECTED]>
>To: Lex Lawrence <[EMAIL PROTECTED]>
>Subject: Re: [Lucene-users] Acronym Search
>Date: Mon, 24 Sep 2001 18:19:53 -0700 (PDT)
>
>I have a simple solution, but I haven't tried it out
>so it may not really work :)
>
>Have you tried to extend TermQuery to score terms
>matching the first alphabet then manually construct a
>PhraseQuery that uses this new class instead of
>TermQuery objects?
>
>
>--- Lex Lawrence <[EMAIL PROTECTED]> wrote:
> > I'm interested in performing an acronym search...
> > looking for phrases
> > matching a set of initials. For example I'd like to
> > search for 'POS' and
> > retrieve documents containing "point of sale" as
> > well as "pot of stew". My
> > understanding is that the current PhraseQuery
> > doesn't support prefix
> > wildcards (it makes sense but doesn't help me).
> >
> > I tried implementing a new query class in a way
> > similar to the current
> > PrefixQuery. For each prefix, I created a list of
> > matching terms from the
> > index. I then created a PhraseQuery for each
> > combination of terms. So the
> > 'POS' search might produce a query like this:
> > "point of stew" OR "point of sale" OR "point orphan
> > stew" OR "point orphan
> > sale" OR "pot of stew" OR "pot of sale" OR blah,
> > blah...
> >
> > The result was huge queries that ran out of memory.
> >
> > It occurred to me that another solution might be to
> > implement the prefix
> > wildcard way down in the IndexReader subclasses
> > (like SegmentReader). I was
> > hoping to create an alternative to the
> > termPositions(Term t) method. This
> > method gets a TermInfo object for the Term, and uses
> > it to return a new
> > TermPositions object. Perhaps I could make a
> > prefixPositions(Term prefix)
> > method. The idea would be to produce several
> > TermInfo objects, one for each
> > term matching the prefix. Then return one
> > TermPositions object representing
> > the locations of all of those terms.
> >
> > This is where I'm stuck. Regardless of whether the
> > approach is ill-advised,
> > does anyone know if it's possible to take several
> > TermInfo objects and
> > create one TermPositions object for them?
> >
> > If this approach is in fact ill-advised, does anyone
> > have a better
> > suggestion for finding acronym phrases?
> >
> > Thanks for your time!
> > -Lex
> >
> >
>_________________________________________________________________
> > Get your FREE download of MSN Explorer at
> > http://explorer.msn.com/intl.asp
> >
> >
> > _______________________________________________
> > Lucene-users mailing list
> > [EMAIL PROTECTED]
> >
>https://lists.sourceforge.net/lists/listinfo/lucene-users
> >
> >
> >
>
>
>__________________________________________________
>Do You Yahoo!?
>Get email alerts & NEW webcam video instant messaging with Yahoo!
>Messenger. http://im.yahoo.com
_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp
_______________________________________________
Lucene-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/lucene-users