Thanks for the suggestions-

Dave Kor's TermQuery extension idea is an interesting one.  It's similar to 
what I'd tried, but more straightforward.  Unfortunately, while trying to 
implement it I've run into the same question I had earlier.  The the new 
class needs to override the TermQuery.scorer(IndexReader reader) method.  
That method calls reader.termDocs(term) to get a TermDoc representing 
instances of one term.  The new method would need to somehow obtain a 
TermDoc representing several terms (all terms starting with a particular 
letter).

So now, instead of TermPosition's, we're trying to combine TermDoc's.  If 
there is a way of doing that I'm golden.  Any thoughts?

As for the other ideas...
Otis Gospodnetic's suggestion of using a thesaurus to replace acronyms with 
complete words is a very reasonable solution; but I need to be able to 
produce hits for unanticipated/new acronyms.  I need to start with an 
arbitrary series of letters and retrieve documents containing any matching 
phrase, not just the ones in a finite (and dated) list.  Hence the "hack".

I tried adding the initial letter of each word to a new document field (per 
Anders Nielsen's suggestion), and it worked well; however, I need to be able 
to know which phrases produced the match.  It seems to me (although I may be 
overlooking something) that now I only have access to the initials, and 
can't retrieve the original words.  If I search for 'POS' I can determine 
which documents contain matches, but I can't tell which document contains 
"point of sale" and which contains "part of speech".

Regards- Lex


>From: Dave Kor <[EMAIL PROTECTED]>
>To: Lex Lawrence <[EMAIL PROTECTED]>
>Subject: Re: [Lucene-users] Acronym Search
>Date: Mon, 24 Sep 2001 18:19:53 -0700 (PDT)
>
>I have a simple solution, but I haven't tried it out
>so it may not really work :)
>
>Have you tried to extend TermQuery to score terms
>matching the first alphabet then manually construct a
>PhraseQuery that uses this new class instead of
>TermQuery objects?
>
>
>--- Lex Lawrence <[EMAIL PROTECTED]> wrote:
> > I'm interested in performing an acronym search...
> > looking for phrases
> > matching a set of initials.  For example I'd like to
> > search for 'POS' and
> > retrieve documents containing "point of sale" as
> > well as "pot of stew".  My
> > understanding is that the current PhraseQuery
> > doesn't support prefix
> > wildcards (it makes sense but doesn't help me).
> >
> > I tried implementing a new query class in a way
> > similar to the current
> > PrefixQuery.  For each prefix, I created a list of
> > matching terms from the
> > index.  I then created a PhraseQuery for each
> > combination of terms.  So the
> > 'POS' search might produce a query like this:
> > "point of stew" OR "point of sale" OR "point orphan
> > stew" OR "point orphan
> > sale" OR "pot of stew" OR "pot of sale" OR blah,
> > blah...
> >
> > The result was huge queries that ran out of memory.
> >
> > It occurred to me that another solution might be to
> > implement the prefix
> > wildcard way down in the IndexReader subclasses
> > (like SegmentReader).  I was
> > hoping to create an alternative to the
> > termPositions(Term t) method.  This
> > method gets a TermInfo object for the Term, and uses
> > it to return a new
> > TermPositions object.  Perhaps I could make a
> > prefixPositions(Term prefix)
> > method.  The idea would be to produce several
> > TermInfo objects, one for each
> > term matching the prefix.  Then return one
> > TermPositions object representing
> > the locations of all of those terms.
> >
> > This is where I'm stuck.  Regardless of whether the
> > approach is ill-advised,
> > does anyone know if it's possible to take several
> > TermInfo objects and
> > create one TermPositions object for them?
> >
> > If this approach is in fact ill-advised, does anyone
> > have a better
> > suggestion for finding acronym phrases?
> >
> > Thanks for your time!
> > -Lex
> >
> >
>_________________________________________________________________
> > Get your FREE download of MSN Explorer at
> > http://explorer.msn.com/intl.asp
> >
> >
> > _______________________________________________
> > Lucene-users mailing list
> > [EMAIL PROTECTED]
> >
>https://lists.sourceforge.net/lists/listinfo/lucene-users
> >
> >
> >
>
>
>__________________________________________________
>Do You Yahoo!?
>Get email alerts & NEW webcam video instant messaging with Yahoo! 
>Messenger. http://im.yahoo.com


_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp


_______________________________________________
Lucene-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/lucene-users

Reply via email to