Our application indexes and retreieves sentences from a large database. Our terms are 
overlapping characters (n-grams). In order to calculate our custom score we need to 
know the (relative) position of each n-gram in the matched sentences. I'm currently 
using a boolen query (each n-ngram in a big 'OR' statement). I will investigate 
customizing the query as you suggest. 

Basically we are using Lucene as a Translation Memeory tool! Pretty cool. Lucene is 
wonderful and I think we can use it in many of our linguistic projects (Terminlogy, 
concordance, TM etc.).

Jim

>>> [EMAIL PROTECTED] 06/30/03 10:56 AM >>>
Jim Hargrave wrote:
> I've defined my own collector (I want the raw score before it is normalized between 
> 1.0 and 0.0). For each document I need to know the the matching term positions in 
> the document.  I've seen the methods in IndexReader, but how can I access them 
> inside my collect method? Are there other methods I am missing? 

No, this information is not available to the hit collector.

Why do you need this?  If it is only for summaries, then you're probably 
better off re-tokenizing those few documents that you wish to summarize. 
  If it is for query evaluation, then you're probably better off writing 
a new class of query (which is non-trivial).

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




------------------------------------------------------------------------------
This message may contain confidential information, and is intended only for the use of 
the individual(s) to whom it is addressed.


==============================================================================


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to