Our application indexes and retreieves sentences from a large database. Our terms are overlapping characters (n-grams). In order to calculate our custom score we need to know the (relative) position of each n-gram in the matched sentences. I'm currently using a boolen query (each n-ngram in a big 'OR' statement). I will investigate customizing the query as you suggest.
Basically we are using Lucene as a Translation Memeory tool! Pretty cool. Lucene is wonderful and I think we can use it in many of our linguistic projects (Terminlogy, concordance, TM etc.). Jim >>> [EMAIL PROTECTED] 06/30/03 10:56 AM >>> Jim Hargrave wrote: > I've defined my own collector (I want the raw score before it is normalized between > 1.0 and 0.0). For each document I need to know the the matching term positions in > the document. I've seen the methods in IndexReader, but how can I access them > inside my collect method? Are there other methods I am missing? No, this information is not available to the hit collector. Why do you need this? If it is only for summaries, then you're probably better off re-tokenizing those few documents that you wish to summarize. If it is for query evaluation, then you're probably better off writing a new class of query (which is non-trivial). Doug --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] ------------------------------------------------------------------------------ This message may contain confidential information, and is intended only for the use of the individual(s) to whom it is addressed. ============================================================================== --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
