Re: Proposal: extracting term-level stats from query process

markharw00d Wed, 17 Mar 2004 00:26:41 -0800

Doug,
To save any duplicated effort on your part: I've started work on the RAMDirectory 
alternative you suggested last week:
>> It would be interesting to write an in-memory version of IndexReader and 
>> IndexWriter 
>>that don't serialize anything to bytes. 
My current implementation is benchmarking as twice as fast at indexing  than 
RAMDirectory but is
slower at querying - I'm working on this. Fortunately querying is relatively much 
faster than indexing so, overall, it is still 
proving quicker at indexing and querying than using a RAMDirectory to perform one-time 
analysis of search results.


It would be useful if the Lucene Term class could be made to implement the 
"Comparable" interface - I think this could be added without
breaking anything. I've currently had to write my own "ComparableTerm" class simply to 
put terms into treemaps.

The design rationale is currently, no thread safety, no ability to merge with other 
indexes etc. A pure throw-away index typically used once in a single
query thread to analyse search results. To support this scenario it also offers some 
new methods of use in refining searches eg things like: 
  String getMostCommonUnstemmedForm(Term t)
  float getRelativeSignificance(Term t, IndexReader corpusReader)

Cheers
Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Proposal: extracting term-level stats from query process

Reply via email to