Re: Flex & Docs/AndPositionsEnum

Renaud Delbru Wed, 10 Feb 2010 04:46:13 -0800

Hi Michael,

On 09/02/10 20:47, Michael McCandless wrote:

But, then, it's very convenient when you need it and don't care about
performance.  EG in Renaud's usage, a test case that is trying to
assert that all indexed docs look right, why should you be forced to
operate per segment?  He shouldn't have to bother with the details of
which field/term/doc was indexed into which segment.


Or, I guess we could argue that this test really should create a
TermQuery and walk the matching docs... instead of using the low level
flex enum APIs.  Because searching impl already knows how to step
through the segments.

In fact, I care about performance, but I was using theIndexReader.termPositionsEnum to mimic the implementation of thedifferent query scorers (e.g., TermScorer).I have already reimplemented many of the original Lucene Scorers to usemy particular index structure. From what I have seen, the main low levelscorers (e.g., TermScorer, PhraseScorer) are using the DocsEnuminterface, and not a segment-level enum. From what I understand, thesescorers are not aware if they are using a segment-level enum or aMulti*Enum. So, there is a loss of performance in this case ? Or do Imiss something ?

I'll try to clarify my usage of the Flex API, maybe it can highlight youcertain aspects.

In the ideal world, what I would like to do is the following:
1) write my own codec,

2) register my codec in the IndexWriter, and tell him to use this codecfor one or more fields (similar to the PerFieldCodecWrapper),

3) write query operators that are compatible with my codec,

4) at search time, use these query operators with the fields that use mycodec.

If by error, I am using the query operators which are not compatiblewith a field (and its related codec), an exception is thrown telling methat I am not able to use these query operators with this field.

So, in my current use case, I don't think it is necessary to be aware ofthat fact that I am manipulating multiple segments or only one segment.I think this should be hidden.

But what you were suggesting is to create my own "MultiReader" that isoptimised for my codec. Is that right ? A MultiReader that just iteratesover the subreaders, checks if they are using my codec (and thereforeassociated fields), and uses them to iterate over my own postings ?

--
Renaud Delbru

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Flex & Docs/AndPositionsEnum

Reply via email to