Hi Michael,

On 09/02/10 20:47, Michael McCandless wrote:
But, then, it's very convenient when you need it and don't care about
performance.  EG in Renaud's usage, a test case that is trying to
assert that all indexed docs look right, why should you be forced to
operate per segment?  He shouldn't have to bother with the details of
which field/term/doc was indexed into which segment.

Or, I guess we could argue that this test really should create a
TermQuery and walk the matching docs... instead of using the low level
flex enum APIs.  Because searching impl already knows how to step
through the segments.
In fact, I care about performance, but I was using the IndexReader.termPositionsEnum to mimic the implementation of the different query scorers (e.g., TermScorer). I have already reimplemented many of the original Lucene Scorers to use my particular index structure. From what I have seen, the main low level scorers (e.g., TermScorer, PhraseScorer) are using the DocsEnum interface, and not a segment-level enum. From what I understand, these scorers are not aware if they are using a segment-level enum or a Multi*Enum. So, there is a loss of performance in this case ? Or do I miss something ?

I'll try to clarify my usage of the Flex API, maybe it can highlight you certain aspects.
In the ideal world, what I would like to do is the following:
1) write my own codec,
2) register my codec in the IndexWriter, and tell him to use this codec for one or more fields (similar to the PerFieldCodecWrapper),
3) write query operators that are compatible with my codec,
4) at search time, use these query operators with the fields that use my codec.

If by error, I am using the query operators which are not compatible with a field (and its related codec), an exception is thrown telling me that I am not able to use these query operators with this field.

So, in my current use case, I don't think it is necessary to be aware of that fact that I am manipulating multiple segments or only one segment. I think this should be hidden.

But what you were suggesting is to create my own "MultiReader" that is optimised for my codec. Is that right ? A MultiReader that just iterates over the subreaders, checks if they are using my codec (and therefore associated fields), and uses them to iterate over my own postings ?
--
Renaud Delbru

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to