On 10/02/10 09:47, Uwe Schindler wrote:
Positions as attributes would be good. For positions we need a new Attribute 
(not PositionIncrement), but e.g. for offsets and payloads we can use the 
standard attributes from the analysis, which is really cool. This would also 
make it possible to add all custom attributes from the analysis phase to the 
posting list and make them visible in the TermDocs enum. In my opinion, there 
should be no DocsEnum, DocsAndPositionsEnum and so on enums, just one class, 
which only differes in provided attributes. So if you want the payloads, ask 
for a standard DocsEnum and pass the requested attribute classes as parameter):
        IndexReader.termDocsEnum(Bits skipDocs, String field, BytesRef term, 
Class<? extends Attribute>... atts)

If somebody wants offsets and payloads:
        reader.termDocsEnum(skipDocs, "field", term, OffsetAttribute.class, 
PayloadAttribute.class);
I kind of like this idea. This interface to iterate over the postings looks more flexible, and imho it will be easy to use this interface with any "home-brewed" codec. Read optimisations based on the user need such as the current termDocsEnum and termPositionsEnum (where one is reading only the freq file, the second one is also reading the prox file) will be done under the hood by the respective PostingReader. Given the set of Attribute class received, the PostingReader knows what he needs to read, and what he does not need to read. So, there is also a simplification of the interface for the user. It does not have to take care of choosing the right enum.
I am not sure if this is very good in Lucene as it would break lots of apps. 
E.g. simple autocompletes use a PrefixTerm(s)Enums, but must use the top-level 
reader or they have to emulate merging of all TermsEnums themselves. A second 
problem (currently) is rewrites (e.g. Fuzzy) to BooleanQuery for MTQs. They 
operate on the top level reader.

So I propose "simple" and not so performant Enums for MultiReaders. In my 
opinion, it would also be possible without ProxyAttributes, if we simply copy them 
around. It’s a performance problem, but if somebody needs speed, segment-level enums 
should be used (and search does this by the way).
Could you provide pointers to search code that uses the segment-level enum ? As I explained in my last answer to Michael, the TermScorer is using the DocsEnum interface, and therefore do not know if it manipulates segment-level enum or a Multi*Enums. What search (or query operators) in Lucene is using segment-level enums ?

Cheers
--
Renaud Delbru

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to