Re: Flex & Docs/AndPositionsEnum

Renaud Delbru Wed, 10 Feb 2010 04:59:04 -0800

On 10/02/10 09:47, Uwe Schindler wrote:

Positions as attributes would be good. For positions we need a new Attribute 
(not PositionIncrement), but e.g. for offsets and payloads we can use the 
standard attributes from the analysis, which is really cool. This would also 
make it possible to add all custom attributes from the analysis phase to the 
posting list and make them visible in the TermDocs enum. In my opinion, there 
should be no DocsEnum, DocsAndPositionsEnum and so on enums, just one class, 
which only differes in provided attributes. So if you want the payloads, ask 
for a standard DocsEnum and pass the requested attribute classes as parameter):
        IndexReader.termDocsEnum(Bits skipDocs, String field, BytesRef term, 
Class<? extends Attribute>... atts)


If somebody wants offsets and payloads:
        reader.termDocsEnum(skipDocs, "field", term, OffsetAttribute.class, 
PayloadAttribute.class);

I kind of like this idea. This interface to iterate over the postingslooks more flexible, and imho it will be easy to use this interface withany "home-brewed" codec.Read optimisations based on the user need such as the currenttermDocsEnum and termPositionsEnum (where one is reading only the freqfile, the second one is also reading the prox file) will be done underthe hood by the respective PostingReader. Given the set of Attributeclass received, the PostingReader knows what he needs to read, and whathe does not need to read. So, there is also a simplification of theinterface for the user. It does not have to take care of choosing theright enum.

I am not sure if this is very good in Lucene as it would break lots of apps. 
E.g. simple autocompletes use a PrefixTerm(s)Enums, but must use the top-level 
reader or they have to emulate merging of all TermsEnums themselves. A second 
problem (currently) is rewrites (e.g. Fuzzy) to BooleanQuery for MTQs. They 
operate on the top level reader.

So I propose "simple" and not so performant Enums for MultiReaders. In my 
opinion, it would also be possible without ProxyAttributes, if we simply copy them 
around. It’s a performance problem, but if somebody needs speed, segment-level enums 
should be used (and search does this by the way).

Could you provide pointers to search code that uses the segment-levelenum ?As I explained in my last answer to Michael, the TermScorer is using theDocsEnum interface, and therefore do not know if it manipulatessegment-level enum or a Multi*Enums. What search (or query operators) inLucene is using segment-level enums ?


Cheers
--
Renaud Delbru

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Flex & Docs/AndPositionsEnum

Reply via email to