Re: Flex & Docs/AndPositionsEnum

Marvin Humphrey Wed, 10 Feb 2010 11:42:58 -0800

On Wed, Feb 10, 2010 at 12:33:27PM -0500, Michael McCandless wrote:

> In Lucene, skipping is done through the aggregator,


I had a look at MultiDocsEnum in the flex blanch.  It doesn't know when
sub-enum is reading skip data.

> > I suppose another possibility would have been to have the aggregator
> > keep its own Posting and copy all data over from the
> > SegPostingList's Posting on each iteration then add its offset.
> 
> I think this is what Lucene does (?).  EG the aggregator holds its own
> "int doc" which it must copy to (adding the offset) from the
> underlying sub enum.

That's fine for a *primitive* type.  Modifying an int returned by a sub-enum
doesn't affect the sub-enum.  :)

The problem arises when there's an opaque *object* conveying data to the
consumer.  The aggregator knows everything there is to know about an int, but
it doesn't know what it needs to do to prepare an opaque object owned by the
sub-enum for consumption at the aggregate level.

> > However, that would have been a lot less efficient, and it still
> > wouldn't have worked for the "flat positions space" example because
> > the generic aggregator would not have known about the needs of the
> > specific codec.
> 
> But aggregator could also add the positions offset on each
> nextPosition() call, in Lucene.  Like that use case could be made to
> work, if Lucene had used a flat position space.

A generic aggregator wouldn't know that it needed to do that.  The postings
codec developer would be forced to write aggregation code in addition to
segment-level code.

Marvin Humphrey


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Flex & Docs/AndPositionsEnum

Reply via email to