On Fri, Aug 24, 2012 at 6:36 PM, Marvin Humphrey <[email protected]> wrote:
> On Fri, Aug 24, 2012 at 11:33 AM, Peter Karman <[email protected]> wrote:
>> Just ran across this:
>>
>> http://blog.mikemccandless.com/2012/08/lucenes-new-blockpostingsformat-thanks.html
>>
>> comments on whether there is anything to be learned here for Lucy?

I just read through the JIRA issue for this, and had trouble
distinguishing their chosen algorithm from the Java and Lucene
specific details. As regards Lucy, I think the main applicable
conclusion was that using a straight "FOR" (Frame of Reference)
approach was slightly more efficient than using "PFOR" (Patched Frame
of Reference), and only slightly worse in index size.   But I'm not
sure how much this is due to the specific decoder they are using
versus anything inherent to the patching.  On the other hand, since
the simple FOR (which I think is the same as they are calling
PackedInt) is conceptually simpler and the index sizes are not
significantly larger, this may be moot.

> As far as what we might have learned from the Lucene implementation, I think
> there are two things:
>
> 1.  Block posting formats are more suitable for modern pipelining processors
>     than the bytewise compression we use now.

I'd agree.  I was interested in an observation in the commentary
guessing that they were running up against Memory Bandwidth  (rather
than CPU) as a limiting factor.  I think this will definitely hold for
Lucy:  at a certain point improvements in decoding efficiency don't
matter if the CPU is starved for data.  Raw ints are definitely memory
bound, variable length ints are CPU bound, and the various FOR and
PFOR implementations straddle the line.   The goal is not just to
decode quickly, or even to selectively decode, but to never even fetch
the blocks you don't need.  Operating in C, I think/hope we may an
advantage here as we can hint to the processor on prefetching.

> 2.  Pluggable posting formats are the wrong level of abstraction.
>
> If we transition to a block posting format like PFORdelta, we will save on
> decoding posting lists, but gains will be limited because we still have to
> access the decoded integers one at a time inside TermMatchers wrapped in
> expensive ORMatchers, etc.  (If you look at profiling data for our
> scoring/matching methods, the bottlenecks are not in InStream, but in higher
> level Matchers like OrMatcher etc.)
>
> Java has an advantage here because hotspot compilers enable inlining of method
> calls in a way we can't duplicate from C -- so those higher level calls may
> get streamlined away to an extent.  However, we can change our architecture so
> that it becomes possible to inline procedures manually.
>
> If we really want to tear it up, I believe we need to implement the
> "MatchEngine" level of abstraction, as proposed on this list a while back.

That's an interesting point about the advantage of a hotspot compiler
--- once you have it in place, it does give you a lot for free.  I'm
not sure that we actually need to inline our calls, though, so much as
limit the number of layers of to something reasonable.  I think we can
match their performance just by being sensible, and pull ahead by
integrating more closely with the metal on the very bottom.

I wouldn't give up on pluggable posting formats yet.

I still think that there is a fine solution where we define an
interchange data format that each plugin decodes to, and each scorer
knows how to read from.   Paying particular attention to cache lines,
the plugin fetches a compressed format from memory, and decodes it to
a static buffer with as few mispredicted branches as possible.  The
scorer then operates on this decoded block directly, no layers
involved.  So long as this static buffer lives in L1, the matching
runs at CPU speed rather than RAM speed, and by having a fixed format,
the scorers can be optimized in advance. If we can sufficiently index
into our format so as to avoid unnecessary reads/decodes, I think we
can fly, no hotspot required or MatchEngine required.

--nate

Reply via email to