On Sun, Jan 13, 2013 at 7:15 PM, Marvin Humphrey <[email protected]> wrote:
> The problem with our current system is that even if we migrate to a block
> encoding for our inverted lists, we can only use a fraction of the block on
> each call to Matcher_Next().  Not only would we have to buffer the decoded
> integers in memory between matches, but we would still continue to make a
> large number of method calls for each match.

I think I understand you, but I'm not sure how much of an issue the
buffering is.   Maybe this has to do with what you mean by "in memory"
--- I think it would be very rare that the decoded integers actually
get written to RAM.  The buffer would be at most a few K.  It would
depend on much data we access while scoring the match, but for an i7
unless we flush through the 8MB L3 cache, the decoded data will stay
on the CPU.  I even think it will usually stay within our 64K L1.

For the second part, are you distinguishing between method calls and
function overhead here?  While we should avoid overhead when we can, I
think the cost of the scoring is going to be dominated by the memory
access to the position data.  While waiting for each random access to
memory (and unless we are crafty each doc-term scored will at least
have one) we could execute almost a thousand (pipelined) instructions.

> A couple months back, Nate proposed an "interchange format" which posting list
> decoders could write into at high speed:

I'm still a fan of this idea -- thanks for recalling the suggestion!

> This is helpful to minimize the cost of decoding posting lists, but it does
> not address the cost of doc-at-a-time Matcher method-call overhead.

A potential gain here would be to parallelize the initial memory
accesses to the term position data.   Perhaps we're doing it already,
but as soon as we know we have a match for which we will need position
data, we should access it (or issue a prefetch) rather than waiting
for each subscorer to run sequentially.

> The layout of BucketTable could take many forms.  In Lucene, it's a fixed size
> object with set fields -- but we might choose to make it a block of opaque
> memory with private space allocated for every subscorer... or in some
> configurations, it could even be a bit vector[2]. :)

Could you explain further, perhaps with a more detailed example?    Is
the gain that each subscorer can score multiple subqueries in a loop,
rather than being called by a loop?

--nate

Reply via email to