[
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636799#action_12636799
]
Michael McCandless commented on LUCENE-1410:
--------------------------------------------
bq. As for decompression speed, please remember that the patching code (that
decodes the exceptions into the output) has not yet been optimized at all.
But this is what I find so weird: for prx, if I fix the encoding at 6 bits,
which generates a zillion exceptions, we are ~31% faster than decoding vInts,
and the pfor file size is much bigger (847 MB vs 621 MB) than vInt. But if
instead I use the bit size as returned by getNumFrameBits(), which has far
fewer exceptions, we are 9.0% slower and file size is a bit smaller than vInt.
Exception processing seems to be very fast, or, it's the non-unrolled
(ForDecompress.decodeAnyFrame) that's slow which could very well be.
bq. The lucene .frq file contains the docids as deltas and the frequencies but
with a special encoding in case the frequency is one. I'd rather try
compressing the real delta docids and the real frequencies than the encoded
versions.
I'll try that. I bet if we had two separate streams (one for the docIDs and
one for the corresponding freqs) we'd get even better pFor compression. If we
did that "for real" in Lucene that'd also make it fast to use a
normally-indexed field for pure boolean searching (you wouldn't have to index 2
different fields as you do today to gain the performance at search time). In
fact, for AND queries on a normal Lucene index this should also result in
faster searching since when searching for the common docIDs you at first don't
need the freq data.
Marvin has been doing some performance testing recently and seems to have
concluded that keeping prx and frq as separate files (like Lucene does today
but KinoSearch does not) gives better performance. Pushing that that same
trend further, I think it may very well make sense to separate docID and frq as
well.
bq. A: there is also the influence of the reduction of data to be fetched (via
various caches) from the index. As reported in the articles, PFor strikes a
nice optimum between decompression speed and fetching speed from (fast) disks.
True, but we are seeing just a bit of compression vs Lucene's current encoding.
I think splitting out frq from docID may show better compression.
{quote}
>: I was thinking local CPU's native asm.
A: I'd try a C version first. Iirc gcc has a decent optimizer for bit ops, but
it's been a while for me that I used C.
{quote}
Well... that would just make me depressed (if from Javaland the CPU
level benefits of pFor don't really "survive", but from C-land they
do) ;) But yes I agree.
bq. For the record, to show the variation in decompression speeds for different
numbers of frame bits without exceptions, here is my current output from
TestPFor:
I have similar results (up to 7 bits -- can you post your new TestPFor.java?).
The even-byte sizes (16, 24) have very sizable leads over the others. The 9-bit
size is fantastically slow; it's insane that unrolling it isn't helping. Seems
like we will need to peek at asm to understand what's going on at the "bare
metal" level....
> PFOR implementation
> -------------------
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Other
> Reporter: Paul Elschot
> Priority: Minor
> Attachments: LUCENE-1410b.patch, TestPFor2.java, TestPFor2.java
>
> Original Estimate: 21840h
> Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]