[jira] Issue Comment Edited: (LUCENE-1410) PFOR implementation

Renaud Delbru (JIRA) Wed, 20 Jan 2010 06:42:19 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801813#action_12801813
 ]


Renaud Delbru edited comment on LUCENE-1410 at 1/20/10 2:41 PM:
----------------------------------------------------------------

Hi,

I have performed some benchmarks using the PFOR index I/O interface in order to 
check if the index reader and block reader were not adding too much overhead (I 
was afraid that the block reading interface was adding too much overhead, and, 
as a consequence, loosing the decompression speed benefits of block based 
compression.

In this benchmark, I have jsut tested FOR (and not PFOR) using various block 
size. The benchmark is setup as follow:
- I generate an integer array of size 33554432, containing uniformly 
distributed integer (0 <= x <  65535)
- I compress the data using PFORDeltaIndexOutput (in fact, aa similar class 
that I modified for this benchmark in order to support various blocksize)
- I measure the time to decompress the full list of integers using 
PFORDeltaIndexInput#next()

I have performed a similar test using a basic IndexInput/Output with VInt 
encoding. The performance was 39Kints/msec.

For FrameOfRef, the best block size seems to be 4096 (256 - 270 kints / msec), 
larger block size do not provide significant performance improvement,

We can observe that FOR provides ~7 times read performance increase. To 
conclude, it looks like the reader and block reader interface do not add to 
much overhead ;o).

P.S.: It is possible that these results are not totally correct. I'll try to 
double check the code, and upload it here. 

Results:
{code}
BlockSize = 32
FrameOfRef 0 decompressed 33554432 in 368 msecs, 91 kints/msec, (1 iters).
FrameOfRef 1 decompressed 33554432 in 314 msecs, 106 kints/msec, (1 iters).
FrameOfRef 2 decompressed 33554432 in 294 msecs, 114 kints/msec, (1 iters).
BlockSize = 64
FrameOfRef 0 decompressed 33554432 in 242 msecs, 138 kints/msec, (1 iters).
FrameOfRef 1 decompressed 33554432 in 239 msecs, 140 kints/msec, (1 iters).
FrameOfRef 2 decompressed 33554432 in 237 msecs, 141 kints/msec, (1 iters).
BlockSize = 128
FrameOfRef 0 decompressed 33554432 in 223 msecs, 150 kints/msec, (1 iters).
FrameOfRef 1 decompressed 33554432 in 228 msecs, 147 kints/msec, (1 iters).
FrameOfRef 2 decompressed 33554432 in 224 msecs, 149 kints/msec, (1 iters).
BlockSize = 256
FrameOfRef 0 decompressed 33554432 in 219 msecs, 153 kints/msec, (1 iters).
FrameOfRef 1 decompressed 33554432 in 218 msecs, 153 kints/msec, (1 iters).
FrameOfRef 2 decompressed 33554432 in 219 msecs, 153 kints/msec, (1 iters).
BlockSize = 512
FrameOfRef 0 decompressed 33554432 in 170 msecs, 197 kints/msec, (1 iters).
FrameOfRef 1 decompressed 33554432 in 176 msecs, 190 kints/msec, (1 iters).
FrameOfRef 2 decompressed 33554432 in 173 msecs, 193 kints/msec, (1 iters).
BlockSize = 1024
FrameOfRef 0 decompressed 33554432 in 136 msecs, 246 kints/msec, (1 iters).
FrameOfRef 1 decompressed 33554432 in 139 msecs, 241 kints/msec, (1 iters).
FrameOfRef 2 decompressed 33554432 in 147 msecs, 228 kints/msec, (1 iters).
BlockSize = 2048
FrameOfRef 0 decompressed 33554432 in 133 msecs, 252 kints/msec, (1 iters).
FrameOfRef 1 decompressed 33554432 in 135 msecs, 248 kints/msec, (1 iters).
FrameOfRef 2 decompressed 33554432 in 139 msecs, 241 kints/msec, (1 iters).
BlockSize = 4096
FrameOfRef 0 decompressed 33554432 in 124 msecs, 270 kints/msec, (1 iters).
FrameOfRef 1 decompressed 33554432 in 131 msecs, 256 kints/msec, (1 iters).
FrameOfRef 2 decompressed 33554432 in 131 msecs, 256 kints/msec, (1 iters).
BlockSize = 8192
FrameOfRef 0 decompressed 33554432 in 126 msecs, 266 kints/msec, (1 iters).
FrameOfRef 1 decompressed 33554432 in 128 msecs, 262 kints/msec, (1 iters).
FrameOfRef 2 decompressed 33554432 in 127 msecs, 264 kints/msec, (1 iters).
BlockSize = 16384
FrameOfRef 0 decompressed 33554432 in 127 msecs, 264 kints/msec, (1 iters).
FrameOfRef 1 decompressed 33554432 in 125 msecs, 268 kints/msec, (1 iters).
FrameOfRef 2 decompressed 33554432 in 129 msecs, 260 kints/msec, (1 iters).
BlockSize = 32768
FrameOfRef 0 decompressed 33554432 in 123 msecs, 272 kints/msec, (1 iters).
FrameOfRef 1 decompressed 33554432 in 132 msecs, 254 kints/msec, (1 iters).
FrameOfRef 2 decompressed 33554432 in 135 msecs, 248 kints/msec, (1 iters).
{code}

----
EDIT: Here is new results comparing FOR and a block-based VInt using various 
block size. The decompression loop is repeated to reach > 300ms (for JIT 
effect, see post below). The loop recreates a new block reader each time (which 
causes some overhead, as you can see with the FOR performance results, compared 
to the one below).

||  ||32||64||128||256||512||1024||2048||4096||8192||16384||32768||
|VInt (kints/msec)|28|30|30|31|48|65|84|94|100|104|104|
|FOR (kints/msec)|104|126|131|132|164|195|202|214|217|220|223| 

      was (Author: renaud.delbru):
    Hi,

I have performed some benchmarks using the PFOR index I/O interface in order to 
check if the index reader and block reader were not adding too much overhead (I 
was afraid that the block reading interface was adding too much overhead, and, 
as a consequence, loosing the decompression speed benefits of block based 
compression.

In this benchmark, I have jsut tested FOR (and not PFOR) using various block 
size. The benchmark is setup as follow:
- I generate an integer array of size 33554432, containing uniformly 
distributed integer (0 <= x <  65535)
- I compress the data using PFORDeltaIndexOutput (in fact, aa similar class 
that I modified for this benchmark in order to support various blocksize)
- I measure the time to decompress the full list of integers using 
PFORDeltaIndexInput#next()

I have performed a similar test using a basic IndexInput/Output with VInt 
encoding. The performance was 39Kints/msec.

For FrameOfRef, the best block size seems to be 4096 (256 - 270 kints / msec), 
larger block size do not provide significant performance improvement,

We can observe that FOR provides ~7 times read performance increase. To 
conclude, it looks like the reader and block reader interface do not add to 
much overhead ;o).

P.S.: It is possible that these results are not totally correct. I'll try to 
double check the code, and upload it here. 

Results:
{code}
BlockSize = 32
FrameOfRef 0 decompressed 33554432 in 368 msecs, 91 kints/msec, (1 iters).
FrameOfRef 1 decompressed 33554432 in 314 msecs, 106 kints/msec, (1 iters).
FrameOfRef 2 decompressed 33554432 in 294 msecs, 114 kints/msec, (1 iters).
BlockSize = 64
FrameOfRef 0 decompressed 33554432 in 242 msecs, 138 kints/msec, (1 iters).
FrameOfRef 1 decompressed 33554432 in 239 msecs, 140 kints/msec, (1 iters).
FrameOfRef 2 decompressed 33554432 in 237 msecs, 141 kints/msec, (1 iters).
BlockSize = 128
FrameOfRef 0 decompressed 33554432 in 223 msecs, 150 kints/msec, (1 iters).
FrameOfRef 1 decompressed 33554432 in 228 msecs, 147 kints/msec, (1 iters).
FrameOfRef 2 decompressed 33554432 in 224 msecs, 149 kints/msec, (1 iters).
BlockSize = 256
FrameOfRef 0 decompressed 33554432 in 219 msecs, 153 kints/msec, (1 iters).
FrameOfRef 1 decompressed 33554432 in 218 msecs, 153 kints/msec, (1 iters).
FrameOfRef 2 decompressed 33554432 in 219 msecs, 153 kints/msec, (1 iters).
BlockSize = 512
FrameOfRef 0 decompressed 33554432 in 170 msecs, 197 kints/msec, (1 iters).
FrameOfRef 1 decompressed 33554432 in 176 msecs, 190 kints/msec, (1 iters).
FrameOfRef 2 decompressed 33554432 in 173 msecs, 193 kints/msec, (1 iters).
BlockSize = 1024
FrameOfRef 0 decompressed 33554432 in 136 msecs, 246 kints/msec, (1 iters).
FrameOfRef 1 decompressed 33554432 in 139 msecs, 241 kints/msec, (1 iters).
FrameOfRef 2 decompressed 33554432 in 147 msecs, 228 kints/msec, (1 iters).
BlockSize = 2048
FrameOfRef 0 decompressed 33554432 in 133 msecs, 252 kints/msec, (1 iters).
FrameOfRef 1 decompressed 33554432 in 135 msecs, 248 kints/msec, (1 iters).
FrameOfRef 2 decompressed 33554432 in 139 msecs, 241 kints/msec, (1 iters).
BlockSize = 4096
FrameOfRef 0 decompressed 33554432 in 124 msecs, 270 kints/msec, (1 iters).
FrameOfRef 1 decompressed 33554432 in 131 msecs, 256 kints/msec, (1 iters).
FrameOfRef 2 decompressed 33554432 in 131 msecs, 256 kints/msec, (1 iters).
BlockSize = 8192
FrameOfRef 0 decompressed 33554432 in 126 msecs, 266 kints/msec, (1 iters).
FrameOfRef 1 decompressed 33554432 in 128 msecs, 262 kints/msec, (1 iters).
FrameOfRef 2 decompressed 33554432 in 127 msecs, 264 kints/msec, (1 iters).
BlockSize = 16384
FrameOfRef 0 decompressed 33554432 in 127 msecs, 264 kints/msec, (1 iters).
FrameOfRef 1 decompressed 33554432 in 125 msecs, 268 kints/msec, (1 iters).
FrameOfRef 2 decompressed 33554432 in 129 msecs, 260 kints/msec, (1 iters).
BlockSize = 32768
FrameOfRef 0 decompressed 33554432 in 123 msecs, 272 kints/msec, (1 iters).
FrameOfRef 1 decompressed 33554432 in 132 msecs, 254 kints/msec, (1 iters).
FrameOfRef 2 decompressed 33554432 in 135 msecs, 248 kints/msec, (1 iters).
{code}
  
> PFOR implementation
> -------------------
>
>                 Key: LUCENE-1410
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1410
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Other
>            Reporter: Paul Elschot
>            Priority: Minor
>         Attachments: autogen.tgz, LUCENE-1410-codecs.tar.bz2, 
> LUCENE-1410b.patch, LUCENE-1410c.patch, LUCENE-1410d.patch, 
> LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, 
> TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-1410) PFOR implementation

Reply via email to