[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801813#action_12801813 ]
Renaud Delbru edited comment on LUCENE-1410 at 1/20/10 2:41 PM: ---------------------------------------------------------------- Hi, I have performed some benchmarks using the PFOR index I/O interface in order to check if the index reader and block reader were not adding too much overhead (I was afraid that the block reading interface was adding too much overhead, and, as a consequence, loosing the decompression speed benefits of block based compression. In this benchmark, I have jsut tested FOR (and not PFOR) using various block size. The benchmark is setup as follow: - I generate an integer array of size 33554432, containing uniformly distributed integer (0 <= x < 65535) - I compress the data using PFORDeltaIndexOutput (in fact, aa similar class that I modified for this benchmark in order to support various blocksize) - I measure the time to decompress the full list of integers using PFORDeltaIndexInput#next() I have performed a similar test using a basic IndexInput/Output with VInt encoding. The performance was 39Kints/msec. For FrameOfRef, the best block size seems to be 4096 (256 - 270 kints / msec), larger block size do not provide significant performance improvement, We can observe that FOR provides ~7 times read performance increase. To conclude, it looks like the reader and block reader interface do not add to much overhead ;o). P.S.: It is possible that these results are not totally correct. I'll try to double check the code, and upload it here. Results: {code} BlockSize = 32 FrameOfRef 0 decompressed 33554432 in 368 msecs, 91 kints/msec, (1 iters). FrameOfRef 1 decompressed 33554432 in 314 msecs, 106 kints/msec, (1 iters). FrameOfRef 2 decompressed 33554432 in 294 msecs, 114 kints/msec, (1 iters). BlockSize = 64 FrameOfRef 0 decompressed 33554432 in 242 msecs, 138 kints/msec, (1 iters). FrameOfRef 1 decompressed 33554432 in 239 msecs, 140 kints/msec, (1 iters). FrameOfRef 2 decompressed 33554432 in 237 msecs, 141 kints/msec, (1 iters). BlockSize = 128 FrameOfRef 0 decompressed 33554432 in 223 msecs, 150 kints/msec, (1 iters). FrameOfRef 1 decompressed 33554432 in 228 msecs, 147 kints/msec, (1 iters). FrameOfRef 2 decompressed 33554432 in 224 msecs, 149 kints/msec, (1 iters). BlockSize = 256 FrameOfRef 0 decompressed 33554432 in 219 msecs, 153 kints/msec, (1 iters). FrameOfRef 1 decompressed 33554432 in 218 msecs, 153 kints/msec, (1 iters). FrameOfRef 2 decompressed 33554432 in 219 msecs, 153 kints/msec, (1 iters). BlockSize = 512 FrameOfRef 0 decompressed 33554432 in 170 msecs, 197 kints/msec, (1 iters). FrameOfRef 1 decompressed 33554432 in 176 msecs, 190 kints/msec, (1 iters). FrameOfRef 2 decompressed 33554432 in 173 msecs, 193 kints/msec, (1 iters). BlockSize = 1024 FrameOfRef 0 decompressed 33554432 in 136 msecs, 246 kints/msec, (1 iters). FrameOfRef 1 decompressed 33554432 in 139 msecs, 241 kints/msec, (1 iters). FrameOfRef 2 decompressed 33554432 in 147 msecs, 228 kints/msec, (1 iters). BlockSize = 2048 FrameOfRef 0 decompressed 33554432 in 133 msecs, 252 kints/msec, (1 iters). FrameOfRef 1 decompressed 33554432 in 135 msecs, 248 kints/msec, (1 iters). FrameOfRef 2 decompressed 33554432 in 139 msecs, 241 kints/msec, (1 iters). BlockSize = 4096 FrameOfRef 0 decompressed 33554432 in 124 msecs, 270 kints/msec, (1 iters). FrameOfRef 1 decompressed 33554432 in 131 msecs, 256 kints/msec, (1 iters). FrameOfRef 2 decompressed 33554432 in 131 msecs, 256 kints/msec, (1 iters). BlockSize = 8192 FrameOfRef 0 decompressed 33554432 in 126 msecs, 266 kints/msec, (1 iters). FrameOfRef 1 decompressed 33554432 in 128 msecs, 262 kints/msec, (1 iters). FrameOfRef 2 decompressed 33554432 in 127 msecs, 264 kints/msec, (1 iters). BlockSize = 16384 FrameOfRef 0 decompressed 33554432 in 127 msecs, 264 kints/msec, (1 iters). FrameOfRef 1 decompressed 33554432 in 125 msecs, 268 kints/msec, (1 iters). FrameOfRef 2 decompressed 33554432 in 129 msecs, 260 kints/msec, (1 iters). BlockSize = 32768 FrameOfRef 0 decompressed 33554432 in 123 msecs, 272 kints/msec, (1 iters). FrameOfRef 1 decompressed 33554432 in 132 msecs, 254 kints/msec, (1 iters). FrameOfRef 2 decompressed 33554432 in 135 msecs, 248 kints/msec, (1 iters). {code} ---- EDIT: Here is new results comparing FOR and a block-based VInt using various block size. The decompression loop is repeated to reach > 300ms (for JIT effect, see post below). The loop recreates a new block reader each time (which causes some overhead, as you can see with the FOR performance results, compared to the one below). || ||32||64||128||256||512||1024||2048||4096||8192||16384||32768|| |VInt (kints/msec)|28|30|30|31|48|65|84|94|100|104|104| |FOR (kints/msec)|104|126|131|132|164|195|202|214|217|220|223| was (Author: renaud.delbru): Hi, I have performed some benchmarks using the PFOR index I/O interface in order to check if the index reader and block reader were not adding too much overhead (I was afraid that the block reading interface was adding too much overhead, and, as a consequence, loosing the decompression speed benefits of block based compression. In this benchmark, I have jsut tested FOR (and not PFOR) using various block size. The benchmark is setup as follow: - I generate an integer array of size 33554432, containing uniformly distributed integer (0 <= x < 65535) - I compress the data using PFORDeltaIndexOutput (in fact, aa similar class that I modified for this benchmark in order to support various blocksize) - I measure the time to decompress the full list of integers using PFORDeltaIndexInput#next() I have performed a similar test using a basic IndexInput/Output with VInt encoding. The performance was 39Kints/msec. For FrameOfRef, the best block size seems to be 4096 (256 - 270 kints / msec), larger block size do not provide significant performance improvement, We can observe that FOR provides ~7 times read performance increase. To conclude, it looks like the reader and block reader interface do not add to much overhead ;o). P.S.: It is possible that these results are not totally correct. I'll try to double check the code, and upload it here. Results: {code} BlockSize = 32 FrameOfRef 0 decompressed 33554432 in 368 msecs, 91 kints/msec, (1 iters). FrameOfRef 1 decompressed 33554432 in 314 msecs, 106 kints/msec, (1 iters). FrameOfRef 2 decompressed 33554432 in 294 msecs, 114 kints/msec, (1 iters). BlockSize = 64 FrameOfRef 0 decompressed 33554432 in 242 msecs, 138 kints/msec, (1 iters). FrameOfRef 1 decompressed 33554432 in 239 msecs, 140 kints/msec, (1 iters). FrameOfRef 2 decompressed 33554432 in 237 msecs, 141 kints/msec, (1 iters). BlockSize = 128 FrameOfRef 0 decompressed 33554432 in 223 msecs, 150 kints/msec, (1 iters). FrameOfRef 1 decompressed 33554432 in 228 msecs, 147 kints/msec, (1 iters). FrameOfRef 2 decompressed 33554432 in 224 msecs, 149 kints/msec, (1 iters). BlockSize = 256 FrameOfRef 0 decompressed 33554432 in 219 msecs, 153 kints/msec, (1 iters). FrameOfRef 1 decompressed 33554432 in 218 msecs, 153 kints/msec, (1 iters). FrameOfRef 2 decompressed 33554432 in 219 msecs, 153 kints/msec, (1 iters). BlockSize = 512 FrameOfRef 0 decompressed 33554432 in 170 msecs, 197 kints/msec, (1 iters). FrameOfRef 1 decompressed 33554432 in 176 msecs, 190 kints/msec, (1 iters). FrameOfRef 2 decompressed 33554432 in 173 msecs, 193 kints/msec, (1 iters). BlockSize = 1024 FrameOfRef 0 decompressed 33554432 in 136 msecs, 246 kints/msec, (1 iters). FrameOfRef 1 decompressed 33554432 in 139 msecs, 241 kints/msec, (1 iters). FrameOfRef 2 decompressed 33554432 in 147 msecs, 228 kints/msec, (1 iters). BlockSize = 2048 FrameOfRef 0 decompressed 33554432 in 133 msecs, 252 kints/msec, (1 iters). FrameOfRef 1 decompressed 33554432 in 135 msecs, 248 kints/msec, (1 iters). FrameOfRef 2 decompressed 33554432 in 139 msecs, 241 kints/msec, (1 iters). BlockSize = 4096 FrameOfRef 0 decompressed 33554432 in 124 msecs, 270 kints/msec, (1 iters). FrameOfRef 1 decompressed 33554432 in 131 msecs, 256 kints/msec, (1 iters). FrameOfRef 2 decompressed 33554432 in 131 msecs, 256 kints/msec, (1 iters). BlockSize = 8192 FrameOfRef 0 decompressed 33554432 in 126 msecs, 266 kints/msec, (1 iters). FrameOfRef 1 decompressed 33554432 in 128 msecs, 262 kints/msec, (1 iters). FrameOfRef 2 decompressed 33554432 in 127 msecs, 264 kints/msec, (1 iters). BlockSize = 16384 FrameOfRef 0 decompressed 33554432 in 127 msecs, 264 kints/msec, (1 iters). FrameOfRef 1 decompressed 33554432 in 125 msecs, 268 kints/msec, (1 iters). FrameOfRef 2 decompressed 33554432 in 129 msecs, 260 kints/msec, (1 iters). BlockSize = 32768 FrameOfRef 0 decompressed 33554432 in 123 msecs, 272 kints/msec, (1 iters). FrameOfRef 1 decompressed 33554432 in 132 msecs, 254 kints/msec, (1 iters). FrameOfRef 2 decompressed 33554432 in 135 msecs, 248 kints/msec, (1 iters). {code} > PFOR implementation > ------------------- > > Key: LUCENE-1410 > URL: https://issues.apache.org/jira/browse/LUCENE-1410 > Project: Lucene - Java > Issue Type: New Feature > Components: Other > Reporter: Paul Elschot > Priority: Minor > Attachments: autogen.tgz, LUCENE-1410-codecs.tar.bz2, > LUCENE-1410b.patch, LUCENE-1410c.patch, LUCENE-1410d.patch, > LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, > TestPFor2.java > > Original Estimate: 21840h > Remaining Estimate: 21840h > > Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org