[ https://issues.apache.org/jira/browse/LUCENE-9027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16973328#comment-16973328 ]
Adrien Grand commented on LUCENE-9027: -------------------------------------- I guess it depends what kind of analytics, I expect counts to get the best speedup indeed, since they are only about decoding postings and counting matches. However other analytics use-cases like facets still need to read values for a doc-value field, which likely drowns the speedup a bit, like sorting. We have some benchmarks for Elasticsearch that run nightly at https://benchmarks.elastic.co but most analytics queries like histograms or terms facets run on a match_all query, so I'm not expecting to see any speedup. I've done some more research regarding the prefix sum. Unfortunately, I didn't manage to get C2 to generate SIMD instructions for the prefix sum (even with gaps of 2 or 4 instead of 1). The best option I have for now works by hooking into the decoding logic and summing up longs when they still represent 2 packed integers (which effectively computes the prefix sum for [0:64) and [64:128) in parallel and then add the value at index 63 to all values at indices [64:128). This last step is vectorized by the JVM. See benchmark results at the bottom of https://github.com/jpountz/decode-128-ints-benchmark if you are interested. I wish we could do better, but it's still nice that we can both decompress and compute the prefix sum faster that we can just decompress with the current codec on average. Having the sum pre-computed also helps simplify some conditions in the PostingsEnum implementations of nextDoc/advance. Here is the result of a run on wikibigall with the current pull request. I included some sorting tasks as well to see how much they benefit from this change {noformat} TaskQPS baseline StdDev QPS patch StdDev Pct diff Fuzzy2 99.40 (10.4%) 95.46 (6.9%) -4.0% ( -19% - 14%) IntNRQ 966.00 (2.4%) 977.27 (2.1%) 1.2% ( -3% - 5%) Fuzzy1 97.25 (7.8%) 99.66 (9.4%) 2.5% ( -13% - 21%) OrHighHigh 84.95 (3.7%) 87.40 (4.2%) 2.9% ( -4% - 11%) Term 1316.61 (3.1%) 1357.55 (4.8%) 3.1% ( -4% - 11%) HighTermDayOfYearSort 35.89 (6.3%) 37.66 (4.2%) 4.9% ( -5% - 16%) Phrase 60.22 (2.1%) 63.45 (4.3%) 5.4% ( -1% - 12%) HighTermMonthSort 63.83 (9.1%) 67.40 (10.7%) 5.6% ( -12% - 27%) AndHighHigh 27.27 (3.4%) 28.80 (4.0%) 5.6% ( -1% - 13%) SloppyPhrase 1.35 (7.4%) 1.43 (9.3%) 6.2% ( -9% - 24%) AndHighOrMedMed 26.38 (1.7%) 28.17 (2.6%) 6.8% ( 2% - 11%) IntervalsOrdered 21.99 (2.8%) 23.57 (1.9%) 7.2% ( 2% - 12%) OrHighMed 39.27 (2.8%) 42.32 (3.0%) 7.8% ( 1% - 13%) SpanNear 9.74 (3.2%) 10.71 (2.0%) 10.0% ( 4% - 15%) AndHighMed 59.40 (3.1%) 65.44 (3.6%) 10.2% ( 3% - 17%) Wildcard 127.70 (7.7%) 140.93 (3.3%) 10.4% ( 0% - 23%) AndMedOrHighHigh 30.65 (1.4%) 34.15 (2.1%) 11.4% ( 7% - 15%) Prefix3 46.12 (9.9%) 53.01 (10.1%) 14.9% ( -4% - 38%) {noformat} I now plan to focus of getting this into a mergeable state. > SIMD-based decoding of postings lists > ------------------------------------- > > Key: LUCENE-9027 > URL: https://issues.apache.org/jira/browse/LUCENE-9027 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > [~rcmuir] has been mentioning the idea for quite some time that we might be > able to write the decoding logic in such a way that Java would use SIMD > instructions. More recently [~paul.masurel] wrote a [blog > post|https://fulmicoton.com/posts/bitpacking/] that raises the point that > Lucene could still do decode multiple ints at once in a single instruction by > packing two ints in a long and we've had some discussions about what we could > try in Lucene to speed up the decoding of postings. This made me want to look > a bit deeper at what we could do. > Our current decoding logic reads data in a byte[] and decodes packed integers > from it. Unfortunately it doesn't make use of SIMD instructions and looks > like > [this|https://github.com/jpountz/decode-128-ints-benchmark/blob/master/src/main/java/jpountz/NaiveByteDecoder.java]. > I confirmed by looking at the generated assembly that if I take an array of > integers and shift them all by the same number of bits then Java will use > SIMD instructions to shift multiple integers at once. This led me to writing > this > [implementation|https://github.com/jpountz/decode-128-ints-benchmark/blob/master/src/main/java/jpountz/SimpleSIMDDecoder.java] > that tries as much as possible to shift long sequences of ints by the same > number of bits to speed up decoding. It is indeed faster than the current > logic we have, up to about 2x faster for some numbers of bits per value. > Currently the best > [implementation|https://github.com/jpountz/decode-128-ints-benchmark/blob/master/src/main/java/jpountz/SIMDDecoder.java] > I've been able to come up with combines the above idea with the idea that > Paul mentioned in his blog that consists of emulating SIMD from Java by > packing multiple integers into a long: 2 ints, 4 shorts or 8 bytes. It is a > bit harder to read but gives another speedup on top of the above > implementation. > I have a [JMH > benchmark|https://github.com/jpountz/decode-128-ints-benchmark/] available in > case someone would like to play with this and maybe even come up with an even > faster implementation. It is 2-2.5x faster than our current implementation > for most numbers of bits per value. I'm copying results here: > {noformat} > * `readLongs` just reads 2*bitsPerValue longs from the ByteBuffer, it serves > as > a baseline. > * `decodeNaiveFromBytes` reads a byte[] and decodes from it. This is what the > current Lucene codec does. > * `decodeNaiveFromLongs` decodes from longs on the fly. > * `decodeSimpleSIMD` is a simple implementation that relies on how Java > recognizes some patterns and uses SIMD instructions. > * `decodeSIMD` is a more complex implementation that both relies on the C2 > compiler to generate SIMD instructions and encodes 8 bytes, 4 shorts or > 2 ints in a long in order to decompress multiple values at once. > Benchmark (bitsPerValue) (byteOrder) > Mode Cnt Score Error Units > PackedIntsDecodeBenchmark.decodeNaiveFromBytes 1 LE > thrpt 5 12.912 ± 0.393 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromBytes 1 BE > thrpt 5 12.862 ± 0.395 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromBytes 2 LE > thrpt 5 13.040 ± 1.162 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromBytes 2 BE > thrpt 5 13.027 ± 0.270 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromBytes 3 LE > thrpt 5 12.409 ± 0.637 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromBytes 3 BE > thrpt 5 12.268 ± 0.947 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromBytes 4 LE > thrpt 5 14.177 ± 2.263 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromBytes 4 BE > thrpt 5 11.457 ± 0.150 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromBytes 5 LE > thrpt 5 10.988 ± 1.179 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromBytes 5 BE > thrpt 5 11.226 ± 0.088 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromBytes 6 LE > thrpt 5 9.791 ± 0.305 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromBytes 6 BE > thrpt 5 9.403 ± 3.598 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromBytes 7 LE > thrpt 5 10.256 ± 0.211 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromBytes 7 BE > thrpt 5 10.314 ± 0.382 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromBytes 8 LE > thrpt 5 16.516 ± 0.380 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromBytes 8 BE > thrpt 5 16.375 ± 0.427 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromBytes 9 LE > thrpt 5 9.067 ± 0.066 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromBytes 9 BE > thrpt 5 9.078 ± 0.178 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromBytes 10 LE > thrpt 5 8.913 ± 0.074 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromBytes 10 BE > thrpt 5 8.893 ± 0.101 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromBytes 11 LE > thrpt 5 7.908 ± 0.118 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromBytes 11 BE > thrpt 5 7.864 ± 0.097 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromBytes 12 LE > thrpt 5 9.220 ± 0.103 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromBytes 12 BE > thrpt 5 9.186 ± 0.241 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromBytes 13 LE > thrpt 5 7.119 ± 0.071 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromBytes 13 BE > thrpt 5 7.066 ± 0.059 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromBytes 14 LE > thrpt 5 12.483 ± 0.171 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromBytes 14 BE > thrpt 5 12.473 ± 0.117 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromBytes 15 LE > thrpt 5 6.202 ± 0.192 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromBytes 15 BE > thrpt 5 6.187 ± 0.399 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromBytes 16 LE > thrpt 5 12.798 ± 0.249 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromBytes 16 BE > thrpt 5 12.987 ± 0.208 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromLongs 1 LE > thrpt 5 7.248 ± 0.096 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromLongs 1 BE > thrpt 5 7.292 ± 0.114 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromLongs 2 LE > thrpt 5 8.923 ± 0.099 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromLongs 2 BE > thrpt 5 8.899 ± 0.028 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromLongs 3 LE > thrpt 5 9.192 ± 0.082 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromLongs 3 BE > thrpt 5 9.090 ± 0.066 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromLongs 4 LE > thrpt 5 7.947 ± 0.039 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromLongs 4 BE > thrpt 5 7.809 ± 0.298 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromLongs 5 LE > thrpt 5 8.342 ± 0.568 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromLongs 5 BE > thrpt 5 8.259 ± 0.572 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromLongs 6 LE > thrpt 5 15.594 ± 0.149 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromLongs 6 BE > thrpt 5 14.012 ± 0.160 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromLongs 7 LE > thrpt 5 12.686 ± 0.271 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromLongs 7 BE > thrpt 5 12.806 ± 0.160 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromLongs 8 LE > thrpt 5 13.571 ± 0.135 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromLongs 8 BE > thrpt 5 13.312 ± 0.110 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromLongs 9 LE > thrpt 5 11.812 ± 0.108 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromLongs 9 BE > thrpt 5 12.874 ± 0.168 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromLongs 10 LE > thrpt 5 12.882 ± 0.114 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromLongs 10 BE > thrpt 5 12.142 ± 0.091 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromLongs 11 LE > thrpt 5 12.302 ± 0.111 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromLongs 11 BE > thrpt 5 10.762 ± 0.250 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromLongs 12 LE > thrpt 5 12.505 ± 0.070 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromLongs 12 BE > thrpt 5 12.149 ± 0.083 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromLongs 13 LE > thrpt 5 11.159 ± 0.341 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromLongs 13 BE > thrpt 5 10.395 ± 0.222 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromLongs 14 LE > thrpt 5 11.004 ± 0.058 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromLongs 14 BE > thrpt 5 10.312 ± 0.369 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromLongs 15 LE > thrpt 5 11.236 ± 0.117 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromLongs 15 BE > thrpt 5 9.792 ± 0.202 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromLongs 16 LE > thrpt 5 10.607 ± 0.105 ops/us > PackedIntsDecodeBenchmark.decodeNaiveFromLongs 16 BE > thrpt 5 10.340 ± 0.070 ops/us > PackedIntsDecodeBenchmark.decodeSIMD 1 LE > thrpt 5 20.925 ± 0.368 ops/us > PackedIntsDecodeBenchmark.decodeSIMD 1 BE > thrpt 5 13.396 ± 0.485 ops/us > PackedIntsDecodeBenchmark.decodeSIMD 2 LE > thrpt 5 20.628 ± 0.494 ops/us > PackedIntsDecodeBenchmark.decodeSIMD 2 BE > thrpt 5 13.584 ± 0.194 ops/us > PackedIntsDecodeBenchmark.decodeSIMD 3 LE > thrpt 5 19.932 ± 1.609 ops/us > PackedIntsDecodeBenchmark.decodeSIMD 3 BE > thrpt 5 13.296 ± 0.095 ops/us > PackedIntsDecodeBenchmark.decodeSIMD 4 LE > thrpt 5 21.065 ± 0.767 ops/us > PackedIntsDecodeBenchmark.decodeSIMD 4 BE > thrpt 5 13.557 ± 0.051 ops/us > PackedIntsDecodeBenchmark.decodeSIMD 5 LE > thrpt 5 19.630 ± 0.067 ops/us > PackedIntsDecodeBenchmark.decodeSIMD 5 BE > thrpt 5 12.916 ± 0.186 ops/us > PackedIntsDecodeBenchmark.decodeSIMD 6 LE > thrpt 5 20.253 ± 0.701 ops/us > PackedIntsDecodeBenchmark.decodeSIMD 6 BE > thrpt 5 12.820 ± 0.048 ops/us > PackedIntsDecodeBenchmark.decodeSIMD 7 LE > thrpt 5 18.944 ± 0.160 ops/us > PackedIntsDecodeBenchmark.decodeSIMD 7 BE > thrpt 5 12.562 ± 0.128 ops/us > PackedIntsDecodeBenchmark.decodeSIMD 8 LE > thrpt 5 22.778 ± 2.023 ops/us > PackedIntsDecodeBenchmark.decodeSIMD 8 BE > thrpt 5 13.658 ± 0.158 ops/us > PackedIntsDecodeBenchmark.decodeSIMD 9 LE > thrpt 5 18.527 ± 0.169 ops/us > PackedIntsDecodeBenchmark.decodeSIMD 9 BE > thrpt 5 12.045 ± 0.111 ops/us > PackedIntsDecodeBenchmark.decodeSIMD 10 LE > thrpt 5 16.610 ± 0.997 ops/us > PackedIntsDecodeBenchmark.decodeSIMD 10 BE > thrpt 5 11.208 ± 0.087 ops/us > PackedIntsDecodeBenchmark.decodeSIMD 11 LE > thrpt 5 17.961 ± 0.080 ops/us > PackedIntsDecodeBenchmark.decodeSIMD 11 BE > thrpt 5 11.594 ± 0.084 ops/us > PackedIntsDecodeBenchmark.decodeSIMD 12 LE > thrpt 5 16.980 ± 2.372 ops/us > PackedIntsDecodeBenchmark.decodeSIMD 12 BE > thrpt 5 11.135 ± 0.050 ops/us > PackedIntsDecodeBenchmark.decodeSIMD 13 LE > thrpt 5 17.592 ± 0.269 ops/us > PackedIntsDecodeBenchmark.decodeSIMD 13 BE > thrpt 5 11.132 ± 0.227 ops/us > PackedIntsDecodeBenchmark.decodeSIMD 14 LE > thrpt 5 16.964 ± 0.423 ops/us > PackedIntsDecodeBenchmark.decodeSIMD 14 BE > thrpt 5 10.953 ± 0.326 ops/us > PackedIntsDecodeBenchmark.decodeSIMD 15 LE > thrpt 5 17.972 ± 0.572 ops/us > PackedIntsDecodeBenchmark.decodeSIMD 15 BE > thrpt 5 10.872 ± 0.150 ops/us > PackedIntsDecodeBenchmark.decodeSIMD 16 LE > thrpt 5 24.152 ± 0.213 ops/us > PackedIntsDecodeBenchmark.decodeSIMD 16 BE > thrpt 5 12.984 ± 0.348 ops/us > PackedIntsDecodeBenchmark.decodeSimpleSIMD 1 LE > thrpt 5 14.567 ± 0.714 ops/us > PackedIntsDecodeBenchmark.decodeSimpleSIMD 1 BE > thrpt 5 10.541 ± 0.079 ops/us > PackedIntsDecodeBenchmark.decodeSimpleSIMD 2 LE > thrpt 5 15.395 ± 0.687 ops/us > PackedIntsDecodeBenchmark.decodeSimpleSIMD 2 BE > thrpt 5 11.142 ± 0.052 ops/us > PackedIntsDecodeBenchmark.decodeSimpleSIMD 3 LE > thrpt 5 15.802 ± 0.623 ops/us > PackedIntsDecodeBenchmark.decodeSimpleSIMD 3 BE > thrpt 5 10.656 ± 0.278 ops/us > PackedIntsDecodeBenchmark.decodeSimpleSIMD 4 LE > thrpt 5 17.732 ± 0.276 ops/us > PackedIntsDecodeBenchmark.decodeSimpleSIMD 4 BE > thrpt 5 11.289 ± 0.209 ops/us > PackedIntsDecodeBenchmark.decodeSimpleSIMD 5 LE > thrpt 5 16.230 ± 0.389 ops/us > PackedIntsDecodeBenchmark.decodeSimpleSIMD 5 BE > thrpt 5 10.216 ± 0.184 ops/us > PackedIntsDecodeBenchmark.decodeSimpleSIMD 6 LE > thrpt 5 16.478 ± 0.682 ops/us > PackedIntsDecodeBenchmark.decodeSimpleSIMD 6 BE > thrpt 5 10.379 ± 0.157 ops/us > PackedIntsDecodeBenchmark.decodeSimpleSIMD 8 LE > thrpt 5 18.222 ± 0.388 ops/us > PackedIntsDecodeBenchmark.decodeSimpleSIMD 8 BE > thrpt 5 11.153 ± 0.619 ops/us > PackedIntsDecodeBenchmark.decodeSimpleSIMD 10 LE > thrpt 5 15.138 ± 0.321 ops/us > PackedIntsDecodeBenchmark.decodeSimpleSIMD 10 BE > thrpt 5 9.384 ± 0.671 ops/us > PackedIntsDecodeBenchmark.decodeSimpleSIMD 16 LE > thrpt 5 20.776 ± 0.397 ops/us > PackedIntsDecodeBenchmark.decodeSimpleSIMD 16 BE > thrpt 5 10.199 ± 0.146 ops/us > PackedIntsDecodeBenchmark.readLongs 1 LE > thrpt 5 30.220 ± 0.652 ops/us > PackedIntsDecodeBenchmark.readLongs 1 BE > thrpt 5 16.324 ± 0.226 ops/us > PackedIntsDecodeBenchmark.readLongs 2 LE > thrpt 5 30.952 ± 0.329 ops/us > PackedIntsDecodeBenchmark.readLongs 2 BE > thrpt 5 16.492 ± 0.397 ops/us > PackedIntsDecodeBenchmark.readLongs 3 LE > thrpt 5 30.156 ± 0.979 ops/us > PackedIntsDecodeBenchmark.readLongs 3 BE > thrpt 5 16.273 ± 0.441 ops/us > PackedIntsDecodeBenchmark.readLongs 4 LE > thrpt 5 29.925 ± 0.718 ops/us > PackedIntsDecodeBenchmark.readLongs 4 BE > thrpt 5 15.930 ± 0.350 ops/us > PackedIntsDecodeBenchmark.readLongs 5 LE > thrpt 5 29.773 ± 0.979 ops/us > PackedIntsDecodeBenchmark.readLongs 5 BE > thrpt 5 15.775 ± 0.257 ops/us > PackedIntsDecodeBenchmark.readLongs 6 LE > thrpt 5 29.591 ± 1.285 ops/us > PackedIntsDecodeBenchmark.readLongs 6 BE > thrpt 5 15.732 ± 0.226 ops/us > PackedIntsDecodeBenchmark.readLongs 7 LE > thrpt 5 29.708 ± 0.909 ops/us > PackedIntsDecodeBenchmark.readLongs 7 BE > thrpt 5 15.433 ± 0.562 ops/us > PackedIntsDecodeBenchmark.readLongs 8 LE > thrpt 5 29.828 ± 0.689 ops/us > PackedIntsDecodeBenchmark.readLongs 8 BE > thrpt 5 15.390 ± 0.188 ops/us > PackedIntsDecodeBenchmark.readLongs 9 LE > thrpt 5 29.127 ± 0.309 ops/us > PackedIntsDecodeBenchmark.readLongs 9 BE > thrpt 5 15.180 ± 0.199 ops/us > PackedIntsDecodeBenchmark.readLongs 10 LE > thrpt 5 29.085 ± 0.538 ops/us > PackedIntsDecodeBenchmark.readLongs 10 BE > thrpt 5 14.887 ± 1.687 ops/us > PackedIntsDecodeBenchmark.readLongs 11 LE > thrpt 5 28.904 ± 0.329 ops/us > PackedIntsDecodeBenchmark.readLongs 11 BE > thrpt 5 14.936 ± 0.119 ops/us > PackedIntsDecodeBenchmark.readLongs 12 LE > thrpt 5 29.025 ± 0.299 ops/us > PackedIntsDecodeBenchmark.readLongs 12 BE > thrpt 5 14.685 ± 0.154 ops/us > PackedIntsDecodeBenchmark.readLongs 13 LE > thrpt 5 28.963 ± 0.244 ops/us > PackedIntsDecodeBenchmark.readLongs 13 BE > thrpt 5 14.569 ± 0.100 ops/us > PackedIntsDecodeBenchmark.readLongs 14 LE > thrpt 5 28.584 ± 1.409 ops/us > PackedIntsDecodeBenchmark.readLongs 14 BE > thrpt 5 14.340 ± 0.594 ops/us > PackedIntsDecodeBenchmark.readLongs 15 LE > thrpt 5 28.744 ± 0.314 ops/us > PackedIntsDecodeBenchmark.readLongs 15 BE > thrpt 5 14.222 ± 0.105 ops/us > PackedIntsDecodeBenchmark.readLongs 16 LE > thrpt 5 26.638 ± 0.452 ops/us > PackedIntsDecodeBenchmark.readLongs 16 BE > thrpt 5 13.906 ± 0.604 ops/us > {noformat} > The thing that is a bit frustrating is that the best throughputs are obtained > on a ByteBuffer that is configured to use the little endian byte order (which > is the native byte order of my machine) while Java/Lucene default to big > endian. So if we want that kind of throughput we'll probably need to add ways > to read data in the native byte order in the IndexInput API. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org