[ https://issues.apache.org/jira/browse/LUCENE-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347599#comment-16347599 ]
Uwe Schindler commented on LUCENE-7966: --------------------------------------- Hi, I did some comparing benchmarks using Mike's benchmark tool (luceneutil). In general the performance difference between Java 8 and Java 9 is neglectible (more about this in my talk next week in London), if you use the usual Parallel or CMS GC. Some queries tend to be slower on Java 9. I also compared this patch: Java 9 without patch and with patch: {noformat} Task QPS orig_j9 StdDevQPS patch_j9 StdDev Pct diff IntNRQ 5.97 (8.8%) 5.76 (8.4%) -3.7% ( -19% - 14%) Prefix3 59.77 (7.3%) 58.59 (7.5%) -2.0% ( -15% - 13%) Wildcard 18.62 (5.3%) 18.38 (5.9%) -1.3% ( -11% - 10%) HighSpanNear 13.04 (4.4%) 12.93 (4.9%) -0.9% ( -9% - 8%) MedSpanNear 11.36 (3.9%) 11.27 (4.3%) -0.8% ( -8% - 7%) Respell 51.17 (2.1%) 50.79 (1.6%) -0.7% ( -4% - 3%) PKLookup 256.20 (5.4%) 255.83 (5.9%) -0.1% ( -10% - 11%) Fuzzy1 24.12 (2.8%) 24.09 (2.3%) -0.1% ( -5% - 5%) LowSpanNear 10.38 (2.1%) 10.37 (2.2%) -0.1% ( -4% - 4%) MedPhrase 27.76 (1.9%) 27.74 (1.9%) -0.1% ( -3% - 3%) Fuzzy2 70.57 (1.8%) 70.59 (1.6%) 0.0% ( -3% - 3%) HighPhrase 14.21 (2.2%) 14.22 (2.4%) 0.1% ( -4% - 4%) AndHighHigh 34.11 (1.2%) 34.15 (0.7%) 0.1% ( -1% - 2%) LowPhrase 15.98 (1.7%) 16.01 (1.6%) 0.2% ( -3% - 3%) OrNotHighLow 531.86 (3.5%) 534.36 (3.3%) 0.5% ( -6% - 7%) AndHighMed 170.44 (1.2%) 171.46 (1.2%) 0.6% ( -1% - 3%) OrNotHighMed 206.78 (1.8%) 208.06 (2.2%) 0.6% ( -3% - 4%) OrHighMed 20.61 (5.2%) 20.76 (4.3%) 0.7% ( -8% - 10%) OrHighHigh 11.07 (5.6%) 11.17 (4.6%) 0.9% ( -8% - 11%) OrHighNotHigh 24.57 (4.0%) 24.80 (4.7%) 0.9% ( -7% - 9%) OrHighNotMed 50.41 (4.0%) 50.88 (5.1%) 0.9% ( -7% - 10%) LowTerm 202.23 (2.4%) 204.33 (3.4%) 1.0% ( -4% - 7%) AndHighLow 745.13 (3.1%) 753.23 (2.9%) 1.1% ( -4% - 7%) OrNotHighHigh 12.48 (4.1%) 12.63 (5.2%) 1.2% ( -7% - 10%) LowSloppyPhrase 3.79 (5.3%) 3.85 (5.5%) 1.5% ( -8% - 13%) HighSloppyPhrase 10.58 (4.0%) 10.74 (4.3%) 1.5% ( -6% - 10%) OrHighNotLow 18.46 (4.4%) 18.75 (5.3%) 1.6% ( -7% - 11%) MedSloppyPhrase 28.88 (4.4%) 29.35 (4.8%) 1.6% ( -7% - 11%) OrHighLow 15.26 (3.0%) 15.54 (3.0%) 1.9% ( -4% - 8%) HighTermDayOfYearSort 19.83 (6.6%) 20.25 (8.1%) 2.1% ( -11% - 17%) MedTerm 64.23 (5.0%) 65.64 (7.0%) 2.2% ( -9% - 14%) HighTerm 40.05 (5.4%) 41.02 (7.6%) 2.4% ( -10% - 16%) HighTermMonthSort 87.80 (12.5%) 91.28 (12.3%) 4.0% ( -18% - 32%) {noformat} So it does not hurt performance, although it adds additional checks that ensure index consistency! Thanks Robert for exploring the parts in code where bounds checks were missing! As you see, especially the "sorting" stuff got a slight reproducible improvement (although stddev is still large!). This might be related to optimized bounds checking code when reading docvalues and bytebuffers. I also compared Java 8 to be safe: {noformat} Task QPS orig_j8 StdDevQPS patch_j8 StdDev Pct diff HighTermDayOfYearSort 23.65 (9.1%) 22.98 (6.6%) -2.8% ( -17% - 14%) OrHighMed 9.87 (4.3%) 9.76 (2.9%) -1.0% ( -7% - 6%) OrHighHigh 11.85 (4.1%) 11.73 (2.8%) -1.0% ( -7% - 6%) MedSpanNear 149.54 (4.2%) 148.97 (3.7%) -0.4% ( -7% - 7%) HighSloppyPhrase 0.43 (5.4%) 0.43 (6.1%) 0.0% ( -10% - 12%) LowSpanNear 22.66 (3.3%) 22.70 (2.7%) 0.2% ( -5% - 6%) OrHighLow 27.55 (2.5%) 27.62 (1.8%) 0.2% ( -4% - 4%) LowTerm 285.27 (0.6%) 286.20 (0.7%) 0.3% ( 0% - 1%) AndHighMed 134.26 (2.1%) 134.73 (1.9%) 0.4% ( -3% - 4%) IntNRQ 5.11 (8.8%) 5.13 (9.0%) 0.4% ( -16% - 20%) Fuzzy2 33.40 (1.7%) 33.53 (1.4%) 0.4% ( -2% - 3%) AndHighLow 521.74 (3.1%) 524.04 (2.3%) 0.4% ( -4% - 5%) AndHighHigh 37.68 (0.9%) 37.85 (1.0%) 0.4% ( -1% - 2%) HighPhrase 7.25 (1.3%) 7.29 (1.0%) 0.5% ( -1% - 2%) HighTerm 62.16 (1.2%) 62.56 (1.9%) 0.7% ( -2% - 3%) OrNotHighLow 368.52 (3.5%) 371.06 (2.8%) 0.7% ( -5% - 7%) MedTerm 77.17 (1.2%) 77.76 (1.7%) 0.8% ( -2% - 3%) HighSpanNear 3.94 (4.1%) 3.97 (3.6%) 0.9% ( -6% - 8%) Fuzzy1 105.41 (1.1%) 106.40 (1.1%) 0.9% ( -1% - 3%) Respell 43.38 (1.3%) 43.84 (1.6%) 1.1% ( -1% - 4%) MedPhrase 11.16 (0.8%) 11.28 (1.3%) 1.1% ( 0% - 3%) MedSloppyPhrase 25.87 (3.2%) 26.15 (2.8%) 1.1% ( -4% - 7%) LowPhrase 20.97 (0.7%) 21.23 (1.0%) 1.2% ( 0% - 2%) LowSloppyPhrase 16.09 (2.7%) 16.33 (2.6%) 1.5% ( -3% - 6%) Wildcard 24.83 (3.9%) 25.19 (5.4%) 1.5% ( -7% - 11%) PKLookup 250.76 (5.1%) 254.84 (5.2%) 1.6% ( -8% - 12%) Prefix3 37.18 (5.3%) 37.79 (6.6%) 1.6% ( -9% - 14%) OrNotHighMed 56.71 (2.1%) 58.04 (4.5%) 2.3% ( -4% - 9%) HighTermMonthSort 77.91 (9.7%) 80.12 (11.2%) 2.8% ( -16% - 26%) OrHighNotLow 35.27 (2.8%) 36.37 (5.1%) 3.1% ( -4% - 11%) OrHighNotMed 17.69 (3.4%) 18.33 (6.0%) 3.6% ( -5% - 13%) OrHighNotHigh 6.01 (3.7%) 6.26 (6.9%) 4.2% ( -6% - 15%) OrNotHighHigh 20.56 (3.1%) 21.41 (6.7%) 4.2% ( -5% - 14%) {noformat} Results are similar. And finally this is the difference between Java 8 unpatched and Java 9 patched: {noformat} Task QPS orig_j8 StdDevQPS patch_j9 StdDev Pct diff HighSloppyPhrase 14.48 (2.8%) 13.94 (3.3%) -3.8% ( -9% - 2%) MedSloppyPhrase 18.65 (1.6%) 18.12 (3.8%) -2.9% ( -8% - 2%) IntNRQ 5.78 (8.9%) 5.62 (8.9%) -2.7% ( -18% - 16%) LowSloppyPhrase 69.13 (2.0%) 67.55 (3.0%) -2.3% ( -7% - 2%) HighTermMonthSort 34.38 (9.9%) 33.87 (12.4%) -1.5% ( -21% - 23%) HighSpanNear 6.24 (3.1%) 6.15 (4.7%) -1.5% ( -9% - 6%) Wildcard 14.16 (8.3%) 13.99 (7.7%) -1.2% ( -15% - 16%) LowSpanNear 65.04 (4.0%) 64.36 (6.0%) -1.0% ( -10% - 9%) HighTerm 44.52 (5.5%) 44.09 (8.0%) -1.0% ( -13% - 13%) MedTerm 65.87 (4.9%) 65.50 (7.2%) -0.6% ( -12% - 12%) OrHighNotHigh 24.59 (4.3%) 24.48 (4.5%) -0.5% ( -8% - 8%) OrNotHighHigh 14.63 (4.2%) 14.57 (4.3%) -0.4% ( -8% - 8%) OrHighNotMed 37.97 (4.6%) 37.81 (5.1%) -0.4% ( -9% - 9%) LowPhrase 30.58 (1.9%) 30.48 (2.2%) -0.3% ( -4% - 3%) HighPhrase 2.46 (4.3%) 2.45 (4.3%) -0.3% ( -8% - 8%) MedPhrase 17.43 (2.0%) 17.38 (2.2%) -0.3% ( -4% - 4%) MedSpanNear 26.07 (2.4%) 26.00 (3.8%) -0.2% ( -6% - 6%) Prefix3 42.61 (7.3%) 42.51 (6.6%) -0.2% ( -13% - 14%) OrHighNotLow 39.09 (4.7%) 39.06 (4.9%) -0.1% ( -9% - 9%) LowTerm 319.58 (2.4%) 319.58 (3.4%) 0.0% ( -5% - 5%) OrHighLow 16.87 (3.6%) 16.89 (4.3%) 0.1% ( -7% - 8%) OrHighMed 11.55 (3.6%) 11.67 (4.6%) 1.0% ( -6% - 9%) OrHighHigh 10.72 (3.7%) 10.83 (4.8%) 1.0% ( -7% - 9%) OrNotHighMed 48.54 (2.5%) 49.13 (2.2%) 1.2% ( -3% - 6%) HighTermDayOfYearSort 37.23 (5.6%) 37.68 (6.9%) 1.2% ( -10% - 14%) AndHighHigh 21.26 (1.1%) 21.59 (1.2%) 1.5% ( 0% - 3%) PKLookup 250.34 (5.1%) 255.08 (4.9%) 1.9% ( -7% - 12%) Fuzzy2 52.75 (2.2%) 53.84 (2.6%) 2.1% ( -2% - 7%) Fuzzy1 76.52 (1.7%) 78.38 (2.2%) 2.4% ( -1% - 6%) AndHighMed 51.15 (1.1%) 52.44 (0.9%) 2.5% ( 0% - 4%) OrNotHighLow 339.27 (1.7%) 348.34 (1.8%) 2.7% ( 0% - 6%) Respell 44.60 (1.8%) 45.84 (2.9%) 2.8% ( -1% - 7%) AndHighLow 575.14 (1.6%) 593.90 (1.4%) 3.3% ( 0% - 6%) {noformat} All tests were done without tiered compilation, Xbatch and Parallel GC. So you can also compare the QPS values between the runs. If you use other garbage collectors the results are dramatically different (more than 10% slowdown with G1GC!). So the additional checks have no effect on query performance. I did not do a benchmark of indexing, but according to Adrien's benchmarking done earlier, the most impact is about indexing stored fields of huge size with many repeatable (compressible) contents. E.g., the JSON source field in Elasticsearch should show a significant speedup, especially when you have a lot of content! I was not able to reproduce a slowdown in BytesRefHash with ParallelGC. In case Mike used the standard GC of Java 9 (G1GC), he might have been affected by the G1GC slowdown bug as mentioned before => [~dweiss]. I'd like to use this as a start to migrate to Java 9 at some point, so we can now already use the Java 9 features. After applying this patch, we should check our source code for typical "bounds checks" and replace them by {{(Future)Objects.checkIndex}} variants anywhere. So please don't add code like {{if (i >= x and i < y) ...}} and instead use the checker methods, that are intrinsics in Java 9! What's other's opinion? [~jpountz], [~rcmuir], [~mikemccand], [~dweiss] > build mr-jar and use some java 9 methods if available > ----------------------------------------------------- > > Key: LUCENE-7966 > URL: https://issues.apache.org/jira/browse/LUCENE-7966 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other, general/build > Reporter: Robert Muir > Priority: Major > Labels: Java9 > Attachments: LUCENE-7966.patch, LUCENE-7966.patch, LUCENE-7966.patch, > LUCENE-7966.patch, LUCENE-7966.patch > > > See background: http://openjdk.java.net/jeps/238 > It would be nice to use some of the newer array methods and range checking > methods in java 9 for example, without waiting for lucene 10 or something. If > we build an MR-jar, we can start migrating our code to use java 9 methods > right now, it will use optimized methods from java 9 when thats available, > otherwise fall back to java 8 code. > This patch adds: > {code} > Objects.checkIndex(int,int) > Objects.checkFromToIndex(int,int,int) > Objects.checkFromIndexSize(int,int,int) > Arrays.mismatch(byte[],int,int,byte[],int,int) > Arrays.compareUnsigned(byte[],int,int,byte[],int,int) > Arrays.equal(byte[],int,int,byte[],int,int) > // did not add char/int/long/short/etc but of course its possible if needed > {code} > It sets these up in {{org.apache.lucene.future}} as 1-1 mappings to java > methods. This way, we can simply directly replace call sites with java 9 > methods when java 9 is a minimum. Simple 1-1 mappings mean also that we only > have to worry about testing that our java 8 fallback methods work. > I found that many of the current byte array methods today are willy-nilly and > very lenient for example, passing invalid offsets at times and relying on > compare methods not throwing exceptions, etc. I fixed all the instances in > core/codecs but have not looked at the problems with AnalyzingSuggester. Also > SimpleText still uses a silly method in ArrayUtil in similar crazy way, have > not removed that one yet. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org