[jira] [Comment Edited] (LUCENE-8780) Improve ByteBufferGuard in Java 11

Uwe Schindler (JIRA) Sun, 28 Apr 2019 08:33:10 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16828039#comment-16828039
 ]


Uwe Schindler edited comment on LUCENE-8780 at 4/28/19 3:32 PM:
----------------------------------------------------------------

Thats the result after 20 runs of wikimediumall with 6 searcher threads (with 
ParallelGC) on Mike's lucenebench:

{noformat}
use java command /home/jenkins/tools/java/64bit/jdk-11.0.2/bin/java -server 
-Xms2g -Xmx2g -XX:+UseParallelGC -Xbatch

JAVA:
openjdk version "11.0.2" 2019-01-15
OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode)

OS:
Linux serv1.sd-datasolutions.de 4.18.0-17-generic #18~18.04.1-Ubuntu SMP Fri 
Mar 15 15:27:12 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

[...]

Report after iter 19:
                    Task    QPS orig      StdDev   QPS patch      StdDev        
        Pct diff
                  IntNRQ       30.88      (0.6%)       26.33      (0.8%)  
-14.7% ( -16% -  -13%)
                PKLookup      107.70      (2.7%)       94.31      (2.9%)  
-12.4% ( -17% -   -7%)
             AndHighHigh       10.76     (11.5%)       10.17      (3.3%)   
-5.4% ( -18% -   10%)
                  Fuzzy2       45.10      (7.7%)       43.21      (9.0%)   
-4.2% ( -19% -   13%)
         LowSloppyPhrase        7.28     (16.8%)        6.98      (6.3%)   
-4.2% ( -23% -   22%)
            OrHighNotLow      783.24      (7.1%)      751.37      (2.5%)   
-4.1% ( -12% -    5%)
           OrHighNotHigh      934.39      (6.5%)      896.38      (2.1%)   
-4.1% ( -11% -    4%)
                 Respell       45.36     (10.6%)       43.65      (7.0%)   
-3.8% ( -19% -   15%)
           OrNotHighHigh      779.95      (3.8%)      752.28      (1.8%)   
-3.5% (  -8% -    2%)
        HighSloppyPhrase       10.37     (12.8%)       10.03      (3.5%)   
-3.3% ( -17% -   14%)
               LowPhrase       11.60      (8.9%)       11.23      (1.7%)   
-3.2% ( -12% -    8%)
                 LowTerm     1694.00      (8.9%)     1642.34      (5.5%)   
-3.0% ( -16% -   12%)
                 MedTerm     1292.82      (9.3%)     1253.69      (8.2%)   
-3.0% ( -18% -   15%)
              AndHighMed       71.41      (9.9%)       69.77      (7.5%)   
-2.3% ( -17% -   16%)
            OrNotHighMed      634.32      (7.2%)      620.67      (7.5%)   
-2.2% ( -15% -   13%)
                 Prefix3      110.65     (14.9%)      108.55      (8.7%)   
-1.9% ( -22% -   25%)
               OrHighLow      347.02      (4.3%)      340.51      (9.9%)   
-1.9% ( -15% -   12%)
            OrNotHighLow      591.61      (5.5%)      580.60      (9.0%)   
-1.9% ( -15% -   13%)
            OrHighNotMed     1258.21      (1.8%)     1237.28      (5.0%)   
-1.7% (  -8% -    5%)
                  Fuzzy1       91.79      (4.3%)       90.77     (11.1%)   
-1.1% ( -15% -   14%)
               OrHighMed       10.29      (7.9%)       10.25     (11.8%)   
-0.4% ( -18% -   20%)
                Wildcard       52.28      (6.3%)       52.21      (6.8%)   
-0.1% ( -12% -   13%)
              OrHighHigh        8.16      (6.9%)        8.22      (9.3%)    
0.8% ( -14% -   18%)
              AndHighLow      563.89      (9.1%)      569.31     (15.3%)    
1.0% ( -21% -   27%)
              HighPhrase       15.88      (9.3%)       16.04     (13.0%)    
1.0% ( -19% -   25%)
               MedPhrase       14.84      (9.0%)       15.15     (12.8%)    
2.1% ( -18% -   26%)
            HighSpanNear        2.16      (9.8%)        2.21     (10.1%)    
2.3% ( -16% -   24%)
         MedSloppyPhrase       18.48     (15.4%)       18.96     (18.9%)    
2.6% ( -27% -   43%)
             MedSpanNear       17.75      (3.8%)       18.31     (10.0%)    
3.1% ( -10% -   17%)
                HighTerm     1031.00      (9.9%)     1068.12     (17.1%)    
3.6% ( -21% -   33%)
             LowSpanNear        8.22      (5.5%)        8.53     (13.3%)    
3.7% ( -14% -   23%)
   HighTermDayOfYearSort        9.78     (11.0%)       10.25     (18.2%)    
4.8% ( -21% -   38%)
       HighTermMonthSort       23.40     (26.5%)       27.11     (32.1%)   
15.9% ( -33% -  101%)
{noformat}

The total runtime of each run did not change, always approx 280s per run 
patched and unpatched. Not sure how to interpret this.


was (Author: thetaphi):
Thats the result after 20 runs with 6 searcher threads (with ParallelGC) on 
Mike's lucenebench:

{noformat}
use java command /home/jenkins/tools/java/64bit/jdk-11.0.2/bin/java -server 
-Xms2g -Xmx2g -XX:+UseParallelGC -Xbatch

JAVA:
openjdk version "11.0.2" 2019-01-15
OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode)

OS:
Linux serv1.sd-datasolutions.de 4.18.0-17-generic #18~18.04.1-Ubuntu SMP Fri 
Mar 15 15:27:12 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

[...]

Report after iter 19:
                    Task    QPS orig      StdDev   QPS patch      StdDev        
        Pct diff
                  IntNRQ       30.88      (0.6%)       26.33      (0.8%)  
-14.7% ( -16% -  -13%)
                PKLookup      107.70      (2.7%)       94.31      (2.9%)  
-12.4% ( -17% -   -7%)
             AndHighHigh       10.76     (11.5%)       10.17      (3.3%)   
-5.4% ( -18% -   10%)
                  Fuzzy2       45.10      (7.7%)       43.21      (9.0%)   
-4.2% ( -19% -   13%)
         LowSloppyPhrase        7.28     (16.8%)        6.98      (6.3%)   
-4.2% ( -23% -   22%)
            OrHighNotLow      783.24      (7.1%)      751.37      (2.5%)   
-4.1% ( -12% -    5%)
           OrHighNotHigh      934.39      (6.5%)      896.38      (2.1%)   
-4.1% ( -11% -    4%)
                 Respell       45.36     (10.6%)       43.65      (7.0%)   
-3.8% ( -19% -   15%)
           OrNotHighHigh      779.95      (3.8%)      752.28      (1.8%)   
-3.5% (  -8% -    2%)
        HighSloppyPhrase       10.37     (12.8%)       10.03      (3.5%)   
-3.3% ( -17% -   14%)
               LowPhrase       11.60      (8.9%)       11.23      (1.7%)   
-3.2% ( -12% -    8%)
                 LowTerm     1694.00      (8.9%)     1642.34      (5.5%)   
-3.0% ( -16% -   12%)
                 MedTerm     1292.82      (9.3%)     1253.69      (8.2%)   
-3.0% ( -18% -   15%)
              AndHighMed       71.41      (9.9%)       69.77      (7.5%)   
-2.3% ( -17% -   16%)
            OrNotHighMed      634.32      (7.2%)      620.67      (7.5%)   
-2.2% ( -15% -   13%)
                 Prefix3      110.65     (14.9%)      108.55      (8.7%)   
-1.9% ( -22% -   25%)
               OrHighLow      347.02      (4.3%)      340.51      (9.9%)   
-1.9% ( -15% -   12%)
            OrNotHighLow      591.61      (5.5%)      580.60      (9.0%)   
-1.9% ( -15% -   13%)
            OrHighNotMed     1258.21      (1.8%)     1237.28      (5.0%)   
-1.7% (  -8% -    5%)
                  Fuzzy1       91.79      (4.3%)       90.77     (11.1%)   
-1.1% ( -15% -   14%)
               OrHighMed       10.29      (7.9%)       10.25     (11.8%)   
-0.4% ( -18% -   20%)
                Wildcard       52.28      (6.3%)       52.21      (6.8%)   
-0.1% ( -12% -   13%)
              OrHighHigh        8.16      (6.9%)        8.22      (9.3%)    
0.8% ( -14% -   18%)
              AndHighLow      563.89      (9.1%)      569.31     (15.3%)    
1.0% ( -21% -   27%)
              HighPhrase       15.88      (9.3%)       16.04     (13.0%)    
1.0% ( -19% -   25%)
               MedPhrase       14.84      (9.0%)       15.15     (12.8%)    
2.1% ( -18% -   26%)
            HighSpanNear        2.16      (9.8%)        2.21     (10.1%)    
2.3% ( -16% -   24%)
         MedSloppyPhrase       18.48     (15.4%)       18.96     (18.9%)    
2.6% ( -27% -   43%)
             MedSpanNear       17.75      (3.8%)       18.31     (10.0%)    
3.1% ( -10% -   17%)
                HighTerm     1031.00      (9.9%)     1068.12     (17.1%)    
3.6% ( -21% -   33%)
             LowSpanNear        8.22      (5.5%)        8.53     (13.3%)    
3.7% ( -14% -   23%)
   HighTermDayOfYearSort        9.78     (11.0%)       10.25     (18.2%)    
4.8% ( -21% -   38%)
       HighTermMonthSort       23.40     (26.5%)       27.11     (32.1%)   
15.9% ( -33% -  101%)
{noformat}

The total runtime of each run did not change, always approx 280s per run 
patched and unpatched. Not sure how to interpret this.

> Improve ByteBufferGuard in Java 11
> ----------------------------------
>
>                 Key: LUCENE-8780
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8780
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/store
>    Affects Versions: master (9.0)
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Major
>              Labels: Java11
>         Attachments: LUCENE-8780.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> In LUCENE-7409 we added {{ByteBufferGuard}} to protect MMapDirectory from 
> crushing the JVM with SIGSEGV when you close and unmap the mmapped buffers of 
> an IndexInput, while another thread is accessing it.
> The idea was to do a volatile write access to flush the caches (to trigger a 
> full fence) and set a non-volatile boolean to true. All accesses would check 
> the boolean and stop the caller from accessing the underlying ByteBuffer. 
> This worked most of the time, until the JVM optimized away the plain read 
> access to the boolean (you can easily see this after some runtime of our 
> by-default ignored testcase).
> With master on Java 11, we can improve the whole thing. Using VarHandles you 
> can use any access type when reading or writing the boolean. After reading 
> Doug Lea's expanation <http://gee.cs.oswego.edu/dl/html/j9mm.html> and some 
> testing, I was no longer able to crush my JDK (even after running for minutes 
> unmapping bytebuffers).
> The apraoch is the same, we do a full-fenced write (standard volatile write) 
> when we unmap, then we yield the thread (to finish in-flight reads in other 
> threads) and then unmap all byte buffers.
> On the test side (read access), instead of using a plain read, we use the new 
> "opaque read". Opaque reads are the same as plain reads, there are only 
> different order requirements. Actually the main difference is explained by 
> Doug like this: "For example in constructions in which the only modification 
> of some variable x is for one thread to write in Opaque (or stronger) mode, 
> X.setOpaque(this, 1), any other thread spinning in 
> while(X.getOpaque(this)!=1){} will eventually terminate. Note that this 
> guarantee does NOT hold in Plain mode, in which spin loops may (and usually 
> do) infinitely loop -- they are not required to notice that a write ever 
> occurred in another thread if it was not seen on first encounter." - And 
> that's waht we want to have: We don't want to do volatile reads, but we want 
> to prevent the compiler from optimizing away our read to the boolean. So we 
> want it to "eventually" see the change. By the much stronger volatile write, 
> the cache effects should be visible even faster (like in our Java 8 approach, 
> just now we improved our read side).
> The new code is much slimmer (theoretically we could also use a AtomicBoolean 
> for that and use the new method {{getOpaque()}}, but I wanted to prevent 
> extra method calls, so I used a VarHandle directly).
> It's setup like this:
> - The underlying boolean field is a private member (with unused 
> SuppressWarnings, as its unused by the java compiler), marked as volatile 
> (that's the recommendation, but in reality it does not matter at all).
> - We create a VarHandle to access this boolean, we never do this directly 
> (this is why the volatile marking does not affect us).
> - We use VarHandle.setVolatile() to change our "invalidated" boolean to 
> "true", so enforcing a full fence
> - On the read side we use VarHandle.getOpaque() instead of VarHandle.get() 
> (like in our old code for Java 8).
> I had to tune our test a bit, as the VarHandles make it take longer until it 
> actually crushes (as optimizations jump in later). I also used a random for 
> the reads to prevent the optimizer from removing all the bytebuffer reads. 
> When we commit this, we can disable the test again (it takes approx 50 secs 
> on my machine).
> I'd still like to see the differences between the plain read and the opaque 
> read in production, so maybe [~mikemccand] or [~rcmuir] can do a comparison 
> with nightly benchmarker?
> Have fun, maybe [~dweiss] has some ideas, too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (LUCENE-8780) Improve ByteBufferGuard in Java 11

Reply via email to