[
https://issues.apache.org/jira/browse/LUCENE-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829428#comment-16829428
]
Uwe Schindler edited comment on LUCENE-8780 at 4/29/19 4:34 PM:
----------------------------------------------------------------
Hi,
I did some tests over the last night. The fluctuating results can be improved
by removing the null-check on the "cleaner". Normally this one should be
optimized away, but that seems to be too hard. Without it, you still have some
variance in the results, but it's better.
In general the new code is a bit slower, which is more visible if you have
shorter runtime, so it looks like optimizing aways the VarHandle simply takes
longer, and this degrades oaverage performance. I don't know how long the
benchmark runs a warmup.
bq. Every time I read about those memory ordering models I seem to get it and
two weeks later I'm again confused...
That's the ssme for me. But our use-case seems to be pretty good explained in
the above quotes by Doug Lea. It clearly says that a volatile write is harder
than a opaque read, the volatile write is is only visible eventually to the
opaque read (but never for a plain read, if it was optimized away).
I was already talking with Hotspot guys: In short, there is no way to make
ByteBufferGuard "safe" unless you use a volatile or with an aquire/release
fence (that can be implemented also with the varhandles). When using it, I was
unable to crush my VM, but slowdown is definitely there. Also manually putting
in fences does not help.
The main difference between plain read and opaque (as used here) is not that
the produced CPU instructions are different, the difference is only how the
optimizer may optimize code. With plain reads, a simple spin-loop will never
end, as the hotspot compiler clearly sees that the value can't change, and
because it's not volatile, opaque or whatever, he will remove it for this
thread. This does not only happen in spin loops, it also happens in our code if
it's executed often enough. This is not a bug, it's wanted.
I tuned the testcase to crush always: Just insert a loop of 10.000 reads BEFORE
you unmap, then it fails always (in Java 8). With opaque it was also doing
this, but not reproducible, so there is an improvement. IMHO, the reason why we
see a slowdown for some queries could be coming from that: Hotspot is removing
the "if (invalidated)" check, as it cannot change in the current thread. With
opaque it can't do it, so the code is slower on the long term. I will try to
get some assembly output later, just have to install hsdis.dll.
To make it bulletproof, we have to wait for official support by the ByteBuffer
API to officially unmap (https://bugs.openjdk.java.net/browse/JDK-4724038),
with our checks we cannot make it safe. Another approach would be to expand the
SIGSEGV signal handler checks in the JVM to throw an InternalError (they do it
now if the operating system truncates the file and so the mapped are changes).
I don't know why they not generally do a SIGSEGV check and if it happens inside
DirectBuffer they just throw an Exception.... (possibly delayed as in the
truncated file case).
So we have to decide what we want to do as a workaround:
- We can't be bulletproof
- We can catch with our current code most cases where you open an index, start
a query and then close the reader. This only works shortly after program
started and everything is not yet optimized. As soon as there were multiple
queries already running and you close the JVM later, it SIGSEGV almost always.
Not even "opaque" semantics help here. With "plain" code (current code in
branch_8x), at the time when optimizer decides to optimize, the checks are gone.
- With the current "opaque" code we can go a bit further, but we have some
slowdown, but it's still not save.
was (Author: thetaphi):
Hi,
I did some tests over the last night. The fluctuating results can be improved
by removing the null-check on the "cleaner". Normally this one should be
optimized away, but that seems to be too hard. Without it, you still have some
variance in the results, but it's better.
In general the new code is a bit slower, which is more visible if you have
shorter runtime, so it looks like optimizing aways the VarHandle simply takes
longer, and this degrades oaverage performance. I don't know how long the
benchmark runs a warmup.
bq. Every time I read about those memory ordering models I seem to get it and
two weeks later I'm again confused...
That's the ssme for me. But our use-case seems to be pretty good explained in
the above quotes by Doug Lea. It clearly says that a volatile write is harder
than a opaque read, the volatile write is is only visible eventually to the
opaque read (but never for a plain read, if it was optimized away).
I was already talking with Hotspot guys: In short, there is no way to make
ByteBufferGuard "safe" unless you use a volatile or with an aquire/release
fence (that can be implemented also with the varhandles). When using it, I was
unable to crush my VM, but slowdown is definitely there. Also manually putting
in fences does not help.
The main difference between plain read and opaque (as used here) is not that
the produced CPU instructions are different, the difference is only how the
optimizer may optimize code. With plain reads, a simple spin-loop will never
end, as the hotspot compiler clearly sees that the value can't change, and
because it's not volatile, opaque or whatever, he will remove it for this
thread. This does not only happen in spin loops, it also happens in our code if
it's executed often enough. This is not a bug, it's wanted.
I tuned the testcase to crush always: Just insert a loop of 10.000 reads BEFORE
you unmap, then it fails always (in Java 8). With opaque it was also doing
this, but not reproducible, so there is an improvement. IMHO, the reason why we
see a slowdown for some queries could be coming from that: Hotspot is removing
the "if (invalidated)" check, as it cannot change in the current thread. With
opaque it can't do it, so the code is slower on the long term. I will try to
get some assembly output later, just have to install hsdis.dll.
To make it bulletproof, we have to wait for official support by the ByteBuffer
API to officially unmap (https://bugs.openjdk.java.net/browse/JDK-4724038),
with our checks we cannot make it safe. Another approach would be to expand the
SIGSEGV signal handler checks in the JVM to throw an InternalError (they do it
now if the operating system truncates the file and so the mapped are changes).
I don't know why they not generally do a SIGSEGV check and if it happens inside
DirectBuffer they just throw an Exception.... (possibly delayed as in the
truncated file case).
So we have to decide what we want to do as a workaround:
- We can't be bulletproof
- We can catch with our current code most cases where you open an index, start
a query and then close the reader. This only works shortly after program
started and everything is not yet optimized. As soon as there were multiple
queries already running and you close the JVM later, it SIGSEGV almost always.
Not even "opaque" semantics help here.
- With the current code we can go a bit further, but we have some slowdown, but
it's still not save.
> Improve ByteBufferGuard in Java 11
> ----------------------------------
>
> Key: LUCENE-8780
> URL: https://issues.apache.org/jira/browse/LUCENE-8780
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/store
> Affects Versions: master (9.0)
> Reporter: Uwe Schindler
> Assignee: Uwe Schindler
> Priority: Major
> Labels: Java11
> Attachments: LUCENE-8780.patch
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> In LUCENE-7409 we added {{ByteBufferGuard}} to protect MMapDirectory from
> crushing the JVM with SIGSEGV when you close and unmap the mmapped buffers of
> an IndexInput, while another thread is accessing it.
> The idea was to do a volatile write access to flush the caches (to trigger a
> full fence) and set a non-volatile boolean to true. All accesses would check
> the boolean and stop the caller from accessing the underlying ByteBuffer.
> This worked most of the time, until the JVM optimized away the plain read
> access to the boolean (you can easily see this after some runtime of our
> by-default ignored testcase).
> With master on Java 11, we can improve the whole thing. Using VarHandles you
> can use any access type when reading or writing the boolean. After reading
> Doug Lea's expanation <http://gee.cs.oswego.edu/dl/html/j9mm.html> and some
> testing, I was no longer able to crush my JDK (even after running for minutes
> unmapping bytebuffers).
> The apraoch is the same, we do a full-fenced write (standard volatile write)
> when we unmap, then we yield the thread (to finish in-flight reads in other
> threads) and then unmap all byte buffers.
> On the test side (read access), instead of using a plain read, we use the new
> "opaque read". Opaque reads are the same as plain reads, there are only
> different order requirements. Actually the main difference is explained by
> Doug like this: "For example in constructions in which the only modification
> of some variable x is for one thread to write in Opaque (or stronger) mode,
> X.setOpaque(this, 1), any other thread spinning in
> while(X.getOpaque(this)!=1){} will eventually terminate. Note that this
> guarantee does NOT hold in Plain mode, in which spin loops may (and usually
> do) infinitely loop -- they are not required to notice that a write ever
> occurred in another thread if it was not seen on first encounter." - And
> that's waht we want to have: We don't want to do volatile reads, but we want
> to prevent the compiler from optimizing away our read to the boolean. So we
> want it to "eventually" see the change. By the much stronger volatile write,
> the cache effects should be visible even faster (like in our Java 8 approach,
> just now we improved our read side).
> The new code is much slimmer (theoretically we could also use a AtomicBoolean
> for that and use the new method {{getOpaque()}}, but I wanted to prevent
> extra method calls, so I used a VarHandle directly).
> It's setup like this:
> - The underlying boolean field is a private member (with unused
> SuppressWarnings, as its unused by the java compiler), marked as volatile
> (that's the recommendation, but in reality it does not matter at all).
> - We create a VarHandle to access this boolean, we never do this directly
> (this is why the volatile marking does not affect us).
> - We use VarHandle.setVolatile() to change our "invalidated" boolean to
> "true", so enforcing a full fence
> - On the read side we use VarHandle.getOpaque() instead of VarHandle.get()
> (like in our old code for Java 8).
> I had to tune our test a bit, as the VarHandles make it take longer until it
> actually crushes (as optimizations jump in later). I also used a random for
> the reads to prevent the optimizer from removing all the bytebuffer reads.
> When we commit this, we can disable the test again (it takes approx 50 secs
> on my machine).
> I'd still like to see the differences between the plain read and the opaque
> read in production, so maybe [~mikemccand] or [~rcmuir] can do a comparison
> with nightly benchmarker?
> Have fun, maybe [~dweiss] has some ideas, too.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]