Hi,
this error indeed cannot happen as all our segments are shared. It could
still be some bug in the Java 19 version, did you try Java 21 or Java 20?
It may also be a Coretto problem, maybe contact their team, maybe they
have applied some changes. ScopedMemoryAccess is using an extension to
the original Java memory model internally (I think the changed something
in the specs), so it changed quite a lot internally. Maybe Coretto has
some patches for hotspot that make the memory model changes hit us?
I don't think the bug is in Lucene's code, because if a thread is
shared, it is shared. Maybe some other problem could be: Have you maybe
accidentally closed the IndexInput too early. Normally this should cause
an IllegalStateException (we have a test for this), but I am not fully
sure what happens if the shared scope was already closed. I remmeber
there were some bugs in 19, but it is already too long ago. So please
try with plain OpenJDK Java 21 (or 20).
I would like to know more about the speed improvements! In our
benchmarking they were not so visible (only a slight change), so happy
to see more.
Uwe
Am 17.08.2023 um 12:43 schrieb Michael McCandless:
Hi Team,
We hit an interesting and exciting intermittent exception in our
customer-facing product search instance (all Lucene!) at Amazon:
java.lang.WrongThreadException: Attempted access outside owning thread
at
java.base/jdk.internal.foreign.MemorySessionImpl.wrongThread(MemorySessionImpl.java:460)
at
java.base/jdk.internal.misc.ScopedMemoryAccess$ScopedAccessError.newRuntimeException(ScopedMemoryAccess.java:113)
at
java.base/jdk.internal.misc.ScopedMemoryAccess.getByte(ScopedMemoryAccess.java:518)
at
java.base/java.lang.invoke.VarHandleSegmentAsBytes.get(VarHandleSegmentAsBytes.java:109)
at java.base/java.lang.foreign.MemorySegment.get(MemorySegment.java:1103)
at
org.apache.lucene.store.MemorySegmentIndexInput$SingleSegmentImpl.readByte(MemorySegmentIndexInput.java:485)
at
org.apache.lucene.util.fst.ReverseRandomAccessReader.readByte(ReverseRandomAccessReader.java:33)
at org.apache.lucene.util.fst.FST.findTargetArc(FST.java:1444)
at
org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum.seekExact(SegmentTermsEnum.java:511)
at org.apache.lucene.index.TermStates.loadTermsEnum(TermStates.java:111)
at org.apache.lucene.index.TermStates.build(TermStates.java:96)
We are using Corretto Java full version:
openjdk full version "19.0.2+9"
Looking at how Uwe's magic mrjar code works, it doesn't look like we
ever make a thread private MemorySegment? If so, I don't see how this
exception could be occurring :) We seem to do this:
|final MemorySession session = MemorySession.openShared();|
Or, maybe we do sometimes make thread private memory segments, and
maybe we (Amazon's sources) have a silly thread over-sharing bug, but
so far I think that's unlikely -- we are calling TermStates.build from
a single thread, which under the hood clones/slices the MMap
IndexInputs to seek the terms dictionary on each segment and only that
one thread ever interacts with those. It's all just one thread under
TermStates.build.
This only happened on a few hosts and only for a short period of time,
making me suspect some sort of intermittent JVM bug (e.g. HotSpot
miscomiplation or so). It is clearly very rare, so we are still using
the new MMap (which btw seems to be a big performance gain for our
service, which we are still trying to fully understand, more on that
later!).
Has anyone else seen such errant exceptions with the new Panama based
MMap? Are there any known Java issues that smell like this? (A quick
search on bugs.openjdk.org <http://bugs.openjdk.org>
(https://bugs.openjdk.org/browse/JDK-8287809?jql=issuetype%20%3D%20Bug%20AND%20text%20~%20WrongThreadException)
did not seem to turn up any obvious candidates).
Thanks,
Mike McCandless
http://blog.mikemccandless.com
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail:u...@thetaphi.de