Hi,

this error indeed cannot happen as all our segments are shared. It could still be some bug in the Java 19 version, did you try Java 21 or Java 20?

It may also be a Coretto problem, maybe contact their team, maybe they have applied some changes. ScopedMemoryAccess is using an extension to the original Java memory model internally (I think the changed something in the specs), so it changed quite a lot internally. Maybe Coretto has some patches for hotspot that make the memory model changes hit us?

I don't think the bug is in Lucene's code, because if a thread is shared, it is shared. Maybe some other problem could be: Have you maybe accidentally closed the IndexInput too early. Normally this should cause an IllegalStateException (we have a test for this), but I am not fully sure what happens if the shared scope was already closed. I remmeber there were some bugs in 19, but it is already too long ago. So please try with plain OpenJDK Java 21 (or 20).

I would like to know more about the speed improvements! In our benchmarking they were not so visible (only a slight change), so happy to see more.

Uwe

Am 17.08.2023 um 12:43 schrieb Michael McCandless:
Hi Team,

We hit an interesting and exciting intermittent exception in our customer-facing product search instance (all Lucene!) at Amazon:

 java.lang.WrongThreadException: Attempted access outside owning thread

at java.base/jdk.internal.foreign.MemorySessionImpl.wrongThread(MemorySessionImpl.java:460)

at java.base/jdk.internal.misc.ScopedMemoryAccess$ScopedAccessError.newRuntimeException(ScopedMemoryAccess.java:113)

at java.base/jdk.internal.misc.ScopedMemoryAccess.getByte(ScopedMemoryAccess.java:518)

at java.base/java.lang.invoke.VarHandleSegmentAsBytes.get(VarHandleSegmentAsBytes.java:109)

at java.base/java.lang.foreign.MemorySegment.get(MemorySegment.java:1103)

at org.apache.lucene.store.MemorySegmentIndexInput$SingleSegmentImpl.readByte(MemorySegmentIndexInput.java:485)

at org.apache.lucene.util.fst.ReverseRandomAccessReader.readByte(ReverseRandomAccessReader.java:33)

at org.apache.lucene.util.fst.FST.findTargetArc(FST.java:1444)

at org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum.seekExact(SegmentTermsEnum.java:511)

at org.apache.lucene.index.TermStates.loadTermsEnum(TermStates.java:111)

at org.apache.lucene.index.TermStates.build(TermStates.java:96)


We are using Corretto Java full version:


openjdk full version "19.0.2+9"


Looking at how Uwe's magic mrjar code works, it doesn't look like we ever make a thread private MemorySegment?  If so, I don't see how this exception could be occurring :)  We seem to do this:

|final MemorySession session = MemorySession.openShared();|

Or, maybe we do sometimes make thread private memory segments, and maybe we (Amazon's sources) have a silly thread over-sharing bug, but so far I think that's unlikely -- we are calling TermStates.build from a single thread, which under the hood clones/slices the MMap IndexInputs to seek the terms dictionary on each segment and only that one thread ever interacts with those.  It's all just one thread under TermStates.build.


This only happened on a few hosts and only for a short period of time, making me suspect some sort of intermittent JVM bug (e.g. HotSpot miscomiplation or so).  It is clearly very rare, so we are still using the new MMap (which btw seems to be a big performance gain for our service, which we are still trying to fully understand, more on that later!).


Has anyone else seen such errant exceptions with the new Panama based MMap?  Are there any known Java issues that smell like this?  (A quick search on bugs.openjdk.org <http://bugs.openjdk.org> (https://bugs.openjdk.org/browse/JDK-8287809?jql=issuetype%20%3D%20Bug%20AND%20text%20~%20WrongThreadException) did not seem to turn up any obvious candidates).


Thanks,


Mike McCandless

http://blog.mikemccandless.com

--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail:u...@thetaphi.de

Reply via email to