Copilot commented on code in PR #243: URL: https://github.com/apache/uima-uimaj/pull/243#discussion_r3298270124
########## uimaj-core/src/test/java/org/apache/uima/jcas/impl/LoadingBuiltinAnnotationBeforeCasTest.java: ########## @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.uima.jcas.impl; + +import static java.lang.System.getProperty; +import static org.assertj.core.api.Assertions.assertThat; +import static org.assertj.core.api.Assertions.fail; + +import java.io.ByteArrayOutputStream; +import java.io.File; +import java.nio.charset.StandardCharsets; +import java.util.concurrent.TimeUnit; + +import org.apache.uima.jcas.tcas.Annotation; +import org.apache.uima.resource.metadata.impl.TypeSystemDescription_impl; +import org.apache.uima.util.CasCreationUtils; +import org.junit.jupiter.api.Test; + +/** + * Reproduces <a href="https://github.com/apache/uima-uimaj/issues/234">issue #234</a>: when a + * built-in Annotation cover class is touched before any CAS exists, the type system records a wrong + * (zero) jcasType for it and Annotation's feature callsites are never updated. Subsequent + * {@code getCasType} / {@code setBegin} calls then fail. + * + * <p> + * The bug only manifests in a fresh JVM where nothing has yet initialized {@link Annotation}. Class + * initialization is once-per-JVM, so running this in the shared Surefire JVM (where other tests + * have already created CASes and thereby initialized Annotation normally) would not reproduce it. + * The test therefore forks a dedicated JVM running {@link #main(String[])} and asserts on its exit + * code. + */ +public class LoadingBuiltinAnnotationBeforeCasTest { + + /** + * Reproducer body. Must run in a fresh JVM (see the {@code @Test} below). Throws on failure so + * the JVM exits non-zero. + */ + public static void main(String[] args) throws Exception { + // Force Annotation's <clinit> to start now, before any TypeSystemImpl / CAS exists. + // Annotation's super (AnnotationBase) references TypeSystemImpl in its own <clinit>, which + // kicks off a recursive init storm that reads Annotation.typeIndexID while it's still 0 -- + // caching a broken JCasClassInfo and skipping Annotation's _FC_begin / _FC_end callsite update. + Class.forName(Annotation.class.getName()); + + var tsd = new TypeSystemDescription_impl(); + var jcas = CasCreationUtils.createCas(tsd, null, null).getJCas(); + + // (1) Type system lookup by JCas type id. With the bug, slot Annotation.type of + // jcasRegisteredTypes is empty (the JCCI got registered at slot 0 instead). + jcas.getCasType(Annotation.type); + + // (2) setDocumentText eventually calls Annotation.setBegin, which dispatches through the + // _FC_begin callsite. With the bug, that callsite still returns -1 and the offset lookup + // throws ArrayIndexOutOfBoundsException. + jcas.setDocumentText("hello"); + } + + @Test + void thatLoadingAnnotationBeforeCasDoesNotBreakTypeSystemManagement() throws Exception { + var windows = getProperty("os.name").toLowerCase().contains("win"); + var java = getProperty("java.home") + File.separator + "bin" + File.separator + + (windows ? "java.exe" : "java"); + + var pb = new ProcessBuilder(java, "-cp", getProperty("java.class.path"), getClass().getName()); + pb.redirectErrorStream(true); + var p = pb.start(); + + var buf = new ByteArrayOutputStream(); + p.getInputStream().transferTo(buf); + + // Bound the wait so a hypothetical class-init deadlock in the subprocess can't hang CI. + if (!p.waitFor(60, TimeUnit.SECONDS)) { Review Comment: `p.getInputStream().transferTo(buf)` is executed before the timed `waitFor`. Since `transferTo` blocks until EOF, a deadlocked/hung subprocess will block here indefinitely and the 60s timeout will never be reached. Consider consuming the subprocess output asynchronously (e.g., a separate thread) or waiting with a timeout first and only then reading the completed output (or redirecting output to a file) so the test cannot hang CI. ########## uimaj-core/src/test/java/org/apache/uima/jcas/impl/LoadingBuiltinAnnotationBeforeCasTest.java: ########## @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.uima.jcas.impl; + +import static java.lang.System.getProperty; +import static org.assertj.core.api.Assertions.assertThat; +import static org.assertj.core.api.Assertions.fail; + +import java.io.ByteArrayOutputStream; +import java.io.File; +import java.nio.charset.StandardCharsets; +import java.util.concurrent.TimeUnit; + +import org.apache.uima.jcas.tcas.Annotation; +import org.apache.uima.resource.metadata.impl.TypeSystemDescription_impl; +import org.apache.uima.util.CasCreationUtils; +import org.junit.jupiter.api.Test; + +/** + * Reproduces <a href="https://github.com/apache/uima-uimaj/issues/234">issue #234</a>: when a + * built-in Annotation cover class is touched before any CAS exists, the type system records a wrong + * (zero) jcasType for it and Annotation's feature callsites are never updated. Subsequent + * {@code getCasType} / {@code setBegin} calls then fail. + * + * <p> + * The bug only manifests in a fresh JVM where nothing has yet initialized {@link Annotation}. Class + * initialization is once-per-JVM, so running this in the shared Surefire JVM (where other tests + * have already created CASes and thereby initialized Annotation normally) would not reproduce it. + * The test therefore forks a dedicated JVM running {@link #main(String[])} and asserts on its exit + * code. + */ +public class LoadingBuiltinAnnotationBeforeCasTest { + + /** + * Reproducer body. Must run in a fresh JVM (see the {@code @Test} below). Throws on failure so + * the JVM exits non-zero. + */ + public static void main(String[] args) throws Exception { + // Force Annotation's <clinit> to start now, before any TypeSystemImpl / CAS exists. + // Annotation's super (AnnotationBase) references TypeSystemImpl in its own <clinit>, which + // kicks off a recursive init storm that reads Annotation.typeIndexID while it's still 0 -- + // caching a broken JCasClassInfo and skipping Annotation's _FC_begin / _FC_end callsite update. + Class.forName(Annotation.class.getName()); + + var tsd = new TypeSystemDescription_impl(); + var jcas = CasCreationUtils.createCas(tsd, null, null).getJCas(); + + // (1) Type system lookup by JCas type id. With the bug, slot Annotation.type of + // jcasRegisteredTypes is empty (the JCCI got registered at slot 0 instead). + jcas.getCasType(Annotation.type); + + // (2) setDocumentText eventually calls Annotation.setBegin, which dispatches through the + // _FC_begin callsite. With the bug, that callsite still returns -1 and the offset lookup + // throws ArrayIndexOutOfBoundsException. + jcas.setDocumentText("hello"); + } + + @Test + void thatLoadingAnnotationBeforeCasDoesNotBreakTypeSystemManagement() throws Exception { + var windows = getProperty("os.name").toLowerCase().contains("win"); Review Comment: Windows detection via `os.name.toLowerCase().contains("win")` can mis-detect non-Windows OS names containing "win" (e.g. "Darwin"). Use a stricter check such as `startsWith("windows")` (or similar) to avoid building a non-existent `java.exe` path. ########## uimaj-core/src/main/java/org/apache/uima/cas/impl/TypeSystemImpl.java: ########## @@ -2867,11 +2867,40 @@ public List<TypeImpl> getAllTypes() { } public static final TypeSystemImpl staticTsi = new TypeSystemImpl(); - static { - TypeSystemImpl tsi = staticTsi.commit(); // needed to assign adjusted offsets to the builtins - if (tsi != staticTsi) { - Misc.internalError(); + + // staticTsi was previously committed inside this class's <clinit>. That triggered a recursive + // load-cover-classes storm: a built-in JCas cover class's <clinit> could be the very thing that + // caused TypeSystemImpl to load (e.g. via createCallSiteForBuiltIn). The commit then asked the + // JVM to fully initialize that same cover class while it was still mid-init on the current + // thread, producing a zero/default typeIndexID and broken callsites (issue #234). Commit is now + // deferred until first real demand via committedStaticTsi(). + // + // Note: under deferred commit, another TypeSystemImpl with the same builtin-only structure may + // be committed before staticTsi is. In that case staticTsi.commit() consolidates to that other + // instance via committedTypeSystems (and staticTsi itself stays unfinalized). We therefore cache + // and return whatever commit() returns -- that is the instance with the correctly computed + // offsets. + private static volatile TypeSystemImpl committedStaticTsi; + + /** + * Returns a committed builtin-only {@link TypeSystemImpl} -- either {@link #staticTsi} itself if + * it was the first one to commit, or whatever structurally-equivalent committed instance it was + * consolidated to. Commit is performed lazily and exactly once across all threads. Code that + * needs commit-dependent state (e.g. {@link TypeImpl#getAdjOffset(String)}) must go through this + * accessor rather than reading {@link #staticTsi} directly. + */ + public static TypeSystemImpl committedStaticTsi() { + TypeSystemImpl result = committedStaticTsi; + if (result == null) { + synchronized (TypeSystemImpl.class) { + result = committedStaticTsi; + if (result == null) { + result = staticTsi.commit(); + committedStaticTsi = result; Review Comment: The new lazy-commit design explicitly allows `staticTsi.commit()` to consolidate to a different committed instance, leaving `staticTsi` itself unfinalized. However, other code (notably `createCallSiteForBuiltIn(...)`) still uses `staticTsi` directly to read `jcasRegisteredTypes` and compute `TypeImpl#getAdjOffset`, which can yield `null`/`-1` and leave builtin call sites permanently incorrect when the commit already happened on a different instance. Consider updating builtin callsite creation to use the already-committed instance (`committedStaticTsi` if non-null) for offset lookups, and avoid forcing a commit during cover-class initialization; also, the existing `if (staticTsi == null)` guard in `createCallSiteForBuiltIn` is ineffective since `staticTsi` is `static final`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
