kgyrtkirk commented on issue #17429:
URL: https://github.com/apache/druid/issues/17429#issuecomment-2456639199
this issue was blocking me for a few days now - so I was forced to take a
look ;)
so far my observations are:
* the issue did happened even earlier with different GHA ubuntu images
* its nondeterministic; I've seen it happen with 2-3 tests; the most common
victim is `AndFilterTest` ; but it could be `SelectorFilterTest` as well - so
its not related to just one testclass
* I've blind-tuned the `jit` to be more agressive...right now I'm not sure
if that even mattered
* I was able to repro it locally after I've switched to `OpenJDK Runtime
Environment Zulu21.38+21-CA (build 21.0.5+11-LTS)`
* I was running `21.0.4` earlier which was unaffected with several
attempts
* switching to `21.0.5` right away showed the issue; so it must be
introduced in `21.0.5`
* the issue consistently happens with
`org.apache.druid.query.filter.InDimFilter::optimizeLookup`
* my attempts to get a disassembly so far was unsuccessfull; I was bumping
into that I'm not prepared for it as `hsdis-amd64.so` is missing
* all tests run in a single jvm -> jvm retains method invocation counts from
earlier tests
* surefire test execution order is unspecified - which makes surefire run
them in `filesystem` order (which becomes essentially a random in CI runs)
* right now the shortest running command which is still able to repro the
issue is:
```
time mvn install -pl processing/
-Dtest=AndFilterTest,SuperSorterTest*,InFilterTests,*FilterTest
-Pskip-static-checks
```
* even thru the test execution order is the same I've seen crashes in 3
different tests (`NotFilterTest` ; `AndFilterTest` ; `SelectorFilterTest`)
I think right now we have the following options:
* debug it further
* even if we uncover the underlying issue - a new jdk release will be
needed to get rid of the problem ; which might take some time...so we will need
an alternate fix until that will happen
* disable the affected tests: not really an option since the C2 optimization
of `InDimFilter::optimizeLookup` may trigger the issue
* from the call stack it seem like its connected to the C2 compiler - so
disabling C2 optimization for tests on 21 might also be a possible way out
* possibly force `21.0.4` to be used
* try to not reuse the jvm for subsequent test runs (`reuseforks`)
* other options?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]