kgyrtkirk commented on issue #17429:
URL: https://github.com/apache/druid/issues/17429#issuecomment-2456639199

   this issue was blocking me for a few days now - so I was forced to take a 
look ;)
   
   so far my observations are:
   
   * the issue did happened even earlier with different GHA ubuntu images
   * its nondeterministic; I've seen it happen with 2-3 tests; the most common 
victim is `AndFilterTest` ; but it could be `SelectorFilterTest` as well - so 
its not related to just one testclass
   * I've blind-tuned the `jit` to be more agressive...right now I'm not sure 
if that even mattered
   * I was able to repro it locally after I've switched to `OpenJDK Runtime 
Environment Zulu21.38+21-CA (build 21.0.5+11-LTS)`
      * I was running `21.0.4` earlier which was unaffected with several 
attempts
      * switching to `21.0.5` right away showed the issue; so it must be 
introduced in `21.0.5`
   * the issue consistently happens with 
`org.apache.druid.query.filter.InDimFilter::optimizeLookup`
   * my attempts to get a disassembly so far was unsuccessfull; I was bumping 
into that I'm not prepared for it as `hsdis-amd64.so`  is missing
   * all tests run in a single jvm -> jvm retains method invocation counts from 
earlier tests
   * surefire test execution order is unspecified - which makes surefire run 
them in `filesystem` order (which becomes essentially a random in CI runs)
   * right now the shortest running command which is still able to repro the 
issue is:
   ```
   time mvn install -pl processing/ 
-Dtest=AndFilterTest,SuperSorterTest*,InFilterTests,*FilterTest 
-Pskip-static-checks
   ```
   * even thru the test execution order is the same I've seen crashes in 3 
different tests (`NotFilterTest` ; `AndFilterTest` ; `SelectorFilterTest`)
   
   I think right now we have the following options:
   * debug it further
      * even if we uncover the underlying issue - a new jdk release will be 
needed to get rid of the problem ; which might take some time...so we will need 
an alternate fix until that will happen
   * disable the affected tests: not really an option since the C2 optimization 
of `InDimFilter::optimizeLookup` may trigger the issue 
   * from the call stack it seem like its connected to the C2 compiler - so 
disabling C2 optimization for tests on 21 might also be a possible way out
   * possibly force `21.0.4` to be used
   * try to not reuse the jvm for subsequent test runs (`reuseforks`)
   * other options?
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to