aho135 opened a new issue, #18156:
URL: https://github.com/apache/druid/issues/18156
We are ingesting a complex json column and are occasionally running into a
bug during ingestion where the task fails with:
java.lang.RuntimeException: java.lang.IllegalArgumentException: Comparison
method violates its general contract!
This occurs prior to the intermediate persist with the following stacktrace:
```
Caused by: java.lang.IllegalArgumentException: Comparison method
violates its general contract!
at java.base/java.util.TimSort.mergeLo(TimSort.java:781) ~[?:?]
at java.base/java.util.TimSort.mergeAt(TimSort.java:518) ~[?:?]
at java.base/java.util.TimSort.mergeCollapse(TimSort.java:448)
~[?:?]
at java.base/java.util.TimSort.sort(TimSort.java:245) ~[?:?]
at java.base/java.util.Arrays.sort(Arrays.java:1307) ~[?:?]
at java.base/java.util.ArrayList.sort(ArrayList.java:1721)
~[?:?]
at
java.base/java.util.stream.SortedOps$RefSortingSink.end(SortedOps.java:392)
~[?:?]
at
java.base/java.util.stream.Sink$ChainedReference.end(Sink.java:258) ~[?:?]
at
java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:210)
~[?:?]
at
java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:161)
~[?:?]
at
java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:298)
~[?:?]
at
java.base/java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681) ~[?:?]
at
org.apache.druid.segment.incremental.IncrementalIndexAdapter.processRows(IncrementalIndexAdapter.java:85)
~[druid-processing-32.0.1.jar:32.0.1]
at
org.apache.druid.segment.incremental.IncrementalIndexAdapter.<init>(IncrementalIndexAdapter.java:65)
~[druid-processing-32.0.1.jar:32.0.1]
at
org.apache.druid.segment.IndexMergerV9.persist(IndexMergerV9.java:1086)
~[druid-processing-32.0.1.jar:32.0.1]
at
org.apache.druid.segment.IndexMerger.persist(IndexMerger.java:237)
~[druid-processing-32.0.1.jar:32.0.1]
at
org.apache.druid.segment.realtime.appenderator.StreamAppenderator.persistHydrant(StreamAppenderator.java:1657)
~[druid-server-32.0.1.jar:32.0.1]
at
org.apache.druid.segment.realtime.appenderator.StreamAppenderator$2.call(StreamAppenderator.java:694)
~[druid-server-32.0.1.jar:32.0.1]
```
This issue is more reliably reproduced with the following query:
```
SELECT
"complex_json_column",
COUNT(*) AS "Count"
FROM "TABLE"
GROUP BY 1
ORDER BY 2 DESC
```
With the following stacktrace
```
Caused by: java.lang.IllegalArgumentException: Comparison method violates
its general contract!
at java.base/java.util.TimSort.mergeLo(TimSort.java:781) ~[?:?]
at java.base/java.util.TimSort.mergeAt(TimSort.java:518) ~[?:?]
at java.base/java.util.TimSort.mergeCollapse(TimSort.java:448) ~[?:?]
at java.base/java.util.TimSort.sort(TimSort.java:245) ~[?:?]
at java.base/java.util.Arrays.sort(Arrays.java:1233) ~[?:?]
at java.base/java.util.List.sort(List.java:510) ~[?:?]
at java.base/java.util.Collections.sort(Collections.java:179) ~[?:?]
at
org.apache.druid.query.groupby.epinephelinae.BufferHashGrouper.iterator(BufferHashGrouper.java:204)
~[druid-processing-32.0.1.jar:32.0.1]
at
org.apache.druid.query.groupby.epinephelinae.SpillingGrouper.iterator(SpillingGrouper.java:274)
~[druid-processing-32.0.1.jar:32.0.1]
at
org.apache.druid.query.groupby.epinephelinae.ConcurrentGrouper$1.call(ConcurrentGrouper.java:426)
~[druid-processing-32.0.1.jar:32.0.1]
at
org.apache.druid.query.groupby.epinephelinae.ConcurrentGrouper$1.call(ConcurrentGrouper.java:422)
~[druid-processing-32.0.1.jar:32.0.1]
at
java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at
org.apache.druid.query.PrioritizedListenableFutureTask.run(PrioritizedExecutorService.java:259)
~[druid-processing-32.0.1.jar:32.0.1]
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
~[?:?]
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
~[?:?]
```
The exception doesn't occur every time. Sometimes it fails and throws the
IllegalArgumentException and other times the query completes successfully but
the results vary slightly every time. One observation is that when setting
`"groupByIsSingleThreaded": true` the query always completes successfully and
with consistent results.
Worth mentioning is that this column has a mix of plain double values (e.g.
`16245.0`) and json objects where each field is a double (e.g. `{"k1": 1.0,
"k2": 2.0, "k3": 3.0}`)
### Affected Version
32.0.1
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]