dulu98Kurz opened a new issue, #15091:
URL: https://github.com/apache/druid/issues/15091
### Affected Version
26.0.0,27.0.0,master
### Description
Please include as much detailed information about the problem as possible.
- Cluster size
15 * i3.4xlarge, 130 cpus for Druid
500K segments loaded in cluster
- Configurations in use
General configurations
- Steps to reproduce the problem
When there are more than 32767 segments in single time period, ingestion
start to fail
- The error message or stack traces encountered. Providing more context,
such as nearby log messages or even entire logs, can be helpful.
```
2023-09-29T02:45:16,308 ERROR [task-runner-0-priority-0]
org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner -
Encountered exception while running task.
java.lang.IllegalArgumentException: fromKey > toKey
at java.util.TreeMap$NavigableSubMap.<init>(TreeMap.java:1368)
~[?:1.8.0_302]
at java.util.TreeMap$AscendingSubMap.<init>(TreeMap.java:1855)
~[?:1.8.0_302]
at java.util.TreeMap.subMap(TreeMap.java:913) ~[?:1.8.0_302]
at
org.apache.druid.timeline.partition.OvershadowableManager.entryIteratorGreaterThan(OvershadowableManager.java:423)
~[druid-processing-2023.03.1-iap.jar:2023.03.1-iap]
at
org.apache.druid.timeline.partition.OvershadowableManager.findOvershadowedBy(OvershadowableManager.java:299)
~[druid-processing-2023.03.1-iap.jar:2023.03.1-iap]
at
org.apache.druid.timeline.partition.OvershadowableManager.findOvershadowedBy(OvershadowableManager.java:275)
~[druid-processing-2023.03.1-iap.jar:2023.03.1-iap]
at
org.apache.druid.timeline.partition.OvershadowableManager.moveNewStandbyToVisibleIfNecessary(OvershadowableManager.java:456)
~[druid-processing-2023.03.1-iap.jar:2023.03.1-iap]
at
org.apache.druid.timeline.partition.OvershadowableManager.determineVisibleGroupAfterAdd(OvershadowableManager.java:432)
~[druid-processing-2023.03.1-iap.jar:2023.03.1-iap]
at
org.apache.druid.timeline.partition.OvershadowableManager.addAtomicUpdateGroupWithState(OvershadowableManager.java:629)
~[druid-processing-2023.03.1-iap.jar:2023.03.1-iap]
at
org.apache.druid.timeline.partition.OvershadowableManager.addChunk(OvershadowableManager.java:699)
~[druid-processing-2023.03.1-iap.jar:2023.03.1-iap]
at
org.apache.druid.timeline.partition.PartitionHolder.add(PartitionHolder.java:70)
~[druid-processing-2023.03.1-iap.jar:2023.03.1-iap]
at
org.apache.druid.timeline.partition.PartitionHolder.<init>(PartitionHolder.java:52)
~[druid-processing-2023.03.1-iap.jar:2023.03.1-iap]
at
org.apache.druid.timeline.VersionedIntervalTimeline.addAll(VersionedIntervalTimeline.java:201)
~[druid-processing-2023.03.1-iap.jar:2023.03.1-iap]
at
org.apache.druid.timeline.VersionedIntervalTimeline.add(VersionedIntervalTimeline.java:180)
~[druid-processing-2023.03.1-iap.jar:2023.03.1-iap]
at
org.apache.druid.segment.realtime.appenderator.StreamAppenderator.getOrCreateSink(StreamAppenderator.java:486)
~[druid-server-2023.03.1-iap.jar:2023.03.1-iap]
at
org.apache.druid.segment.realtime.appenderator.StreamAppenderator.add(StreamAppenderator.java:267)
~[druid-server-2023.03.1-iap.jar:2023.03.1-iap]
at
org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver.append(BaseAppenderatorDriver.java:411)
~[druid-server-2023.03.1-iap.jar:2023.03.1-iap]
at
org.apache.druid.segment.realtime.appenderator.StreamAppenderatorDriver.add(StreamAppenderatorDriver.java:191)
~[druid-server-2023.03.1-iap.jar:2023.03.1-iap]
at
org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.runInternal(SeekableStreamIndexTaskRunner.java:654)
~[druid-indexing-service-2023.03.1-iap.jar:2023.03.1-iap]
at
org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.run(SeekableStreamIndexTaskRunner.java:266)
~[druid-indexing-service-2023.03.1-iap.jar:2023.03.1-iap]
at
org.apache.druid.indexing.seekablestream.SeekableStreamIndexTask.runTask(SeekableStreamIndexTask.java:151)
~[druid-indexing-service-2023.03.1-iap.jar:2023.03.1-iap]
at
org.apache.druid.indexing.common.task.AbstractTask.run(AbstractTask.java:169)
~[druid-indexing-service-2023.03.1-iap.jar:2023.03.1-iap]
at
org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:477)
~[druid-indexing-service-2023.03.1-iap.jar:2023.03.1-iap]
at
org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:449)
~[druid-indexing-service-2023.03.1-iap.jar:2023.03.1-iap]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
~[?:1.8.0_302]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
~[?:1.8.0_302]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
~[?:1.8.0_302]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_302]
```
- Any debugging that you have already done
Unloaded and deleted segments in deepstorage for the time period temporarily
remediated the problem
### Background
This issue only happens to our Kafka ingesting datasources where too many
small segments were created due to compaction failures/ back filling/ late
messages, the datasource was configured with DAY segment granularity, when
number of segments were too large ( exceeds Short.MAX_VALUE 32767 ) for that
day, the ingestion task failed with error above
It seems strange to me when I realized Druid assumes/limits it`s ability to
holding more than 32767 segments in single time period, I really hope someone
could share some context about why this assumption/limitation exists to better
understand how to fix the issue.
I have a PR ready to review to remediate this issue temporarily, at least
keep ingestion happy, I believe a better way to solve this issue more
completely is to expand the range to `Integer` instead of `Short`, without
context on where the assumption/limitation coming from I did not proceed with
this route, please suggest
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]