dulu98Kurz opened a new issue, #15091:
URL: https://github.com/apache/druid/issues/15091

   ### Affected Version
   
   26.0.0,27.0.0,master
   
   ### Description
   
   Please include as much detailed information about the problem as possible.
   - Cluster size
   15 * i3.4xlarge, 130 cpus for Druid
   500K segments loaded in cluster
   
   - Configurations in use
   General configurations
   
   - Steps to reproduce the problem
   When there are more than 32767 segments in single time period, ingestion 
start to fail
    
   - The error message or stack traces encountered. Providing more context, 
such as nearby log messages or even entire logs, can be helpful.
   ```
   2023-09-29T02:45:16,308 ERROR [task-runner-0-priority-0] 
org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - 
Encountered exception while running task.
   java.lang.IllegalArgumentException: fromKey > toKey
        at java.util.TreeMap$NavigableSubMap.<init>(TreeMap.java:1368) 
~[?:1.8.0_302]
        at java.util.TreeMap$AscendingSubMap.<init>(TreeMap.java:1855) 
~[?:1.8.0_302]
        at java.util.TreeMap.subMap(TreeMap.java:913) ~[?:1.8.0_302]
        at 
org.apache.druid.timeline.partition.OvershadowableManager.entryIteratorGreaterThan(OvershadowableManager.java:423)
 ~[druid-processing-2023.03.1-iap.jar:2023.03.1-iap]
        at 
org.apache.druid.timeline.partition.OvershadowableManager.findOvershadowedBy(OvershadowableManager.java:299)
 ~[druid-processing-2023.03.1-iap.jar:2023.03.1-iap]
        at 
org.apache.druid.timeline.partition.OvershadowableManager.findOvershadowedBy(OvershadowableManager.java:275)
 ~[druid-processing-2023.03.1-iap.jar:2023.03.1-iap]
        at 
org.apache.druid.timeline.partition.OvershadowableManager.moveNewStandbyToVisibleIfNecessary(OvershadowableManager.java:456)
 ~[druid-processing-2023.03.1-iap.jar:2023.03.1-iap]
        at 
org.apache.druid.timeline.partition.OvershadowableManager.determineVisibleGroupAfterAdd(OvershadowableManager.java:432)
 ~[druid-processing-2023.03.1-iap.jar:2023.03.1-iap]
        at 
org.apache.druid.timeline.partition.OvershadowableManager.addAtomicUpdateGroupWithState(OvershadowableManager.java:629)
 ~[druid-processing-2023.03.1-iap.jar:2023.03.1-iap]
        at 
org.apache.druid.timeline.partition.OvershadowableManager.addChunk(OvershadowableManager.java:699)
 ~[druid-processing-2023.03.1-iap.jar:2023.03.1-iap]
        at 
org.apache.druid.timeline.partition.PartitionHolder.add(PartitionHolder.java:70)
 ~[druid-processing-2023.03.1-iap.jar:2023.03.1-iap]
        at 
org.apache.druid.timeline.partition.PartitionHolder.<init>(PartitionHolder.java:52)
 ~[druid-processing-2023.03.1-iap.jar:2023.03.1-iap]
        at 
org.apache.druid.timeline.VersionedIntervalTimeline.addAll(VersionedIntervalTimeline.java:201)
 ~[druid-processing-2023.03.1-iap.jar:2023.03.1-iap]
        at 
org.apache.druid.timeline.VersionedIntervalTimeline.add(VersionedIntervalTimeline.java:180)
 ~[druid-processing-2023.03.1-iap.jar:2023.03.1-iap]
        at 
org.apache.druid.segment.realtime.appenderator.StreamAppenderator.getOrCreateSink(StreamAppenderator.java:486)
 ~[druid-server-2023.03.1-iap.jar:2023.03.1-iap]
        at 
org.apache.druid.segment.realtime.appenderator.StreamAppenderator.add(StreamAppenderator.java:267)
 ~[druid-server-2023.03.1-iap.jar:2023.03.1-iap]
        at 
org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver.append(BaseAppenderatorDriver.java:411)
 ~[druid-server-2023.03.1-iap.jar:2023.03.1-iap]
        at 
org.apache.druid.segment.realtime.appenderator.StreamAppenderatorDriver.add(StreamAppenderatorDriver.java:191)
 ~[druid-server-2023.03.1-iap.jar:2023.03.1-iap]
        at 
org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.runInternal(SeekableStreamIndexTaskRunner.java:654)
 ~[druid-indexing-service-2023.03.1-iap.jar:2023.03.1-iap]
        at 
org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.run(SeekableStreamIndexTaskRunner.java:266)
 ~[druid-indexing-service-2023.03.1-iap.jar:2023.03.1-iap]
        at 
org.apache.druid.indexing.seekablestream.SeekableStreamIndexTask.runTask(SeekableStreamIndexTask.java:151)
 ~[druid-indexing-service-2023.03.1-iap.jar:2023.03.1-iap]
        at 
org.apache.druid.indexing.common.task.AbstractTask.run(AbstractTask.java:169) 
~[druid-indexing-service-2023.03.1-iap.jar:2023.03.1-iap]
        at 
org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:477)
 ~[druid-indexing-service-2023.03.1-iap.jar:2023.03.1-iap]
        at 
org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:449)
 ~[druid-indexing-service-2023.03.1-iap.jar:2023.03.1-iap]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_302]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_302]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_302]
        at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_302]
   ```
   - Any debugging that you have already done
   Unloaded and deleted segments in deepstorage for the time period temporarily 
remediated the problem
   
   ### Background
   This issue only happens to our Kafka ingesting datasources where too many 
small segments were created due to compaction failures/ back filling/ late 
messages, the datasource was configured with DAY segment granularity, when 
number of segments were too large ( exceeds Short.MAX_VALUE 32767 ) for that 
day, the ingestion task failed with error above
   
   It seems strange to me when I realized Druid assumes/limits it`s ability to 
holding more than 32767 segments in single time period, I really hope someone 
could share some context about why this assumption/limitation exists to better 
understand how to fix the issue.
   
   I have a PR ready to review to remediate this issue temporarily, at least 
keep ingestion happy, I believe a better way to solve this issue more 
completely is to expand the range to `Integer` instead of `Short`, without 
context on where the assumption/limitation coming from I did not proceed with 
this route, please suggest


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to