dulu98Kurz commented on code in PR #15090:
URL: https://github.com/apache/druid/pull/15090#discussion_r1350578049
##########
processing/src/main/java/org/apache/druid/timeline/partition/OvershadowableManager.java:
##########
@@ -418,9 +420,11 @@ private Iterator<Entry<RootPartitionRange,
Short2ObjectSortedMap<AtomicUpdateGro
TreeMap<RootPartitionRange, Short2ObjectSortedMap<AtomicUpdateGroup<T>>>
stateMap
)
{
- final RootPartitionRange lowFench = new RootPartitionRange(partitionId,
partitionId);
+ // remediate submap `fromKey > toKey` issue when partitionId overflows
+ final short partitionIdLowFence = partitionId < 0 ? Short.MAX_VALUE :
partitionId;
Review Comment:
Hi @abhishekagarwal87 , thanks for checking on this!
You are right our investigation suggesting both late-messages from upstream
and compactions falling behind, specifically we found there were random
late-messages mixed in the kafka topics, it keep adding tiny segments to
finalized trunk and eventually goes beyond `short` range and broke live
ingestion tasks of new data, setting rejection period was not ideal because it
means we will lose data, and because compaction falling behind we can`t afford
to wait for it to catch up , I end up hard deleting the problematic time-trunk
and then I realized solely relying on compaction seems inadequate.
Admittedly it is not an ideal use-case for Druid to handle random late
messages, but it was a really difficult choice when user had to chose between
letting ingestion broke vs deleting problematic time trunk.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]