clintropolis commented on code in PR #14204:
URL: https://github.com/apache/druid/pull/14204#discussion_r1190646105
##########
processing/src/main/java/org/apache/druid/java/util/common/guava/ParallelMergeCombiningSequence.java:
##########
@@ -1310,9 +1327,22 @@ public long getSlowestPartitionInitializedTime()
private final int inputSequences;
+ public enum State
+ {
+ SCHEDULED,
+ STARTED,
+ PARTITON_MERGE_SCHEDULED,
+ FINAL_MERGE_SCHEDULED
+ }
Review Comment:
could you explain the states a bit better? Like, the tasks from both layers
of the merge are all started at basically the same time, and keep scheduling
additional actions if there is more stuff to process until their inputs are
completely drained. So I don't really understand how the state transitions are
supposed to work and what they are supposed to tell us. It seems like unless it
gets blocked completely at SCHEDULED, chances are it will burn through the
other two scheduled states like immediately (since its just adding tasks to a
queue, not blocked on them actually running afaict). Im not really sure there
are meaningful states between haven't started, running, and done, and done
isn't very interesting. More interesting seems like the information about the
number of tasks each partition has run and/or number of rows they have
processed, vs amount of stuff processed by the 2nd layer "final merge" tasks
(like is it being held up by a slow partition, etc), but I'm not really sur
e how to easily wrap that stuff up in state.
##########
processing/src/main/java/org/apache/druid/java/util/common/guava/ParallelMergeCombiningSequence.java:
##########
@@ -191,6 +194,11 @@ public boolean hasNext()
{
final long thisTimeoutNanos = timeoutAtNanos -
System.nanoTime();
if (hasTimeout && thisTimeoutNanos < 0) {
+ LOG.info("Query timed out while merging results from data
nodes. "
+ + "Current merge state is [%s] and time spent in
current state is [%,d]ms.",
+ mergeCombineMetricsAccumulator.getState(),
+
mergeCombineMetricsAccumulator.timeSpentInCurrentStateNanos()
+ );
Review Comment:
same though on info logs
##########
processing/src/main/java/org/apache/druid/java/util/common/guava/ParallelMergeCombiningSequence.java:
##########
@@ -205,14 +213,20 @@ public boolean hasNext()
currentBatch = queue.take();
}
}
- if (currentBatch == null) {
- throw new QueryTimeoutException();
- }
- if (cancellationGizmo.isCancelled()) {
+ if (cancellationGizmo.isCancelled()) { // preffering to
throw any cancellation exception, if any
throw cancellationGizmo.getRuntimeException();
}
+ if (currentBatch == null) {
+ LOG.info("Query timed out while merging results from data
nodes. "
+ + "Current merge state is [%s] and time spent in
current state is [%,d]ms.",
+ mergeCombineMetricsAccumulator.getState(),
+
mergeCombineMetricsAccumulator.timeSpentInCurrentStateNanos()
+ );
Review Comment:
info doesn't really seem right for this, should this information just
decorate the timeout exception instead since that is also going to be logged?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]