clintropolis commented on code in PR #14204:
URL: https://github.com/apache/druid/pull/14204#discussion_r1190646105


##########
processing/src/main/java/org/apache/druid/java/util/common/guava/ParallelMergeCombiningSequence.java:
##########
@@ -1310,9 +1327,22 @@ public long getSlowestPartitionInitializedTime()
 
     private final int inputSequences;
 
+    public enum State
+    {
+      SCHEDULED,
+      STARTED,
+      PARTITON_MERGE_SCHEDULED,
+      FINAL_MERGE_SCHEDULED
+    }

Review Comment:
   could you explain the states a bit better? Like, the tasks from both layers 
of the merge are all started at basically the same time, and keep scheduling 
additional actions if there is more stuff to process until their inputs are 
completely drained. So I don't really understand how the state transitions are 
supposed to work and what they are supposed to tell us. It seems like unless it 
gets blocked completely at SCHEDULED, chances are it will burn through the 
other two scheduled states like immediately (since its just adding tasks to a 
queue, not blocked on them actually running afaict). Im not really sure there 
are meaningful states between haven't started, running, and done, and done 
isn't very interesting. More interesting seems like the information about the 
number of tasks each partition has run and/or number of rows they have 
processed, vs amount of stuff processed by the 2nd layer "final merge" tasks 
(like is it being held up by a slow partition, etc), but I'm not really sur
 e how to easily wrap that stuff up in state.



##########
processing/src/main/java/org/apache/druid/java/util/common/guava/ParallelMergeCombiningSequence.java:
##########
@@ -191,6 +194,11 @@ public boolean hasNext()
               {
                 final long thisTimeoutNanos = timeoutAtNanos - 
System.nanoTime();
                 if (hasTimeout && thisTimeoutNanos < 0) {
+                  LOG.info("Query timed out while merging results from data 
nodes. "
+                           + "Current merge state is [%s] and time spent in 
current state is [%,d]ms.",
+                           mergeCombineMetricsAccumulator.getState(),
+                           
mergeCombineMetricsAccumulator.timeSpentInCurrentStateNanos()
+                  );

Review Comment:
   same though on info logs



##########
processing/src/main/java/org/apache/druid/java/util/common/guava/ParallelMergeCombiningSequence.java:
##########
@@ -205,14 +213,20 @@ public boolean hasNext()
                       currentBatch = queue.take();
                     }
                   }
-                  if (currentBatch == null) {
-                    throw new QueryTimeoutException();
-                  }
 
-                  if (cancellationGizmo.isCancelled()) {
+                  if (cancellationGizmo.isCancelled()) { // preffering to 
throw any cancellation exception, if any
                     throw cancellationGizmo.getRuntimeException();
                   }
 
+                  if (currentBatch == null) {
+                    LOG.info("Query timed out while merging results from data 
nodes. "
+                             + "Current merge state is [%s] and time spent in 
current state is [%,d]ms.",
+                             mergeCombineMetricsAccumulator.getState(),
+                             
mergeCombineMetricsAccumulator.timeSpentInCurrentStateNanos()
+                    );

Review Comment:
   info doesn't really seem right for this, should this information just 
decorate the timeout exception instead since that is also going to be logged?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to