[GitHub] [beam] scwhittle commented on a diff in pull request #26085: Populate getWorkStream latencies in dataflow streaming worker harness

via GitHub Thu, 01 Jun 2023 06:25:38 -0700


scwhittle commented on code in PR #26085:
URL: https://github.com/apache/beam/pull/26085#discussion_r1213150951



##########
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/windmill/GrpcWindmillServer.java:
##########
@@ -969,6 +987,76 @@ protected void startThrottleTimer() {
       getWorkThrottleTimer.start();
     }
 
+    private class GetWorkTimingInfosTracker {
+      private final Map<State, Duration> getWorkStreamLatencies;
+
+      public GetWorkTimingInfosTracker() {
+        this.getWorkStreamLatencies = new EnumMap<>(State.class);
+      }
+
+      public void addTimingInfo(Collection<GetWorkStreamTimingInfo> infos) {
+        // We want to record duration for each stage and also be reflective on 
total work item
+        // processing time. It can be tricky because timings of different
+        // StreamingGetWorkResponseChunks can be interleaved. Current strategy 
is to record the
+        // maximum duration in each stage across different chunks, this will 
allow us to identify
+        // the slow stage, but note the sum duration of each slowest stages 
may be larger than the
+        // duration from first chunk creation to last chunk reception by user 
worker.

Review Comment:
   I was thinking to scale the times to transmit to user worker such that those 
portions of the latency attribution match the time it takes from generating 
work to assembling it on the user worker.
   
   So the transmit time elapsed would be something like [GET_WORK_CREATION_END 
time, last chunk arrived in user worker].
   Then we can scale the components of latency for transmitting (ie 
GET_WORK_IN_TRANSIT_TO_USER_WORKER and GET_WORK_IN_TRANSIT_TO_DISPATCHER) so 
that their sum equals the transmit time.
   
   I think it's worthwhile as we are looking into low-latency processing 
because it isn't necessarily guaranteed that user worker processing will take 
longer and it's confusing if the total of all the latencies could be larger 
than the actual total time to process.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] scwhittle commented on a diff in pull request #26085: Populate getWorkStream latencies in dataflow streaming worker harness

Reply via email to