[ 
https://issues.apache.org/jira/browse/BEAM-10303?focusedWorklogId=466432&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-466432
 ]

ASF GitHub Bot logged work on BEAM-10303:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 04/Aug/20 20:16
            Start Date: 04/Aug/20 20:16
    Worklog Time Spent: 10m 
      Work Description: lukecwik commented on a change in pull request #12430:
URL: https://github.com/apache/beam/pull/12430#discussion_r465304871



##########
File path: 
sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnApiDoFnRunner.java
##########
@@ -1029,7 +1040,27 @@ public double getProgress() {
   private Progress getProgress() {
     synchronized (splitLock) {
       if (currentTracker instanceof RestrictionTracker.HasProgress) {
-        return ((HasProgress) currentTracker).getProgress();
+        Progress progress = ((HasProgress) currentTracker).getProgress();
+        double totalWork = progress.getWorkCompleted() + 
progress.getWorkRemaining();
+        double completed =
+            totalWork * currentWindowIterator.previousIndex() + 
progress.getWorkCompleted();
+        double remaining =
+            totalWork * (currentElement.getWindows().size() - 
currentWindowIterator.nextIndex())
+                + progress.getWorkRemaining();
+        return Progress.from(completed, remaining);
+      }
+    }
+    return null;
+  }
+
+  private Progress getProgressFromWindowObservingTruncate(double 
elementCompleted) {
+    synchronized (splitLock) {
+      if (currentWindowIterator != null) {

Review comment:
       Thinking about this more, I do believe that the progress does need to be 
reported as a metric so a runner can choose a split fraction and also compute 
the amount of remaining work and/or completion time estimate. It looks like 
either:
   1) Need to make work completed/remaining take into account downstream 
processing
   OR
   2) Need to add a metric that represents work in progress so that a runner 
can compute the amount of work being done (without this we can't figure out how 
much the work remaining downstream is relative to an upstream node).
   
   I'm not a big fan of 1) since it means that this metric is intrinsically 
tied to the state of other transforms while in 2) we are adding something new.

##########
File path: 
sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnApiDoFnRunner.java
##########
@@ -1029,7 +1040,27 @@ public double getProgress() {
   private Progress getProgress() {
     synchronized (splitLock) {
       if (currentTracker instanceof RestrictionTracker.HasProgress) {
-        return ((HasProgress) currentTracker).getProgress();
+        Progress progress = ((HasProgress) currentTracker).getProgress();
+        double totalWork = progress.getWorkCompleted() + 
progress.getWorkRemaining();
+        double completed =
+            totalWork * currentWindowIterator.previousIndex() + 
progress.getWorkCompleted();
+        double remaining =
+            totalWork * (currentElement.getWindows().size() - 
currentWindowIterator.nextIndex())
+                + progress.getWorkRemaining();
+        return Progress.from(completed, remaining);
+      }
+    }
+    return null;
+  }
+
+  private Progress getProgressFromWindowObservingTruncate(double 
elementCompleted) {
+    synchronized (splitLock) {
+      if (currentWindowIterator != null) {

Review comment:
       Thinking about this more, I do believe that the progress does need to be 
reported as a metric so a runner can choose a split fraction and also compute 
the amount of remaining work and/or completion time estimate. It looks like 
either:
   1) Need to make work completed/remaining take into account downstream 
processing
   2) Need to add a metric that represents work in progress so that a runner 
can compute the amount of work being done (without this we can't figure out how 
much the work remaining downstream is relative to an upstream node).
   
   I'm not a big fan of 1) since it means that this metric is intrinsically 
tied to the state of other transforms while in 2) we are adding something new.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 466432)
    Time Spent: 4h  (was: 3h 50m)

> FnApiDoFnRunner window observing optimization
> ---------------------------------------------
>
>                 Key: BEAM-10303
>                 URL: https://issues.apache.org/jira/browse/BEAM-10303
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-harness
>            Reporter: Luke Cwik
>            Assignee: Luke Cwik
>            Priority: P2
>              Labels: portability
>          Time Spent: 4h
>  Remaining Estimate: 0h
>
> Currently the FnApiDoFnRunner processes each element within it's own window. 
> There is an easy optimization where we process the element once if and only 
> if the function doesn't observe the window (either directly or indirectly via 
> side inputs/state/...).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to