lukecwik commented on code in PR #23882:
URL: https://github.com/apache/beam/pull/23882#discussion_r1012371539


##########
sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnApiDoFnRunner.java:
##########
@@ -999,14 +1022,7 @@ public void onClaimFailed(PositionT position) {}
         }
       }
     } finally {

Review Comment:
   I think you missed pushing out your latest version of this code.
   
   In general your right that a DoFn could throw an exception which is caught 
by another DoFn further up the stack. This wouldn't apply to the SplittablDoFn 
since it is at the root of the stage for all runners so we are ok. There is 
also the problem were the SplittableDoFn itself throws an exception during 
processElement and it hasn't yet escaped the DoFn (whether that exception was 
caused during usage of the RestrictionTracker or something else). In this 
scenario we need to assume that the 
RestrictionTracker/WatermarkEstimatorState/DoFn state that getSize relies on 
isn't corrupted and a proper split can be returned. The only way I think we 
could resolve this completely is if we move the splitting logic to actually 
happen on the bundle processing thread by inserting points where we check if 
there is an outstanding split and process it but there is no time bound on how 
long we may have to wait before handling a split.
   
   We can solve the DoFn exception propagation problem by ensuring that 
uncaught exceptions that escape a DoFn fail the bundle in the 
PCollectionConsumerRegistry and also plug this logic into places when the DoFn 
invokes system code that causes a terminal bundle failure (e.g. state gets 
corrupted). The issue comes down to whether we trust that the user is 
specifically handling the failure and choosing to continue or are they just 
blindly catching everything.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to