chamikaramj commented on a change in pull request #16550:
URL: https://github.com/apache/beam/pull/16550#discussion_r788186901
##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java
##########
@@ -979,6 +982,8 @@ public static FileNaming relativeFileNaming(
abstract Builder<DestinationT, UserT> setSharding(
PTransform<PCollection<UserT>, PCollectionView<Integer>> sharding);
+ abstract Builder<DestinationT, UserT> setMoveOption(StandardMoveOptions
moveOption);
Review comment:
What is the high level reason for making this customizable.
I think MoveOptions are internal implementation detail for FileIO that
determines that it behaves correctly when workitems are retired by the runner.
Any changes to retry behavior could result in hard to track errors for the
user when any failures are retried.
##########
File path:
sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java
##########
@@ -799,11 +801,7 @@ final void moveToOutputFiles(
}
// During a failure case, files may have been deleted in an earlier
step. Thus
// we ignore missing files here.
- FileSystems.rename(
- srcFiles,
- dstFiles,
- StandardMoveOptions.IGNORE_MISSING_FILES,
- StandardMoveOptions.SKIP_IF_DESTINATION_EXISTS);
Review comment:
This was set to "SKIP_IF_DESTINATION_EXISTS" to make sure that we do not
fail when partially completed workitems are retried by the runner. I don't
think making this customizable is a good option since passing different move
options could result in hard to track (and inconsistent) retry behavior changes
for users.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]