[ 
https://issues.apache.org/jira/browse/BEAM-11494?focusedWorklogId=553429&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553429
 ]

ASF GitHub Bot logged work on BEAM-11494:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 17/Feb/21 07:02
            Start Date: 17/Feb/21 07:02
    Worklog Time Spent: 10m 
      Work Description: pabloem commented on a change in pull request #13558:
URL: https://github.com/apache/beam/pull/13558#discussion_r577368164



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java
##########
@@ -401,16 +412,40 @@ public ResourceId apply(@Nonnull Metadata input) {
     List<ResourceId> srcToHandle = new ArrayList<>();
     List<ResourceId> destToHandle = new ArrayList<>();
 
-    List<MatchResult> matchResults = matchResources(srcResourceIds);
-    for (int i = 0; i < matchResults.size(); ++i) {
-      if (!matchResults.get(i).status().equals(Status.NOT_FOUND)) {
-        srcToHandle.add(srcResourceIds.get(i));
-        destToHandle.add(destResourceIds.get(i));
+    List<MatchResult> matchSrcResults = matchResources(srcResourceIds);
+    List<MatchResult> matchDestResults = new ArrayList<>();
+    if (skipExistingDest) {
+      matchDestResults = matchResources(destResourceIds);
+    }
+
+    for (int i = 0; i < matchSrcResults.size(); ++i) {
+      if (matchSrcResults.get(i).status().equals(Status.NOT_FOUND) && 
ignoreMissingSrc) {
+        // If the source is not found, and we are ignoring found source files, 
then we skip it.
+        continue;
       }
+      if (skipExistingDest
+          && matchDestResults.get(i).status().equals(Status.OK)
+          && filesMatch(
+              matchDestResults.get(i).metadata().get(0),
+              matchSrcResults.get(i).metadata().get(0))) {
+        // If the destination exists, and we are skipping when destinations 
exist, then we skip.
+        continue;
+      }
+      srcToHandle.add(srcResourceIds.get(i));
+      destToHandle.add(destResourceIds.get(i));
     }
     return KV.of(srcToHandle, destToHandle);
   }
 
+  private static boolean filesMatch(MatchResult.Metadata first, 
MatchResult.Metadata second) {
+    if (!first.checksum().isPresent() && !second.checksum().isPresent()) {

Review comment:
       changed this to null. If both checksums are nullable, only then should 
we rely on the file size - otherwise we should always rely on the checksum (if 
only one file reports a checksum and the other doesnt, then they are not equal, 
which is what happens in the next section)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 553429)
    Time Spent: 1h 40m  (was: 1.5h)

> FileIO.Write overwrites destination files on retries
> ----------------------------------------------------
>
>                 Key: BEAM-11494
>                 URL: https://issues.apache.org/jira/browse/BEAM-11494
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-files
>            Reporter: Pablo Estrada
>            Assignee: Pablo Estrada
>            Priority: P2
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Users have reported cases of FileIO.Write becoming stuck or failing due to 
> overwriting destination files.
> The failure/stuckness occurs because there are some file system buckets with 
> strict retention policies that do not allow files to be deleted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to