Steve Niemitz created BEAM-10395:
------------------------------------

             Summary: Dataflow runner should deduplicate files to stage by 
destination 
                 Key: BEAM-10395
                 URL: https://issues.apache.org/jira/browse/BEAM-10395
             Project: Beam
          Issue Type: Improvement
          Components: runner-dataflow
            Reporter: Steve Niemitz
            Assignee: Steve Niemitz


If a pipeline contains multiple files with the same destination path, the 
dataflow runner will try to stage them both in parallel, resulting in the 
upload usually failing (due to conflicting uploads).

The runner should only upload one file per destination, and ideally check that 
the sources are the same as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to