Steve Niemitz created BEAM-10395:
------------------------------------
Summary: Dataflow runner should deduplicate files to stage by
destination
Key: BEAM-10395
URL: https://issues.apache.org/jira/browse/BEAM-10395
Project: Beam
Issue Type: Improvement
Components: runner-dataflow
Reporter: Steve Niemitz
Assignee: Steve Niemitz
If a pipeline contains multiple files with the same destination path, the
dataflow runner will try to stage them both in parallel, resulting in the
upload usually failing (due to conflicting uploads).
The runner should only upload one file per destination, and ideally check that
the sources are the same as well.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)