iemejia edited a comment on pull request #14417:
URL: https://github.com/apache/beam/pull/14417#issuecomment-812755129
About the question about the exact cause. The method used on the Structured
Streaming runner differed from the Classic/Portable implementation in two
aspects:
(1) It did not validate the existence of the files to stage first so it
could end up trying to stage non existent files and (2) if the user had not set
up a tempLocation it would fail, instead of failing back to a default tmp
directory as this implementation does:
```java
if (!isLocalSparkMaster(options)) {
List<String> filesToStage =
options.getFilesToStage().stream()
.map(File::new)
.filter(File::exists)
.map(
file -> {
return file.getAbsolutePath();
})
.collect(Collectors.toList());
options.setFilesToStage(
PipelineResources.prepareFilesForStaging(
filesToStage,
MoreObjects.firstNonNull(
options.getTempLocation(),
System.getProperty("java.io.tmpdir"))));
}
```
vs
```java
if (!PipelineTranslator.isLocalSparkMaster(options)) {
options.setFilesToStage(
PipelineResources.prepareFilesForStaging(
options.getFilesToStage(), options.getTempLocation()));
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]