iemejia edited a comment on pull request #14417:
URL: https://github.com/apache/beam/pull/14417#issuecomment-812755129


   About the question about the exact cause. The method used on the Structured 
Streaming runner differed from the Classic/Portable implementation in two 
aspects:
   (1) It did not validate the existence of the files to stage first so it 
could end up trying to stage non existent files and (2) if the user had not set 
up a tempLocation it would fail, instead of failing back to a default tmp 
directory as this implementation does:
   ```java
       if (!isLocalSparkMaster(options)) {
         List<String> filesToStage =
             options.getFilesToStage().stream()
                 .map(File::new)
                 .filter(File::exists)
                 .map(
                     file -> {
                       return file.getAbsolutePath();
                     })
                 .collect(Collectors.toList());
         options.setFilesToStage(
             PipelineResources.prepareFilesForStaging(
                 filesToStage,
                 MoreObjects.firstNonNull(
                     options.getTempLocation(), 
System.getProperty("java.io.tmpdir"))));
       }
   ```
   vs
   ```java
       if (!PipelineTranslator.isLocalSparkMaster(options)) {
         options.setFilesToStage(
             PipelineResources.prepareFilesForStaging(
                 options.getFilesToStage(), options.getTempLocation()));
       }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to