[
https://issues.apache.org/jira/browse/BEAM-12088?focusedWorklogId=576356&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-576356
]
ASF GitHub Bot logged work on BEAM-12088:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 02/Apr/21 23:37
Start Date: 02/Apr/21 23:37
Worklog Time Spent: 10m
Work Description: iemejia commented on pull request #14417:
URL: https://github.com/apache/beam/pull/14417#issuecomment-812755129
About the question about the exact cause. The method used on the Structured
Streaming runner differed from the Classic/Portable implementation in two
aspects:
(1) It did not validate the existence of the files to stage first so it
could end up trying to stage non existent files and (2) if the user had not set
up a tempLocation it would fail, instead of failing back to a default tmp
directory as this implementation does:
```java
if (!isLocalSparkMaster(options)) {
List<String> filesToStage =
options.getFilesToStage().stream()
.map(File::new)
.filter(File::exists)
.map(
file -> {
return file.getAbsolutePath();
})
.collect(Collectors.toList());
options.setFilesToStage(
PipelineResources.prepareFilesForStaging(
filesToStage,
MoreObjects.firstNonNull(
options.getTempLocation(),
System.getProperty("java.io.tmpdir"))));
}
```
vs
```java
if (!PipelineTranslator.isLocalSparkMaster(options)) {
options.setFilesToStage(
PipelineResources.prepareFilesForStaging(
options.getFilesToStage(), options.getTempLocation()));
}
```
I suppose failing back to a local temp directory may be improved but so far
this is the only way I found to run this on YARN who seems not to follow
exactly Beam's staging pattern.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 576356)
Time Spent: 1h 10m (was: 1h)
> Make file staging uniform among Spark Runners
> ---------------------------------------------
>
> Key: BEAM-12088
> URL: https://issues.apache.org/jira/browse/BEAM-12088
> Project: Beam
> Issue Type: Bug
> Components: runner-spark
> Reporter: Ismaël Mejía
> Assignee: Ismaël Mejía
> Priority: P2
> Fix For: 2.30.0
>
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> Both the Spark Classic and Portable runners share the file staging logic, but
> the Structured Streaming runner is using a different logic even if the
> process should in principle be the same. This manifests on issues when trying
> to deploy pipelines via Hadoop YARN with exceptions related to file Staging.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)