[
https://issues.apache.org/jira/browse/FLINK-16544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068496#comment-17068496
]
Angel Barragán commented on FLINK-16544:
----------------------------------------
Hi tried several times to set "web.upload.dir" and "web.upload.dir" to a
different directory on yarn, but it doesn't work, it always sets the directory
on /tmp.
Is it prefixed on code when in a Yarn environment?
> Flink FileSystem for web.uploadDir
> ----------------------------------
>
> Key: FLINK-16544
> URL: https://issues.apache.org/jira/browse/FLINK-16544
> Project: Flink
> Issue Type: Improvement
> Components: API / Core
> Affects Versions: 1.10.0
> Reporter: Angel Barragán
> Priority: Minor
>
> Currently the configuration properties "web.upload.dir" and "web.upload.dir"
> only supports paths on the local filesystem. When we deploy Flink under
> another cluster environment like yarn, it is more useful to be able to
> configure those directories to be on HDFS, so the size and maintenance tasks
> are easier, than trying to find out on which node yarn has launched the
> Jobmanager task, and manage the upload directory there.
> In my concrete case, I found this management (let's say disadvantage)
> creating an AWS EMR cluster with Flink, where the default configuration
> creates this directory under /tmp on the local filesystem of the CORE node
> where the JobManager is deployed by Yarn. We found that EMR cluster is also
> configured to fully empty /tmp on a month basis, removing the upload
> directory for Flink, and in that case makigng Flink to fail when you try to
> submit a new Job. We had to recreate the directory manually.
> The first solution I tried is to change the above configuration properties to
> use hdfs like we did with configuration property "state.checkpoints.dir", and
> we found it doesn't work on yarn environment. So I checked Flink code to see
> how this configuration is being used and found it is the local file system.
> I think, that this solution would be an improvement on the management for
> Flink when running on another Cluster environment where we can use a shared
> storage like HDFS or S3.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)