[ https://issues.apache.org/jira/browse/FLINK-16544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Flink Jira Bot updated FLINK-16544: ----------------------------------- Labels: stale-minor (was: ) > Flink FileSystem for web.uploadDir > ---------------------------------- > > Key: FLINK-16544 > URL: https://issues.apache.org/jira/browse/FLINK-16544 > Project: Flink > Issue Type: Improvement > Components: API / Core > Affects Versions: 1.10.0 > Reporter: Angel Barragán > Priority: Minor > Labels: stale-minor > > Currently the configuration properties "web.upload.dir" and "web.upload.dir" > only supports paths on the local filesystem. When we deploy Flink under > another cluster environment like yarn, it is more useful to be able to > configure those directories to be on HDFS, so the size and maintenance tasks > are easier, than trying to find out on which node yarn has launched the > Jobmanager task, and manage the upload directory there. > In my concrete case, I found this management (let's say disadvantage) > creating an AWS EMR cluster with Flink, where the default configuration > creates this directory under /tmp on the local filesystem of the CORE node > where the JobManager is deployed by Yarn. We found that EMR cluster is also > configured to fully empty /tmp on a month basis, removing the upload > directory for Flink, and in that case makigng Flink to fail when you try to > submit a new Job. We had to recreate the directory manually. > The first solution I tried is to change the above configuration properties to > use hdfs like we did with configuration property "state.checkpoints.dir", and > we found it doesn't work on yarn environment. So I checked Flink code to see > how this configuration is being used and found it is the local file system. > I think, that this solution would be an improvement on the management for > Flink when running on another Cluster environment where we can use a shared > storage like HDFS or S3. > -- This message was sent by Atlassian Jira (v8.3.4#803005)