Angel Barragán created FLINK-16544:
--------------------------------------
Summary: Flink FileSystem for web.uploadDir
Key: FLINK-16544
URL: https://issues.apache.org/jira/browse/FLINK-16544
Project: Flink
Issue Type: Improvement
Components: API / Core
Affects Versions: 1.10.0
Reporter: Angel Barragán
Currently the configuration properties "web.upload.dir" and "web.upload.dir"
only supports paths on the local filesystem. When we deploy Flink under another
cluster environment like yarn, it is more useful to be able to configure those
directories to be on HDFS, so the size and maintenance tasks are easier, than
trying to find out on which node yarn has launched the Jobmanager task, and
manage the upload directory there.
In my concrete case, I found this management (let's say disadvantage) creating
an AWS EMR cluster with Flink, where the default configuration creates this
directory under /tmp on the local filesystem of the CORE node where the
JobManager is deployed by Yarn. We found that EMR cluster is also configured to
fully empty /tmp on a month basis, removing the upload directory for Flink, and
in that case makigng Flink to fail when you try to submit a new Job. We had to
recreate the directory manually.
The first solution I tried is to change the above configuration properties to
use hdfs like we did with configuration property "state.checkpoints.dir", and
we found it doesn't work on yarn environment. So I checked Flink code to see
how this configuration is being used and found it is the local file system.
I think, that this solution would be an improvement on the management for Flink
when running on another Cluster environment where we can use a shared storage
like HDFS or S3.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)