Dylan Meissner created FLINK-35833:
--------------------------------------
Summary: ArtifactFetchManager always requires writable filesystem
Key: FLINK-35833
URL: https://issues.apache.org/jira/browse/FLINK-35833
Project: Flink
Issue Type: Bug
Components: Deployment / Kubernetes
Affects Versions: 1.19.1, 1.19.0
Reporter: Dylan Meissner
FLINK-28915 added support for remote job jar fetching (HTTPS, S3, etc) but
broke the default behavior of local jar fetching when running application on
non-writable filesystems.
Running application on non-writable filesystem is a common scenario in
environments when jar is published with the Docker container image. In this
case, jar URI is usually specified as value like
local://opt/flink/usrlib/my-app.jar.
A local jar does not get "fetched", with no need to create an intermediate
directory to copy fetched artifact to. However, the ArtifactFetchManager always
attempts to create a directory before fetching, regardless of which fetcher
would do the work. On non-writable filesystem, the outcome is a runtime
exception:
{{java.lang.RuntimeException: org.apache.flink.util.FlinkRuntimeException:
Failed}}
{{to create parent(s) for given base dir:}}
{{/opt/flink/artifacts/<namesapce>/<job name>}}
{{ at
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:158)
~[flink-dist-1.19.1.jar:1.19.1]}}
{{ at
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.getPackagedProgramRetriever(KubernetesApplicationClusterEntrypoint.java:129)
~[flink-dist-1.19.1.jar:1.19.1]}}
{{ at
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.getPackagedProgram(KubernetesApplicationClusterEntrypoint.java:111)
~[flink-dist-1.19.1.jar:1.19.1]}}
{{ at
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.lambda$main$0(KubernetesApplicationClusterEntrypoint.java:85)
~[flink-dist-1.19.1.jar:1.19.1]}}
{{ at
org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
~[flink-dist-1.19.1.jar:1.19.1]}}
{{ at
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.main(KubernetesApplicationClusterEntrypoint.java:85)
[flink-dist-1.19.1.jar:1.19.1]}}
{{Caused by: org.apache.flink.util.FlinkRuntimeException: Failed to create
parent(s) for given base dir:
/opt/flink/artifacts/app07772/sample-app-flink-1-19}}
{{ at
org.apache.flink.client.program.artifact.ArtifactUtils.createMissingParents(ArtifactUtils.java:50)
~[flink-dist-1.19.1.jar:1.19.1]}}
{{ at
org.apache.flink.client.program.artifact.ArtifactFetchManager.fetchArtifacts(ArtifactFetchManager.java:123)
~[flink-dist-1.19.1.jar:1.19.1]}}
{{ at
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:156)
~[flink-dist-1.19.1.jar:1.19.1]}}
{{ ... 5 more}}
{{Caused by: java.io.IOException: Cannot create directory
'/opt/flink/artifacts/app07772'.}}
{{ at org.apache.commons.io.FileUtils.mkdirs(FileUtils.java:2289)
~[flink-dist-1.19.1.jar:1.19.1]}}
{{ at org.apache.commons.io.FileUtils.forceMkdir(FileUtils.java:1376)
~[flink-dist-1.19.1.jar:1.19.1]}}
{{ at org.apache.commons.io.FileUtils.forceMkdirParent(FileUtils.java:1394)
~[flink-dist-1.19.1.jar:1.19.1]}}
{{ at
org.apache.flink.client.program.artifact.ArtifactUtils.createMissingParents(ArtifactUtils.java:46)
~[flink-dist-1.19.1.jar:1.19.1]}}
{{ at
org.apache.flink.client.program.artifact.ArtifactFetchManager.fetchArtifacts(ArtifactFetchManager.java:123)
~[flink-dist-1.19.1.jar:1.19.1]}}
{{ at
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:156)
~[flink-dist-1.19.1.jar:1.19.1]}}
{{ ... 5 more}}
A workaround is to specify a location that allows the process to create
directories e.g., user.artifacts.base-dir: /tmp/foo.
A solution proposal is to enable each fetcher to decide whether to create the
intermediate directory or fail.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)