[
https://issues.apache.org/jira/browse/FLINK-26030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Biao Geng updated FLINK-26030:
------------------------------
Description:
Currently, we utilize
{{org.apache.flink.runtime.entrypoint.ClusterEntrypointUtils#tryFindUserLibDirectory}}
to locate usrlib in both flink client and cluster side.
This method relies on the value of environment variable {{FLINK_LIB_DIR}} to
find the {{{}usrlib{}}}.
It makes sense in client side since in {{{}bin/config.sh{}}}, {{FLINK_LIB_DIR}}
will be set by default(i.e. {{FLINK_HOME/lib}} if not exists. But in YARN
cluster's containers, when we want to reuse this method to find {{{}usrlib{}}},
as the YARN usually starts the process using commands like
{quote}/bin/bash -c /usr/lib/jvm/java-1.8.0/bin/java -Xmx1073741824
-Xms1073741824
-XX:MaxMetaspaceSize=268435456org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint
-D jobmanager.memory.off-heap.size=134217728b -D
jobmanager.memory.jvm-overhead.min=201326592b -D
jobmanager.memory.jvm-metaspace.size=268435456b -D
jobmanager.memory.heap.size=1073741824b -D
jobmanager.memory.jvm-overhead.max=201326592b ...
{quote}
{{FLINK_LIB_DIR}} is not guaranteed to be set in such case. Current codes will
use current working dir to locate the {{usrlib}} which is correct in most
cases. But bad things can happen if the machine which the YARN container
resides in has already set {{FLINK_LIB_DIR}} to a different folder. In that
case, codes will try to find {{usrlib}} in a undesired place.
One possible solution would be overriding the {{FLINK_LIB_DIR}} in YARN
container env to the {{lib}} dir under YARN's working dir.
was:
Currently, we utilize
{{org.apache.flink.runtime.entrypoint.ClusterEntrypointUtils#tryFindUserLibDirectory}}
to locate usrlib in both flink client and cluster side.
This method relies on the value of environment variable {{FLINK_LIB_DIR}} to
find the {{usrlib}}.
It makes sense in client side since in {{bin/config.sh}}, {{FLINK_LIB_DIR}}
will be set by default(i.e. {{FLINK_HOME/lib}} if not exists. But in YARN
cluster's containers, when we want to reuse this method to find {{usrlib}}, as
the YARN usually starts the process using commands like
bq. /bin/bash -c /usr/lib/jvm/java-1.8.0/bin/java -Xmx1073741824 -Xms1073741824
-XX:MaxMetaspaceSize=268435456org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint
-D jobmanager.memory.off-heap.size=134217728b -D
jobmanager.memory.jvm-overhead.min=201326592b -D
jobmanager.memory.jvm-metaspace.size=268435456b -D
jobmanager.memory.heap.size=1073741824b -D
jobmanager.memory.jvm-overhead.max=201326592b ...
{{FLINK_LIB_DIR}} is not guaranteed to be set in such case. Current codes will
use current working dir to locate the {{usrlib}} which is correct in most
cases. But bad things can happen if the machine which the YARN container
resides in has already set {{FLINK_LIB_DIR}} to a different folder. In that
case, codes will try to find {{usrlib}} in a undesired place.
One possible solution would be overriding the {{FLINK_LIB_DIR}} in YARN
container env to the {{lib}} dir under YARN's workding dir.
> Set FLINK_LIB_DIR to 'lib' under working dir in YARN containers
> ---------------------------------------------------------------
>
> Key: FLINK-26030
> URL: https://issues.apache.org/jira/browse/FLINK-26030
> Project: Flink
> Issue Type: Bug
> Components: Deployment / YARN
> Reporter: Biao Geng
> Priority: Minor
>
> Currently, we utilize
> {{org.apache.flink.runtime.entrypoint.ClusterEntrypointUtils#tryFindUserLibDirectory}}
> to locate usrlib in both flink client and cluster side.
> This method relies on the value of environment variable {{FLINK_LIB_DIR}} to
> find the {{{}usrlib{}}}.
> It makes sense in client side since in {{{}bin/config.sh{}}},
> {{FLINK_LIB_DIR}} will be set by default(i.e. {{FLINK_HOME/lib}} if not
> exists. But in YARN cluster's containers, when we want to reuse this method
> to find {{{}usrlib{}}}, as the YARN usually starts the process using commands
> like
> {quote}/bin/bash -c /usr/lib/jvm/java-1.8.0/bin/java -Xmx1073741824
> -Xms1073741824
> -XX:MaxMetaspaceSize=268435456org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint
> -D jobmanager.memory.off-heap.size=134217728b -D
> jobmanager.memory.jvm-overhead.min=201326592b -D
> jobmanager.memory.jvm-metaspace.size=268435456b -D
> jobmanager.memory.heap.size=1073741824b -D
> jobmanager.memory.jvm-overhead.max=201326592b ...
> {quote}
> {{FLINK_LIB_DIR}} is not guaranteed to be set in such case. Current codes
> will use current working dir to locate the {{usrlib}} which is correct in
> most cases. But bad things can happen if the machine which the YARN container
> resides in has already set {{FLINK_LIB_DIR}} to a different folder. In that
> case, codes will try to find {{usrlib}} in a undesired place.
> One possible solution would be overriding the {{FLINK_LIB_DIR}} in YARN
> container env to the {{lib}} dir under YARN's working dir.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)