[
https://issues.apache.org/jira/browse/FLINK-24897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17445631#comment-17445631
]
Biao Geng commented on FLINK-24897:
-----------------------------------
Hi [~trohrmann] and [~wangyang0918] thank you very much for your reply.
I agree with Till's suggestion about reusing the existing logic to include
{{usrlib}} in user classloader.
Yang's questions are also helpful and critical:
*A summary of my answer abourt {{{}usrlib{}}}:*
0. We should ship {{usrlib}} by default like what we have done for {{lib}} dir.
1. We should avoid uploading it again and not add classes in it into system
path if users specify {{usrlib}} again in the {{yarn.ship-files}} option.
2. It should work for per-job mode
3. Only when UserJarInclusion is DISABLED will {{usrlib}} take effect in
per-job mode. But we should consider the default value of {{UserJarInclusion}}
option.
*Datail:*
Q1:
Currently, I think we should ship {{usrlib}} by default if it exists because
AFAIK, {{usrlib}} is the default userClassPath which is defined by flink. If we
ask the user to explicitly specify it, it is somehow waste the flink's contract
with users.
When users specify a shipped directory named as "usrlib", I think there are 3
options:
Option1: skip it
Option2: report error
Option3: do nothing but just upload it and add files in {{usrlib}} into system
classpaths
Option1 seems to be easiest, just as what we have done for {{flink_dist.jar}}
when users specify {{lib}} in ship files.
Option3 is worthwhile to mention as if users specify {{usrlib}} in ship files,
files in {{usrlib}} will be added into system classpaths but if users use
child-first resolve order, files in {{usrlib}} will also be loaded by
UserClassLoader as they are in userClassPath as well. Bad things happen If
users choose parent-first resolve order, files in {{usrlib}} will be loaded by
AppClassLoader which breaks the design.
So, in summary, I think skipping it is a better one.
Q2:
After checking codes about {{FileJobGraphRetriever}} and
{{{}YarnJobClusterEntrypoint{}}}, I think we have prepared for using {{usrlib}}
if we upload it to the cluster.
Q3:
I agree only when UserJarInclusion is DISABLED will {{usrlib}} take effect in
per-job mode. But currently default value of UserJarInclusion is {{ORDERED}}
and works for all 3 modes(per job, session, app). If we agree the {{usrlib}}
should be shipped automatically, we may need to consider the default value of
this option if we want to use UserClassLoader to load jars in {{{}usrlib{}}}.
> Enable application mode on YARN to use usrlib
> ---------------------------------------------
>
> Key: FLINK-24897
> URL: https://issues.apache.org/jira/browse/FLINK-24897
> Project: Flink
> Issue Type: Improvement
> Components: Deployment / YARN
> Reporter: Biao Geng
> Priority: Major
>
> Hi there,
> I am working to utilize application mode to submit flink jobs to YARN cluster
> but I find that currently there is no easy way to ship my user-defined
> jars(e.g. some custom connectors or udf jars that would be shared by some
> jobs) and ask the FlinkUserCodeClassLoader to load classes in these jars.
> I checked some relevant jiras, like FLINK-21289. In k8s mode, there is a
> solution that users can use `usrlib` directory to store their user-defined
> jars and these jars would be loaded by FlinkUserCodeClassLoader when the job
> is executed on JM/TM.
> But on YARN mode, `usrlib` does not work as that:
> In this method(org.apache.flink.yarn.YarnClusterDescriptor#addShipFiles), if
> I want to use `yarn.ship-files` to ship `usrlib` from my flink client(in my
> local machine) to remote cluster, I must not set UserJarInclusion to
> DISABLED due to the checkArgument(). However, if I do not set that option to
> DISABLED, the user jars to be shipped will be added into systemClassPaths. As
> a result, classes in those user jars will be loaded by AppClassLoader.
> But if I do not ship these jars, there is no convenient way to utilize these
> jars in my flink run command. Currently, all I can do seems to use `-C`
> option, which means I have to upload my jars to some shared store first and
> then use these remote paths. It is not so perfect as we have already make it
> possible to ship jars or files directly and we also introduce `usrlib` in
> application mode on YARN. It would be more user-friendly if we can allow
> shipping `usrlib` from local to remote cluster while using
> FlinkUserCodeClassLoader to load classes in the jars in `usrlib`.
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)