[ 
https://issues.apache.org/jira/browse/FLINK-24897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17445631#comment-17445631
 ] 

Biao Geng commented on FLINK-24897:
-----------------------------------

Hi [~trohrmann] and [~wangyang0918] thank you very much for your reply.
I agree with Till's suggestion about reusing the existing logic to include 
{{usrlib}} in user classloader.
Yang's questions are also helpful and critical: 
*A summary of my answer abourt {{{}usrlib{}}}:*
0. We should ship {{usrlib}} by default like what we have done for {{lib}} dir.
1. We should avoid uploading it again and not add classes in it into system 
path if users specify {{usrlib}} again in the {{yarn.ship-files}} option.
2. It should work for per-job mode
3. Only when UserJarInclusion is DISABLED will {{usrlib}} take effect in 
per-job mode. But we should consider the default value of {{UserJarInclusion}} 
option.

*Datail:*

Q1:
Currently, I think we should ship {{usrlib}} by default if it exists because 
AFAIK, {{usrlib}} is the default userClassPath which is defined by flink. If we 
ask the user to explicitly specify it, it is somehow waste the flink's contract 
with users. 
When users specify a shipped directory named as "usrlib", I think there are 3 
options:
Option1: skip it
Option2: report error
Option3: do nothing but just upload it and add files in {{usrlib}} into system 
classpaths

Option1 seems to be easiest, just as what we have done for {{flink_dist.jar}} 
when users specify {{lib}} in ship files.
Option3 is worthwhile to mention as if users specify {{usrlib}} in ship files, 
files in {{usrlib}} will be added into system classpaths but if users use 
child-first resolve order, files in {{usrlib}} will also be loaded by 
UserClassLoader as they are in userClassPath as well. Bad things happen If 
users choose parent-first resolve order, files in {{usrlib}} will be loaded by 
AppClassLoader which breaks the design. 
So, in summary, I think skipping it is a better one.

Q2:
After checking codes about {{FileJobGraphRetriever}} and 
{{{}YarnJobClusterEntrypoint{}}}, I think we have prepared for using {{usrlib}} 
if we upload it to the cluster.

Q3:
I agree only when UserJarInclusion is DISABLED will {{usrlib}} take effect in 
per-job mode. But currently default value of UserJarInclusion is {{ORDERED}} 
and works for all 3 modes(per job, session, app). If we agree the {{usrlib}} 
should be shipped automatically, we may need to consider the default value of 
this option if we want to use UserClassLoader to load jars in {{{}usrlib{}}}.

> Enable application mode on YARN to use usrlib
> ---------------------------------------------
>
>                 Key: FLINK-24897
>                 URL: https://issues.apache.org/jira/browse/FLINK-24897
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / YARN
>            Reporter: Biao Geng
>            Priority: Major
>
> Hi there, 
> I am working to utilize application mode to submit flink jobs to YARN cluster 
> but I find that currently there is no easy way to ship my user-defined 
> jars(e.g. some custom connectors or udf jars that would be shared by some 
> jobs) and ask the FlinkUserCodeClassLoader to load classes in these jars. 
> I checked some relevant jiras, like  FLINK-21289. In k8s mode, there is a 
> solution that users can use `usrlib` directory to store their user-defined 
> jars and these jars would be loaded by FlinkUserCodeClassLoader when the job 
> is executed on JM/TM.
> But on YARN mode, `usrlib` does not work as that:
> In this method(org.apache.flink.yarn.YarnClusterDescriptor#addShipFiles), if 
> I want to use `yarn.ship-files` to ship `usrlib` from my flink client(in my 
> local machine) to remote cluster, I must not set  UserJarInclusion to 
> DISABLED due to the checkArgument(). However, if I do not set that option to 
> DISABLED, the user jars to be shipped will be added into systemClassPaths. As 
> a result, classes in those user jars will be loaded by AppClassLoader. 
> But if I do not ship these jars, there is no convenient way to utilize these 
> jars in my flink run command. Currently, all I can do seems to use `-C` 
> option, which means I have to upload my jars to some shared store first and 
> then use these remote paths. It is not so perfect as we have already make it 
> possible to ship jars or files directly and we also introduce `usrlib` in 
> application mode on YARN. It would be more user-friendly if we can allow 
> shipping `usrlib` from local to remote cluster while using 
> FlinkUserCodeClassLoader to load classes in the jars in `usrlib`.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to