[ 
https://issues.apache.org/jira/browse/FLINK-35358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rasmus Thygesen updated FLINK-35358:
------------------------------------
    Description: 
We have been using the following code snippet in our Dockerfiles for running a 
Flink job in application mode

 
{code:java}
FROM flink:1.18.1-scala_2.12-java17

COPY --from=build /app/target/my-job*.jar /opt/flink/usrlib/artifacts/my-job.jar

USER flink {code}
 

Which has been working since at least around Flink 1.14, but the 1.19 update 
has broken our Dockerfiles. The fix is to put the jar file a step further out 
so the code snippet becomes

 
{code:java}
FROM flink:1.18.1-scala_2.12-java17

COPY --from=build /app/target/my-job*.jar /opt/flink/usrlib/my-job.jar

USER flink  {code}
 

We have not spent too much time looking into what the cause is, but we get the 
stack trace

 
{code:java}
myjob-jobmanager-1   | org.apache.flink.util.FlinkException: Could not load the 
provided entrypoint class.
myjob-jobmanager-1   |     at 
org.apache.flink.client.program.DefaultPackagedProgramRetriever.getPackagedProgram(DefaultPackagedProgramRetriever.java:230)
 ~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     at 
org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.getPackagedProgram(StandaloneApplicationClusterEntryPoint.java:149)
 ~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     at 
org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.lambda$main$0(StandaloneApplicationClusterEntryPoint.java:90)
 ~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     at 
org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
 ~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     at 
org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.main(StandaloneApplicationClusterEntryPoint.java:89)
 [flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   | Caused by: 
org.apache.flink.client.program.ProgramInvocationException: The program's entry 
point class 'my.company.job.MyJob' was not found in the jar file.
myjob-jobmanager-1   |     at 
org.apache.flink.client.program.PackagedProgram.loadMainClass(PackagedProgram.java:481)
 ~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     at 
org.apache.flink.client.program.PackagedProgram.<init>(PackagedProgram.java:153)
 ~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     at 
org.apache.flink.client.program.PackagedProgram.<init>(PackagedProgram.java:65) 
~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     at 
org.apache.flink.client.program.PackagedProgram$Builder.build(PackagedProgram.java:691)
 ~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     at 
org.apache.flink.client.program.DefaultPackagedProgramRetriever.getPackagedProgram(DefaultPackagedProgramRetriever.java:228)
 ~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     ... 4 more
myjob-jobmanager-1   | Caused by: java.lang.ClassNotFoundException: 
my.company.job.MyJob
myjob-jobmanager-1   |     at java.net.URLClassLoader.findClass(Unknown Source) 
~[?:?]
myjob-jobmanager-1   |     at java.lang.ClassLoader.loadClass(Unknown Source) 
~[?:?]
myjob-jobmanager-1   |     at 
org.apache.flink.util.FlinkUserCodeClassLoader.loadClassWithoutExceptionHandling(FlinkUserCodeClassLoader.java:67)
 ~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     at 
org.apache.flink.util.ChildFirstClassLoader.loadClassWithoutExceptionHandling(ChildFirstClassLoader.java:74)
 ~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     at 
org.apache.flink.util.FlinkUserCodeClassLoader.loadClass(FlinkUserCodeClassLoader.java:51)
 ~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     at java.lang.ClassLoader.loadClass(Unknown Source) 
~[?:?]
myjob-jobmanager-1   |     at 
org.apache.flink.util.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.loadClass(FlinkUserCodeClassLoaders.java:197)
 ~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     at java.lang.Class.forName0(Native Method) ~[?:?]
myjob-jobmanager-1   |     at java.lang.Class.forName(Unknown Source) ~[?:?]
myjob-jobmanager-1   |     at 
org.apache.flink.client.program.PackagedProgram.loadMainClass(PackagedProgram.java:479)
 ~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     at 
org.apache.flink.client.program.PackagedProgram.<init>(PackagedProgram.java:153)
 ~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     at 
org.apache.flink.client.program.PackagedProgram.<init>(PackagedProgram.java:65) 
~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     at 
org.apache.flink.client.program.PackagedProgram$Builder.build(PackagedProgram.java:691)
 ~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     at 
org.apache.flink.client.program.DefaultPackagedProgramRetriever.getPackagedProgram(DefaultPackagedProgramRetriever.java:228)
 ~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     ... 4 more{code}
 

I have changed some text in the stack trace to keep it anonymous so it is 
possible there is a typo but that is not the issue. As you can see, the stack 
trace leads to PackagedProgram and DefaultPackagedProgramRetriever to which the 
only commits after Flink 1.18 are [PackagedProgram 
commit|https://github.com/apache/flink/commit/d0ce5349fdf1a611518eba20a169c475ee0b46c5]
 and [DefaultPackagedProgramRetriever 
commit|https://github.com/apache/flink/commit/e63aa12252843d0098a56f3091b28d48aff5b5af]
 and we suspect the culprit is the latter, specifically [this 
line|https://github.com/apache/flink/commit/e63aa12252843d0098a56f3091b28d48aff5b5af#diff-11b5162d6745014c68e96303d26c71bdb88bac068c27834dbdbb7c9089ffbe9fL227]
 which we think has made the artifact check non-recursive. We assume it is 
intended to have your artifacts directly in /opt/flink/usrlib without the 
artifacts directory so we are planning on changing that for our Dockerfiles 
anyway, but it is still a breaking change so we wanted to make an issue on it 
first.

  was:
We have been using the following code snippet in our Dockerfiles for running a 
Flink job in application mode

 

 
{code:java}
FROM flink:1.18.1-scala_2.12-java17

COPY --from=build /app/target/my-job*.jar /opt/flink/usrlib/artifacts/my-job.jar

USER flink {code}
 

 

Which has been working since at least around Flink 1.14, but the 1.19 update 
has broken our Dockerfiles. The fix is to put the jar file a step further out 
so the code snippet becomes

 

 
{code:java}
FROM flink:1.18.1-scala_2.12-java17

COPY --from=build /app/target/my-job*.jar /opt/flink/usrlib/my-job.jar

USER flink  {code}
 

 

We have not spent too much time looking into what the cause is, but we get the 
stack trace

 

 
{code:java}
myjob-jobmanager-1   | org.apache.flink.util.FlinkException: Could not load the 
provided entrypoint class.
myjob-jobmanager-1   |     at 
org.apache.flink.client.program.DefaultPackagedProgramRetriever.getPackagedProgram(DefaultPackagedProgramRetriever.java:230)
 ~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     at 
org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.getPackagedProgram(StandaloneApplicationClusterEntryPoint.java:149)
 ~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     at 
org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.lambda$main$0(StandaloneApplicationClusterEntryPoint.java:90)
 ~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     at 
org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
 ~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     at 
org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.main(StandaloneApplicationClusterEntryPoint.java:89)
 [flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   | Caused by: 
org.apache.flink.client.program.ProgramInvocationException: The program's entry 
point class 'my.company.job.MyJob' was not found in the jar file.
myjob-jobmanager-1   |     at 
org.apache.flink.client.program.PackagedProgram.loadMainClass(PackagedProgram.java:481)
 ~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     at 
org.apache.flink.client.program.PackagedProgram.<init>(PackagedProgram.java:153)
 ~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     at 
org.apache.flink.client.program.PackagedProgram.<init>(PackagedProgram.java:65) 
~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     at 
org.apache.flink.client.program.PackagedProgram$Builder.build(PackagedProgram.java:691)
 ~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     at 
org.apache.flink.client.program.DefaultPackagedProgramRetriever.getPackagedProgram(DefaultPackagedProgramRetriever.java:228)
 ~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     ... 4 more
myjob-jobmanager-1   | Caused by: java.lang.ClassNotFoundException: 
my.company.job.MyJob
myjob-jobmanager-1   |     at java.net.URLClassLoader.findClass(Unknown Source) 
~[?:?]
myjob-jobmanager-1   |     at java.lang.ClassLoader.loadClass(Unknown Source) 
~[?:?]
myjob-jobmanager-1   |     at 
org.apache.flink.util.FlinkUserCodeClassLoader.loadClassWithoutExceptionHandling(FlinkUserCodeClassLoader.java:67)
 ~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     at 
org.apache.flink.util.ChildFirstClassLoader.loadClassWithoutExceptionHandling(ChildFirstClassLoader.java:74)
 ~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     at 
org.apache.flink.util.FlinkUserCodeClassLoader.loadClass(FlinkUserCodeClassLoader.java:51)
 ~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     at java.lang.ClassLoader.loadClass(Unknown Source) 
~[?:?]
myjob-jobmanager-1   |     at 
org.apache.flink.util.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.loadClass(FlinkUserCodeClassLoaders.java:197)
 ~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     at java.lang.Class.forName0(Native Method) ~[?:?]
myjob-jobmanager-1   |     at java.lang.Class.forName(Unknown Source) ~[?:?]
myjob-jobmanager-1   |     at 
org.apache.flink.client.program.PackagedProgram.loadMainClass(PackagedProgram.java:479)
 ~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     at 
org.apache.flink.client.program.PackagedProgram.<init>(PackagedProgram.java:153)
 ~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     at 
org.apache.flink.client.program.PackagedProgram.<init>(PackagedProgram.java:65) 
~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     at 
org.apache.flink.client.program.PackagedProgram$Builder.build(PackagedProgram.java:691)
 ~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     at 
org.apache.flink.client.program.DefaultPackagedProgramRetriever.getPackagedProgram(DefaultPackagedProgramRetriever.java:228)
 ~[flink-dist-1.19.0.jar:1.19.0]
myjob-jobmanager-1   |     ... 4 more{code}
 

 

I have changed some text in the stack trace to keep it anonymous so it is 
possible there is a typo but that is not the issue. As you can see, the stack 
trace leads to PackagedProgram and DefaultPackagedProgramRetriever to which the 
only commits after Flink 1.18 are [PackagedProgram 
commit|https://github.com/apache/flink/commit/d0ce5349fdf1a611518eba20a169c475ee0b46c5]
 and [DefaultPackagedProgramRetriever 
commit|https://github.com/apache/flink/commit/e63aa12252843d0098a56f3091b28d48aff5b5af]
 and we suspect the culprit is the latter, specifically [this 
line|https://github.com/apache/flink/commit/e63aa12252843d0098a56f3091b28d48aff5b5af#diff-11b5162d6745014c68e96303d26c71bdb88bac068c27834dbdbb7c9089ffbe9fL227]
 which we think has made the artifact check non-recursive. We assume it is 
intended to have your artifacts directly in /opt/flink/usrlib without the 
artifacts directory so we are planning on changing that for our Dockerfiles 
anyway, but it is still a breaking change so we wanted to make an issue on it 
first.


> Breaking change when loading artifacts
> --------------------------------------
>
>                 Key: FLINK-35358
>                 URL: https://issues.apache.org/jira/browse/FLINK-35358
>             Project: Flink
>          Issue Type: New Feature
>          Components: Client / Job Submission, flink-docker
>    Affects Versions: 1.19.0
>            Reporter: Rasmus Thygesen
>            Priority: Not a Priority
>
> We have been using the following code snippet in our Dockerfiles for running 
> a Flink job in application mode
>  
> {code:java}
> FROM flink:1.18.1-scala_2.12-java17
> COPY --from=build /app/target/my-job*.jar 
> /opt/flink/usrlib/artifacts/my-job.jar
> USER flink {code}
>  
> Which has been working since at least around Flink 1.14, but the 1.19 update 
> has broken our Dockerfiles. The fix is to put the jar file a step further out 
> so the code snippet becomes
>  
> {code:java}
> FROM flink:1.18.1-scala_2.12-java17
> COPY --from=build /app/target/my-job*.jar /opt/flink/usrlib/my-job.jar
> USER flink  {code}
>  
> We have not spent too much time looking into what the cause is, but we get 
> the stack trace
>  
> {code:java}
> myjob-jobmanager-1   | org.apache.flink.util.FlinkException: Could not load 
> the provided entrypoint class.
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.DefaultPackagedProgramRetriever.getPackagedProgram(DefaultPackagedProgramRetriever.java:230)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.getPackagedProgram(StandaloneApplicationClusterEntryPoint.java:149)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.lambda$main$0(StandaloneApplicationClusterEntryPoint.java:90)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.main(StandaloneApplicationClusterEntryPoint.java:89)
>  [flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   | Caused by: 
> org.apache.flink.client.program.ProgramInvocationException: The program's 
> entry point class 'my.company.job.MyJob' was not found in the jar file.
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.PackagedProgram.loadMainClass(PackagedProgram.java:481)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.PackagedProgram.<init>(PackagedProgram.java:153)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.PackagedProgram.<init>(PackagedProgram.java:65)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.PackagedProgram$Builder.build(PackagedProgram.java:691)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.DefaultPackagedProgramRetriever.getPackagedProgram(DefaultPackagedProgramRetriever.java:228)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     ... 4 more
> myjob-jobmanager-1   | Caused by: java.lang.ClassNotFoundException: 
> my.company.job.MyJob
> myjob-jobmanager-1   |     at java.net.URLClassLoader.findClass(Unknown 
> Source) ~[?:?]
> myjob-jobmanager-1   |     at java.lang.ClassLoader.loadClass(Unknown Source) 
> ~[?:?]
> myjob-jobmanager-1   |     at 
> org.apache.flink.util.FlinkUserCodeClassLoader.loadClassWithoutExceptionHandling(FlinkUserCodeClassLoader.java:67)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.util.ChildFirstClassLoader.loadClassWithoutExceptionHandling(ChildFirstClassLoader.java:74)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.util.FlinkUserCodeClassLoader.loadClass(FlinkUserCodeClassLoader.java:51)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at java.lang.ClassLoader.loadClass(Unknown Source) 
> ~[?:?]
> myjob-jobmanager-1   |     at 
> org.apache.flink.util.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.loadClass(FlinkUserCodeClassLoaders.java:197)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at java.lang.Class.forName0(Native Method) ~[?:?]
> myjob-jobmanager-1   |     at java.lang.Class.forName(Unknown Source) ~[?:?]
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.PackagedProgram.loadMainClass(PackagedProgram.java:479)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.PackagedProgram.<init>(PackagedProgram.java:153)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.PackagedProgram.<init>(PackagedProgram.java:65)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.PackagedProgram$Builder.build(PackagedProgram.java:691)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     at 
> org.apache.flink.client.program.DefaultPackagedProgramRetriever.getPackagedProgram(DefaultPackagedProgramRetriever.java:228)
>  ~[flink-dist-1.19.0.jar:1.19.0]
> myjob-jobmanager-1   |     ... 4 more{code}
>  
> I have changed some text in the stack trace to keep it anonymous so it is 
> possible there is a typo but that is not the issue. As you can see, the stack 
> trace leads to PackagedProgram and DefaultPackagedProgramRetriever to which 
> the only commits after Flink 1.18 are [PackagedProgram 
> commit|https://github.com/apache/flink/commit/d0ce5349fdf1a611518eba20a169c475ee0b46c5]
>  and [DefaultPackagedProgramRetriever 
> commit|https://github.com/apache/flink/commit/e63aa12252843d0098a56f3091b28d48aff5b5af]
>  and we suspect the culprit is the latter, specifically [this 
> line|https://github.com/apache/flink/commit/e63aa12252843d0098a56f3091b28d48aff5b5af#diff-11b5162d6745014c68e96303d26c71bdb88bac068c27834dbdbb7c9089ffbe9fL227]
>  which we think has made the artifact check non-recursive. We assume it is 
> intended to have your artifacts directly in /opt/flink/usrlib without the 
> artifacts directory so we are planning on changing that for our Dockerfiles 
> anyway, but it is still a breaking change so we wanted to make an issue on it 
> first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to