[
https://issues.apache.org/jira/browse/FLINK-20284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17238646#comment-17238646
]
Huang Xingbo commented on FLINK-20284:
--------------------------------------
[~zhuzh] [~dian.fu] Thanks a lot for reporting this issue.
This error will not affect the correctness of the result. This is caused by not
releasing the EventLoopGroup resource of the netty server correctly.
This problem has been reported by users since Beam release 2.8.0
https://issues.apache.org/jira/browse/BEAM-5397, but it has not been resolved.
After investigation, I found that the reason for the problem is that Grpc
releases the related resources of Netty Server by schedule tasks started by 1s.
https://github.com/grpc/grpc-java/blob/master/core/src/main/java/io/grpc/internal/SharedResourceHolder.java#L150.
After 1s, when the schedule Task starts to reclaim resources, the related
classes have been unloaded by the UserClassLoader, so ClassNotFoundError is
thrown.
> Error happens in TaskExecutor when closing JobMaster connection if there was
> a python UDF
> -----------------------------------------------------------------------------------------
>
> Key: FLINK-20284
> URL: https://issues.apache.org/jira/browse/FLINK-20284
> Project: Flink
> Issue Type: Bug
> Components: API / Python, Runtime / Coordination
> Affects Versions: 1.12.0
> Reporter: Zhu Zhu
> Priority: Major
> Fix For: 1.12.0
>
>
> When a TaskExecutor successfully finished running a python UDF task and
> disconnecting from JobMaster, errors below will happen. This error, however,
> seems not affect job execution at the moment.
> {code:java}
> 2020-11-20 17:05:21,932 INFO
> org.apache.beam.runners.fnexecution.logging.GrpcLoggingService [] - 1 Beam Fn
> Logging clients still connected during shutdown.
> 2020-11-20 17:05:21,938 WARN
> org.apache.beam.sdk.fn.data.BeamFnDataGrpcMultiplexer [] - Hanged up
> for unknown endpoint.
> 2020-11-20 17:05:22,126 INFO org.apache.flink.runtime.taskmanager.Task
> [] - Source: Custom Source -> select: (f0) -> select:
> (add_one(f0) AS a) -> to: Tuple2 -> Sink: Streaming select table sink (1/1)#0
> (b0c2104dd8f87bb1caf0c83586c22a51) switched from RUNNING to FINISHED.
> 2020-11-20 17:05:22,126 INFO org.apache.flink.runtime.taskmanager.Task
> [] - Freeing task resources for Source: Custom Source -> select:
> (f0) -> select: (add_one(f0) AS a) -> to: Tuple2 -> Sink: Streaming select
> table sink (1/1)#0 (b0c2104dd8f87bb1caf0c83586c22a51).
> 2020-11-20 17:05:22,128 INFO
> org.apache.flink.runtime.taskexecutor.TaskExecutor [] -
> Un-registering task and sending final execution state FINISHED to JobManager
> for task Source: Custom Source -> select: (f0) -> select: (add_one(f0) AS a)
> -> to: Tuple2 -> Sink: Streaming select table sink (1/1)#0
> b0c2104dd8f87bb1caf0c83586c22a51.
> 2020-11-20 17:05:22,156 INFO
> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl [] - Free slot
> TaskSlot(index:0, state:ACTIVE, resource profile:
> ResourceProfile{cpuCores=1.0000000000000000, taskHeapMemory=384.000mb
> (402653174 bytes), taskOffHeapMemory=0 bytes, managedMemory=512.000mb
> (536870920 bytes), networkMemory=128.000mb (134217730 bytes)}, allocationId:
> b67c3307dcf93757adfb4f0f9f7b8c7b, jobId: d05f32162f38ec3ec813c4621bc106d9).
> 2020-11-20 17:05:22,157 INFO
> org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Remove job
> d05f32162f38ec3ec813c4621bc106d9 from job leader monitoring.
> 2020-11-20 17:05:22,157 INFO
> org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Close
> JobManager connection for job d05f32162f38ec3ec813c4621bc106d9.
> 2020-11-20 17:05:23,064 ERROR
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.rejectedExecution
> [] - Failed to submit a listener notification task. Event loop shut down?
> java.lang.NoClassDefFoundError:
> org/apache/beam/vendor/grpc/v1p26p0/io/netty/util/concurrent/GlobalEventExecutor$2
> at
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.GlobalEventExecutor.startThread(GlobalEventExecutor.java:227)
>
> ~[blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
> at
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.GlobalEventExecutor.execute(GlobalEventExecutor.java:215)
>
> ~[blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
> at
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.safeExecute(DefaultPromise.java:841)
>
> [blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
> at
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:498)
>
> [blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
> at
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
>
> [blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
> at
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:604)
>
> [blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
> at
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.setSuccess(DefaultPromise.java:96)
>
> [blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
> at
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1089)
>
> [blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
> at
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>
> [blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
> at
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>
> [blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_261]
> Caused by: java.lang.ClassNotFoundException:
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.GlobalEventExecutor$2
> at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> ~[?:1.8.0_261]
> at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
> ~[?:1.8.0_261]
> at
> org.apache.flink.util.FlinkUserCodeClassLoader.loadClassWithoutExceptionHandling(FlinkUserCodeClassLoader.java:63)
> ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at
> org.apache.flink.util.ChildFirstClassLoader.loadClassWithoutExceptionHandling(ChildFirstClassLoader.java:72)
> ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at
> org.apache.flink.util.FlinkUserCodeClassLoader.loadClass(FlinkUserCodeClassLoader.java:49)
> ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> ~[?:1.8.0_261]
> ... 11 more
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)