[
https://issues.apache.org/jira/browse/FLINK-38500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Biao Geng updated FLINK-38500:
------------------------------
Description:
We notice that when we submit a same pyflink job using the thread mode
consecutively to a same session cluster using flink1.20.3 on macOS, such
exception would occur:
{code:java}
Caused by: java.lang.RuntimeException: java.lang.UnsatisfiedLinkError: Native
Library
/opt/homebrew/Cellar/[email protected]/3.11.13/Frameworks/Python.framework/Versions/3.11/Python
already loaded in another classloader at
pemja.utils.CommonUtils.loadLibrary(CommonUtils.java:103) at
pemja.utils.CommonUtils.loadPython(CommonUtils.java:45) at
pemja.core.PythonInterpreter$MainInterpreter.initialize(PythonInterpreter.java:365)
at pemja.core.PythonInterpreter.initialize(PythonInterpreter.java:144) at
pemja.core.PythonInterpreter.<init>(PythonInterpreter.java:45) at
org.apache.flink.streaming.api.operators.python.embedded.AbstractEmbeddedPythonFunctionOperator.open(AbstractEmbeddedPythonFunctionOperator.java:72)
at
org.apache.flink.streaming.api.operators.python.embedded.AbstractEmbeddedDataStreamPythonFunctionOperator.open(AbstractEmbeddedDataStreamPythonFunctionOperator.java:88)
at
org.apache.flink.streaming.api.operators.python.embedded.AbstractOneInputEmbeddedPythonFunctionOperator.open(AbstractOneInputEmbeddedPythonFunctionOperator.java:68)
at
org.apache.flink.streaming.api.operators.python.embedded.EmbeddedPythonProcessOperator.open(EmbeddedPythonProcessOperator.java:67)
at
org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:107)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateAndGates(StreamTask.java:858)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$restoreInternal$5(StreamTask.java:812)
at
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.call(StreamTaskActionExecutor.java:100)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:812)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:771)
at
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:970)
at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:939)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:763) at
org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) at
java.base/java.lang.Thread.run(Thread.java:829)Caused by:
java.lang.UnsatisfiedLinkError: Native Library
/opt/homebrew/Cellar/[email protected]/3.11.13/Frameworks/Python.framework/Versions/3.11/Python
already loaded in another classloader at
java.base/java.lang.ClassLoader$NativeLibrary.loadLibrary(ClassLoader.java:2476)
at java.base/java.lang.ClassLoader.loadLibrary0(ClassLoader.java:2705) at
java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2635) at
java.base/java.lang.Runtime.load0(Runtime.java:768) at
java.base/java.lang.System.load(System.java:1854) at
pemja.utils.CommonUtils.loadLibrary(CommonUtils.java:101) ... 19 more
{code}
Example codes to reproduce:
{code:python}
from pyflink.common import Configuration
from pyflink.datastream import RuntimeExecutionMode, StreamExecutionEnvironment
def main():
config = Configuration()
config.set_string("python.execution-mode", "thread")
env = StreamExecutionEnvironment.get_execution_environment(config)
env.set_runtime_mode(RuntimeExecutionMode.STREAMING)
env.from_collection([1, 2, 3, 4, 5]).map(lambda x: x + 1).print()
env.execute()
if __name__ == "__main__":
main()
{code}
Such exception makes it annoying to verify pyflink jobs in session cluster.
After some debugging, I find that the root cause is that pemja (ver 0.5.5) does
not handle classloading correctly for mac in its
{{pemja.utils.CommonUtils#loadPython}} when loading python interpreter's .so /
Python for more than one time by different child classloader in the same JVM(tm
instance). I will fix it in the pemja repo and deploy new pemja dependencies.
For those who are bothered by such problem, one workaround is that moving
flink-python jar from opt/ to lib/ . With the trick, python interpreter
dependency would be loaded by AppClassloader instead of different child
classloader from different jobs.
was:
We notice that when we submit a same pyflink job using the thread mode
consecutively to a same session cluster using flink1.20.3 on macOS, such
exception would occur:
{code:java}
Caused by: java.lang.RuntimeException: java.lang.UnsatisfiedLinkError: Native
Library
/opt/homebrew/Cellar/[email protected]/3.11.13/Frameworks/Python.framework/Versions/3.11/Python
already loaded in another classloader at
pemja.utils.CommonUtils.loadLibrary(CommonUtils.java:103) at
pemja.utils.CommonUtils.loadPython(CommonUtils.java:45) at
pemja.core.PythonInterpreter$MainInterpreter.initialize(PythonInterpreter.java:365)
at pemja.core.PythonInterpreter.initialize(PythonInterpreter.java:144) at
pemja.core.PythonInterpreter.<init>(PythonInterpreter.java:45) at
org.apache.flink.streaming.api.operators.python.embedded.AbstractEmbeddedPythonFunctionOperator.open(AbstractEmbeddedPythonFunctionOperator.java:72)
at
org.apache.flink.streaming.api.operators.python.embedded.AbstractEmbeddedDataStreamPythonFunctionOperator.open(AbstractEmbeddedDataStreamPythonFunctionOperator.java:88)
at
org.apache.flink.streaming.api.operators.python.embedded.AbstractOneInputEmbeddedPythonFunctionOperator.open(AbstractOneInputEmbeddedPythonFunctionOperator.java:68)
at
org.apache.flink.streaming.api.operators.python.embedded.EmbeddedPythonProcessOperator.open(EmbeddedPythonProcessOperator.java:67)
at
org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:107)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateAndGates(StreamTask.java:858)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$restoreInternal$5(StreamTask.java:812)
at
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.call(StreamTaskActionExecutor.java:100)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:812)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:771)
at
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:970)
at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:939)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:763) at
org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) at
java.base/java.lang.Thread.run(Thread.java:829)Caused by:
java.lang.UnsatisfiedLinkError: Native Library
/opt/homebrew/Cellar/[email protected]/3.11.13/Frameworks/Python.framework/Versions/3.11/Python
already loaded in another classloader at
java.base/java.lang.ClassLoader$NativeLibrary.loadLibrary(ClassLoader.java:2476)
at java.base/java.lang.ClassLoader.loadLibrary0(ClassLoader.java:2705) at
java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2635) at
java.base/java.lang.Runtime.load0(Runtime.java:768) at
java.base/java.lang.System.load(System.java:1854) at
pemja.utils.CommonUtils.loadLibrary(CommonUtils.java:101) ... 19 more
{code}
{code:python}
from pyflink.common import Configuration
from pyflink.datastream import RuntimeExecutionMode, StreamExecutionEnvironment
def main():
config = Configuration()
config.set_string("python.execution-mode", "thread")
env = StreamExecutionEnvironment.get_execution_environment(config)
env.set_runtime_mode(RuntimeExecutionMode.STREAMING)
env.from_collection([1, 2, 3, 4, 5]).map(lambda x: x + 1).print()
env.execute()
if __name__ == "__main__":
main()
{code}
Such exception makes it annoying to verify pyflink jobs in session cluster.
After some debugging, I find that the root cause is that pemja (ver 0.5.5) does
not handle classloading correctly for mac in its
{{pemja.utils.CommonUtils#loadPython}} when loading python interpreter's .so /
Python for more than one time by different child classloader in the same JVM(tm
instance). I will fix it in the pemja repo and deploy new pemja dependencies.
For those who are bothered by such problem, one workaround is that moving
flink-python jar from opt/ to lib/ . With the trick, python interpreter
dependency would be loaded by AppClassloader instead of different child
classloader from different jobs.
> PyFlink jobs using the thread mode cannot run consecutively on a session
> cluster on MacOS
> -----------------------------------------------------------------------------------------
>
> Key: FLINK-38500
> URL: https://issues.apache.org/jira/browse/FLINK-38500
> Project: Flink
> Issue Type: Bug
> Components: API / Python
> Affects Versions: 1.20.3
> Reporter: Biao Geng
> Priority: Major
>
> We notice that when we submit a same pyflink job using the thread mode
> consecutively to a same session cluster using flink1.20.3 on macOS, such
> exception would occur:
> {code:java}
> Caused by: java.lang.RuntimeException: java.lang.UnsatisfiedLinkError: Native
> Library
> /opt/homebrew/Cellar/[email protected]/3.11.13/Frameworks/Python.framework/Versions/3.11/Python
> already loaded in another classloader at
> pemja.utils.CommonUtils.loadLibrary(CommonUtils.java:103) at
> pemja.utils.CommonUtils.loadPython(CommonUtils.java:45) at
> pemja.core.PythonInterpreter$MainInterpreter.initialize(PythonInterpreter.java:365)
> at pemja.core.PythonInterpreter.initialize(PythonInterpreter.java:144) at
> pemja.core.PythonInterpreter.<init>(PythonInterpreter.java:45) at
> org.apache.flink.streaming.api.operators.python.embedded.AbstractEmbeddedPythonFunctionOperator.open(AbstractEmbeddedPythonFunctionOperator.java:72)
> at
> org.apache.flink.streaming.api.operators.python.embedded.AbstractEmbeddedDataStreamPythonFunctionOperator.open(AbstractEmbeddedDataStreamPythonFunctionOperator.java:88)
> at
> org.apache.flink.streaming.api.operators.python.embedded.AbstractOneInputEmbeddedPythonFunctionOperator.open(AbstractOneInputEmbeddedPythonFunctionOperator.java:68)
> at
> org.apache.flink.streaming.api.operators.python.embedded.EmbeddedPythonProcessOperator.open(EmbeddedPythonProcessOperator.java:67)
> at
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:107)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateAndGates(StreamTask.java:858)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$restoreInternal$5(StreamTask.java:812)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.call(StreamTaskActionExecutor.java:100)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:812)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:771)
> at
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:970)
> at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:939)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:763) at
> org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) at
> java.base/java.lang.Thread.run(Thread.java:829)Caused by:
> java.lang.UnsatisfiedLinkError: Native Library
> /opt/homebrew/Cellar/[email protected]/3.11.13/Frameworks/Python.framework/Versions/3.11/Python
> already loaded in another classloader at
> java.base/java.lang.ClassLoader$NativeLibrary.loadLibrary(ClassLoader.java:2476)
> at java.base/java.lang.ClassLoader.loadLibrary0(ClassLoader.java:2705)
> at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2635) at
> java.base/java.lang.Runtime.load0(Runtime.java:768) at
> java.base/java.lang.System.load(System.java:1854) at
> pemja.utils.CommonUtils.loadLibrary(CommonUtils.java:101) ... 19 more
> {code}
> Example codes to reproduce:
> {code:python}
> from pyflink.common import Configuration
> from pyflink.datastream import RuntimeExecutionMode,
> StreamExecutionEnvironment
> def main():
> config = Configuration()
> config.set_string("python.execution-mode", "thread")
> env = StreamExecutionEnvironment.get_execution_environment(config)
> env.set_runtime_mode(RuntimeExecutionMode.STREAMING)
> env.from_collection([1, 2, 3, 4, 5]).map(lambda x: x + 1).print()
> env.execute()
> if __name__ == "__main__":
> main()
> {code}
> Such exception makes it annoying to verify pyflink jobs in session cluster.
> After some debugging, I find that the root cause is that pemja (ver 0.5.5)
> does not handle classloading correctly for mac in its
> {{pemja.utils.CommonUtils#loadPython}} when loading python interpreter's .so
> / Python for more than one time by different child classloader in the same
> JVM(tm instance). I will fix it in the pemja repo and deploy new pemja
> dependencies.
> For those who are bothered by such problem, one workaround is that moving
> flink-python jar from opt/ to lib/ . With the trick, python interpreter
> dependency would be loaded by AppClassloader instead of different child
> classloader from different jobs.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)