bydeath commented on issue #306:
URL: https://github.com/apache/flink-agents/issues/306#issuecomment-3501761953

   
   > I think this a known bug of pemja: 
https://issues.apache.org/jira/browse/FLINK-38585, and has been fixed in pemja 
recently [alibaba/pemja#87](https://github.com/alibaba/pemja/pull/87).
   > 
   > But because Flink-Agents is indirectly depended on pemja through pyflink, 
Flink-Agents must wait until Flink releases a version containing the pemja fix 
before this issue can be resolved.
   
   Hi @wenjin272,
   
   Thank you for your response and for pointing out the related JIRA issue, 
[FLINK-38585](https://issues.apache.org/jira/browse/FLINK-38585).
   
   Based on my analysis and the stack trace, it appears that my issue with 
`flink-agents` is distinct from `FLINK-38585`, which specifically addresses 
problems in PyFlink's thread mode execution based on Pemja. Furthermore, I have 
not observed `flink-agents` explicitly configuring Pemja to use thread mode 
(e.g., by setting `python.execution-mode` to `thread`), suggesting the nature 
of the Pemja usage is fundamentally different from the one targeted by 
FLINK-38585.
   
   The failure I am encountering occurs when the `flink-agents` framework 
attempts to initialize its own embedded Python environment. The stack trace 
clearly indicates that the failure happens directly within the `flink-agents` 
operator loading the Python interpreter via Pemja:
   
   ```java
   // ... (omitted)
        at pemja.core.PythonInterpreter.<init>(PythonInterpreter.java:45) 
~[flink-python-1.20.3.jar:1.20.3]
        at 
org.apache.flink.agents.runtime.env.EmbeddedPythonEnvironment.getInterpreter(EmbeddedPythonEnvironment.java:45)
 ~[flink-agents-dist-0.1.0.jar:0.1.0]
        at 
org.apache.flink.agents.runtime.python.utils.PythonActionExecutor.open(PythonActionExecutor.java:80)
 ~[flink-agents-dist-0.1.0.jar:0.1.0]
        at 
org.apache.flink.agents.runtime.operator.ActionExecutionOperator.initPythonActionExecutor(ActionExecutionOperator.java:504)
 ~[flink-agents-dist-0.1.0.jar:0.1.0]
   // ... (omitted)
   Caused by: java.io.IOException: Failed to execute the command: ... 
/venv.tar.gz/bin/python -c from find_libpython import 
find_libpython;print(find_libpython())
   Fatal Python error: init_fs_encoding: failed to get the Python codec of the 
filesystem encoding
   ModuleNotFoundError: No module named 'encodings'
   ```
   
   As you can see, the call sequence confirms that flink-agents is directly 
using Pemja's PythonInterpreter for environment initialization:
   
   1. The EmbeddedPythonEnvironment.getInterpreter() method (source: 
[EmbeddedPythonEnvironment.java#L45](https://github.com/apache/flink-agents/blob/fcaabe7dbe6b04f00da9c5a3563e9599710088ce/runtime/src/main/java/org/apache/flink/agents/runtime/env/EmbeddedPythonEnvironment.java#L45))
 returns the PythonInterpreter instance.
   
   2. This interpreter instance is created via the logic in 
PythonEnvironmentManager.createEnvironment() (source: 
[PythonEnvironmentManager.java#L45-L83](https://github.com/apache/flink-agents/blob/fcaabe7dbe6b04f00da9c5a3563e9599710088ce/runtime/src/main/java/org/apache/flink/agents/runtime/env/PythonEnvironmentManager.java#L45-L83)).
   
   3. The failure (ModuleNotFoundError: No module named 'encodings') happens 
inside Pemja's constructor (PythonInterpreter.<init>) during this direct 
initialization call.
   
   The core issue remains that Pemja fails to initialize the self-contained 
Conda environment when launched by flink-agents in YARN mode. I suspect that 
even after a Pemja bug fix is incorporated into a new Flink release, this 
specific issue with flink-agents may not be resolved, because the problem 
appears tied to path resolution logic within flink-agents' direct usage of 
Pemja, and not just the execution model addressed in FLINK-38585.
   
   Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to