bgeng777 commented on issue #306:
URL: https://github.com/apache/flink-agents/issues/306#issuecomment-3510150224

   hi folks, 
   Just some information after debugging this issue, I see that the reproduce 
command sets PYTHONHOME via `-Dcontainerized.taskmanager.env.PYTHONHOME`, this 
is not recommended as pyflink would try to use the shipped python env(i.e. 
venv.tar.gz) as its execution environment. The shipped tar.gz/zip would be 
decompressed and when using YARN, the unzipped dir usually looks like this 
`/xxx/nm-local-dir/usercache/xxx/appcache/application_xx3921_0020/python-dist-8d697226-383a-486c-ae01-df4b096e8a70/python-archives/venv.tar.gz/`
 (note `python-archives/venv.tar.gz/` is a dir which contains the specified 
python interpreter like 
`python-dist-8d697226-383a-486c-ae01-df4b096e8a70/python-archives/venv.tar.gz/bin/python`).
 
   
   So, when users set `containerized.taskmanager.env.PYTHONHOME`, it make pemja 
use wrong PYTHONHOME to find packages and leads to error like 
`ModuleNotFoundError: No module named 'encodings'`. I think we should either 
improve pemja's logic to handle such case and output some warn, or at least 
tell users that such behaivor would cause some unexpected issues. cc @dianfu 
   
   As a result, in YARN env, to run the example for now, users should use pemja 
wheel package from https://github.com/alibaba/pemja/pull/8. and NOT set 
containerized.taskmanager.env.PYTHONHOME. Maybe @GreatEugenius can offer more 
detailed instructions.
   
   thanks
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to