mattfysh commented on PR #21254:
URL: https://github.com/apache/flink/pull/21254#issuecomment-1317799085

   Hi @MartijnVisser - my apologies, I am short on time and hoped someone with 
PyFlink familiarity would pick this up and immediately identify why this change 
is required.
   
   The current instructions do not work in Docker, which means they won't work 
on anyone's machine regardless of host setup
   
   To reproduce, stand up a local cluster in session mode using the following 
steps:
   
   1. Create a new folder
   2. Create a Dockerfile inside the folder with the contents of "Using Flink 
Python on Docker": 
https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/deployment/resource-providers/standalone/docker/#using-flink-python-on-docker
   3. Create a docker-compose.yaml file with the contents of "Session Mode": 
https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/deployment/resource-providers/standalone/docker/#session-cluster-yml
   4. Change both instances of `image: flink:1.16.0-scala_2.12` to be `build: .`
   5. Add a volumes entry to the `jobmanager` as:
   
           volumes:
               - .:/input
   
   6. Create a word_count.py file with the contents of "The complete code so 
far" section 
https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/python/table_api_tutorial/
   7. Run `docker-compose up`
   8. Enter the jobmanager via `docker exec -it [job_manager_container_id] bash`
   9. Run `./bin/flink run --python /input/word_count.py`
   
   This does not work, and throws the following error:
   
   ```
   Caused by: java.lang.RuntimeException: Failed to create stage bundle 
factory! Traceback (most recent call last):
     File "/usr/local/lib/python3.7/site-packages/fastavro/read.py", line 2, in 
<module>
       from . import _read
     File "fastavro/_read.pyx", line 11, in init fastavro._read
     File "/usr/local/lib/python3.7/lzma.py", line 27, in <module>
       from _lzma import *
   ModuleNotFoundError: No module named '_lzma
   ```
   
   This is occurring because when building Python from source, as instructed in 
the Flink documentation I have updated, certain "optional modules" in Python 
are not built if host dependencies could not be found. This includes things 
like readline, sqlite, etc, but also lzma - more information can be found at 
https://devguide.python.org/getting-started/setup-building/#unix particularly 
the "optional modules were not found" message
   
   From what I gather, the latest version of PyFlink uses a version of fastavro 
that requires lzma to be present in the Python build. I assume the instructions 
I have updated used to work with older version of PyFlink.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to