mattfysh commented on PR #21254: URL: https://github.com/apache/flink/pull/21254#issuecomment-1317799085
Hi @MartijnVisser - my apologies, I am short on time and hoped someone with PyFlink familiarity would pick this up and immediately identify why this change is required. The current instructions do not work in Docker, which means they won't work on anyone's machine regardless of host setup To reproduce, stand up a local cluster in session mode using the following steps: 1. Create a new folder 2. Create a Dockerfile inside the folder with the contents of "Using Flink Python on Docker": https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/deployment/resource-providers/standalone/docker/#using-flink-python-on-docker 3. Create a docker-compose.yaml file with the contents of "Session Mode": https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/deployment/resource-providers/standalone/docker/#session-cluster-yml 4. Change both instances of `image: flink:1.16.0-scala_2.12` to be `build: .` 5. Add a volumes entry to the `jobmanager` as: volumes: - .:/input 6. Create a word_count.py file with the contents of "The complete code so far" section https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/python/table_api_tutorial/ 7. Run `docker-compose up` 8. Enter the jobmanager via `docker exec -it [job_manager_container_id] bash` 9. Run `./bin/flink run --python /input/word_count.py` This does not work, and throws the following error: ``` Caused by: java.lang.RuntimeException: Failed to create stage bundle factory! Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/fastavro/read.py", line 2, in <module> from . import _read File "fastavro/_read.pyx", line 11, in init fastavro._read File "/usr/local/lib/python3.7/lzma.py", line 27, in <module> from _lzma import * ModuleNotFoundError: No module named '_lzma ``` This is occurring because when building Python from source, as instructed in the Flink documentation I have updated, certain "optional modules" in Python are not built if host dependencies could not be found. This includes things like readline, sqlite, etc, but also lzma - more information can be found at https://devguide.python.org/getting-started/setup-building/#unix particularly the "optional modules were not found" message From what I gather, the latest version of PyFlink uses a version of fastavro that requires lzma to be present in the Python build. I assume the instructions I have updated used to work with older version of PyFlink. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org