[
https://issues.apache.org/jira/browse/BEAM-11959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422377#comment-17422377
]
Valentyn Tymofieiev commented on BEAM-11959:
--------------------------------------------
I don't know off the top of my head. Frankly, i'd expect the opposite behavior
in some cases where installing a src would fail, but installing a .whl would
pass. of course, the .whl in question needs to be compatible with the target
platform. but I think pip should be checking that.
Possible leads to understand what's going on:
- reproduce the buggy behavior in a smaller scope: one could start a docker
container manually, and run the same command that docker runs, or compile a
simple boot file that runs that command if that makes a difference
- understand what --worker_harness_container_image=none did and how it helped
avoid the issue.
- try to run git bisect + run SDK from head, and see see if there is a culprit
commit that changes the behavior.
worker_harness_container_image=none
> Python Beam SDK Harness hangs when installing pip packages
> ----------------------------------------------------------
>
> Key: BEAM-11959
> URL: https://issues.apache.org/jira/browse/BEAM-11959
> Project: Beam
> Issue Type: Bug
> Components: runner-flink, sdk-py-harness
> Affects Versions: 2.27.0, 2.28.0, 2.31.0, 2.32.0
> Environment: Kubernetes v1.20.1
> Reporter: Jens Wiren
> Priority: P1
> Attachments: jobmanager-configmap.yaml, jobmanager-deploy.yaml,
> jobmanager-svc.yaml, taskmanager-deploy.yaml
>
>
> When running a Beam pipeline using Flink as backend, the python sdk harness
> hangs when trying to install pip packages. Tested using Flink 1.10.3.
> Images used:
> apache/beam_python3.7_sdk:2.28.0
> apache/flink:1.10.3
> Beam args used are:
> "--runner=FlinkRunner",
> "–flink_version=1.10", //same with 1.13
>
> "--flink_master=[http://flink-jobmanager.default:8081|http://flink-jobmanager.default:8081/]",
> f"--artifacts_dir=/mnt/flink",
> "--environment_type=EXTERNAL",
> "--environment_config=localhost:50000",
>
> Specifically this was tested by running a TFX pipeline which gets submitted
> and registered as it should, but the SDK Harness hangs when installing:
> 2021/03/10 12:16:20 Initializing python harness: /opt/apache/beam/boot
> --id=1-1 --logging_endpoint=localhost:39795
> --artifact_endpoint=localhost:34095 --provision_endpoint=localhost:42999
> --control_endpoint=localhost:38129
> 2021/03/10 12:16:20 Found artifact: tfx_ephemeral-0.27.0.tar.gz
> 2021/03/10 12:16:20 Found artifact: extra_packages.txt
> 2021/03/10 12:16:20 Installing setup packages ...
> 2021/03/10 12:16:20 Installing extra package: tfx_ephemeral-0.27.0.tar.gz
> and nothing else is shown irregardless how long it is left. I can manually
> install the TFX package by exec into the container in < 3 min.
> The Flink task-manager then waits idling and periodically logs:
> 2021-03-10 11:29:26,287 INFO
> org.apache.beam.runners.fnexecution.environment.ExternalEnvironmentFactory -
> Still waiting for startup of environment from localhost:50000 for worker id
> 1-1
> Helm charts attached below.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)