[
https://issues.apache.org/jira/browse/BEAM-11959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325960#comment-17325960
]
Jens Wiren commented on BEAM-11959:
-----------------------------------
[~ibzib] [~tvalentyn]
A work around was found by MeTaNoV in this issue:
[https://github.com/tensorflow/tfx/issues/3435.]
Passing --worker_harness_container_image=none as an argument and the sdk worker
no longer hangs. This worked for me as well.
Either this should probably be documented or more investigation as to way this
is needed.
> Python Beam SDK Harness hangs when installing pip packages
> ----------------------------------------------------------
>
> Key: BEAM-11959
> URL: https://issues.apache.org/jira/browse/BEAM-11959
> Project: Beam
> Issue Type: Bug
> Components: runner-flink, sdk-py-harness
> Affects Versions: 2.27.0, 2.28.0
> Environment: Kubernetes v1.19.6
> Reporter: Jens Wiren
> Priority: P1
> Attachments: jobmanager-configmap.yaml, jobmanager-deploy.yaml,
> jobmanager-svc.yaml, taskmanager-deploy.yaml
>
>
> When running a Beam pipeline using Flink as backend, the python sdk harness
> hangs when trying to install pip packages. Tested using Flink 1.10.3.
> Images used:
> apache/beam_python3.7_sdk:2.28.0
> apache/flink:1.10.3
> Beam args used are:
> "--runner=FlinkRunner",
> "--flink_version=1.10",
> "--flink_master=http://flink-jobmanager.default:8081",
> f"--artifacts_dir=/mnt/flink",
> "--environment_type=EXTERNAL",
> "--environment_config=localhost:50000",
>
> Specifically this was tested by running a TFX pipeline which gets submitted
> and registered as it should, but the SDK Harness hangs when installing:
> 2021/03/10 12:16:20 Initializing python harness: /opt/apache/beam/boot
> --id=1-1 --logging_endpoint=localhost:39795
> --artifact_endpoint=localhost:34095 --provision_endpoint=localhost:42999
> --control_endpoint=localhost:38129
> 2021/03/10 12:16:20 Found artifact: tfx_ephemeral-0.27.0.tar.gz
> 2021/03/10 12:16:20 Found artifact: extra_packages.txt
> 2021/03/10 12:16:20 Installing setup packages ...
> 2021/03/10 12:16:20 Installing extra package: tfx_ephemeral-0.27.0.tar.gz
> and nothing else is shown irregardless how long it is left. I can manually
> install the TFX package by exec into the container in < 3 min.
> The Flink task-manager then waits idling and periodically logs:
> 2021-03-10 11:29:26,287 INFO
> org.apache.beam.runners.fnexecution.environment.ExternalEnvironmentFactory -
> Still waiting for startup of environment from localhost:50000 for worker id
> 1-1
> Helm charts attached below.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)