[ 
https://issues.apache.org/jira/browse/BEAM-13787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487070#comment-17487070
 ] 

Abhinav Sarkari commented on BEAM-13787:
----------------------------------------

I suspect sdk_worker_main is not even getting run. I customized the module to 
have a print statement inside '__main__' method & don't see the statement 
getting logged in console (though it's possible that entire stdout has been 
directed elsewhere).

/opt/apache/boom/boot seems to be unable to initiate the python run for 
'apache_beam.runners.worker.sdk_worker_main, though it was able to do the same 
for 'apache_beam.runners.worker.worker_pool_main'.

> Job in Flink + K8S Cluster exits without executing the pipeline
> ---------------------------------------------------------------
>
>                 Key: BEAM-13787
>                 URL: https://issues.apache.org/jira/browse/BEAM-13787
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-harness
>    Affects Versions: 2.34.0
>         Environment: Python version: 3.7.9
> Flink version: 1.13.5
> Beam version: 2.34.0
> Beam job server: apache/beam_flink1.13_job_server
> Kubernetes (GCP/GKE) version: 1.20.11-gke.1300
>            Reporter: Abhinav Sarkari
>            Priority: P1
>
>  
> I am running a beam pipeline on Flink deployed on Standalone K8S with Session 
> Mode.
> Beam pipeline options are
> options = [
> "--runner=PortableRunner",
> "--job_endpoint=beam-job-server:8099",
> "--artifact_endpoint=beam-job-server:8098",
> "--environment_type=EXTERNAL",
> "--environment_config=localhost:50000"
> ]
> Worker pools are deployed as containers in taskmanager pod. A separate job 
> server runs as k8S deployment with image apache/beam_flink1.13_job_server
>  
> The worker pool uses a custom image built on top of 
> apache/beam_python3.7_sdk:2.34.0 and contains user code.
>  
> The client is able to successfully submit the job and it gets dispatched to 
> worker-pool container. But the python subprocess appears to fatally crash 
> without executing the pipeline. The pipeline status on the client side shows 
> as DONE
> 2022-02-01 00:52:11,910 None wait_until_finish_read (ProcessId : 1) INFO 
> portable_runner.py:(576) Job state changed to DONE
>  
>  
> Log from worker pool container 
>  
> beam-worker-pool
> 2022/02/01 00:52:05 Initializing python harness: /opt/apache/beam/boot 
> --id=1-1 --logging_endpoint=localhost:44717 
> --artifact_endpoint=localhost:42449 --provision_endpoint=localhost:35353 
> --control_endpoint=localhost:34919
>  
> beam-worker-pool
> 2022/02/01 00:52:05 Installing setup packages ...
>  
> beam-worker-pool
> 2022/02/01 00:52:05 Executing: python -m 
> apache_beam.runners.worker.sdk_worker_main
>  
> beam-worker-pool
> 2022/02/01 00:52:08 Python exited: <nil>
>  
>  
> There is no other log statement. The last statement comes from fatal 
> statement in boot code line 209.
> https://github.com/apache/beam/blob/v2.34.0/sdks/python/container/boot.go
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to