LibofRelax commented on issue #26485:
URL: https://github.com/apache/beam/issues/26485#issuecomment-1700788584

   @jeremyje  It's been a while since I gave up on using Beam in production, so 
this sample might not be fully correct.
   
   Here's a sample docker compose config that kinda worked for me. It runs 
smoothly until UDF execution. There seems to be a version mismatch between the 
job server and the worker implementation despite they have the same version 
tag. The worker will expect a log endpoint from the job on spark worker which 
the spark worker does not seem to expose. You can try it out with 
`--environment EXTERNAL --environment_config beam-python-workers:50000`
   
   I suggest you try the docker-in-docker environment option. Maybe the default 
docker image is consistent with the job implementation. I already set the 
privileged flag to true in the compose file for that. If using `--environment 
DOCKER`, `beam-python-workers` service won't be needed.
   
   ```yml
   version: '3'
   
   volumes:
     tmp:
   
   services:
   
     spark:
       image: docker.io/bitnami/spark:3.1.2
       environment:
         - SPARK_MODE=master
       ports:
         - "8080:8080"
   
     spark-worker:
       image: docker.io/bitnami/spark:3.1.2
       privileged: true # To run docker SDK harness
       environment:
         - SPARK_MODE=worker
         - SPARK_MASTER_URL=spark://spark:7077
         - SPARK_WORKER_MEMORY=4g
         - SPARK_WORKER_CORES=1
         - BEAM_WORKER_POOL_IN_DOCKER_VM=1
         - DOCKER_MAC_CONTAINER=1
       ports:
         - "8081:8081"
         - "8100-8200:8100-8200"
       volumes:
         - tmp:/tmp
         - ./work/spark:/opt/bitnami/spark/work
   
     beam-python-workers:
       image: apache/beam_python3.10_sdk:2.49.0
       command: [ "--worker_pool" ]
       environment:
         - RUN_PYTHON_SDK_IN_DEFAULT_ENVIRONMENT=1
       volumes:
         - tmp:/tmp
   
     beam-job-server:
       image: apache/beam_spark3_job_server:2.49.0
       command: [ "--spark-master-url=spark://spark:7077" ]
       ports:
         - "4040:4040" # Spark job UI on the driver
         - "8099:8099" # Job endpoint
         - "8098:8098" # Artifact endpoint
       volumes:
         - tmp:/tmp
       depends_on:
         - spark
         - spark-worker
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to