LibofRelax commented on issue #26485:
URL: https://github.com/apache/beam/issues/26485#issuecomment-1700788584
@jeremyje It's been a while since I gave up on using Beam in production, so
this sample might not be fully correct.
Here's a sample docker compose config that kinda worked for me. It runs
smoothly until UDF execution. There seems to be a version mismatch between the
job server and the worker implementation despite they have the same version
tag. The worker will expect a log endpoint from the job on spark worker which
the spark worker does not seem to expose. You can try it out with
`--environment EXTERNAL --environment_config beam-python-workers:50000`
I suggest you try the docker-in-docker environment option. Maybe the default
docker image is consistent with the job implementation. I already set the
privileged flag to true in the compose file for that. If using `--environment
DOCKER`, `beam-python-workers` service won't be needed.
```yml
version: '3'
volumes:
tmp:
services:
spark:
image: docker.io/bitnami/spark:3.1.2
environment:
- SPARK_MODE=master
ports:
- "8080:8080"
spark-worker:
image: docker.io/bitnami/spark:3.1.2
privileged: true # To run docker SDK harness
environment:
- SPARK_MODE=worker
- SPARK_MASTER_URL=spark://spark:7077
- SPARK_WORKER_MEMORY=4g
- SPARK_WORKER_CORES=1
- BEAM_WORKER_POOL_IN_DOCKER_VM=1
- DOCKER_MAC_CONTAINER=1
ports:
- "8081:8081"
- "8100-8200:8100-8200"
volumes:
- tmp:/tmp
- ./work/spark:/opt/bitnami/spark/work
beam-python-workers:
image: apache/beam_python3.10_sdk:2.49.0
command: [ "--worker_pool" ]
environment:
- RUN_PYTHON_SDK_IN_DEFAULT_ENVIRONMENT=1
volumes:
- tmp:/tmp
beam-job-server:
image: apache/beam_spark3_job_server:2.49.0
command: [ "--spark-master-url=spark://spark:7077" ]
ports:
- "4040:4040" # Spark job UI on the driver
- "8099:8099" # Job endpoint
- "8098:8098" # Artifact endpoint
volumes:
- tmp:/tmp
depends_on:
- spark
- spark-worker
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]