damccorm opened a new issue, #20568: URL: https://github.com/apache/beam/issues/20568
I have been trying to run the python word-count example on an [AWS EMR](https://aws.amazon.com/emr/) cluster. And it does not work. Things I have tried: * Running with ``` python3 py_codes/word_count_beam.py --output word_count_output --runner=SparkRunner ``` This results in implicitly running with `--spark-master-url local[4]` which defeats the purpose of running it in a cluster * Tried ``` python3 py_codes/word_count_beam.py --output word_count_output --runner=SparkRunner --spark-master-url=yarn ``` Still uses local master. * Could not use method described in [https://beam.apache.org/documentation/runners/spark/](https://beam.apache.org/documentation/runners/spark/) under "Running on a pre-deployed Spark cluster" because in yarn master is not exposed with an URL like localhost:7077 * Tried ``` python3 py_codes/word_ount_beam.py --output word_count_output --runner=SparkRunner --output_executable_path=jars/beam_word_count.jar ``` as described in https://issues.apache.org/jira/browse/BEAM-8970 It can create a jar file, but when I submit the jar with spark-submit I get docker permission denied exception. Possibly related to https://issues.apache.org/jira/browse/BEAM-6020 *So, no way to run a python beam code in a yarn spark cluster?* This also means no way to run TFX code (which uses beam) in a yarn cluster. Imported from Jira [BEAM-11378](https://issues.apache.org/jira/browse/BEAM-11378). Original Jira may contain additional context. Reported by: ratulray. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
