damccorm opened a new issue, #20568:
URL: https://github.com/apache/beam/issues/20568

   I have been trying to run the python word-count example on an [AWS 
EMR](https://aws.amazon.com/emr/) cluster. And it does not work.
   
   Things I have tried:
    * Running with 
   ```
   
   python3 py_codes/word_count_beam.py --output word_count_output 
--runner=SparkRunner
   
   ```
   
   This results in implicitly running with `--spark-master-url local[4]` which 
defeats the purpose of running it in a cluster
   
    * Tried
   ```
   
   python3 py_codes/word_count_beam.py --output word_count_output 
--runner=SparkRunner --spark-master-url=yarn
   
   ```
   
   Still uses local master.
   
    * Could not use method described in 
[https://beam.apache.org/documentation/runners/spark/](https://beam.apache.org/documentation/runners/spark/)
 under "Running on a pre-deployed Spark cluster" because in yarn master is not 
exposed with an URL like localhost:7077
   
    * Tried
   ```
   
   python3 py_codes/word_ount_beam.py --output word_count_output 
--runner=SparkRunner --output_executable_path=jars/beam_word_count.jar
   
   ```
   
   as described in https://issues.apache.org/jira/browse/BEAM-8970
    It can create a jar file, but when I submit the jar with spark-submit I get 
docker permission denied exception. Possibly related to 
https://issues.apache.org/jira/browse/BEAM-6020
   
   *So, no way to run a python beam code in a yarn spark cluster?*
    This also means no way to run TFX code (which uses beam) in a yarn cluster.
   
   Imported from Jira 
[BEAM-11378](https://issues.apache.org/jira/browse/BEAM-11378). Original Jira 
may contain additional context.
   Reported by: ratulray.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to