[D] Deployment Guide for Spark on EMR [incubator-gluten]

via GitHub Wed, 10 Dec 2025 16:47:25 -0800


GitHub user ammarchalifah created a discussion: Deployment Guide for Spark on 
EMR


Hi, sorry for the amateur question, but I'm facing difficulty when testing the 
deployment of Gluten+Velox on AWS EMR.

I'm using EMR 7.8, Spark 3.5.4, with PySpark. I'm using Intel x86 instances on 
Amazon Linux distribution. I want to test a job that reads from vanilla 
Parquet, do simple `Project` & `Exchange`, and write to Iceberg table 
(`copy-on-write`). Both input & output are in S3.

I downloaded the `gluten-velox-bundle-spark3.5_2.12-linux_amd64-1.4.0.jar` and 
put it into S3 directly. Then, in the EMR cluster, I'm trying to run a Spark 
submit while passing the JAR and setting up the configurations. 

Questions:
- Does this approach works in general, or should I build from source in this 
case? 
- Is there any community guide regarding building/deploying Gluten-Velox on EMR 
that is generic & easy to follow?

GitHub link: https://github.com/apache/incubator-gluten/discussions/11279

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[D] Deployment Guide for Spark on EMR [incubator-gluten]

Reply via email to