GitHub user ammarchalifah created a discussion: Deployment Guide for Spark on EMR
Hi, sorry for the amateur question, but I'm facing difficulty when testing the deployment of Gluten+Velox on AWS EMR. I'm using EMR 7.8, Spark 3.5.4, with PySpark. I'm using Intel x86 instances on Amazon Linux distribution. I want to test a job that reads from vanilla Parquet, do simple `Project` & `Exchange`, and write to Iceberg table (`copy-on-write`). Both input & output are in S3. I downloaded the `gluten-velox-bundle-spark3.5_2.12-linux_amd64-1.4.0.jar` and put it into S3 directly. Then, in the EMR cluster, I'm trying to run a Spark submit while passing the JAR and setting up the configurations. Questions: - Does this approach works in general, or should I build from source in this case? - Is there any community guide regarding building/deploying Gluten-Velox on EMR that is generic & easy to follow? GitHub link: https://github.com/apache/incubator-gluten/discussions/11279 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
