meatheadmike commented on issue #12108:
URL: https://github.com/apache/hudi/issues/12108#issuecomment-2416526133

   This is what I thought initially too. So then I deployed a bone-stock Spark 
image and used the `--packages` flag:
   ```
   export SPARK_CONF_DIR=/opt/spark/conf
   export SPARK_VERSION=3.5 # or 3.4, 3.3, 3.2
   export HUDI_VERSION=1.0.0-beta2
   /opt/spark/bin/pyspark \
   --packages \
   org.apache.hudi:hudi-spark$SPARK_VERSION-bundle_2.12:$HUDI_VERSION,\
   com.amazonaws:aws-java-sdk-bundle:1.12.770,\
   org.apache.hadoop:hadoop-common:3.3.4,\
   org.apache.hadoop:hadoop-client:3.3.4,\
   org.apache.hadoop:hadoop-aws:3.3.4 \
   --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
   --conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \
   --conf 
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
 \
   --conf 'spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar' \
   --conf spark.driver.extraJavaOptions='-Divy.cache.dir=/tmp -Divy.home=/tmp'
   ```
   
   This should have eliminated any potential jar conflicts, no? Mind you this 
was done exclusively on the reader side of thngs as I assume this is where the 
problem is.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to