Gatsby-Lee commented on issue #10590:
URL: https://github.com/apache/hudi/issues/10590#issuecomment-2163667858

   > > > why is it required to set these? is it really required?
   > > > ```
   > > > > "--conf",
   > > > > 
"spark.driver.extraClassPath=/usr/lib/hudi/hudi-aws-bundle-0.14.1.jar:/usr/lib/hudi/hudi-spark3.3-bundle_2.12-0.14.1.jar",
   > > > > "--conf",
   > > > > 
"spark.executor.extraClassPath=/usr/lib/hudi/hudi-aws-bundle-0.14.1.jar:/usr/lib/hudi/hudi-spark3.3-bundle_2.12-0.14.1.jar",
   > > > ```
   > > 
   > > 
   > > You can try without it. Basically 
`/usr/lib/hudi/hudi-aws-bundle-0.14.1.jar` includes minor things like 
`AwsGlueCatalogSyncTool` which behaves better then default if 
changed(https://hudi.apache.org/docs/configurations/#hoodiemetasyncclienttoolclass)
 on huge number of partitions.
   > 
   > On the spark config side: Any dependencies off of the default spark path 
will require setting the extra class path in two places so both the driver and 
executor containers can see them. You can use blob statements to make this more 
concise if there are no conflicts.
   > 
   > My comments on this thread are based around the EMR environment so the 
long class path after the fact is also required if using Glue metastore.
   
   Thank you very much for your comment.
   
   I am on Amazon EMR on EKS env as well.
   I've been using `--jar` option only to load and use the Hudi bundles from 
either Maven or Amazon EMR.
   ( The JARs bundles either on S3 or in the Amazon EMR Image. )
   
   What I am curious is `--conf` is required along with `--jar`?
   
   Thank you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to