Tyler-Rendina commented on issue #10590:
URL: https://github.com/apache/hudi/issues/10590#issuecomment-2162910002

   > > why is it required to set these? is it really required?
   > > ```
   > > > "--conf",
   > > > 
"spark.driver.extraClassPath=/usr/lib/hudi/hudi-aws-bundle-0.14.1.jar:/usr/lib/hudi/hudi-spark3.3-bundle_2.12-0.14.1.jar",
   > > > "--conf",
   > > > 
"spark.executor.extraClassPath=/usr/lib/hudi/hudi-aws-bundle-0.14.1.jar:/usr/lib/hudi/hudi-spark3.3-bundle_2.12-0.14.1.jar",
   > > ```
   > 
   > You can try without it. Basically 
`/usr/lib/hudi/hudi-aws-bundle-0.14.1.jar` includes minor things like 
`AwsGlueCatalogSyncTool` which behaves better then default if 
changed(https://hudi.apache.org/docs/configurations/#hoodiemetasyncclienttoolclass)
 on huge number of partitions.
   
   On the spark config side: Any dependencies off of the default spark path 
will require setting the extra class path in two places so both the driver and 
executor containers can see them.  You can use blob statements to make this 
more concise if there are no conflicts.
   
   My comments on this thread are based around the EMR environment so the long 
class path after the fact is also required if using Glue metastore.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to