ROOBALJINDAL opened a new issue, #11469:
URL: https://github.com/apache/hudi/issues/11469
**Describe the problem you faced**
We are creating empty hudi tables from java as follows
```
Dataset<Row> emptyDF = spark.createDataFrame(new ArrayList<Row>(),
schemaStruct);
emptyDF.write()
.format("org.apache.hudi")
.options(tableConf.getHudiOptions())
.mode(SaveMode.Append)
.save();
```
Spark conf:
```
entryPoint: /hudi/hudi-addon-edfx.jar
sparkParamsArguments = ["--class
com.edifecs.em.cloud.hudi.setup.PreCreateEmptyTablesInHudi",
"--conf spark.jars=/usr/lib/hudi/hudi-utilities-bundle.jar",
"--conf spark.executor.instances=0",
"--conf spark.executor.memory=4g",
"--conf spark.driver.memory=4g",
"--conf spark.driver.cores=4",
"--conf spark.dynamicAllocation.initialExecutors=1"
```
This used to work fine but suddenly stopped working after hudi upgraded from
13.1 to 14.0 (Emr upgraded from 6.12 to 6.15)
I refered to similar issue: [](https://github.com/apache/hudi/issues/2997)
I also added hudi-spark3-bundle_2.12-0.14.0.jar to the spark.jars but it
didnt work. Dont know why it is not able to find this class.
**Environment Description**
* Hudi version : 14.0
* AWS EMR version : 6.15
**Stacktrace**
```24/06/18 12:02:18 ERROR PreCreateEmptyTablesInHudi: Exception encountered
while generating table ehcpencountererror :
org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed
to find the data source: org.apache.hudi. Please find packages at
`https://spark.apache.org/third-party-projects.html`.
at
org.apache.spark.sql.errors.QueryExecutionErrors$.dataSourceNotFoundError(QueryExecutionErrors.scala:739)
~[spark-catalyst_2.12-3.4.1-amzn-2.jar:3.4.1-amzn-2]
at
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:647)
~[spark-sql_2.12-3.4.1-amzn-2.jar:3.4.1-amzn-2]
at
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:697)
~[spark-sql_2.12-3.4.1-amzn-2.jar:3.4.1-amzn-2]
at
org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:860)
~[spark-sql_2.12-3.4.1-amzn-2.jar:3.4.1-amzn-2]
at
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:256)
~[spark-sql_2.12-3.4.1-amzn-2.jar:3.4.1-amzn-2]
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:247)
~[spark-sql_2.12-3.4.1-amzn-2.jar:3.4.1-amzn-2]
at
com.edifecs.em.cloud.hudi.setup.PreCreateEmptyTablesInHudi.lambda$main$0(PreCreateEmptyTablesInHudi.java:170)
~[?:?]
at
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) ~[?:?]
at
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)
~[?:?]
at
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) ~[?:?]
at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
~[?:?]
at
java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:754) ~[?:?]
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
~[?:?]
at
java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
~[?:?]
at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655) ~[?:?]
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
~[?:?]
at
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
~[?:?]
Caused by: java.lang.ClassNotFoundException: org.apache.hudi.DefaultSource
at
jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
~[?:?]
at
jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
~[?:?]
at java.lang.ClassLoader.loadClass(ClassLoader.java:525) ~[?:?]
at
org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$5(DataSource.scala:633)
~[spark-sql_2.12-3.4.1-amzn-2.jar:3.4.1-amzn-2]
at scala.util.Try$.apply(Try.scala:213) ~[scala-library-2.12.15.jar:?]
at
org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$4(DataSource.scala:633)
~[spark-sql_2.12-3.4.1-amzn-2.jar:3.4.1-amzn-2]
at scala.util.Failure.orElse(Try.scala:224)
~[scala-library-2.12.15.jar:?]
at
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:633)
~[spark-sql_2.12-3.4.1-amzn-2.jar:3.4.1-amzn-2]
... 15 more```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]