chandu-1101 commented on issue #9141:
URL: https://github.com/apache/hudi/issues/9141#issuecomment-1639855549
Hi,
@ad1happy2go I get the same exception again. Below are the 4 variants I
tried.
spark shell command: With -jars taking my custom jar. --package referring to
hudi.jar
```
spark-shell --driver-memory 1g --executor-memory 4g --executor-cores 1
--driver-cores 1 --conf spark.dynamicAllocation.maxExecutors=2 --conf
"spark.serializer=org.apache.spark.serializer.KryoSerializer" --conf
"spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension"
--conf
"spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog"
--conf "spark.sql.caseSensitive=true" --conf
"spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension"
--conf
"spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog"
--conf spark.sql.legacy.parquet.int96RebaseModeInRead=CORRECTED --conf
spark.sql.legacy.parquet.int96RebaseModeInWrite=CORRECTED --conf
spark.sql.legacy.parquet.datetimeRebaseModeInRead=CORRECTED --conf
spark.sql.legacy.parquet.datetimeRebaseModeInWrite=CORRECTED --name ravic
--packages org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.1 --jars
/home/hadoop/jars2/spark-1.0-SNAPSHOT.jar
```
spark shell command: With -jars taking my custom jar and hudi.jar
```
spark-shell --driver-memory 1g --executor-memory 4g --executor-cores 1
--driver-cores 1 --conf spark.dynamicAllocation.maxExecutors=2 --conf
"spark.serializer=org.apache.spark.serializer.KryoSerializer" --conf
"spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension"
--conf
"spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog"
--conf "spark.sql.caseSensitive=true" --conf
"spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension"
--conf
"spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog"
--conf spark.sql.legacy.parquet.int96RebaseModeInRead=CORRECTED --conf
spark.sql.legacy.parquet.int96RebaseModeInWrite=CORRECTED --conf
spark.sql.legacy.parquet.datetimeRebaseModeInRead=CORRECTED --conf
spark.sql.legacy.parquet.datetimeRebaseModeInWrite=CORRECTED --name ravic
--jars
/home/hadoop/jars2/spark-1.0-SNAPSHOT.jar,/home/hadoop/hudi/hudi-release-0.12.3/packaging/hudi-spark-bundle/target/hudi
-spark3.3-bundle_2.12-0.12.3.jar
```
spark shell command: With -jars taking hudi.jar (no custom jar)
```
spark-shell --driver-memory 1g --executor-memory 4g --executor-cores 1
--driver-cores 1 --conf spark.dynamicAllocation.maxExecutors=2 --conf
"spark.serializer=org.apache.spark.serializer.KryoSerializer" --conf
"spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension"
--conf
"spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog"
--conf "spark.sql.caseSensitive=true" --conf
"spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension"
--conf
"spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog"
--conf spark.sql.legacy.parquet.int96RebaseModeInRead=CORRECTED --conf
spark.sql.legacy.parquet.int96RebaseModeInWrite=CORRECTED --conf
spark.sql.legacy.parquet.datetimeRebaseModeInRead=CORRECTED --conf
spark.sql.legacy.parquet.datetimeRebaseModeInWrite=CORRECTED --name ravic
--jars
/home/hadoop/hudi/hudi-release-0.12.3/packaging/hudi-spark-bundle/target/hudi-spark3.3-bundle_2.12-0.12.3.jar
```
spark shell command: With only --package switch (no custom jar, no hudi
jar)
```
spark-shell --driver-memory 1g --executor-memory 4g --executor-cores 1
--driver-cores 1 --conf spark.dynamicAllocation.maxExecutors=2 --conf
"spark.serializer=org.apache.spark.serializer.KryoSerializer" --conf
"spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension"
--conf
"spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog"
--conf "spark.sql.caseSensitive=true" --conf
"spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension"
--conf
"spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog"
--conf spark.sql.legacy.parquet.int96RebaseModeInRead=CORRECTED --conf
spark.sql.legacy.parquet.int96RebaseModeInWrite=CORRECTED --conf
spark.sql.legacy.parquet.datetimeRebaseModeInRead=CORRECTED --conf
spark.sql.legacy.parquet.datetimeRebaseModeInWrite=CORRECTED --name ravic
--packages org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.1
```
WARNS I get when I run the code (same above code. exactly same.)
```
07-18 08:58:36 ${sys:config.appname} WARN HoodieSparkSqlWriter$: hoodie
table at s3://bucket/snapshots-hudi/ge11-drop/snapshot already exists. Deleting
existing data & overwriting with new data.
07-18 08:58:38 ${sys:config.appname} WARN SQLConf: The SQL config
'spark.sql.legacy.parquet.int96RebaseModeInWrite' has been deprecated in Spark
v3.2 and may be removed in the future. Use
'spark.sql.parquet.int96RebaseModeInWrite' instead.
07-18 08:58:38 ${sys:config.appname} WARN SQLConf: The SQL config
'spark.sql.legacy.parquet.int96RebaseModeInRead' has been deprecated in Spark
v3.2 and may be removed in the future. Use
'spark.sql.parquet.int96RebaseModeInRead' instead.
07-18 08:58:38 ${sys:config.appname} WARN SQLConf: The SQL config
'spark.sql.legacy.parquet.datetimeRebaseModeInWrite' has been deprecated in
Spark v3.2 and may be removed in the future. Use
'spark.sql.parquet.datetimeRebaseModeInWrite' instead.
07-18 08:58:38 ${sys:config.appname} WARN SQLConf: The SQL config
'spark.sql.legacy.parquet.datetimeRebaseModeInRead' has been deprecated in
Spark v3.2 and may be removed in the future. Use
'spark.sql.parquet.datetimeRebaseModeInRead' instead.
07-18 08:58:39 ${sys:config.appname} WARN HiveConf: HiveConf of name
hive.stats.jdbc.timeout does not exist
07-18 08:58:39 ${sys:config.appname} WARN HiveConf: HiveConf of name
hive.stats.retries.wait does not exist
07-18 08:58:41 ${sys:config.appname} WARN ObjectStore: Version information
not found in metastore. hive.metastore.schema.verification is not enabled so
recording the schema version 2.3.0
07-18 08:58:41 ${sys:config.appname} WARN ObjectStore:
setMetaStoreSchemaVersion called but recording version is disabled: version =
2.3.0, comment = Set by MetaStore [email protected]
07-18 08:58:43 ${sys:config.appname} WARN HoodieBackedTableMetadata:
Metadata table was not found at path
s3://bucket/snapshots-hudi/ge11-drop/snapshot/.hoodie/metadata
```
Exception. Again the same. Please note that we have columns with same name
but different case and hence the flags enabled in the spark shell command
```
07-18 08:58:43 ${sys:config.appname} WARN HoodieBackedTableMetadata:
Metadata table was not found at path
s3://p-crm-messaging-v2/snapshots-hudi/ge11-drop/snapshot/.hoodie/metadata
07-18 08:59:03 ${sys:config.appname} ERROR HoodieSparkSqlWriter$: UPSERT
failed with errors
org.apache.hudi.exception.HoodieException: Write to Hudi failed
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:153)
at
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
at
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:103)
at
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224)
at
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:114)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$7(SQLExecution.scala:139)
at
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:139)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:245)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:138)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
at
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:100)
at
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:96)
at
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:615)
at
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:177)
at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:615)
at
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
at
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
at
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
at
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
at
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:591)
at
org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:96)
at
org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:83)
at
org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:81)
at
org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:124)
at
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:860)
at
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:390)
at
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:363)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239)
... 49 elided
scala>
```
Kindly let me know if I can do something more to get this working?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]