[GitHub] [hudi] chandu-1101 commented on issue #9141: [SUPPORT] Example from Hudi Quick start doesnt work!

via GitHub Tue, 18 Jul 2023 02:20:42 -0700


chandu-1101 commented on issue #9141:
URL: https://github.com/apache/hudi/issues/9141#issuecomment-1639855549


   Hi,
   
    @ad1happy2go I get the same exception again. Below are the 4 variants I 
tried. 
   
   spark shell command: With -jars taking my custom jar. --package referring to 
hudi.jar
   
   ```
   spark-shell --driver-memory 1g --executor-memory 4g --executor-cores 1 
--driver-cores 1 --conf spark.dynamicAllocation.maxExecutors=2 --conf 
"spark.serializer=org.apache.spark.serializer.KryoSerializer" --conf 
"spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension" 
--conf 
"spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog"
 --conf "spark.sql.caseSensitive=true"  --conf 
"spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension" 
--conf 
"spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog"
 --conf spark.sql.legacy.parquet.int96RebaseModeInRead=CORRECTED --conf 
spark.sql.legacy.parquet.int96RebaseModeInWrite=CORRECTED --conf 
spark.sql.legacy.parquet.datetimeRebaseModeInRead=CORRECTED --conf 
spark.sql.legacy.parquet.datetimeRebaseModeInWrite=CORRECTED --name ravic  
--packages org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.1    --jars 
/home/hadoop/jars2/spark-1.0-SNAPSHOT.jar
   
   ```
   
   spark shell command: With -jars taking my custom jar and  hudi.jar
   
   ```
   spark-shell --driver-memory 1g --executor-memory 4g --executor-cores 1 
--driver-cores 1 --conf spark.dynamicAllocation.maxExecutors=2 --conf 
"spark.serializer=org.apache.spark.serializer.KryoSerializer" --conf 
"spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension" 
--conf 
"spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog"
 --conf "spark.sql.caseSensitive=true"  --conf 
"spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension" 
--conf 
"spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog"
 --conf spark.sql.legacy.parquet.int96RebaseModeInRead=CORRECTED --conf 
spark.sql.legacy.parquet.int96RebaseModeInWrite=CORRECTED --conf 
spark.sql.legacy.parquet.datetimeRebaseModeInRead=CORRECTED --conf 
spark.sql.legacy.parquet.datetimeRebaseModeInWrite=CORRECTED --name ravic     
--jars 
/home/hadoop/jars2/spark-1.0-SNAPSHOT.jar,/home/hadoop/hudi/hudi-release-0.12.3/packaging/hudi-spark-bundle/target/hudi
 -spark3.3-bundle_2.12-0.12.3.jar
   ```
   
   
   spark shell command: With -jars taking hudi.jar (no custom jar)
   
   ```
   spark-shell --driver-memory 1g --executor-memory 4g --executor-cores 1 
--driver-cores 1 --conf spark.dynamicAllocation.maxExecutors=2 --conf 
"spark.serializer=org.apache.spark.serializer.KryoSerializer" --conf 
"spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension" 
--conf 
"spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog"
 --conf "spark.sql.caseSensitive=true"  --conf 
"spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension" 
--conf 
"spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog"
 --conf spark.sql.legacy.parquet.int96RebaseModeInRead=CORRECTED --conf 
spark.sql.legacy.parquet.int96RebaseModeInWrite=CORRECTED --conf 
spark.sql.legacy.parquet.datetimeRebaseModeInRead=CORRECTED --conf 
spark.sql.legacy.parquet.datetimeRebaseModeInWrite=CORRECTED --name ravic     
--jars 
/home/hadoop/hudi/hudi-release-0.12.3/packaging/hudi-spark-bundle/target/hudi-spark3.3-bundle_2.12-0.12.3.jar
   ```
   
   spark shell command: With  only  --package switch (no custom jar, no hudi 
jar)
   
   ```
   spark-shell --driver-memory 1g --executor-memory 4g --executor-cores 1 
--driver-cores 1 --conf spark.dynamicAllocation.maxExecutors=2 --conf 
"spark.serializer=org.apache.spark.serializer.KryoSerializer" --conf 
"spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension" 
--conf 
"spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog"
 --conf "spark.sql.caseSensitive=true"  --conf 
"spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension" 
--conf 
"spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog"
 --conf spark.sql.legacy.parquet.int96RebaseModeInRead=CORRECTED --conf 
spark.sql.legacy.parquet.int96RebaseModeInWrite=CORRECTED --conf 
spark.sql.legacy.parquet.datetimeRebaseModeInRead=CORRECTED --conf 
spark.sql.legacy.parquet.datetimeRebaseModeInWrite=CORRECTED --name ravic 
--packages org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.1 
   ```
   
   
   WARNS I get  when I run the code (same above code. exactly same.)
   
   ```
   07-18 08:58:36 ${sys:config.appname} WARN HoodieSparkSqlWriter$: hoodie 
table at s3://bucket/snapshots-hudi/ge11-drop/snapshot already exists. Deleting 
existing data & overwriting with new data.
   07-18 08:58:38 ${sys:config.appname} WARN SQLConf: The SQL config 
'spark.sql.legacy.parquet.int96RebaseModeInWrite' has been deprecated in Spark 
v3.2 and may be removed in the future. Use 
'spark.sql.parquet.int96RebaseModeInWrite' instead.
   07-18 08:58:38 ${sys:config.appname} WARN SQLConf: The SQL config 
'spark.sql.legacy.parquet.int96RebaseModeInRead' has been deprecated in Spark 
v3.2 and may be removed in the future. Use 
'spark.sql.parquet.int96RebaseModeInRead' instead.
   07-18 08:58:38 ${sys:config.appname} WARN SQLConf: The SQL config 
'spark.sql.legacy.parquet.datetimeRebaseModeInWrite' has been deprecated in 
Spark v3.2 and may be removed in the future. Use 
'spark.sql.parquet.datetimeRebaseModeInWrite' instead.
   07-18 08:58:38 ${sys:config.appname} WARN SQLConf: The SQL config 
'spark.sql.legacy.parquet.datetimeRebaseModeInRead' has been deprecated in 
Spark v3.2 and may be removed in the future. Use 
'spark.sql.parquet.datetimeRebaseModeInRead' instead.
   07-18 08:58:39 ${sys:config.appname} WARN HiveConf: HiveConf of name 
hive.stats.jdbc.timeout does not exist
   07-18 08:58:39 ${sys:config.appname} WARN HiveConf: HiveConf of name 
hive.stats.retries.wait does not exist
   07-18 08:58:41 ${sys:config.appname} WARN ObjectStore: Version information 
not found in metastore. hive.metastore.schema.verification is not enabled so 
recording the schema version 2.3.0
   07-18 08:58:41 ${sys:config.appname} WARN ObjectStore: 
setMetaStoreSchemaVersion called but recording version is disabled: version = 
2.3.0, comment = Set by MetaStore [email protected]
   07-18 08:58:43 ${sys:config.appname} WARN HoodieBackedTableMetadata: 
Metadata table was not found at path 
s3://bucket/snapshots-hudi/ge11-drop/snapshot/.hoodie/metadata
   ```
   
   
   
   Exception. Again the same. Please note that we have columns with same name 
but different case and hence the flags enabled in the spark shell command
   
   ```
   07-18 08:58:43 ${sys:config.appname} WARN HoodieBackedTableMetadata: 
Metadata table was not found at path 
s3://p-crm-messaging-v2/snapshots-hudi/ge11-drop/snapshot/.hoodie/metadata
   07-18 08:59:03 ${sys:config.appname} ERROR HoodieSparkSqlWriter$: UPSERT 
failed with errors
   org.apache.hudi.exception.HoodieException: Write to Hudi failed
     at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:153)
     at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
     at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
     at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
     at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
     at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:103)
     at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
     at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224)
     at 
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:114)
     at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$7(SQLExecution.scala:139)
     at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
     at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224)
     at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:139)
     at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:245)
     at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:138)
     at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
     at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
     at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:100)
     at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:96)
     at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:615)
     at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:177)
     at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:615)
     at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
     at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
     at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
     at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
     at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
     at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:591)
     at 
org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:96)
     at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:83)
     at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:81)
     at 
org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:124)
     at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:860)
     at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:390)
     at 
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:363)
     at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239)
     ... 49 elided
   
   scala>
   ```
   
   Kindly let me know if I can do something more to get this working?
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] chandu-1101 commented on issue #9141: [SUPPORT] Example from Hudi Quick start doesnt work!

Reply via email to