aharbunou-branch opened a new issue #3894:
URL: https://github.com/apache/hudi/issues/3894
**Describe the problem you faced**
I'm migrating Hudi from 0.8.0 to 0.9.0.
I'm testing it with simple workflow that gets data from s3 and puts to a
Hudi table via Spark. This workflow is running fine for 0.8.0 in production.
When I try 0.9.0 and create brand new table then everything works fine, but
when I try to add data to existed Hudi table I see the following error `Error
syncing to metadata table.` and `Property
hoodie.datasource.write.recordkey.field not found` (see stacktrace)
The parameter mentioned in the error already set as option when I write the
data.
Could you help to identify where else I need to set the parameter?
**To Reproduce**
Steps to reproduce the behavior:
1. Create a Hudi Table using 0.8.0
2. Add data to the table using 0.9.0
**Expected behavior**
Data should be added & table should be migrated to version 2.
**Environment Description**
* Hudi version : 0.8.0 -> 0.9.0
* Spark version : 2.4.4
* Hive version : 1.2.1
* Hadoop version : 2.8.5
* Storage (HDFS/S3/GCS..) : s3
* Running on Docker? (yes/no) : yes
**Stacktrace**
```
21/10/30 00:00:14 INFO o.a.h.t.u.AbstractUpgradeDowngrade: Attempting to
move table from version ONE to TWO
Exception in thread "main"
org.apache.hudi.exception.HoodieMetadataException: Error syncing to metadata
table.
at
org.apache.hudi.client.SparkRDDWriteClient.syncTableMetadata(SparkRDDWriteClient.java:459)
at
org.apache.hudi.client.AbstractHoodieWriteClient.preWrite(AbstractHoodieWriteClient.java:407)
at
org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:156)
at
org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:214)
at
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:265)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
at
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
at
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
...
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalArgumentException: Property
hoodie.datasource.write.recordkey.field not found
at
org.apache.hudi.common.config.TypedProperties.checkKey(TypedProperties.java:48)
at
org.apache.hudi.common.config.TypedProperties.getString(TypedProperties.java:58)
at
org.apache.hudi.keygen.SimpleKeyGenerator.<init>(SimpleKeyGenerator.java:39)
at
org.apache.hudi.keygen.factory.HoodieSparkKeyGeneratorFactory.createKeyGeneratorByType(HoodieSparkKeyGeneratorFactory.java:78)
at
org.apache.hudi.keygen.factory.HoodieSparkKeyGeneratorFactory.createKeyGenerator(HoodieSparkKeyGeneratorFactory.java:57)
at
org.apache.hudi.HoodieSparkUtils$.getPartitionColumns(HoodieSparkUtils.scala:241)
at
org.apache.hudi.HoodieSparkUtils.getPartitionColumns(HoodieSparkUtils.scala)
at
org.apache.hudi.table.upgrade.OneToTwoUpgradeHandler.getPartitionColumns(OneToTwoUpgradeHandler.java:31)
at
org.apache.hudi.table.upgrade.BaseOneToTwoUpgradeHandler.upgrade(BaseOneToTwoUpgradeHandler.java:35)
at
org.apache.hudi.table.upgrade.SparkUpgradeDowngrade.upgrade(SparkUpgradeDowngrade.java:55)
at
org.apache.hudi.table.upgrade.AbstractUpgradeDowngrade.run(AbstractUpgradeDowngrade.java:123)
at
org.apache.hudi.table.upgrade.SparkUpgradeDowngrade.run(SparkUpgradeDowngrade.java:44)
at
org.apache.hudi.client.SparkRDDWriteClient.getTableAndInitCtx(SparkRDDWriteClient.java:409)
at
org.apache.hudi.client.SparkRDDWriteClient.upsertPreppedRecords(SparkRDDWriteClient.java:167)
at
org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.commit(SparkHoodieBackedTableMetadataWriter.java:106)
at
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.syncFromInstants(HoodieBackedTableMetadataWriter.java:425)
at
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.<init>(HoodieBackedTableMetadataWriter.java:121)
at
org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.<init>(SparkHoodieBackedTableMetadataWriter.java:62)
at
org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.create(SparkHoodieBackedTableMetadataWriter.java:58)
at
org.apache.hudi.client.SparkRDDWriteClient.syncTableMetadata(SparkRDDWriteClient.java:456)
... 41 more
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]