[GitHub] [hudi] aharbunou-branch opened a new issue #3894: [SUPPORT] Property hoodie.datasource.write.recordkey.field not found during version ONE to TWO migration

GitBox Fri, 29 Oct 2021 18:04:40 -0700


aharbunou-branch opened a new issue #3894:
URL: https://github.com/apache/hudi/issues/3894



   **Describe the problem you faced**
   
   I'm migrating Hudi from 0.8.0 to 0.9.0. 
   I'm testing it with simple workflow that gets data from s3 and puts to a 
Hudi table via Spark. This workflow is running fine for 0.8.0 in production.
   When I try 0.9.0 and create brand new table then everything works fine, but 
when I try to add data to existed Hudi table I see the following error `Error 
syncing to metadata table.` and `Property 
hoodie.datasource.write.recordkey.field not found` (see stacktrace)
   
   The parameter mentioned in the error already set as option when I write the 
data.
   Could you help to identify where else I need to set the parameter? 
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Create a Hudi Table using 0.8.0
   2. Add data to the table using 0.9.0
   
   **Expected behavior**
   
   Data should be added & table should be migrated to version 2.
   
   **Environment Description**
   
   * Hudi version : 0.8.0 -> 0.9.0
   
   * Spark version : 2.4.4
   
   * Hive version : 1.2.1
   
   * Hadoop version : 2.8.5
   
   * Storage (HDFS/S3/GCS..) : s3
   
   * Running on Docker? (yes/no) : yes
   
   
   **Stacktrace**
   
   ```
   21/10/30 00:00:14 INFO o.a.h.t.u.AbstractUpgradeDowngrade: Attempting to 
move table from version ONE to TWO
   Exception in thread "main" 
org.apache.hudi.exception.HoodieMetadataException: Error syncing to metadata 
table.
       at 
org.apache.hudi.client.SparkRDDWriteClient.syncTableMetadata(SparkRDDWriteClient.java:459)
       at 
org.apache.hudi.client.AbstractHoodieWriteClient.preWrite(AbstractHoodieWriteClient.java:407)
       at 
org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:156)
       at 
org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:214)
       at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:265)
       at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
       at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
       at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
       at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
       at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
       at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
       at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
       at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
       at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
       at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
       at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
       at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
       at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
       at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
       at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
       at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
       at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
       at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
       at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
       at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
       at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
       at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
       ...
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
       at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:498)
       at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
       at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
       at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
       at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
       at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
       at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
       at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
       at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   Caused by: java.lang.IllegalArgumentException: Property 
hoodie.datasource.write.recordkey.field not found
       at 
org.apache.hudi.common.config.TypedProperties.checkKey(TypedProperties.java:48)
       at 
org.apache.hudi.common.config.TypedProperties.getString(TypedProperties.java:58)
       at 
org.apache.hudi.keygen.SimpleKeyGenerator.<init>(SimpleKeyGenerator.java:39)
       at 
org.apache.hudi.keygen.factory.HoodieSparkKeyGeneratorFactory.createKeyGeneratorByType(HoodieSparkKeyGeneratorFactory.java:78)
       at 
org.apache.hudi.keygen.factory.HoodieSparkKeyGeneratorFactory.createKeyGenerator(HoodieSparkKeyGeneratorFactory.java:57)
       at 
org.apache.hudi.HoodieSparkUtils$.getPartitionColumns(HoodieSparkUtils.scala:241)
       at 
org.apache.hudi.HoodieSparkUtils.getPartitionColumns(HoodieSparkUtils.scala)
       at 
org.apache.hudi.table.upgrade.OneToTwoUpgradeHandler.getPartitionColumns(OneToTwoUpgradeHandler.java:31)
       at 
org.apache.hudi.table.upgrade.BaseOneToTwoUpgradeHandler.upgrade(BaseOneToTwoUpgradeHandler.java:35)
       at 
org.apache.hudi.table.upgrade.SparkUpgradeDowngrade.upgrade(SparkUpgradeDowngrade.java:55)
       at 
org.apache.hudi.table.upgrade.AbstractUpgradeDowngrade.run(AbstractUpgradeDowngrade.java:123)
       at 
org.apache.hudi.table.upgrade.SparkUpgradeDowngrade.run(SparkUpgradeDowngrade.java:44)
       at 
org.apache.hudi.client.SparkRDDWriteClient.getTableAndInitCtx(SparkRDDWriteClient.java:409)
       at 
org.apache.hudi.client.SparkRDDWriteClient.upsertPreppedRecords(SparkRDDWriteClient.java:167)
       at 
org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.commit(SparkHoodieBackedTableMetadataWriter.java:106)
       at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.syncFromInstants(HoodieBackedTableMetadataWriter.java:425)
       at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.<init>(HoodieBackedTableMetadataWriter.java:121)
       at 
org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.<init>(SparkHoodieBackedTableMetadataWriter.java:62)
       at 
org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.create(SparkHoodieBackedTableMetadataWriter.java:58)
       at 
org.apache.hudi.client.SparkRDDWriteClient.syncTableMetadata(SparkRDDWriteClient.java:456)
       ... 41 more
   
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] aharbunou-branch opened a new issue #3894: [SUPPORT] Property hoodie.datasource.write.recordkey.field not found during version ONE to TWO migration

Reply via email to