wangjunjie-lnnf opened a new issue, #11738:
URL: https://github.com/apache/hudi/issues/11738

   We have much hudi table of version 0.6.0,  We want to upgrade to version 
0.14.1 or 0.15.0, so we did some test. When we write table of version 0.6.0 
with client of 0.15.0, some error happen
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. create table with 0.6.0
   2. write table with 0.15.0
   3.
   4.
   
   **Expected behavior**
   
   upgrade success
   
   **Environment Description**
   
   * Hudi version : 0.6.0 & 0.15.0
   
   * Spark version : 
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) : 
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```scala
   Exception in thread "main" org.apache.hudi.exception.HoodieException: Config 
conflict(key    current value   existing value):
   RecordKey:   id      null
        at 
org.apache.hudi.HoodieWriterUtils$.validateTableConfig(HoodieWriterUtils.scala:229)
   HoodieWriterUtils.scala:229
        at 
org.apache.hudi.HoodieSparkSqlWriterInternal.writeInternal(HoodieSparkSqlWriter.scala:232)
   HoodieSparkSqlWriter.scala:232
        at 
org.apache.hudi.HoodieSparkSqlWriterInternal.write(HoodieSparkSqlWriter.scala:187)
   HoodieSparkSqlWriter.scala:187
        at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:125)
   HoodieSparkSqlWriter.scala:125
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:168)
   DefaultSource.scala:168
        at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:47)
   SaveIntoDataSourceCommand.scala:47
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
   commands.scala:75
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
   commands.scala:73
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
   commands.scala:84
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:97)
   QueryExecution.scala:97
   ```
   
   the reason is table of 0.6.0 not record fields info in hoodie.properties
   ```
   hoodie.table.precombine.field=x
   hoodie.table.partition.fields=y
   hoodie.table.recordkey.fields=x
   ```
   
   but when write with client of 0.15.0, it did some validation, the error 
occur in the validation. the validation should skip when current table version 
is too low without need info in hoodie.properties
   
   ```scala
   object HoodieSparkSqlWriter {
   
     private def writeInternal(sqlContext: SQLContext, 
                               mode: SaveMode,
                               optParams: Map[String, String],
                               ...) {
       
       var tableConfig = getHoodieTableConfig(sparkContext, path, mode, ...)
       
       // 验证option参数和hoodie.properties是否一致
       // 低版本的hoodie.properties没有记录字段信息
       validateTableConfig(sqlContext.sparkSession, optParams, tableConfig, 
mode == SaveMode.Overwrite)
     }
   }
   ```
   
   when we fill fields info in hoodie.properties of version 0.6.0 by ourself, 
the upgrade success.
   
   <b>by the way: is it safe upgrade from 0.6.0 to 0.15.0 or 0.14.1 ? </b>
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to