[jira] [Created] (HUDI-7017) Prevent full schema evolution from wrongly falling back to OOB

voon (Jira) Tue, 31 Oct 2023 19:41:05 -0700

voon created HUDI-7017:
--------------------------

             Summary: Prevent full schema evolution from wrongly falling back 
to OOB
                 Key: HUDI-7017
                 URL: https://issues.apache.org/jira/browse/HUDI-7017
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: voon



For MOR tables that have these 2 configurations enabled:

 
{code:java}
hoodie.schema.on.read.enable=true
hoodie.datasource.read.extract.partition.values.from.path=true{code}
 

 

BaseFileReader will use a *requiredSchemaReader* when reading some of the 
parquet files. This BaseFileReader will have an empty *internalSchemaStr* 
causing *Spark3XLegacyHoodieParquetInputFormat* to fall back to OOB schema 
evolution.

 

Although there are required safeguards that are added in HUDI-5400 to force the 
code execution path to use Hudi Full Schema Evolution, we should still fix this 
so that future changes that may deprecate the use of 
*Spark3XLegacyHoodieParquetInputFormat* will not cause issues.

 

A sample test to invoke this:
{code:java}
test("Test wrong fallback to OOB schema evolution") {
  withRecordType()(withTempDir { tmp =>
    Seq("mor").foreach { tableType =>
      val tableName = generateTableName
      val tablePath = s"${new Path(tmp.getCanonicalPath, 
tableName).toUri.toString}"
      if (HoodieSparkUtils.gteqSpark3_1) {
        spark.sql("set " + SPARK_SQL_INSERT_INTO_OPERATION.key + "=upsert")
        spark.sql("set hoodie.schema.on.read.enable=true")
        
spark.sql("hoodie.datasource.read.extract.partition.values.from.path=true")
        // NOTE: This is required since as this tests use type coercions which 
were only permitted in Spark 2.x
        //       and are disallowed now by default in Spark 3.x
        spark.sql("set spark.sql.storeAssignmentPolicy=legacy")
        createAndPreparePartitionTable(spark, tableName, tablePath, tableType)
        // date -> string -> date
        spark.sql(s"alter table $tableName alter column col6 type String")
        checkAnswer(spark.sql(s"select col6 from $tableName where id = 
1").collect())(
          Seq("2021-12-25")
        )
      }
    }
  })
} {code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-7017) Prevent full schema evolution from wrongly falling back to OOB

Reply via email to