voon created HUDI-7017:
--------------------------
Summary: Prevent full schema evolution from wrongly falling back
to OOB
Key: HUDI-7017
URL: https://issues.apache.org/jira/browse/HUDI-7017
Project: Apache Hudi
Issue Type: Bug
Reporter: voon
For MOR tables that have these 2 configurations enabled:
{code:java}
hoodie.schema.on.read.enable=true
hoodie.datasource.read.extract.partition.values.from.path=true{code}
BaseFileReader will use a *requiredSchemaReader* when reading some of the
parquet files. This BaseFileReader will have an empty *internalSchemaStr*
causing *Spark3XLegacyHoodieParquetInputFormat* to fall back to OOB schema
evolution.
Although there are required safeguards that are added in HUDI-5400 to force the
code execution path to use Hudi Full Schema Evolution, we should still fix this
so that future changes that may deprecate the use of
*Spark3XLegacyHoodieParquetInputFormat* will not cause issues.
A sample test to invoke this:
{code:java}
test("Test wrong fallback to OOB schema evolution") {
withRecordType()(withTempDir { tmp =>
Seq("mor").foreach { tableType =>
val tableName = generateTableName
val tablePath = s"${new Path(tmp.getCanonicalPath,
tableName).toUri.toString}"
if (HoodieSparkUtils.gteqSpark3_1) {
spark.sql("set " + SPARK_SQL_INSERT_INTO_OPERATION.key + "=upsert")
spark.sql("set hoodie.schema.on.read.enable=true")
spark.sql("hoodie.datasource.read.extract.partition.values.from.path=true")
// NOTE: This is required since as this tests use type coercions which
were only permitted in Spark 2.x
// and are disallowed now by default in Spark 3.x
spark.sql("set spark.sql.storeAssignmentPolicy=legacy")
createAndPreparePartitionTable(spark, tableName, tablePath, tableType)
// date -> string -> date
spark.sql(s"alter table $tableName alter column col6 type String")
checkAnswer(spark.sql(s"select col6 from $tableName where id =
1").collect())(
Seq("2021-12-25")
)
}
}
})
} {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)