[
https://issues.apache.org/jira/browse/HUDI-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HUDI-6877:
---------------------------------
Labels: pull-request-available (was: )
> Fix unqualified namespace issues in Spark3.1
> --------------------------------------------
>
> Key: HUDI-6877
> URL: https://issues.apache.org/jira/browse/HUDI-6877
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: voon
> Assignee: voon
> Priority: Major
> Labels: pull-request-available
>
> Spark3.1 uses Avro 1.8.2, where Avro schema resolution on any types that are
> allowed to have defined namespaces are strictly-matched. i.e. fields are
> resolved using their fully qualified name.
>
> This means that namespaces must match-up for reader and writer schema.
> However, when ALTER-TABLE-NAME-DLL is performed, the tableName in
> _hoodie.properties_ is changed. The Avro schema that is generated is from the
> requiredSchema struct is hence different for both reader and writer schema
> (although the field names and types are the same).
>
> This will lead to read errors, when there are log files when performing
> ALTER-TABLE-NAME-DLL.
>
> {code:java}
> test("Test rename table") {
> withTempDir { tmp =>
> // Create table with INMEMORY index to generate log only mor table.
> val tableName = generateTableName
> spark.sql(
> s"""
> |create table $tableName (
> | id int,
> | name string,
> | price decimal(20,0),
> | ts long
> |) using hudi
> | location '${tmp.getCanonicalPath}'
> | tblproperties (
> | primaryKey ='id',
> | type = 'mor',
> | preCombineField = 'ts',
> | hoodie.index.type = 'INMEMORY',
> | hoodie.compact.inline = 'true'
> | )
> """.stripMargin)
> spark.sql(s"insert into $tableName values(1, 'a1', 10, 1000),(2, 'a2',
> 10, 1000),(3, 'a3', 10, 1000)")
> spark.sql(s"ALTER TABLE $tableName rename to h0NewTableName")
> spark.sql(s"insert into h0NewTableName values(2, 'a1', 10, 1001),(2,
> 'a2', 10, 1000),(3, 'a3', 10, 1000)")
> spark.sql(s"select id, name, price, ts from h0NewTableName order by
> id").show(false)
> }
> } {code}
>
> Spark3.2 will not have this issue as it uses Avro 1.10.2. Avro schema
> resolution will resolve fields using their unqualified name.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)