[jira] [Updated] (HUDI-6877) Fix unqualified namespace issues in Spark3.1

ASF GitHub Bot (Jira) Wed, 20 Sep 2023 19:11:04 -0700


     [ 
https://issues.apache.org/jira/browse/HUDI-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ASF GitHub Bot updated HUDI-6877:
---------------------------------
    Labels: pull-request-available  (was: )

> Fix unqualified namespace issues in Spark3.1
> --------------------------------------------
>
>                 Key: HUDI-6877
>                 URL: https://issues.apache.org/jira/browse/HUDI-6877
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: voon
>            Assignee: voon
>            Priority: Major
>              Labels: pull-request-available
>
> Spark3.1 uses Avro 1.8.2, where Avro schema resolution on any types that are 
> allowed to have defined namespaces are strictly-matched. i.e. fields are 
> resolved using their fully qualified name. 
>  
> This means that namespaces must match-up for reader and writer schema. 
> However, when ALTER-TABLE-NAME-DLL is performed, the tableName in 
> _hoodie.properties_ is changed. The Avro schema that is generated is from the 
> requiredSchema struct is hence different for both reader and writer schema 
> (although the field names and types are the same).
>  
> This will lead to read errors, when there are log files when performing 
> ALTER-TABLE-NAME-DLL.
>  
> {code:java}
> test("Test rename table") {
>   withTempDir { tmp =>
>     // Create table with INMEMORY index to generate log only mor table.
>     val tableName = generateTableName
>     spark.sql(
>       s"""
>          |create table $tableName (
>          |  id int,
>          |  name string,
>          |  price decimal(20,0),
>          |  ts long
>          |) using hudi
>          | location '${tmp.getCanonicalPath}'
>          | tblproperties (
>          |  primaryKey ='id',
>          |  type = 'mor',
>          |  preCombineField = 'ts',
>          |  hoodie.index.type = 'INMEMORY',
>          |  hoodie.compact.inline = 'true'
>          | )
>      """.stripMargin)
>     spark.sql(s"insert into $tableName values(1, 'a1', 10, 1000),(2, 'a2', 
> 10, 1000),(3, 'a3', 10, 1000)")
>     spark.sql(s"ALTER TABLE $tableName rename to h0NewTableName")
>     spark.sql(s"insert into h0NewTableName values(2, 'a1', 10, 1001),(2, 
> 'a2', 10, 1000),(3, 'a3', 10, 1000)")
>     spark.sql(s"select id, name, price, ts from h0NewTableName order by 
> id").show(false)
>   }
> } {code}
>  
> Spark3.2 will not have this issue as it uses Avro 1.10.2. Avro schema 
> resolution will resolve fields using their unqualified name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-6877) Fix unqualified namespace issues in Spark3.1

Reply via email to