[jira] [Updated] (HUDI-5977) Fix Date to String casts when non-vectorized reader is used

voon (Jira) Thu, 23 Mar 2023 23:09:11 -0700


     [ 
https://issues.apache.org/jira/browse/HUDI-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


voon updated HUDI-5977:
-----------------------
    Description: 
When a Date -> String type conversion is performed and when the non-vectorized 
reader is used, the table becomes unreadable.

 

Test casae to replicate this issue

 
{code:java}
test("Test Date -> String conversion when vectorized reading is not enabled") {
  val tableName = generateTableName
  spark.sql(
    s"""
       | create table $tableName (
       |  id int,
       |  name string,
       |  price double,
       |  ts long
       |) using hudi
       | partitioned by (ts)
       |tblproperties (
       |  primaryKey = 'id'
       )
     """.stripMargin)
  spark.sql(
    s"""
       | insert into $tableName
       | select 1 as id, 'a1' as name, 10 as price, 1000 as ts
      """.stripMargin)
  spark.sql("set hoodie.schema.on.read.enable = true")  // adding a struct 
column to force reads reads to fallback to non-vectorized reading
  spark.sql(s"alter table $tableName add column (`new_struct_col` STRUCT<f0: 
INTEGER, f1: STRING>)")
  spark.sql(
    s"""
       | insert into $tableName
       | values (2, 'a2', 20, struct(2, 'f_2'), 1001)
      """.stripMargin)  spark.sql(s"alter table $tableName add column 
(`date_to_string_col` date)")
  spark.sql(
    s"""
       | insert into $tableName
       | values (3, 'a3', 30, struct(3, 'f_3'), date '2023-03-22', 1002)
      """.stripMargin)
  spark.sql(s"alter table $tableName alter column `date_to_string_col` type 
string")
  spark.sql(s"select * from $tableName").show(false)
}{code}
 

 

  was:
When a Date -> String type conversion is performed and when the non-vectorized 
reader is used, the table becomes unreadable.

 

Test casae to replicate this issue

 
{code:java}
test("Test Date -> String conversion when vectorized reading is not enabled") {
  val tableName = generateTableName
  spark.sql(
    s"""
       | create table $tableName (
       |  id int,
       |  name string,
       |  price double,
       |  ts long
       |) using hudi
       | partitioned by (ts)
       |tblproperties (
       |  primaryKey = 'id'
       )
     """.stripMargin)
  spark.sql(
    s"""
       | insert into $tableName
       | select 1 as id, 'a1' as name, 10 as price, 1000 as ts
      """.stripMargin)
  spark.sql("set hoodie.schema.on.read.enable = true")  // adding a struct 
column to force reads future reads to fallback to a non-vectorized reader
  spark.sql(s"alter table $tableName add column (`new_struct_col` STRUCT<f0: 
INTEGER, f1: STRING>)")
  spark.sql(
    s"""
       | insert into $tableName
       | values (2, 'a2', 20, struct(2, 'f_2'), 1001)
      """.stripMargin)  spark.sql(s"alter table $tableName add column 
(`date_to_string_col` date)")
  spark.sql(
    s"""
       | insert into $tableName
       | values (3, 'a3', 30, struct(3, 'f_3'), date '2023-03-22', 1002)
      """.stripMargin)
  spark.sql(s"alter table $tableName alter column `date_to_string_col` type 
string")
  spark.sql(s"select * from $tableName").show(false)
}{code}
 

 


> Fix Date to String casts when non-vectorized reader is used
> -----------------------------------------------------------
>
>                 Key: HUDI-5977
>                 URL: https://issues.apache.org/jira/browse/HUDI-5977
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: voon
>            Assignee: voon
>            Priority: Major
>
> When a Date -> String type conversion is performed and when the 
> non-vectorized reader is used, the table becomes unreadable.
>  
> Test casae to replicate this issue
>  
> {code:java}
> test("Test Date -> String conversion when vectorized reading is not enabled") 
> {
>   val tableName = generateTableName
>   spark.sql(
>     s"""
>        | create table $tableName (
>        |  id int,
>        |  name string,
>        |  price double,
>        |  ts long
>        |) using hudi
>        | partitioned by (ts)
>        |tblproperties (
>        |  primaryKey = 'id'
>        )
>      """.stripMargin)
>   spark.sql(
>     s"""
>        | insert into $tableName
>        | select 1 as id, 'a1' as name, 10 as price, 1000 as ts
>       """.stripMargin)
>   spark.sql("set hoodie.schema.on.read.enable = true")  // adding a struct 
> column to force reads reads to fallback to non-vectorized reading
>   spark.sql(s"alter table $tableName add column (`new_struct_col` STRUCT<f0: 
> INTEGER, f1: STRING>)")
>   spark.sql(
>     s"""
>        | insert into $tableName
>        | values (2, 'a2', 20, struct(2, 'f_2'), 1001)
>       """.stripMargin)  spark.sql(s"alter table $tableName add column 
> (`date_to_string_col` date)")
>   spark.sql(
>     s"""
>        | insert into $tableName
>        | values (3, 'a3', 30, struct(3, 'f_3'), date '2023-03-22', 1002)
>       """.stripMargin)
>   spark.sql(s"alter table $tableName alter column `date_to_string_col` type 
> string")
>   spark.sql(s"select * from $tableName").show(false)
> }{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5977) Fix Date to String casts when non-vectorized reader is used

Reply via email to