[jira] [Updated] (HUDI-2279) Support column name matching for insert * and update set * in merge into when sourceTable's columns contains all targetTable's columns

ASF GitHub Bot (Jira) Thu, 05 Aug 2021 08:12:05 -0700


     [ 
https://issues.apache.org/jira/browse/HUDI-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ASF GitHub Bot updated HUDI-2279:
---------------------------------
    Labels: pull-request-available  (was: )

> Support column name matching for insert * and update set *  in merge into 
> when sourceTable's columns contains all targetTable's columns
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HUDI-2279
>                 URL: https://issues.apache.org/jira/browse/HUDI-2279
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: Spark Integration
>            Reporter: 董可伦
>            Assignee: 董可伦
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.9.0
>
>
> Example:
> {code:java}
> val tableName = generateTableName
> // Create table
> spark.sql(
>  s"""
>  |create table $tableName (
>  | id int,
>  | name string,
>  | price double,
>  | ts long,
>  | dt string
>  |) using hudi
>  | location '${tmp.getCanonicalPath}/$tableName'
>  | options (
>  | primaryKey ='id',
>  | preCombineField = 'ts'
>  | )
>  """.stripMargin)
> spark.sql(
>   s"""
>      |merge into $tableName as t0
>      |using (
>      |  select 1 as id, '2021-05-05' as dt, 1002 as ts, 97 as price, 'a1' as 
> name union all
>      |  select 1 as id, '2021-05-05' as dt, 1003 as ts, 98 as price, 'a2' as 
> name union all
>      |  select 2 as id, '2021-05-05' as dt, 1001 as ts, 99 as price, 'a3' as 
> name
>      | ) as s0
>      |on t0.id = s0.id
>      |when matched then update set *
>      |when not matched  then insert *
>      |""".stripMargin)
> spark.sql(s"select id, name, price, ts, dt from $tableName").show(){code}
> Fow now,the result is:
> +---+----------+-----+---+---+
> | id| name|price| ts| dt|
> +---+----------+-----+---+---+
> | 2|2021-05-05| 99.0| 99| a3|
> | 1|2021-05-05| 98.0| 98| a2|
> +---+----------+-----+---+---+
> When the order of the column types of souceTable is different from that of 
> the column types of targetTable
>  
> {code:java}
> spark.sql(
>   s"""
>      |merge into ${tableName} as t0
>      |using (
>      |  select 1 as id, 'a1' as name, 1002 as ts, '2021-05-05' as dt, 97 as 
> price union all
>      |  select 1 as id, 'a2' as name, 1003 as ts, '2021-05-05' as dt, 98 as 
> price union all
>      |  select 2 as id, 'a3' as name, 1001 as ts, '2021-05-05' as dt, 99 as 
> price
>      | ) as s0
>      |on t0.id = s0.id
>      |when matched then update set *
>      |when not matched  then insert *
>      |""".stripMargin){code}
>  
> It will throw an exception:
> {code:java}
> [ERROR] 2021-08-05 21:48:53,941 org.apache.hudi.io.HoodieWriteHandle  - Error 
> writing record HoodieRecord{key=HoodieKey { recordKey=id:2 partitionPath=}, 
> currentLocation='null', newLocation='null'}
> java.lang.RuntimeException: Error in execute expression: 
> org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Integer.
> Expressions is: [boundreference() AS `id`  boundreference() AS `name`  
> CAST(boundreference() AS `price` AS DOUBLE)  CAST(boundreference() AS `ts` AS 
> BIGINT)  CAST(boundreference() AS `dt` AS STRING)]
> CodeBody is: {
> ......
> Caused by: java.lang.ClassCastException: 
> org.apache.spark.unsafe.types.UTF8String cannot be cast to 
> java.lang.IntegerCaused by: java.lang.ClassCastException: 
> org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Integer 
> at 
> org.apache.hudi.sql.payload.ExpressionPayloadEvaluator_366797ae_4c30_4862_8222_7be486ede4f8.eval(Unknown
>  Source) at 
> org.apache.spark.sql.hudi.command.payload.ExpressionPayload.org$apache$spark$sql$hudi$command$payload$ExpressionPayload$$evaluate(ExpressionPayload.scala:258)
>  ... 18 more{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-2279) Support column name matching for insert * and update set * in merge into when sourceTable's columns contains all targetTable's columns

Reply via email to