[ 
https://issues.apache.org/jira/browse/HUDI-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17398460#comment-17398460
 ] 

ASF GitHub Bot commented on HUDI-2279:
--------------------------------------

pengzhiwei2018 commented on pull request #3415:
URL: https://github.com/apache/hudi/pull/3415#issuecomment-898216727


   > > Hi @dongkelun , Thanks for the contribution for this. Overall LGTM 
except some minor optimize. And also you can run the test case in spark3 by the 
follow command:
   > > > mvn clean install -DskipTests -Pspark3
   > > > mvn test -Punit-tests -Pspark3 -pl hudi-spark-datasource/hudi-spark
   > 
   > Hi, @pengzhiwei2018 The result is:'Tests: succeeded 56, failed 6, canceled 
0, ignored 0, pending 0'.Two of them are ORC exceptions, and the other three I 
think are due to time zone differences, but I don't know how to solve the time 
zone difference, and the other one is the mismatch of exception information. 
The detailed results are as follows:
   > 
   > `1、Test Different Type of Partition Column *** FAILED *** Expected 
Array([1,a1,10,2021-05-20 00:00:00], [2,a2,10,2021-05-20 00:00:00]), but got 
Array([1,a1,10.0,2021-05-20 15:00:00], [2,a2,10.0,2021-05-20 15:00:00]) 2、- 
Test MergeInto Exception *** FAILED *** Expected "... for target field: '[id]' 
in merge into upda...", but got "... for target field: '[_ts]' in merge into 
upda..." (TestHoodieSqlBase.scala:86) 3、test basic HoodieSparkSqlWriter 
functionality with datasource insert for COPY_ON_WRITE with ORC as the base 
file format with populate meta fields true *** FAILED *** 4、test basic 
HoodieSparkSqlWriter functionality with datasource insert for MERGE_ON_READ 
with ORC as the base file format with populate meta fields true *** FAILED *** 
5、Test Sql Statements *** FAILED *** java.lang.IllegalArgumentException: 
UnExpect result for: select id, name, price, cast(dt as string) from h0_p 
Expect: 1 a1 10 2021-05-07 00:00:00, Actual: 1 a1 10 2021-05-07 15:00:00 6、Test 
Create Table As Select *** FAILED *** Expected Array([1,a1,10,2021-05-06 
00:00:00]), but got Array([1,a1,10,2021-05-06 15:00:00]) 
(TestHoodieSqlBase.scala:78) `
   
   I have rebased the code to the master and test for spark3. Except the test 
for orc, others has passed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> Support column name matching for insert * and update set *  in merge into 
> when sourceTable's columns contains all targetTable's columns
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HUDI-2279
>                 URL: https://issues.apache.org/jira/browse/HUDI-2279
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: Spark Integration
>            Reporter: 董可伦
>            Assignee: 董可伦
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.9.0
>
>
> Example:
> {code:java}
> val tableName = generateTableName
> // Create table
> spark.sql(
>  s"""
>  |create table $tableName (
>  | id int,
>  | name string,
>  | price double,
>  | ts long,
>  | dt string
>  |) using hudi
>  | location '${tmp.getCanonicalPath}/$tableName'
>  | options (
>  | primaryKey ='id',
>  | preCombineField = 'ts'
>  | )
>  """.stripMargin)
> spark.sql(
>   s"""
>      |merge into $tableName as t0
>      |using (
>      |  select 1 as id, '2021-05-05' as dt, 1002 as ts, 97 as price, 'a1' as 
> name union all
>      |  select 1 as id, '2021-05-05' as dt, 1003 as ts, 98 as price, 'a2' as 
> name union all
>      |  select 2 as id, '2021-05-05' as dt, 1001 as ts, 99 as price, 'a3' as 
> name
>      | ) as s0
>      |on t0.id = s0.id
>      |when matched then update set *
>      |when not matched  then insert *
>      |""".stripMargin)
> spark.sql(s"select id, name, price, ts, dt from $tableName").show(){code}
> Fow now,the result is:
> +---+----------+-----+---+---+
> | id| name|price| ts| dt|
> +---+----------+-----+---+---+
> | 2|2021-05-05| 99.0| 99| a3|
> | 1|2021-05-05| 98.0| 98| a2|
> +---+----------+-----+---+---+
> When the order of the column types of souceTable is different from that of 
> the column types of targetTable
>  
> {code:java}
> spark.sql(
>   s"""
>      |merge into ${tableName} as t0
>      |using (
>      |  select 1 as id, 'a1' as name, 1002 as ts, '2021-05-05' as dt, 97 as 
> price union all
>      |  select 1 as id, 'a2' as name, 1003 as ts, '2021-05-05' as dt, 98 as 
> price union all
>      |  select 2 as id, 'a3' as name, 1001 as ts, '2021-05-05' as dt, 99 as 
> price
>      | ) as s0
>      |on t0.id = s0.id
>      |when matched then update set *
>      |when not matched  then insert *
>      |""".stripMargin){code}
>  
> It will throw an exception:
> {code:java}
> [ERROR] 2021-08-05 21:48:53,941 org.apache.hudi.io.HoodieWriteHandle  - Error 
> writing record HoodieRecord{key=HoodieKey { recordKey=id:2 partitionPath=}, 
> currentLocation='null', newLocation='null'}
> java.lang.RuntimeException: Error in execute expression: 
> org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Integer.
> Expressions is: [boundreference() AS `id`  boundreference() AS `name`  
> CAST(boundreference() AS `price` AS DOUBLE)  CAST(boundreference() AS `ts` AS 
> BIGINT)  CAST(boundreference() AS `dt` AS STRING)]
> CodeBody is: {
> ......
> Caused by: java.lang.ClassCastException: 
> org.apache.spark.unsafe.types.UTF8String cannot be cast to 
> java.lang.IntegerCaused by: java.lang.ClassCastException: 
> org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Integer 
> at 
> org.apache.hudi.sql.payload.ExpressionPayloadEvaluator_366797ae_4c30_4862_8222_7be486ede4f8.eval(Unknown
>  Source) at 
> org.apache.spark.sql.hudi.command.payload.ExpressionPayload.org$apache$spark$sql$hudi$command$payload$ExpressionPayload$$evaluate(ExpressionPayload.scala:258)
>  ... 18 more{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to