[jira] [Commented] (HUDI-1842) [SQL] Spark Sql Support For The Exists Hoodie Table

sivabalan narayanan (Jira) Mon, 02 Aug 2021 08:32:05 -0700


    [ 
https://issues.apache.org/jira/browse/HUDI-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391659#comment-17391659
 ]


sivabalan narayanan commented on HUDI-1842:
-------------------------------------------

Even with string formatted partition_path, guess the field name getting 
prefixed need to be fixed. 

 
{code:java}
select * from hudi_ny where trip_distance = 2.78;
20210802105420  20210802105420_2_4      2019-01-01 00:34:55     2019-01-01      
c5e6a617-dfc5-4051-8c1a-8daead3847af-0_2-37-62_20210802105420.parquet   2       
2019-01-01 00:34:55     2019-01-01 00:47:05     1       2.78    1       N       
48      239     1       12.0    0.5     0.5     1.7     0.0     0.3     15.0    
NULL    2019-01-01
20210802112531  20210802112531_1_4      tpep_pickup_datetime:2022-01-01 
00:34:55        date_col=2022-01-01     
16767b94-3a63-4c0c-9939-dd39be4bd27c-0_1-74-3465_20210802112531.parquet 1       
2022-01-01 00:34:55     2022-01-01 00:47:05     1       2.78    1       N       
48      239     12.0    0.5     0.5     1.7     0.0     0.3     15.0    NULL    
2022-01-01
20210802112531  20210802112531_2_3      tpep_pickup_datetime:2019-01-01 
00:34:55        date_col=2019-01-01     
8da20c31-296e-4321-800d-0c2b7ee3a82b-0_2-74-3466_20210802112531.parquet 3       
2019-01-01 00:34:55     2019-01-01 00:47:05     1       2.78    1       N       
48      239     12.0    0.5     0.5     1.7     0.0     0.3     30.0    NULL    
2019-01-01
20210802112531  20210802112531_0_2      tpep_pickup_datetime:2021-01-01 
00:34:55        date_col=2021-01-01     
c5c72f9e-9a63-48ca-a981-4302890f5210-0_0-68-3464_20210802112531.parquet 2       
2021-01-01 00:34:55     2021-01-01 00:47:05     1       2.78    1       N       
48      239     12.0    0.5     0.5     1.7     0.0     0.3     25.0    NULL    
2021-01-01
Time taken: 0.311 seconds, Fetched 4 row(s)
{code}
You can notice two entries as last col (partition path) value as 2019-01-01. 
But rows has same record key, but just that, one row was already in hudi table. 
and another was added during merge. Ideally it should have been an update and 
only one row should have been part of the result. 

 

> [SQL] Spark Sql Support For The Exists Hoodie Table
> ---------------------------------------------------
>
>                 Key: HUDI-1842
>                 URL: https://issues.apache.org/jira/browse/HUDI-1842
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: pengzhiwei
>            Priority: Blocker
>              Labels: release-blocker
>             Fix For: 0.9.0
>
>
> In order to support spark sql for hoodie, we persist some table properties to 
> the hoodie.properties. e.g. primaryKey, preCombineField, partition columns.  
> For the exists hoodie tables, these  properties are missing. We need do some 
> code in UpgradeDowngrade to support spark sql for the exists tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-1842) [SQL] Spark Sql Support For The Exists Hoodie Table

Reply via email to