[GitHub] [hudi] jiangbiao910 opened a new issue, #6127: [SUPPORT] Upgrading to 0.11.1 resulting use sparksql and Sync Hive

GitBox Sun, 17 Jul 2022 18:34:28 -0700


jiangbiao910 opened a new issue, #6127:
URL: https://github.com/apache/hudi/issues/6127


   Hudi has recently released the latest version of Hudi, 0.11.1, which is 
pulled from Github and modified as follows for our Hadoop environment CDH6.3.2。
   Upgrading to 0.11.1 resulting use sparksql：
   `
   create table if not exists zone_test.hudi_spark_table0718_mor_0111
   (
       id                 string,
       brand_id           int,
       name         string,
       model_id   int,
       model_name string,
       etl_update_time    string,
       dt                 string,
       hh                 string
   ) using hudi
   options (
     type = 'mor',
     primaryKey = 'brand_id,vehicle_model_id',
     preCombineField = 'etl_update_time',
     hoodie.cleaner.commits.retained = '2',
     hoodie.compact.inline=true
   )
   partitioned by (dt,hh)
   ;
    
   insert into zone_test.hudi_spark_table0718_mor_0111 partition (dt,hh)
   select id,
          brand_id,
         name,
         model_id,
          model_name,
          CAST(current_timestamp AS string)  as etl_update_time,
   '20220718',
   '10'
   from zone_test.test_vehicle_status_2_hi
   ;
   `
   now I can see **3 tables** in the Hive
   
![image](https://user-images.githubusercontent.com/23710717/179433550-5bea5272-da73-4081-985e-1d7ab25857ff.png)
   But，there is **zero result** in the table 
**zone_test.hudi_spark_table0718_mor_0111** and  
**hudi_spark_table0718_mor_0111_ro 
    hudi_spark_table0718_mor_0111_rt** has **2 records**。this is a bug？
   
   However when I run this SQL:
   `set hoodie.datasource.hive_sync.skip_ro_suffix=true;
   create table if not exists zone_test.hudi_spark_table0718_mor_0111_skip
   (
       id                 string,
       brand_id           int,
       name         string,
       model_id   int,
       model_name string,
       etl_update_time    string,
       dt                 string,
       hh                 string
   ) using hudi
   options (
     type = 'mor',
     primaryKey = 'brand_id,vehicle_model_id',
     preCombineField = 'etl_update_time',
     hoodie.cleaner.commits.retained = '2',
     hoodie.compact.inline=true
   )
   partitioned by (dt,hh)
   ;
    
   insert into zone_test.hudi_spark_table0718_mor_0111_skip partition (dt,hh)
   select id,
          brand_id,
          name,
         model_id,
         model_name,
          CAST(current_timestamp AS string)  as etl_update_time,
   '20220715',
   '10'
   from zone_test.test_vehicle_status_2_hi
   ;
   `
   
![image](https://user-images.githubusercontent.com/23710717/179433798-f1b8fc93-f23b-4c4f-ac0e-f8d5d3169c66.png)
   
   now I can see 2 tables in the Hive,both  
   **hudi_spark_table0718_mor_0111_skip and  
hudi_spark_table0718_mor_0111_skip_rt** has **2recoreds**。
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] jiangbiao910 opened a new issue, #6127: [SUPPORT] Upgrading to 0.11.1 resulting use sparksql and Sync Hive

Reply via email to