[GitHub] [hudi] imrewang opened a new issue, #9614: [SUPPORT]No data displayed in hive synchronization partition table

via GitHub Mon, 04 Sep 2023 18:40:46 -0700


imrewang opened a new issue, #9614:
URL: https://github.com/apache/hudi/issues/9614


   1. When I synchronize the **partition table** to the hive table, must I 
manually add the **external table** and **partition** in Hive before I can 
query the data **?**
   
   2. Now I only add external tables for Hive **without manually adding 
partitions**, and I cannot view the synchronized data in the Hive table. **Is 
this normal?**
   
   **The behavior I expect** , manually create hive but without manually 
creating hive partitions, you can synchronize the partition table data in hudi
   
   - **Write data to hudi sql statement:**
   
   ```sql
   CREATE TABLE sink_to_hudi (
                    .....
        `pt` string,
        PRIMARY KEY (`XXXXX`) NOT enforced
   ) partitioned BY (pt) WITH (
        'connector' = 'hudi',
        'compaction.max_memory' = '1024',
        'write.task.max.size' = '2048',
        'write.merge.max_memory' = '1024',
        'index.bootstrap.enabled' = 'false',
        'path' = 'hdfs://XXX/sink_to_hudi',
        'write.tasks' = '1',
        'hive_sync.enable' = 'true',
        'hive_sync.mode' = 'hms',
        'hive_sync.metastore.uris' = 'thrift://xxxx:9083',
        'hive_sync.table' = 'xxxxxxxx',
        'hive_sync.db' = 'xxxxxx',
        'hive_sync.username' = '',
        'hive_sync.password' = ''
   )
   ```
   
   - **hive table creation statement:**
   
   ```sql
   
   CREATE TABLE `hive_table`(
               ......
   )
   COMMENT ''
   PARTITIONED BY ( 
     `pt` string)
   ROW FORMAT SERDE 
     'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
   WITH SERDEPROPERTIES ( 
     'field.delim'='', 
     'serialization.format'='') 
   STORED AS INPUTFORMAT 
     'org.apache.hudi.hadoop.HoodieParquetInputFormat' 
   OUTPUTFORMAT 
     'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
   LOCATION
     'hdfs://XXX/sink_to_hudi'
   TBLPROPERTIES (
     'transient_lastDdlTime'='1593844501');
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] imrewang opened a new issue, #9614: [SUPPORT]No data displayed in hive synchronization partition table

Reply via email to