imrewang opened a new issue, #9614:
URL: https://github.com/apache/hudi/issues/9614
1. When I synchronize the **partition table** to the hive table, must I
manually add the **external table** and **partition** in Hive before I can
query the data **?**
2. Now I only add external tables for Hive **without manually adding
partitions**, and I cannot view the synchronized data in the Hive table. **Is
this normal?**
**The behavior I expect** , manually create hive but without manually
creating hive partitions, you can synchronize the partition table data in hudi
- **Write data to hudi sql statement:**
```sql
CREATE TABLE sink_to_hudi (
.....
`pt` string,
PRIMARY KEY (`XXXXX`) NOT enforced
) partitioned BY (pt) WITH (
'connector' = 'hudi',
'compaction.max_memory' = '1024',
'write.task.max.size' = '2048',
'write.merge.max_memory' = '1024',
'index.bootstrap.enabled' = 'false',
'path' = 'hdfs://XXX/sink_to_hudi',
'write.tasks' = '1',
'hive_sync.enable' = 'true',
'hive_sync.mode' = 'hms',
'hive_sync.metastore.uris' = 'thrift://xxxx:9083',
'hive_sync.table' = 'xxxxxxxx',
'hive_sync.db' = 'xxxxxx',
'hive_sync.username' = '',
'hive_sync.password' = ''
)
```
- **hive table creation statement:**
```sql
CREATE TABLE `hive_table`(
......
)
COMMENT ''
PARTITIONED BY (
`pt` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
'field.delim'='',
'serialization.format'='')
STORED AS INPUTFORMAT
'org.apache.hudi.hadoop.HoodieParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'hdfs://XXX/sink_to_hudi'
TBLPROPERTIES (
'transient_lastDdlTime'='1593844501');
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]