leesf created HUDI-1003:
---------------------------
Summary: Handle partitions when sync non-partitioned table to hive.
Key: HUDI-1003
URL: https://issues.apache.org/jira/browse/HUDI-1003
Project: Apache Hudi
Issue Type: Bug
Components: Hive Integration
Reporter: leesf
Fix For: 0.6.0
When sync hudi non-parititioned table to hive with the following options:
*option("hoodie.datasource.hive_sync.enable", "true").*
option("hoodie.datasource.hive_sync.table", tableName).
option("hoodie.datasource.hive_sync.username", "root").
option("hoodie.datasource.hive_sync.password", "123456").
option("hoodie.datasource.hive_sync.jdbcurl", "jdbc:hive2://localhost:10000").
*option("hoodie.datasource.hive_sync.partition_fields", "region,country,city").*
option("hoodie.datasource.write.operation", writeOperation).
option("hoodie.datasource.write.table.type", tableType).
*option("hoodie.datasource.hive_sync.partition_extractor_class",
"org.apache.hudi.hive.NonPartitionedExtractor")*
it will create the following tables:
CREATE EXTERNAL TABLE `hudi_trips_cow_hive_non_partitioned`(
`_hoodie_commit_time` string,
`_hoodie_commit_seqno` string,
`_hoodie_record_key` string,
`_hoodie_partition_path` string,
`_hoodie_file_name` string,
`age` bigint,
`location` string,
`name` string,
`sex` string,
`ts` bigint)
*PARTITIONED BY (*
*`region` string,*
*`country` string,*
*`city` string)*
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hudi.hadoop.HoodieParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'file:/Users/sflee/personal/hudi_java_client_dataset'
TBLPROPERTIES (
'last_commit_time_sync'='20200606200453',
'transient_lastDdlTime'='1591445103')
but indeed it has no partition, and would not query any data using select *
from hudi_trips_cow_hive_non_partitioned.
so when user use *NonPartitionedExtractor and set*
*hoodie.datasource.hive_sync.partition_fields to some fields,*
we need throw exception or create proper create like below:**
CREATE EXTERNAL TABLE `hudi_trips_cow_hive_non_partitioned`(
`_hoodie_commit_time` string,
`_hoodie_commit_seqno` string,
`_hoodie_record_key` string,
`_hoodie_partition_path` string,
`_hoodie_file_name` string,
`age` bigint,
`location` string,
`name` string,
`sex` string,
`ts` bigint)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hudi.hadoop.HoodieParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'file:/Users/sflee/personal/hudi_java_client_dataset'
TBLPROPERTIES (
'last_commit_time_sync'='20200606201124',
'transient_lastDdlTime'='1591445493')
*I am incline to create the table normally using correct sql.*
--
This message was sent by Atlassian Jira
(v8.3.4#803005)