leesf created HUDI-1003:
---------------------------

             Summary: Handle partitions when sync non-partitioned table to hive.
                 Key: HUDI-1003
                 URL: https://issues.apache.org/jira/browse/HUDI-1003
             Project: Apache Hudi
          Issue Type: Bug
          Components: Hive Integration
            Reporter: leesf
             Fix For: 0.6.0


When sync hudi non-parititioned table to hive with the following options:

*option("hoodie.datasource.hive_sync.enable", "true").*
option("hoodie.datasource.hive_sync.table", tableName).
option("hoodie.datasource.hive_sync.username", "root").
option("hoodie.datasource.hive_sync.password", "123456").
option("hoodie.datasource.hive_sync.jdbcurl", "jdbc:hive2://localhost:10000").
*option("hoodie.datasource.hive_sync.partition_fields", "region,country,city").*
option("hoodie.datasource.write.operation", writeOperation).
option("hoodie.datasource.write.table.type", tableType).
*option("hoodie.datasource.hive_sync.partition_extractor_class", 
"org.apache.hudi.hive.NonPartitionedExtractor")*

 

it will create the following tables:

CREATE EXTERNAL TABLE `hudi_trips_cow_hive_non_partitioned`(
 `_hoodie_commit_time` string,
 `_hoodie_commit_seqno` string,
 `_hoodie_record_key` string,
 `_hoodie_partition_path` string,
 `_hoodie_file_name` string,
 `age` bigint,
 `location` string,
 `name` string,
 `sex` string,
 `ts` bigint)
*PARTITIONED BY (*
 *`region` string,*
 *`country` string,*
 *`city` string)*
ROW FORMAT SERDE
 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
 'org.apache.hudi.hadoop.HoodieParquetInputFormat'
OUTPUTFORMAT
 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
 'file:/Users/sflee/personal/hudi_java_client_dataset'
TBLPROPERTIES (
 'last_commit_time_sync'='20200606200453',
 'transient_lastDdlTime'='1591445103')

 

but indeed it has no partition, and would not query any data using select * 
from  hudi_trips_cow_hive_non_partitioned.

so when user use *NonPartitionedExtractor and set* 
*hoodie.datasource.hive_sync.partition_fields to some fields,* 

we need throw exception or create proper create like below:**

CREATE EXTERNAL TABLE `hudi_trips_cow_hive_non_partitioned`(
 `_hoodie_commit_time` string,
 `_hoodie_commit_seqno` string,
 `_hoodie_record_key` string,
 `_hoodie_partition_path` string,
 `_hoodie_file_name` string,
 `age` bigint,
 `location` string,
 `name` string,
 `sex` string,
 `ts` bigint)
ROW FORMAT SERDE
 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
 'org.apache.hudi.hadoop.HoodieParquetInputFormat'
OUTPUTFORMAT
 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
 'file:/Users/sflee/personal/hudi_java_client_dataset'
TBLPROPERTIES (
 'last_commit_time_sync'='20200606201124',
 'transient_lastDdlTime'='1591445493')

 

*I am incline to create the table normally using correct sql.*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to