[
https://issues.apache.org/jira/browse/HUDI-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yajun Luo reassigned HUDI-1003:
-------------------------------
Assignee: Yajun Luo
> Handle partitions correctly when sync non-partitioned table to hive.
> --------------------------------------------------------------------
>
> Key: HUDI-1003
> URL: https://issues.apache.org/jira/browse/HUDI-1003
> Project: Apache Hudi
> Issue Type: Bug
> Components: Hive Integration
> Reporter: leesf
> Assignee: Yajun Luo
> Priority: Major
> Labels: newbe, starter
> Fix For: 0.6.0
>
>
> When sync hudi non-parititioned table to hive with the following options:
> *option("hoodie.datasource.hive_sync.enable", "true").*
> option("hoodie.datasource.hive_sync.table", tableName).
> option("hoodie.datasource.hive_sync.username", "root").
> option("hoodie.datasource.hive_sync.password", "123456").
> option("hoodie.datasource.hive_sync.jdbcurl", "jdbc:hive2://localhost:10000").
> *option("hoodie.datasource.hive_sync.partition_fields",
> "region,country,city").*
> option("hoodie.datasource.write.operation", writeOperation).
> option("hoodie.datasource.write.table.type", tableType).
> *option("hoodie.datasource.hive_sync.partition_extractor_class",
> "org.apache.hudi.hive.NonPartitionedExtractor")*
>
> it will create the following tables:
> CREATE EXTERNAL TABLE `hudi_trips_cow_hive_non_partitioned`(
> `_hoodie_commit_time` string,
> `_hoodie_commit_seqno` string,
> `_hoodie_record_key` string,
> `_hoodie_partition_path` string,
> `_hoodie_file_name` string,
> `age` bigint,
> `location` string,
> `name` string,
> `sex` string,
> `ts` bigint)
> *PARTITIONED BY (*
> *`region` string,*
> *`country` string,*
> *`city` string)*
> ROW FORMAT SERDE
> 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
> STORED AS INPUTFORMAT
> 'org.apache.hudi.hadoop.HoodieParquetInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
> LOCATION
> 'file:/Users/sflee/personal/hudi_java_client_dataset'
> TBLPROPERTIES (
> 'last_commit_time_sync'='20200606200453',
> 'transient_lastDdlTime'='1591445103')
>
> but indeed it has no partition, and would not query any data using select *
> from hudi_trips_cow_hive_non_partitioned.
> so when user use *NonPartitionedExtractor and set*
> *hoodie.datasource.hive_sync.partition_fields to some fields,*
> we need throw exception or create proper create like below:**
> CREATE EXTERNAL TABLE `hudi_trips_cow_hive_non_partitioned`(
> `_hoodie_commit_time` string,
> `_hoodie_commit_seqno` string,
> `_hoodie_record_key` string,
> `_hoodie_partition_path` string,
> `_hoodie_file_name` string,
> `age` bigint,
> `location` string,
> `name` string,
> `sex` string,
> `ts` bigint)
> ROW FORMAT SERDE
> 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
> STORED AS INPUTFORMAT
> 'org.apache.hudi.hadoop.HoodieParquetInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
> LOCATION
> 'file:/Users/sflee/personal/hudi_java_client_dataset'
> TBLPROPERTIES (
> 'last_commit_time_sync'='20200606201124',
> 'transient_lastDdlTime'='1591445493')
>
> *I am incline to create the table normally using correct sql.*
--
This message was sent by Atlassian Jira
(v8.3.4#803005)