[jira] [Assigned] (HUDI-1003) Handle partitions correctly when sync non-partitioned table to hive.

Yajun Luo (Jira) Sun, 07 Jun 2020 18:16:12 -0700


     [ 
https://issues.apache.org/jira/browse/HUDI-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yajun Luo reassigned HUDI-1003:
-------------------------------

    Assignee: Yajun Luo

> Handle partitions correctly when sync non-partitioned table to hive.
> --------------------------------------------------------------------
>
>                 Key: HUDI-1003
>                 URL: https://issues.apache.org/jira/browse/HUDI-1003
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Hive Integration
>            Reporter: leesf
>            Assignee: Yajun Luo
>            Priority: Major
>              Labels: newbe, starter
>             Fix For: 0.6.0
>
>
> When sync hudi non-parititioned table to hive with the following options:
> *option("hoodie.datasource.hive_sync.enable", "true").*
> option("hoodie.datasource.hive_sync.table", tableName).
> option("hoodie.datasource.hive_sync.username", "root").
> option("hoodie.datasource.hive_sync.password", "123456").
> option("hoodie.datasource.hive_sync.jdbcurl", "jdbc:hive2://localhost:10000").
> *option("hoodie.datasource.hive_sync.partition_fields", 
> "region,country,city").*
> option("hoodie.datasource.write.operation", writeOperation).
> option("hoodie.datasource.write.table.type", tableType).
> *option("hoodie.datasource.hive_sync.partition_extractor_class", 
> "org.apache.hudi.hive.NonPartitionedExtractor")*
>  
> it will create the following tables:
> CREATE EXTERNAL TABLE `hudi_trips_cow_hive_non_partitioned`(
>  `_hoodie_commit_time` string,
>  `_hoodie_commit_seqno` string,
>  `_hoodie_record_key` string,
>  `_hoodie_partition_path` string,
>  `_hoodie_file_name` string,
>  `age` bigint,
>  `location` string,
>  `name` string,
>  `sex` string,
>  `ts` bigint)
> *PARTITIONED BY (*
>  *`region` string,*
>  *`country` string,*
>  *`city` string)*
> ROW FORMAT SERDE
>  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
> STORED AS INPUTFORMAT
>  'org.apache.hudi.hadoop.HoodieParquetInputFormat'
> OUTPUTFORMAT
>  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
> LOCATION
>  'file:/Users/sflee/personal/hudi_java_client_dataset'
> TBLPROPERTIES (
>  'last_commit_time_sync'='20200606200453',
>  'transient_lastDdlTime'='1591445103')
>  
> but indeed it has no partition, and would not query any data using select * 
> from  hudi_trips_cow_hive_non_partitioned.
> so when user use *NonPartitionedExtractor and set* 
> *hoodie.datasource.hive_sync.partition_fields to some fields,* 
> we need throw exception or create proper create like below:**
> CREATE EXTERNAL TABLE `hudi_trips_cow_hive_non_partitioned`(
>  `_hoodie_commit_time` string,
>  `_hoodie_commit_seqno` string,
>  `_hoodie_record_key` string,
>  `_hoodie_partition_path` string,
>  `_hoodie_file_name` string,
>  `age` bigint,
>  `location` string,
>  `name` string,
>  `sex` string,
>  `ts` bigint)
> ROW FORMAT SERDE
>  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
> STORED AS INPUTFORMAT
>  'org.apache.hudi.hadoop.HoodieParquetInputFormat'
> OUTPUTFORMAT
>  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
> LOCATION
>  'file:/Users/sflee/personal/hudi_java_client_dataset'
> TBLPROPERTIES (
>  'last_commit_time_sync'='20200606201124',
>  'transient_lastDdlTime'='1591445493')
>  
> *I am incline to create the table normally using correct sql.*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HUDI-1003) Handle partitions correctly when sync non-partitioned table to hive.

Reply via email to