imperio-wxm opened a new issue #828: Synchronizing to hive partition is incorrect URL: https://github.com/apache/incubator-hudi/issues/828 spark 2.4.0.cloudera1 hadoop 2.6.0-cdh5.11.1 hive 1.1.0-cdh5.11.1 hudi 0.4.7 > I select some data from hive table and wrote a new table with hudi then sync to hive. # My Code ```java Dataset<Row> hiveQuery = spark.sql("select timestamp,key,name,part_date from dw.xxxxx where part_date='2019-08-02' limit 10"); hiveQuery.write() .format("com.uber.hoodie") .option(DataSourceWriteOptions.HIVE_ASSUME_DATE_PARTITION_OPT_KEY(), true) .option(DataSourceWriteOptions.HIVE_URL_OPT_KEY(), "jdbc:hive2://xxxx:10000") .option(DataSourceWriteOptions.HIVE_DATABASE_OPT_KEY(), "dw") .option(DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY(), true) .option(DataSourceWriteOptions.HIVE_TABLE_OPT_KEY(), "hoodie_test") .option(DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY(), "part_date") .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(), "key") .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(), "part_date") .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY(), "timestamp") .option(HoodieWriteConfig.TABLE_NAME, "hoodie_test") .mode(SaveMode.Append) .save("/wxm/hudi/data/hoodie_test"); ``` **The job success however I found some problems with the hive partition in new table.** ## 1. The partition path is incorrect. If the data migration is through hive syntax `as select`, then the partition should be like this: ```java // insert overwrite table new_table partition(part_date) select xxx from old_table /wxm/hudi/data/hoodie_test/part_date=2019-08-02 ``` The path I am running with the code above is:/wxm/hudi/data/hoodie_test/2019-08-02 The hive partition should be in the form of key=value and hudi missing `part_date` field. ## 2. Hive table has no partition I use `show partitions table` not find any partition, I think if you set up a hive partition, you should add it automatically. This causes the query to have no data. ```java hive> show partitions xxx; OK Time taken: 0.317 seconds hive> select * from xxx limit 10; OK Time taken: 0.451 seconds ``` ## Manual operation query data Then I manually added the partition `alter table add partition(part_date='2019-08-02')` and moved the file generated by hudi to the partition `hadoop fs -cp /wxm/hudi/data/hoodie_test/2019-08-02/* /wxm/hudi/data/hoodie_test/part_date=2019-08-02/` I can select the data.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
