onlywangyh commented on code in PR #5434:
URL: https://github.com/apache/hudi/pull/5434#discussion_r858555712


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/FilePathUtils.java:
##########
@@ -420,9 +420,9 @@ public static org.apache.flink.core.fs.Path 
toFlinkPath(Path path) {
    * @return array of the partition fields
    */
   public static String[] 
extractPartitionKeys(org.apache.flink.configuration.Configuration conf) {
-    if (FlinkOptions.isDefaultValueDefined(conf, 
FlinkOptions.PARTITION_PATH_FIELD)) {
+    if (FlinkOptions.isDefaultValueDefined(conf, 
FlinkOptions.HIVE_SYNC_PARTITION_FIELDS)) {
       return new String[0];
     }
-    return conf.getString(FlinkOptions.PARTITION_PATH_FIELD).split(",");
+    return conf.getString(FlinkOptions.HIVE_SYNC_PARTITION_FIELDS).split(",");
   }

Review Comment:
   In HiveSyncContext this PARTITION_PATH_FIELD assign to hive sync partition 
fields .I think these two params `PARTITION_PATH_FIELD` and  
`HIVE_SYNC_PARTITION_FIELDS` 
   have different meanings in hudi.
   `PARTITION_PATH_FIELD` is for hudi KeyGenerator to get a partitionPath
   `HIVE_SYNC_PARTITION_FIELDS`  is use for hive to set a partition field.
   
   This function _extractPartitionKeys_ should get the hive partition fields 
key rather than a hudi partition path field. Sometimes confuse the values ​​of 
the two will cause some errors
   
   
   In this case we use TimestampBasedAvroKeyGenerator and set hudi partition 
path field is same as hive partition fields .  There will be some promblems, 
see:
   `
   PARTITION_PATH_FIELD=datetime
   HIVE_SYNC_PARTITION_FIELDS=datetime
   `
   
   **In hudi:** we will get the _1596074902000L_ value and converted to a 
string hudi partition path like _2020-07-30_.  
   **In hive:** We will get the table like :
   ```
    CREATE EXTERNAL TABLE `testTable`(
      `_hoodie_commit_time` string COMMENT '',       
      `_hoodie_commit_seqno` string COMMENT '',          
      `_hoodie_record_key` string COMMENT '',
      `_hoodie_partition_path` string COMMENT '',
      `_hoodie_file_name` string COMMENT '',               
      `id` int COMMENT '',
      `datetime` bigint COMMENT ''        
      )
    PARTITIONED BY (`datetime` string COMMENT '')...
   ```
   This partition value _datetime=2020-07-30_  also will be add to hive.  We 
can't get the datetime value from this hive table,  and the table partition is 
also broken. This datetime value is conflicting
   
   When we set PARTITION PATH_FIELD value is different with HIVE_SYNC PARTITION 
FIELDS value.
   `
   PARTITION_PATH_FIELD=datetime
   HIVE_SYNC_PARTITION_FIELDS=inc_day
   `
   We can get this table like :
   ```
   CREATE EXTERNAL TABLE `testTable`(
      `_hoodie_commit_time` string COMMENT '',       
      `_hoodie_commit_seqno` string COMMENT '',          
      `_hoodie_record_key` string COMMENT '',
      `_hoodie_partition_path` string COMMENT '',
      `_hoodie_file_name` string COMMENT '',               
      `id` int COMMENT '',
      `datetime` bigint COMMENT ''        
      )
    PARTITIONED BY (`inc_day` string COMMENT '')...
   ```
   In this time we can normal get  the _datetime_ value, and the _inc_day_ as a 
partition field is also work.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to