onlywangyh commented on code in PR #5434:
URL: https://github.com/apache/hudi/pull/5434#discussion_r858555712
##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/FilePathUtils.java:
##########
@@ -420,9 +420,9 @@ public static org.apache.flink.core.fs.Path
toFlinkPath(Path path) {
* @return array of the partition fields
*/
public static String[]
extractPartitionKeys(org.apache.flink.configuration.Configuration conf) {
- if (FlinkOptions.isDefaultValueDefined(conf,
FlinkOptions.PARTITION_PATH_FIELD)) {
+ if (FlinkOptions.isDefaultValueDefined(conf,
FlinkOptions.HIVE_SYNC_PARTITION_FIELDS)) {
return new String[0];
}
- return conf.getString(FlinkOptions.PARTITION_PATH_FIELD).split(",");
+ return conf.getString(FlinkOptions.HIVE_SYNC_PARTITION_FIELDS).split(",");
}
Review Comment:
In HiveSyncContext this PARTITION_PATH_FIELD assign to hive sync partition
fields .I think these two params `PARTITION_PATH_FIELD` and
`HIVE_SYNC_PARTITION_FIELDS`
have different meanings in hudi.
`PARTITION_PATH_FIELD` is for hudi KeyGenerator to get a partitionPath
`HIVE_SYNC_PARTITION_FIELDS` is use for hive to set a partition field.
This function _extractPartitionKeys_ should get the hive partition fields
key rather than a hudi partition path field. Sometimes confuse the values
will cause some errors
In this case we use TimestampBasedAvroKeyGenerator and set hudi partition
path field is same as hive partition fields . There will be some promblems,
see:
`
PARTITION_PATH_FIELD=datetime
HIVE_SYNC_PARTITION_FIELDS=datetime
`
**In hudi:** we will get the _1596074902000L_ value and converted to a
string hudi partition path like _2020-07-30_.
**In hive:** We will get the table like :
```
CREATE EXTERNAL TABLE `testTable`(
`_hoodie_commit_time` string COMMENT '',
`_hoodie_commit_seqno` string COMMENT '',
`_hoodie_record_key` string COMMENT '',
`_hoodie_partition_path` string COMMENT '',
`_hoodie_file_name` string COMMENT '',
`id` int COMMENT '',
`datetime` bigint COMMENT ''
)
PARTITIONED BY (`datetime` string COMMENT '')...
```
This partition value _datetime=2020-07-30_ also will be add to hive. We
can't get the datetime value from this hive table, and the table partition is
also broken. This datetime value is conflicting
When we set PARTITION PATH_FIELD value is different with HIVE_SYNC PARTITION
FIELDS value like this.
`
PARTITION_PATH_FIELD=datetime
HIVE_SYNC_PARTITION_FIELDS=inc_day
`
We can get this table like :
```
CREATE EXTERNAL TABLE `testTable`(
`_hoodie_commit_time` string COMMENT '',
`_hoodie_commit_seqno` string COMMENT '',
`_hoodie_record_key` string COMMENT '',
`_hoodie_partition_path` string COMMENT '',
`_hoodie_file_name` string COMMENT '',
`id` int COMMENT '',
`datetime` bigint COMMENT ''
)
PARTITIONED BY (`inc_day` string COMMENT '')...
```
In this time we can normal get the _datetime_ value, and the _inc_day_ as a
partition field is also work.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]