onlywangyh commented on issue #5394:
URL: https://github.com/apache/hudi/issues/5394#issuecomment-1108132152

   If i keep the same params like `--partition-path-field=timestamp16, 
--hive-sync-partition-fields=timestamp16`. There will be some question:
   1、In the schema the _timestamp16_ is a bigint type. When we use 
_timestamp16_ as a partition field. It will be a string type in hive schema. 
The bigint type can't convert to a string. So that `select timestamp16 from 
testTable;` will also return null.
   2、In KeyGenerator we use the _PARTITIONPATH_FIELD_NAME_  to get a partition 
path, we use the _HIVE_SYNC_PARTITION_FIELDS_  as a partition field sync to 
hive . These two params will be good when the field is string type . But The 
TimestampBasedAvroKeyGenerator relies on timestamps for the partition field. 
The field values are interpreted as timestamps and not just converted to string 
while generating partition path value for records . So when use the 
TimestampBasedAvroKeyGenerator we will get a string partition path like 
`2020-07-30` . I think the 
_PARTITIONPATH_FIELD_NAME、HIVE_SYNC_PARTITION_FIELDS_   should diff to avoid 
the origin partition path field  as a hive partition field cause some loss of 
precision、converted err
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to