voonhous commented on code in PR #14092: URL: https://github.com/apache/hudi/pull/14092#discussion_r2432732988
########## rfc/rfc-80/rfc-80.md: ########## @@ -81,8 +81,8 @@ The bucket is divided into multiple columngroups by column cluster. When columng ### Proposed Storage Format Changes After splitting the fileGroup by columngroup, the naming rules for base files and log files change. We add the cfName suffix at the end of all file names to facilitate Hudi itself to distinguish column groups. If it's not present, we assume default column group. So, new file name templates will be as follows: -- Base file: [file_id]\_[write_token]\_[begin_time][_cfName].[extension] -- Log file: [file_id]\_[begin_instant_time][_cfName].log.[version]_[write_token] +- Base file: [file_id]\_[write_token]\_[begin_time][_cgName_cgSegment].[extension] +- Log file: [file_id]\_[begin_instant_time][_cgName_cgSegment].log.[version]_[write_token] Review Comment: Just noticed this, given that `begin_time` is all digits, if we stick to the convention of `begin_time` being at the end of the file, but before the `extension`. Since `begin_time` is a fixed length string, i feel it's more useful to put fixed length details before an extension. Reason being that one can just delimit by the period, then move forward **N** characters. I assume `cgName_cgSegment` is variable length, this might make extracting `begin_time` harder in the future where users have to fall back to regex instead of just using a few keyboard shortcuts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
