voonhous commented on code in PR #14092:
URL: https://github.com/apache/hudi/pull/14092#discussion_r2432732988


##########
rfc/rfc-80/rfc-80.md:
##########
@@ -81,8 +81,8 @@ The bucket is divided into multiple columngroups by column 
cluster. When columng
 ### Proposed Storage Format Changes
 After splitting the fileGroup by columngroup, the naming rules for base files 
and log files change. We add the cfName suffix at the end of all file names to 
facilitate Hudi itself to distinguish column groups. If it's not present, we 
assume default column group.
 So, new file name templates will be as follows:  
-- Base file: [file_id]\_[write_token]\_[begin_time][_cfName].[extension]  
-- Log file: 
[file_id]\_[begin_instant_time][_cfName].log.[version]_[write_token]  
+- Base file: 
[file_id]\_[write_token]\_[begin_time][_cgName_cgSegment].[extension]  
+- Log file: 
[file_id]\_[begin_instant_time][_cgName_cgSegment].log.[version]_[write_token]  

Review Comment:
   Just noticed this, given that `begin_time` is all digits, if we stick to the 
convention of `begin_time` being at the end of the file, but before the 
`extension`. 
   
   Since `begin_time` is a fixed length string, i feel it's more useful to put 
fixed length details before an extension. Reason being that one can just 
delimit by the period, then move forward **N** characters. 
   
   I assume `cgName_cgSegment` is variable length, this might make extracting 
`begin_time` harder in the future where users have to fall back to regex 
instead of just using a few keyboard shortcuts.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to