geserdugarov commented on PR #17827:
URL: https://github.com/apache/hudi/pull/17827#issuecomment-3753294494

   > Overall thought I have here is whether we simply make this behavior adapt 
dynamically i.e choose sorted-merge vs hash-merge based on whether the base, 
log files are sorted or not -- instead of introducing this as a separate 
layout. This will make it very flexible for users, and even make it possible 
for users to migrate existing tables easily.
   
   I totally support this proposal. The main question, though, is how to mark 
whether a file is sorted or not. For log files, we can add some corresponding 
flag in the metadata, and Parquet format also support it by adding in 
`keyValueMetaData` in `FileMetaData`:
   
https://github.com/apache/parquet-java/blob/aa41aa121ff29e360273675edc899ba1c2640047/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/metadata/FileMetaData.java#L52


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to