geserdugarov commented on PR #17827: URL: https://github.com/apache/hudi/pull/17827#issuecomment-3753294494
> Overall thought I have here is whether we simply make this behavior adapt dynamically i.e choose sorted-merge vs hash-merge based on whether the base, log files are sorted or not -- instead of introducing this as a separate layout. This will make it very flexible for users, and even make it possible for users to migrate existing tables easily. I totally support this proposal. The main question, though, is how to mark whether a file is sorted or not. For log files, we can add some corresponding flag in the metadata, and Parquet format also support it by adding in `keyValueMetaData` in `FileMetaData`: https://github.com/apache/parquet-java/blob/aa41aa121ff29e360273675edc899ba1c2640047/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/metadata/FileMetaData.java#L52 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
