danny0405 commented on PR #18384: URL: https://github.com/apache/hudi/pull/18384#issuecomment-4195977844
> Users who disable it to save storage lose incremental query capability (which requires _hoodie_commit_time). Fields like _hoodie_record_key, _hoodie_partition_path, and _hoodie_file_name can be virtualized and don't need physical storage. @prashantwason I totally agree with the pain points here but have some different thoughts around the solution: 1. for losing incremental query capability, can we always populate the `_hoodie_commit_time` even when `populateMetadataFields` are explicitly set up as false, and add a new config flag to allows populate selectively. 2. for metadata fields that can be virtualized, can we deem it as a pure improvement, like always not to populate them in the write side, and always deduce on the reader side. so that in the future, we have a chance to totally get rid of them(from the table schema). In general, it seems we do not need that much flexibility to enable/disable population of each metadata fields in real production env. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
