danny0405 commented on PR #18384:
URL: https://github.com/apache/hudi/pull/18384#issuecomment-4195977844

   > Users who disable it to save storage lose incremental query capability 
(which requires _hoodie_commit_time). Fields like _hoodie_record_key, 
_hoodie_partition_path, and _hoodie_file_name can be virtualized and don't need 
physical storage.
   
   @prashantwason I totally agree with the pain points here but have some 
different thoughts around the solution:
   
   1. for losing incremental query capability, can we always populate the 
`_hoodie_commit_time` even when `populateMetadataFields` are  explicitly set up 
as false, and add a new config flag to allows populate selectively.
   2. for metadata fields that can be virtualized, can we deem it as a pure 
improvement, like always not to populate them in the write side, and always 
deduce on the reader side. so that in the future, we have a chance to totally 
get rid of them(from the table schema).
   
   In general, it seems we do not need that much flexibility to enable/disable 
population of each metadata fields in real production env.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to