vinothchandar edited a comment on pull request #1922: URL: https://github.com/apache/hudi/pull/1922#issuecomment-669699937
@bschell the metadata columns are heavily compressible actually. except for the `_hoodie_record_key` - which again, given its derived using key generator (whose implementation can change) etc, it makes sense to pull out as a separate field. Here's how incremental query on COW uses `_hoodie_commit_time` , to precisely provide the updated record. During a merge/compaction we may rewrite a base parquet file with records written during different times. https://github.com/apache/hudi/blob/51ea27d665d8053895dd047ca85e3338b357a81d/hudi-spark/src/main/scala/org/apache/hudi/IncrementalRelation.scala#L121 Its actually a core design choice, that sets hudi apart from systems that only provide file level incremental changes, for updates. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
