[GitHub] [hudi] vinothchandar edited a comment on pull request #1922: [HUDI-1152] Add option to skip syncing Hudi metadata columns

GitBox Wed, 05 Aug 2020 22:17:07 -0700


vinothchandar edited a comment on pull request #1922:
URL: https://github.com/apache/hudi/pull/1922#issuecomment-669699937



   @bschell the metadata columns are heavily compressible actually. except for 
the `_hoodie_record_key` - which again, given its derived using key generator 
(whose implementation can change) etc, it makes sense to pull out as a separate 
field. 
   
   Here's how incremental query on COW uses `_hoodie_commit_time` , to 
precisely provide the updated record. During a merge/compaction we may rewrite 
a base parquet file with records written during different times. 
   
   
https://github.com/apache/hudi/blob/51ea27d665d8053895dd047ca85e3338b357a81d/hudi-spark/src/main/scala/org/apache/hudi/IncrementalRelation.scala#L121
   
   Its actually a core design choice, that sets hudi apart from systems that 
only provide file level incremental changes, for updates.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] vinothchandar edited a comment on pull request #1922: [HUDI-1152] Add option to skip syncing Hudi metadata columns

Reply via email to