[GitHub] [hudi] KarthickAN commented on issue #2066: [SUPPORT] Hudi is increasing the storage size big time

GitBox Sat, 05 Sep 2020 09:42:08 -0700


KarthickAN commented on issue #2066:
URL: https://github.com/apache/hudi/issues/2066#issuecomment-687633973



   @bvaradar Thank you for responding. We are using those 5 fields because 
that's how we can identify a unique record in our dataset. I did inspect the 
parquet file produced by hudi and I can see hoodie recordkey field takes up 
most of the memory while the other hudi meta data fields adds up to the memory. 
   
   Total Size of the file is 700MB out of which actual data is around 250MB and 
the rest of the memory goes for hudi meta data. Especially the hoodie record 
key itself amounts to 350MB in our case. 
   
   I did notice the community emails around the record_key being virtual. When 
can we expect that to be released ? Meanwhile is there any workaround for this 
issue ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] KarthickAN commented on issue #2066: [SUPPORT] Hudi is increasing the storage size big time

Reply via email to