[GitHub] [hudi] bvaradar commented on issue #2252: Hudi has high S3 requests

GitBox Thu, 03 Dec 2020 19:07:13 -0800


bvaradar commented on issue #2252:
URL: https://github.com/apache/hudi/issues/2252#issuecomment-738533973



   Regarding the file sizes, As long as you are able to see the same rows, it 
is good :)
   The difference could be coming from compression. By default Hudi uses gzip 
compression whereas Spark is using snappy.  Gzip is expected to give better 
compression ratio.
   
   Regarding head requests, I think it is likely coming from defensive checks 
done to ensure partitions are created. Is it possible to observe the HEAD 
requests payload and see the which path is being queried ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] bvaradar commented on issue #2252: Hudi has high S3 requests

Reply via email to