bvaradar commented on issue #2252: URL: https://github.com/apache/hudi/issues/2252#issuecomment-738533973
Regarding the file sizes, As long as you are able to see the same rows, it is good :) The difference could be coming from compression. By default Hudi uses gzip compression whereas Spark is using snappy. Gzip is expected to give better compression ratio. Regarding head requests, I think it is likely coming from defensive checks done to ensure partitions are created. Is it possible to observe the HEAD requests payload and see the which path is being queried ? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
