bvaradar commented on issue #2252: URL: https://github.com/apache/hudi/issues/2252#issuecomment-728221544
As you are seeing delete requests, Can you check if you are seeing failures or cleaning is kicking in which is inflating the number of S3 requests. Can you list your .hoodie folder to see if you have .rollback or .clean files ? For comparing apples to apples : You would have to discount them as these are additional functionality that Hudi provides in addition to parquet dataset. W.r.t insert vs bulk-insert PUT and HEAD requests, Also check how many number of files got created in Hudi vs parquet dataset. You might want to tune parallelism and configure file sizing https://cwiki.apache.org/confluence/display/HUDI/FAQ#FAQ-HowdoItoavoidcreatingtonsofsmallfiles Hudi uses optimistic approach for failure handling and prevents writing to tmp folder and recopying which performs badly. For this, it keeps additional marker files which are tracked and deleted as part of the commit process. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
