rubenssoto opened a new issue #2463:
URL: https://github.com/apache/hudi/issues/2463


   Hello Guys,
   
   Im testing Hudi performance in my scenarios. So it is a table with 4gb of 
parquet files, I'm using 2 executors with 5 cores each and 32gb of memory. The 
operation is Upsert because I need deduplication and only one file of 200mb 
were rewritten.
   
   <img width="1680" alt="Captura de Tela 2021-01-19 às 22 57 10" 
src="https://user-images.githubusercontent.com/36298331/105116535-b81fd980-5aa9-11eb-9bfb-3e6d25f5813b.png";>
   
   The operation took 2.1 minutes and most of the time on Job: Getting small 
files from partitions, probably the job is written the parquet file to s3, but 
1.6 minutes written 200mb is not too much?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to