rubenssoto opened a new issue #2463: URL: https://github.com/apache/hudi/issues/2463
Hello Guys, Im testing Hudi performance in my scenarios. So it is a table with 4gb of parquet files, I'm using 2 executors with 5 cores each and 32gb of memory. The operation is Upsert because I need deduplication and only one file of 200mb were rewritten. <img width="1680" alt="Captura de Tela 2021-01-19 às 22 57 10" src="https://user-images.githubusercontent.com/36298331/105116535-b81fd980-5aa9-11eb-9bfb-3e6d25f5813b.png"> The operation took 2.1 minutes and most of the time on Job: Getting small files from partitions, probably the job is written the parquet file to s3, but 1.6 minutes written 200mb is not too much? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
