ChiehFu commented on issue #10121: URL: https://github.com/apache/hudi/issues/10121#issuecomment-1817123517
@ad1happy2go I think overall we observed an increase of up to 100% in job duration of upserts jobs after upgraded to Hudi 0.12.3 and these is no significant change in data size of the upsert jobs. Below chart is the duration of 10-min incremental upsert jobs measured in minutes, and the upgrade was done on 11/11 from which point we started seeing increases in job durations.  Also just generally speaking, is it normal for the task of writing stage **"Doing partition and writing data (count at HoodieSparkSqlWriter.scala:721)"** to take up to 10 mins to write parquet file of size like 100MB -300MB into s3 storage? I wonder what else could contribute to the time taken in each of the task of that particular stage while writing parquet files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
