Reo-LEI edited a comment on issue #3118: URL: https://github.com/apache/iceberg/issues/3118#issuecomment-920547971
Actually, in our production environment, we sync our mysql data to iceberg v2 by flink. I do some test of this optimization by spark rewrite action on these v2 table. And we found this optimization is more than 2 times faster to execute than before. caseId | dataFile number | dataFile + deleteFile number | deleteFile read parallelism | target file size | executor number | executor-core number | spark job task number | elapsed time(min) | task min elapsed time | task median elapsed time | task max elapsed time -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- 1 | 755 | 2585 | 1 | 100MB | 4 | 1 | 4 | 456 | 33s | 1.5m | 7.6h 2 | 755 | 2585 | 10 | 100MB | 4 | 1 | 4 | 168 | 26s | 37s | 2.8h -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
