[GitHub] [iceberg] Reo-LEI edited a comment on issue #3118: Read delete files in parallel.

GitBox Wed, 15 Sep 2021 20:32:15 -0700


Reo-LEI edited a comment on issue #3118:
URL: https://github.com/apache/iceberg/issues/3118#issuecomment-920547971



   Actually, in our production environment, we sync our mysql data to iceberg 
v2 by flink. I do some test of this optimization by spark rewrite action on 
these v2 table. And we found this optimization is more than 2 times faster to 
execute than before.
   
   caseId | dataFile number | dataFile + deleteFile number | deleteFile read 
parallelism | target file size | executor number | executor-core number | spark 
job task number | elapsed time(min) | task min elapsed time  | task median 
elapsed time | task max elapsed time
   -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- 
   1 | 755 | 2585 | 1 | 100MB | 4 | 1 | 4 | 456 | 33s | 1.5m | 7.6h
   2 | 755 | 2585 | 10 | 100MB | 4 | 1 | 4 | 168 | 26s | 37s | 2.8h


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] Reo-LEI edited a comment on issue #3118: Read delete files in parallel.

Reply via email to