Abacn commented on PR #26746: URL: https://github.com/apache/beam/pull/26746#issuecomment-1553459336
Did some test, reading from tpcds_1T.web_sales: - on master, runner v1 11562 logs searching for pattern gs://.../temp/BigQueryExtractTemp/42fa897c43704ed7998868ebf83e9198/ - this branch, runner v1 5781 logs for gs://.../temp/BigQueryExtractTemp/ebd9c20b02f142439a837fbf99d72e25/ basically each file is matched only 3 times (2 during split, 1 before delete) That is a decrease by half (meaning making half of the match request) - on master, runner v2 3908 logs searching for pattern "gs://.../temp/BigQueryExtractTemp/8568e876f4404544a69b996c5ff47ca4/" basically each file is matched only 2 times (during split, before delete) Tested on 10 workers (n1s1); runner v1 has 520k record/sec throughput, while runner v2 has 600k record/sec throughput. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
