Abacn commented on PR #26746:
URL: https://github.com/apache/beam/pull/26746#issuecomment-1553459336

   Did some test, reading from tpcds_1T.web_sales:
   
   - on master, runner v1
   
   11562 logs searching for pattern 
gs://.../temp/BigQueryExtractTemp/42fa897c43704ed7998868ebf83e9198/ 
   
   - this branch, runner v1
   
   5781 logs for  
gs://.../temp/BigQueryExtractTemp/ebd9c20b02f142439a837fbf99d72e25/ basically 
each file is matched only 3 times (2 during split, 1 before delete)
   
   That is a decrease by half (meaning making half of the match request)
   
   - on master, runner v2
   
   3908 logs searching for pattern 
"gs://.../temp/BigQueryExtractTemp/8568e876f4404544a69b996c5ff47ca4/" basically 
each file is matched only 2 times (during split, before delete)
   
   Tested on 10 workers (n1s1); runner v1 has 520k record/sec throughput, while 
runner v2 has 600k record/sec throughput.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to