[GitHub] [beam] prodriguezdefino commented on pull request #24274: Support Avro GenericRecord as a valid format for StorageWrite API on BigQueryIO

via GitHub Tue, 31 Jan 2023 18:31:34 -0800


prodriguezdefino commented on PR #24274:
URL: https://github.com/apache/beam/pull/24274#issuecomment-1411367208


   Tested this change with 2 very similar pipelines which:
    * read from PS 
    * transform format into AVRO 
    * and then one of them directly go into BQ and the other one transform to 
Row using beam schema (current possible path with code in `master`, the other 
one would be to write TableRow), 
    * both pipelines process ~250MB/s  
   and as expected the difference in resource utilization is significant. 
   
   Using beam rows as BigQueryIO input format: 
   <img width="483" alt="Screenshot 2023-01-31 at 6 22 08 PM" 
src="https://user-images.githubusercontent.com/3438103/215930472-152134d4-130a-4307-bcdc-469d1d0ee482.png";>
   
   Using write GenericRecords as BigQueryIO input format: 
   <img width="460" alt="Screenshot 2023-01-31 at 6 22 24 PM" 
src="https://user-images.githubusercontent.com/3438103/215930566-af48f252-52c7-487c-94d5-5126ec4d7b0e.png";>
   
   The difference in vCPU utilization at similar runtime is 176 vCPU/hr vs 117 
vCPU/hr, more than 40% improvement when using GenericRecords. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] prodriguezdefino commented on pull request #24274: Support Avro GenericRecord as a valid format for StorageWrite API on BigQueryIO

Reply via email to