[jira] [Work logged] (BEAM-10917) Implement a BigQuery bounded source using the BigQuery storage API

ASF GitHub Bot (Jira) Thu, 05 Aug 2021 11:53:06 -0700


     [ 
https://issues.apache.org/jira/browse/BEAM-10917?focusedWorklogId=634731&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634731
 ]


ASF GitHub Bot logged work on BEAM-10917:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 05/Aug/21 18:52
            Start Date: 05/Aug/21 18:52
    Worklog Time Spent: 10m 
      Work Description: kmjung commented on pull request #15185:
URL: https://github.com/apache/beam/pull/15185#issuecomment-893699939


   > Out of curiosity, what's an average thoughtput of fetching data with BQ 
Storage API for us-central region? What would you consider an expectable 
benchmark numbers?
   
   The single-stream throughput for the storage API depends heavily on your 
schema width and the data format you're using, as well as some other factors, 
but with a ~50 column schema I would expect that the API should be capable of 
sending 40-50 MiB/s (~30k rows/second) per stream. For Java-based pipelines, 
usually the limiting factor is gRPC flow control -- pipelines usually can't 
process data as fast as the API streams it -- and I would expect the same to be 
the case here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 634731)
    Time Spent: 12.5h  (was: 12h 20m)

> Implement a BigQuery bounded source using the BigQuery storage API
> ------------------------------------------------------------------
>
>                 Key: BEAM-10917
>                 URL: https://issues.apache.org/jira/browse/BEAM-10917
>             Project: Beam
>          Issue Type: New Feature
>          Components: io-py-gcp
>            Reporter: Kenneth Jung
>            Assignee: Kanthi Subramanian
>            Priority: P3
>          Time Spent: 12.5h
>  Remaining Estimate: 0h
>
> The Java SDK contains a bounded source implementation which uses the BigQuery 
> storage API to read from BigQuery. We should implement the same for Python.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-10917) Implement a BigQuery bounded source using the BigQuery storage API

Reply via email to