[ 
https://issues.apache.org/jira/browse/BEAM-6183?focusedWorklogId=173872&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-173872
 ]

ASF GitHub Bot logged work on BEAM-6183:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 11/Dec/18 02:42
            Start Date: 11/Dec/18 02:42
    Worklog Time Spent: 10m 
      Work Description: ihji commented on issue #7212: [BEAM-6183] BigQuery 
insertAll API request rate is not properly controlled
URL: https://github.com/apache/beam/pull/7212#issuecomment-446051909
 
 
   I ran a quick benchmark for 4 different implementations. The benchmark 
program is a simple streaming pipeline that generates random integer numbers 
and inserts them into BigQuery table. The number of workers was 4 and running 
time was 20 minutes on DataflowRunner.
   
   ## Current master
   - writer wall time: 11 hr 0 min 56 sec
   - bytes_written: 1,281,740,388 bytes
   
   ##  Dynamic throttling on top of current master
   - writer wall time: 15 hr 50 min 35 sec
   - bytes_writtern: 1,119,466,932 bytes
   
   ## Shared backoff on top of current master
   - writer wall time: 8 hr 43 min 59 sec
   - bytes_written: 536,049,960 bytes
   
   ## Backoff all IOExceptions PR #7189 
   - writer wall time: 13 hr 34 min 5 sec
   - bytes_written: 1,374,636,336 bytes
   
   Some observations:
   - #7189 shows better performance than current master since there's no total 
failure of worker which delays a task for a whole bundle for at least 10 seconds
   - The best performance can be achieved by pushing the BigQuery backend to 
its limit (like the current implementation with no rate controlling). However, 
it generates lots of error messages every second and also creates unnecessary 
overheads on BigQuery backend
   - Dynamic throttling underperforms no throttling by 20 percent but generates 
near-zero error messages about exceeded rate limit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 173872)
    Time Spent: 0.5h  (was: 20m)

> BigQuery insertAll API request rate is not properly controlled
> --------------------------------------------------------------
>
>                 Key: BEAM-6183
>                 URL: https://issues.apache.org/jira/browse/BEAM-6183
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-gcp
>    Affects Versions: 2.8.0
>            Reporter: Heejong Lee
>            Assignee: Heejong Lee
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> BigQuery insertAll API request rate is not properly controlled so it produces 
> too many rate limit exceeded error messages in the worker log.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to