[
https://issues.apache.org/jira/browse/BEAM-6183?focusedWorklogId=173872&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-173872
]
ASF GitHub Bot logged work on BEAM-6183:
----------------------------------------
Author: ASF GitHub Bot
Created on: 11/Dec/18 02:42
Start Date: 11/Dec/18 02:42
Worklog Time Spent: 10m
Work Description: ihji commented on issue #7212: [BEAM-6183] BigQuery
insertAll API request rate is not properly controlled
URL: https://github.com/apache/beam/pull/7212#issuecomment-446051909
I ran a quick benchmark for 4 different implementations. The benchmark
program is a simple streaming pipeline that generates random integer numbers
and inserts them into BigQuery table. The number of workers was 4 and running
time was 20 minutes on DataflowRunner.
## Current master
- writer wall time: 11 hr 0 min 56 sec
- bytes_written: 1,281,740,388 bytes
## Dynamic throttling on top of current master
- writer wall time: 15 hr 50 min 35 sec
- bytes_writtern: 1,119,466,932 bytes
## Shared backoff on top of current master
- writer wall time: 8 hr 43 min 59 sec
- bytes_written: 536,049,960 bytes
## Backoff all IOExceptions PR #7189
- writer wall time: 13 hr 34 min 5 sec
- bytes_written: 1,374,636,336 bytes
Some observations:
- #7189 shows better performance than current master since there's no total
failure of worker which delays a task for a whole bundle for at least 10 seconds
- The best performance can be achieved by pushing the BigQuery backend to
its limit (like the current implementation with no rate controlling). However,
it generates lots of error messages every second and also creates unnecessary
overheads on BigQuery backend
- Dynamic throttling underperforms no throttling by 20 percent but generates
near-zero error messages about exceeded rate limit
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 173872)
Time Spent: 0.5h (was: 20m)
> BigQuery insertAll API request rate is not properly controlled
> --------------------------------------------------------------
>
> Key: BEAM-6183
> URL: https://issues.apache.org/jira/browse/BEAM-6183
> Project: Beam
> Issue Type: Improvement
> Components: io-java-gcp
> Affects Versions: 2.8.0
> Reporter: Heejong Lee
> Assignee: Heejong Lee
> Priority: Major
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> BigQuery insertAll API request rate is not properly controlled so it produces
> too many rate limit exceeded error messages in the worker log.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)