[
https://issues.apache.org/jira/browse/BEAM-7495?focusedWorklogId=258205&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-258205
]
ASF GitHub Bot logged work on BEAM-7495:
----------------------------------------
Author: ASF GitHub Bot
Created on: 11/Jun/19 23:25
Start Date: 11/Jun/19 23:25
Worklog Time Spent: 10m
Work Description: aryann commented on pull request #8832: [BEAM-7495] Add
dynamic worker rebalancing to BigQuery Storage
URL: https://github.com/apache/beam/pull/8832
This change implements splitAtFraction for BigQueryStorageStreamReader,
which allows Google Cloud Dataflow to perform dynamic worker rebalancing when
reading data from BigQuery.
Still TODO:
- Implement progress reporting. I need to implement this on the server-side
first, which I will start doing this week. As-is, all splits are optimistically
performed at 0.5.
- Update the Google API libraries. Currently, I'm setting unknown fields in
several protocol buffer. I've added TODOs everywhere that this is done, and
will update them to use the real setters once the client libraries are updated.
------------------------
Thank you for your contribution! Follow this checklist to help us
incorporate your contribution quickly and easily:
- [ ] [**Choose
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA
issue, if applicable. This will automatically link the pull request to the
issue.
- [ ] If this contribution is large, please file an Apache [Individual
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
Post-Commit Tests Status (on master branch)
------------------------------------------------------------------------------------------------
Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
--- | --- | --- | --- | --- | --- | --- | ---
Go | [](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
| --- | --- | [](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
| --- | --- | [](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
Java | [](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
Python | [](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
| --- | [](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
<br> [](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/)
| --- | --- | [](https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/)
Pre-Commit Tests Status (on master branch)
------------------------------------------------------------------------------------------------
--- |Java | Python | Go | Website
--- | --- | --- | --- | ---
Non-portable | [](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/)
Portable | --- | [](https://builds.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/)
| --- | ---
See
[.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md)
for trigger phrase, status and link of all Jenkins jobs.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 258205)
Time Spent: 10m
Remaining Estimate: 503h 50m (was: 504h)
> Add support for dynamic worker re-balancing when reading BigQuery data using
> Cloud Dataflow
> -------------------------------------------------------------------------------------------
>
> Key: BEAM-7495
> URL: https://issues.apache.org/jira/browse/BEAM-7495
> Project: Beam
> Issue Type: New Feature
> Components: io-java-gcp
> Reporter: Aryan Naraghi
> Priority: Major
> Original Estimate: 504h
> Time Spent: 10m
> Remaining Estimate: 503h 50m
>
> Currently, the BigQuery connector for reading data using the BigQuery Storage
> API does not support any of the facilities on the source for Dataflow to
> split streams.
>
> On the server side, the BigQuery Storage API supports splitting streams at a
> fraction. By adding support to the connector, we enable Dataflow to split
> streams, which unlocks dynamic worker re-balancing.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)