[
https://issues.apache.org/jira/browse/BEAM-7495?focusedWorklogId=289019&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-289019
]
ASF GitHub Bot logged work on BEAM-7495:
----------------------------------------
Author: ASF GitHub Bot
Created on: 05/Aug/19 15:59
Start Date: 05/Aug/19 15:59
Worklog Time Spent: 10m
Work Description: aryann commented on pull request #9156: [BEAM-7495] Add
fine-grained progress reporting
URL: https://github.com/apache/beam/pull/9156#discussion_r310676633
##########
File path:
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageStreamSource.java
##########
@@ -165,8 +165,13 @@ public String toString() {
private GenericRecord record;
private T current;
private long currentOffset;
+
+ // Values used for progress reporting.
private double fractionConsumed;
- private double fractionConsumedFromLastResponse;
+ private double fractionConsumedFromPreviousResponse;
+ private double fractionConsumedFromCurrentResponse;
+ private long rowsReadFromCurrentResponse;
Review comment:
I purposefully chose long variable names to avoid additional comments.
If you don't feel that the names are sufficiently self-describing, I'm happy
to add more comments. Please let me know!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 289019)
Time Spent: 9h 10m (was: 9h)
Remaining Estimate: 494h 50m (was: 495h)
> Add support for dynamic worker re-balancing when reading BigQuery data using
> Cloud Dataflow
> -------------------------------------------------------------------------------------------
>
> Key: BEAM-7495
> URL: https://issues.apache.org/jira/browse/BEAM-7495
> Project: Beam
> Issue Type: New Feature
> Components: io-java-gcp
> Reporter: Aryan Naraghi
> Assignee: Aryan Naraghi
> Priority: Major
> Original Estimate: 504h
> Time Spent: 9h 10m
> Remaining Estimate: 494h 50m
>
> Currently, the BigQuery connector for reading data using the BigQuery Storage
> API does not support any of the facilities on the source for Dataflow to
> split streams.
>
> On the server side, the BigQuery Storage API supports splitting streams at a
> fraction. By adding support to the connector, we enable Dataflow to split
> streams, which unlocks dynamic worker re-balancing.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)