[jira] [Work logged] (BEAM-7495) Add support for dynamic worker re-balancing when reading BigQuery data using Cloud Dataflow

ASF GitHub Bot (JIRA) Mon, 05 Aug 2019 06:21:39 -0700


     [ 
https://issues.apache.org/jira/browse/BEAM-7495?focusedWorklogId=288933&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-288933
 ]


ASF GitHub Bot logged work on BEAM-7495:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 05/Aug/19 13:20
            Start Date: 05/Aug/19 13:20
    Worklog Time Spent: 10m 
      Work Description: chamikaramj commented on pull request #9156: 
[BEAM-7495] Add fine-grained progress reporting
URL: https://github.com/apache/beam/pull/9156#discussion_r310599544
 
 

 ##########
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageStreamSource.java
 ##########
 @@ -210,20 +218,28 @@ private synchronized boolean readNextRecord() throws 
IOException {
           return false;
         }
 
-        // N.B.: For simplicity, we update fractionConsumed once a new 
response is fetched, not
-        // when we reach the end of the current response. In practice, this 
choice is not
-        // consequential.
-        fractionConsumed = fractionConsumedFromLastResponse;
+        fractionConsumedFromPreviousResponse = 
fractionConsumedFromCurrentResponse;
         ReadRowsResponse nextResponse = responseIterator.next();
         decoder =
             DecoderFactory.get()
                 .binaryDecoder(
                     
nextResponse.getAvroRows().getSerializedBinaryRows().toByteArray(), decoder);
-        fractionConsumedFromLastResponse = getFractionConsumed(nextResponse);
+        rowsReadFromCurrentResponse = 0L;
+        rowCountFromCurrentResponse = nextResponse.getAvroRows().getRowCount();
+        fractionConsumedFromCurrentResponse = 
getFractionConsumed(nextResponse);
 
 Review comment:
   I think we need to adjust variable names here. Current naming makes it 
pretty confusing. Should this "nextResponse" be named "currentResponse" since 
it result in "fractionConsumedFromCurrentResponse" ?
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 288933)

> Add support for dynamic worker re-balancing when reading BigQuery data using 
> Cloud Dataflow
> -------------------------------------------------------------------------------------------
>
>                 Key: BEAM-7495
>                 URL: https://issues.apache.org/jira/browse/BEAM-7495
>             Project: Beam
>          Issue Type: New Feature
>          Components: io-java-gcp
>            Reporter: Aryan Naraghi
>            Assignee: Aryan Naraghi
>            Priority: Major
>   Original Estimate: 504h
>          Time Spent: 7h 50m
>  Remaining Estimate: 496h 10m
>
> Currently, the BigQuery connector for reading data using the BigQuery Storage 
> API does not support any of the facilities on the source for Dataflow to 
> split streams.
>  
> On the server side, the BigQuery Storage API supports splitting streams at a 
> fraction. By adding support to the connector, we enable Dataflow to split 
> streams, which unlocks dynamic worker re-balancing.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Work logged] (BEAM-7495) Add support for dynamic worker re-balancing when reading BigQuery data using Cloud Dataflow

Reply via email to