[jira] [Work logged] (BEAM-7495) Add support for dynamic worker re-balancing when reading BigQuery data using Cloud Dataflow

ASF GitHub Bot (JIRA) Fri, 26 Jul 2019 08:18:04 -0700


     [ 
https://issues.apache.org/jira/browse/BEAM-7495?focusedWorklogId=283416&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-283416
 ]


ASF GitHub Bot logged work on BEAM-7495:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 26/Jul/19 15:17
            Start Date: 26/Jul/19 15:17
    Worklog Time Spent: 10m 
      Work Description: aryann commented on issue #9156: [BEAM-7495] Add 
fine-grained progress reporting
URL: https://github.com/apache/beam/pull/9156#issuecomment-515493741
 
 
   > You might want to look at how progress estimation is implemented for 
block-based sources:
   > 
   > 
https://github.com/apache/beam/blob/64c62b1e42a84746e7aa97dddfa1fce95919a6b2/sdks/java/core/src/main/java/org/apache/beam/sdk/io/BlockBasedSource.java
   > I don't see any correctness issues. My only question would be whether you 
might want to move the computation of the exact fractional position into 
getFractionConsumed so that it isn't (necessarily) done for every row.
   
   Thanks for the pointer to the other source. Our source doesn't have an 
isStarted or isDone. I don't think we should add those methods just to localize 
the computation in getFractionConsumed because that's more coordination that we 
have to get right. Additionally, we have to update a number of fields in 
readNextRecord, like num rows read from response, anyway, so I think the code 
is clearer if the computation of the fractional value is next to the code that 
updates those fields. Otherwise, you'd have to hop between isStarted, isDone, 
readNextRow, and getFractionConsumed to understand how the value is computed.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

            Worklog Id:     (was: 283416)
            Time Spent: 7h 20m  (was: 7h 10m)
    Remaining Estimate: 496h 40m  (was: 496h 50m)

> Add support for dynamic worker re-balancing when reading BigQuery data using 
> Cloud Dataflow
> -------------------------------------------------------------------------------------------
>
>                 Key: BEAM-7495
>                 URL: https://issues.apache.org/jira/browse/BEAM-7495
>             Project: Beam
>          Issue Type: New Feature
>          Components: io-java-gcp
>            Reporter: Aryan Naraghi
>            Assignee: Aryan Naraghi
>            Priority: Major
>   Original Estimate: 504h
>          Time Spent: 7h 20m
>  Remaining Estimate: 496h 40m
>
> Currently, the BigQuery connector for reading data using the BigQuery Storage 
> API does not support any of the facilities on the source for Dataflow to 
> split streams.
>  
> On the server side, the BigQuery Storage API supports splitting streams at a 
> fraction. By adding support to the connector, we enable Dataflow to split 
> streams, which unlocks dynamic worker re-balancing.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Work logged] (BEAM-7495) Add support for dynamic worker re-balancing when reading BigQuery data using Cloud Dataflow

Reply via email to