[ 
https://issues.apache.org/jira/browse/BEAM-3042?focusedWorklogId=99786&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-99786
 ]

ASF GitHub Bot logged work on BEAM-3042:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 08/May/18 23:24
            Start Date: 08/May/18 23:24
    Worklog Time Spent: 10m 
      Work Description: pabloem opened a new pull request #5309: [BEAM-3042] 
Adding time tracking of batch side inputs
URL: https://github.com/apache/beam/pull/5309
 
 
   This PR improves Cython tags for some classes, and uses them for tracking of 
time spent reading side inputs.
   
   NOTE: This PR should add flag versioning before merging in any case.
   
   This has been benchmarked with the new 
[`sideinput_microbenchmark.py`](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/tools/sideinput_microbenchmark.py),
 and here are the results:
   
   Current performance with 500 runs:
   - Average runtime: 0.422656793594
   - Time per element: 2.64160495996e-05
   - Regression: 0% (it's the baseline)
   
   With change and flag deactivated:
   - Average runtime: 0.424214161396
   - Time per element: 2.65133850873e-05
   - Regression: 0.36%
   
   With change and flag activated:
   - Average runtime: 0.425546179771
   - Time per element: 2.65966362357e-05
   - Regression: 0.68%
   
   This represents a really small regression in a microbenchmark that 
specifically exercises this feature.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 99786)
    Time Spent: 5h  (was: 4h 50m)

> Add tracking of bytes read / time spent when reading side inputs
> ----------------------------------------------------------------
>
>                 Key: BEAM-3042
>                 URL: https://issues.apache.org/jira/browse/BEAM-3042
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>            Reporter: Pablo Estrada
>            Assignee: Pablo Estrada
>            Priority: Major
>          Time Spent: 5h
>  Remaining Estimate: 0h
>
> It is difficult for Dataflow users to understand how modifying a pipeline or 
> data set can affect how much inter-transform IO is used in their job. The 
> intent of this feature request is to help users understand how side inputs 
> behave when they are consumed.
> This will allow users to understand how much time and how much data their 
> pipeline uses to read/write to inter-transform IO. Users will also be able to 
> modify their pipelines and understand how their changes affect these IO 
> metrics.
> For further information, please review the internal Google doc 
> go/insights-transform-io-design-doc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to