[
https://issues.apache.org/jira/browse/BEAM-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288159#comment-16288159
]
ASF GitHub Bot commented on BEAM-3042:
--------------------------------------
pabloem opened a new pull request #4248: [BEAM-3042] Tracking of time spent
reading side inputs, and bytes read in Dataflow.
URL: https://github.com/apache/beam/pull/4248
This pull request adds changes the PrefetchingSourceReader used to fetch
side input data.
Specifically, this change helps it track how long was spent blocked waiting
to fetch side inputs; and for the Dataflow runner (which uses
NativeAvroSource), it also helps track how many bytes were read.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Add tracking of bytes read / time spent when reading side inputs
> ----------------------------------------------------------------
>
> Key: BEAM-3042
> URL: https://issues.apache.org/jira/browse/BEAM-3042
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Reporter: Pablo Estrada
> Assignee: Pablo Estrada
>
> It is difficult for Dataflow users to understand how modifying a pipeline or
> data set can affect how much inter-transform IO is used in their job. The
> intent of this feature request is to help users understand how side inputs
> behave when they are consumed.
> This will allow users to understand how much time and how much data their
> pipeline uses to read/write to inter-transform IO. Users will also be able to
> modify their pipelines and understand how their changes affect these IO
> metrics.
> For further information, please review the internal Google doc
> go/insights-transform-io-design-doc.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)