[
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=634830&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634830
]
ASF GitHub Bot logged work on BEAM-6374:
----------------------------------------
Author: ASF GitHub Bot
Created on: 05/Aug/21 22:03
Start Date: 05/Aug/21 22:03
Worklog Time Spent: 10m
Work Description: lostluck opened a new pull request #15289:
URL: https://github.com/apache/beam/pull/15289
Restoring https://github.com/apache/beam/pull/10942 to narrow down where the
post submits failed previously.
-------
This adds PCollection metrics to the Go SDK, in particular, Element Count,
and Sampled Size.
New exec.PCollection nodes are added between every processing node in the
bundle execution graph.
* The new metrics are only added as MonitoringInfos, not the legacy protos.
* There's about ~10ns added per element per PCollection node due to the
atomic additions for every element.
* Elements for sizes are selected randomly, then encoded to count their
bytes (w/o window headers).
* An initial index is selected form the first [0,1,2] at bundle start up,
and then pre-select the next index from somewhere later on, proportional to the
bundle so far.
* As currently set up, it will take around 200-300 samples for the first
1M elements, so encoded overhead is limited
* PCollections from a DataSource do 100% "sampling", since they're reading
the bytes directly anyway. The PCollection node that would have been added
after the DataSource is elided from the graph during construction, but re-used
to avoid duplicating the logic for concurrently manipulating the size
distribution.
* DataSources can properly handle CoGBKs as well, counting non-header
bytes for iterables, and state backed iterables.
* This still involves a mutex Lock for every update, so we may want to
find a lighter weight mechanism to handle the distribution samples from
DataSources, or simply opt for the same random sampling.
* A similar method could be used for DataSinks as well, but not handled in
this PR.
* It's important to note that the runner is already aware of the number of
bytes sent and received from the SDK side, so we may opt to remove that this
entirely.
* Counts and Samples are yet not made for SideInputs, which would better
account for data consumed by DoFns.
Thank you @ajamato for reminding me of the pre-select method for sampling,
and @lukecwik for pointing out the DataSource can avoid separate additional
encoding costs when measuring elements.
Performance impact:
I have two jobs I use for benchmarking this: Pipeline A uses int64s as
elements and does simple passthroughs and sums, and Pipeline B where it's using
large protocol buffers as elements, which spends a fair amount of CPU time
decoding them.
For small "fast" elements, the overhead is about ~19.5% of the Go side
processing (which makes sense if elements are just being passed around or
incremented).
For large "heavy" elements, the overhead is about ~0.125% of the Go side of
processing.
Specifically, this is only taking into account the Go SDK worker, and not
any runner side costs. This feels acceptable for the time being, though it's
possible we can improve this later, especially for "lighter" jobs.
------------------------
Thank you for your contribution! Follow this checklist to help us
incorporate your contribution quickly and easily:
- [ ] [**Choose
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA
issue, if applicable. This will automatically link the pull request to the
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
See the [Contributor Guide](https://beam.apache.org/contribute) for more
tips on [how to make review process
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
`ValidatesRunner` compliance status (on master branch)
--------------------------------------------------------
<table>
<thead>
<tr>
<th>Lang</th>
<th>ULR</th>
<th>Dataflow</th>
<th>Flink</th>
<th>Samza</th>
<th>Spark</th>
<th>Twister2</th>
</tr>
</thead>
<tbody>
<tr>
<td>Go</td>
<td>---</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon">
</a>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Samza/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Samza/lastCompletedBuild/badge/icon">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon">
</a>
</td>
<td>---</td>
</tr>
<tr>
<td>Java</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_ULR/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_ULR/lastCompletedBuild/badge/icon">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon?subject=V1">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Streaming/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Streaming/lastCompletedBuild/badge/icon?subject=V1+Streaming">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon?subject=V1+Java+11">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2/lastCompletedBuild/badge/icon?subject=V2">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2_Streaming/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2_Streaming/lastCompletedBuild/badge/icon?subject=V2+Streaming">
</a><br>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon?subject=Java+8">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon?subject=Java+11">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon?subject=Portable">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon?subject=Portable+Streaming">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Samza/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Samza/lastCompletedBuild/badge/icon?subject=Portable">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon?subject=Portable">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon?subject=Structured+Streaming">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon">
</a>
</td>
</tr>
<tr>
<td>Python</td>
<td>---</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon?subject=V1">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon?subject=V2">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon?subject=ValCont">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon?subject=Portable">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Samza/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Samza/lastCompletedBuild/badge/icon">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon">
</a>
</td>
<td>---</td>
</tr>
<tr>
<td>XLang</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/badge/icon">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Dataflow/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Dataflow/lastCompletedBuild/badge/icon">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/badge/icon">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Samza/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Samza/lastCompletedBuild/badge/icon">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/badge/icon">
</a>
</td>
<td>---</td>
</tr>
</tbody>
</table>
Examples testing status on various runners
--------------------------------------------------------
<table>
<thead>
<tr>
<th>Lang</th>
<th>ULR</th>
<th>Dataflow</th>
<th>Flink</th>
<th>Samza</th>
<th>Spark</th>
<th>Twister2</th>
</tr>
</thead>
<tbody>
<tr>
<td>Go</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
</tr>
<tr>
<td>Java</td>
<td>---</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/lastCompletedBuild/badge/icon?subject=V1">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Java11_Cron/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Java11_Cron/lastCompletedBuild/badge/icon?subject=V1+Java11">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java_Examples_Dataflow_V2/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java_Examples_Dataflow_V2/lastCompletedBuild/badge/icon?subject=V2">
</a><br>
</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
</tr>
<tr>
<td>Python</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
</tr>
<tr>
<td>XLang</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
</tr>
</tbody>
</table>
Post-Commit SDK/Transform Integration Tests Status (on master branch)
------------------------------------------------------------------------------------------------
<table>
<thead>
<tr>
<th>Go</th>
<th>Java</th>
<th>Python</th>
</tr>
</thead>
<tbody>
<tr>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon?subject=3.6">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon?subject=3.7">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/badge/icon?subject=3.8">
</a>
</td>
</tr>
</tbody>
</table>
Pre-Commit Tests Status (on master branch)
------------------------------------------------------------------------------------------------
<table>
<thead>
<tr>
<th>---</th>
<th>Java</th>
<th>Python</th>
<th>Go</th>
<th>Website</th>
<th>Whitespace</th>
<th>Typescript</th>
</tr>
</thead>
<tbody>
<tr>
<td>Non-portable</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon">
</a><br>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon?subject=Tests">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/badge/icon?subject=Lint">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/badge/icon?subject=Docker">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Cron/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Cron/badge/icon?subject=Docs">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/badge/icon">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/badge/icon">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PreCommit_Whitespace_Cron/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PreCommit_Whitespace_Cron/lastCompletedBuild/badge/icon">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PreCommit_Typescript_Cron/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PreCommit_Typescript_Cron/lastCompletedBuild/badge/icon">
</a>
</td>
</tr>
<tr>
<td>Portable</td>
<td>---</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/badge/icon">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PreCommit_GoPortable_Cron/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PreCommit_GoPortable_Cron/lastCompletedBuild/badge/icon">
</a>
</td>
<td>---</td>
<td>---</td>
<td>---</td>
</tr>
</tbody>
</table>
See
[.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md)
for trigger phrase, status and link of all Jenkins jobs.
GitHub Actions Tests Status (on master branch)
------------------------------------------------------------------------------------------------
[](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
[](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
[](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more
information about GitHub Actions CI.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 634830)
Time Spent: 1.5h (was: 1h 20m)
> "elements added" for input and output collections is always empty
> -----------------------------------------------------------------
>
> Key: BEAM-6374
> URL: https://issues.apache.org/jira/browse/BEAM-6374
> Project: Beam
> Issue Type: Bug
> Components: runner-dataflow, sdk-go
> Reporter: Andrew Brampton
> Priority: P3
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> The field for "Elements added" and "Estimated size" is always blank when
> running a Go binary on Dataflow. For example when running the work count
> example: https://pasteboard.co/HVf80BU.png
--
This message was sent by Atlassian Jira
(v8.3.4#803005)