[
https://issues.apache.org/jira/browse/BEAM-7745?focusedWorklogId=630592&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-630592
]
ASF GitHub Bot logged work on BEAM-7745:
----------------------------------------
Author: ASF GitHub Bot
Created on: 28/Jul/21 15:40
Start Date: 28/Jul/21 15:40
Worklog Time Spent: 10m
Work Description: steveniemitz opened a new pull request #15235:
URL: https://github.com/apache/beam/pull/15235
The presence of streaming side-inputs in some cases can cause a large
performance penalty due to cache misses looking up `blockedMap`. During normal
operations this map will be empty/null, but previously wouldn't be persisted
(or cached). Because of this, each read would miss the cache and cause a state
fetch from windmill.
This simply calls `clear` in all cases. Doing so will ensure that the state
cell is put into the cache for subsequent reads.
An interesting note, I could only reproduce this if the side-input-using
ParDo was fused with another stage that uses state (I tried with a stateful
DoFn, but possibly other cases as well), I'm still not sure why that is...
@lukecwik @reuvenlax
------------------------
Thank you for your contribution! Follow this checklist to help us
incorporate your contribution quickly and easily:
- [x] [**Choose
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA
issue, if applicable. This will automatically link the pull request to the
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [x] If this contribution is large, please file an Apache [Individual
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
See the [Contributor Guide](https://beam.apache.org/contribute) for more
tips on [how to make review process
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
`ValidatesRunner` compliance status (on master branch)
--------------------------------------------------------
<table>
<thead>
<tr>
<th>Lang</th>
<th>ULR</th>
<th>Dataflow</th>
<th>Flink</th>
<th>Samza</th>
<th>Spark</th>
<th>Twister2</th>
</tr>
</thead>
<tbody>
<tr>
<td>Go</td>
<td>---</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon">
</a>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Samza/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Samza/lastCompletedBuild/badge/icon">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon">
</a>
</td>
<td>---</td>
</tr>
<tr>
<td>Java</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_ULR/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_ULR/lastCompletedBuild/badge/icon">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon?subject=V1">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Streaming/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Streaming/lastCompletedBuild/badge/icon?subject=V1+Streaming">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon?subject=V1+Java+11">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2/lastCompletedBuild/badge/icon?subject=V2">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2_Streaming/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2_Streaming/lastCompletedBuild/badge/icon?subject=V2+Streaming">
</a><br>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon?subject=Java+8">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon?subject=Java+11">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon?subject=Portable">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon?subject=Portable+Streaming">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Samza/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Samza/lastCompletedBuild/badge/icon?subject=Portable">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon?subject=Portable">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon?subject=Structured+Streaming">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon">
</a>
</td>
</tr>
<tr>
<td>Python</td>
<td>---</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon?subject=V1">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon?subject=V2">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon?subject=ValCont">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon?subject=Portable">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Samza/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Samza/lastCompletedBuild/badge/icon">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon">
</a>
</td>
<td>---</td>
</tr>
<tr>
<td>XLang</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/badge/icon">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Dataflow/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Dataflow/lastCompletedBuild/badge/icon">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/badge/icon">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Samza/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Samza/lastCompletedBuild/badge/icon">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/badge/icon">
</a>
</td>
<td>---</td>
</tr>
</tbody>
</table>
Examples testing status on various runners
--------------------------------------------------------
<table>
<thead>
<tr>
<th>Lang</th>
<th>ULR</th>
<th>Dataflow</th>
<th>Flink</th>
<th>Samza</th>
<th>Spark</th>
<th>Twister2</th>
</tr>
</thead>
<tbody>
<tr>
<td>Go</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
</tr>
<tr>
<td>Java</td>
<td>---</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/lastCompletedBuild/badge/icon?subject=V1">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Java11_Cron/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Java11_Cron/lastCompletedBuild/badge/icon?subject=V1+Java11">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java_Examples_Dataflow_V2/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java_Examples_Dataflow_V2/lastCompletedBuild/badge/icon?subject=V2">
</a><br>
</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
</tr>
<tr>
<td>Python</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
</tr>
<tr>
<td>XLang</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td>---</td>
</tr>
</tbody>
</table>
Post-Commit SDK/Transform Integration Tests Status (on master branch)
------------------------------------------------------------------------------------------------
<table>
<thead>
<tr>
<th>Go</th>
<th>Java</th>
<th>Python</th>
</tr>
</thead>
<tbody>
<tr>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon?subject=3.6">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon?subject=3.7">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/badge/icon?subject=3.8">
</a>
</td>
</tr>
</tbody>
</table>
Pre-Commit Tests Status (on master branch)
------------------------------------------------------------------------------------------------
<table>
<thead>
<tr>
<th>---</th>
<th>Java</th>
<th>Python</th>
<th>Go</th>
<th>Website</th>
<th>Whitespace</th>
<th>Typescript</th>
</tr>
</thead>
<tbody>
<tr>
<td>Non-portable</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon">
</a><br>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon?subject=Tests">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/badge/icon?subject=Lint">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/badge/icon?subject=Docker">
</a><br>
<a
href="https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Cron/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Cron/badge/icon?subject=Docs">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/badge/icon">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/badge/icon">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PreCommit_Whitespace_Cron/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PreCommit_Whitespace_Cron/lastCompletedBuild/badge/icon">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PreCommit_Typescript_Cron/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PreCommit_Typescript_Cron/lastCompletedBuild/badge/icon">
</a>
</td>
</tr>
<tr>
<td>Portable</td>
<td>---</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/badge/icon">
</a>
</td>
<td>
<a
href="https://ci-beam.apache.org/job/beam_PreCommit_GoPortable_Cron/lastCompletedBuild/">
<img alt="Build Status"
src="https://ci-beam.apache.org/job/beam_PreCommit_GoPortable_Cron/lastCompletedBuild/badge/icon">
</a>
</td>
<td>---</td>
<td>---</td>
<td>---</td>
</tr>
</tbody>
</table>
See
[.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md)
for trigger phrase, status and link of all Jenkins jobs.
GitHub Actions Tests Status (on master branch)
------------------------------------------------------------------------------------------------
[](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
[](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
[](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more
information about GitHub Actions CI.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 630592)
Remaining Estimate: 0h
Time Spent: 10m
> StreamingSideInputDoFnRunner/StreamingSideInputFetcher have suboptimal state
> access pattern during normal operation
> -------------------------------------------------------------------------------------------------------------------
>
> Key: BEAM-7745
> URL: https://issues.apache.org/jira/browse/BEAM-7745
> Project: Beam
> Issue Type: Improvement
> Components: runner-dataflow
> Reporter: Steve Niemitz
> Priority: P3
> Time Spent: 10m
> Remaining Estimate: 0h
>
> I spent some time tracking down sources of uncached state fetches in my job,
> and one large category was the interaction of StreamingSideInputDoFnRunner +
> StreamingSideInputFetcher.
> Basically, during standard operations, when the main input is NOT blocked by
> the side input, the side input fetcher will perform an uncached state read
> for every input element. Changing it to cache the blockedMap state gave me a
> ~30-40% increase in throughput in my job.
> The interaction is a little complicated, and there's a couple optimizations
> here I can see.
>
> Primarily, the blockedMap is only persisted if it is non-empty. Because the
> WindmillStateCache won't cache a null value, this means that the "nothing is
> blocked" signal is never actually cached, and will issue a state read to
> windmill for each input element. The solution here seems like it is to
> persist an empty map rather than a null when there are no blocked elements.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)