[
https://issues.apache.org/jira/browse/BEAM-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ismaël Mejía updated BEAM-7745:
-------------------------------
Status: Open (was: Triage Needed)
> StreamingSideInputDoFnRunner/StreamingSideInputFetcher have suboptimal state
> access pattern during normal operation
> -------------------------------------------------------------------------------------------------------------------
>
> Key: BEAM-7745
> URL: https://issues.apache.org/jira/browse/BEAM-7745
> Project: Beam
> Issue Type: Improvement
> Components: runner-dataflow
> Reporter: Steve Niemitz
> Priority: Major
>
> I spent some time tracking down sources of uncached state fetches in my job,
> and one large category was the interaction of StreamingSideInputDoFnRunner +
> StreamingSideInputFetcher.
> Basically, during standard operations, when the main input is NOT blocked by
> the side input, the side input fetcher will perform an uncached state read
> for every input element. Changing it to cache the blockedMap state gave me a
> ~30-40% increase in throughput in my job.
> The interaction is a little complicated, and there's a couple optimizations
> here I can see.
>
> Primarily, the blockedMap is only persisted if it is non-empty. Because the
> WindmillStateCache won't cache a null value, this means that the "nothing is
> blocked" signal is never actually cached, and will issue a state read to
> windmill for each input element. The solution here seems like it is to
> persist an empty map rather than a null when there are no blocked elements.
>
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)