[ 
https://issues.apache.org/jira/browse/BEAM-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7745:
-------------------------------
    Status: Open  (was: Triage Needed)

> StreamingSideInputDoFnRunner/StreamingSideInputFetcher have suboptimal state 
> access pattern during normal operation
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-7745
>                 URL: https://issues.apache.org/jira/browse/BEAM-7745
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-dataflow
>            Reporter: Steve Niemitz
>            Priority: Major
>
> I spent some time tracking down sources of uncached state fetches in my job, 
> and one large category was the interaction of StreamingSideInputDoFnRunner + 
> StreamingSideInputFetcher.
> Basically, during standard operations, when the main input is NOT blocked by 
> the side input, the side input fetcher will perform an uncached state read 
> for every input element.  Changing it to cache the blockedMap state gave me a 
> ~30-40% increase in throughput in my job.
> The interaction is a little complicated, and there's a couple optimizations 
> here I can see.
>  
> Primarily, the blockedMap is only persisted if it is non-empty.  Because the 
> WindmillStateCache won't cache a null value, this means that the "nothing is 
> blocked" signal is never actually cached, and will issue a state read to 
> windmill for each input element.  The solution here seems like it is to 
> persist an empty map rather than a null when there are no blocked elements.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to