[ 
https://issues.apache.org/jira/browse/BEAM-7745?focusedWorklogId=634348&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634348
 ]

ASF GitHub Bot logged work on BEAM-7745:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 05/Aug/21 11:41
            Start Date: 05/Aug/21 11:41
    Worklog Time Spent: 10m 
      Work Description: steveniemitz commented on a change in pull request 
#15235:
URL: https://github.com/apache/beam/pull/15235#discussion_r682686933



##########
File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateInternals.java
##########
@@ -474,9 +474,12 @@ private WindmillValue(
 
     @Override
     public void clear() {
-      modified = true;
+      // if the value was already null (because it was already cleared) then 
there's no need to
+      // mark this as modified again.  This will save having to persist a 
clear that does nothing.

Review comment:
       good catch, I didn't really need to include this in here anyways, I'll 
revert it.

##########
File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateInternals.java
##########
@@ -474,9 +474,12 @@ private WindmillValue(
 
     @Override
     public void clear() {
-      modified = true;
+      // if the value was already null (because it was already cleared) then 
there's no need to

Review comment:
       reverted




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 634348)
    Time Spent: 1h 40m  (was: 1.5h)

> StreamingSideInputDoFnRunner/StreamingSideInputFetcher have suboptimal state 
> access pattern during normal operation
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-7745
>                 URL: https://issues.apache.org/jira/browse/BEAM-7745
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-dataflow
>            Reporter: Steve Niemitz
>            Priority: P3
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> I spent some time tracking down sources of uncached state fetches in my job, 
> and one large category was the interaction of StreamingSideInputDoFnRunner + 
> StreamingSideInputFetcher.
> Basically, during standard operations, when the main input is NOT blocked by 
> the side input, the side input fetcher will perform an uncached state read 
> for every input element.  Changing it to cache the blockedMap state gave me a 
> ~30-40% increase in throughput in my job.
> The interaction is a little complicated, and there's a couple optimizations 
> here I can see.
>  
> Primarily, the blockedMap is only persisted if it is non-empty.  Because the 
> WindmillStateCache won't cache a null value, this means that the "nothing is 
> blocked" signal is never actually cached, and will issue a state read to 
> windmill for each input element.  The solution here seems like it is to 
> persist an empty map rather than a null when there are no blocked elements.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to