Sam Whittle created BEAM-11707:
----------------------------------

             Summary: Optimize WindmillStateCache CPU usage
                 Key: BEAM-11707
                 URL: https://issues.apache.org/jira/browse/BEAM-11707
             Project: Beam
          Issue Type: Bug
          Components: runner-dataflow
            Reporter: Sam Whittle
            Assignee: Sam Whittle


>From profiling nexmark Query11 which has many unique tags per key, I observed 
>that the WindmillStateCache cpu usage was 6% of CPU.
The usage appears to be due to the invalidation set maintenance as well as many 
reads/inserts.

The invalidation set is maintained so that if a key encounters an error 
processing or the cache token changes, we can invalidate all the entries for a 
key.  Currently this is done by removing all entries for the key from the 
cache.  Another alternative which appears much more CPU efficient is to instead 
leave the entries in the cache but make them unreachable.  This can be done by 
having a per-key object that uses object equality as part of the cache lookup.  
Then to discard entries for the key, we start using a new per-key object.  
Cleanup of per-key objects can be done with a weak reference map.

Another cost to the cache is that objects are grouped by window so that they 
are kept/evicted all at once.  However currently when reading items into the 
cache, we fetch the window set and then lookup each tag in it.  This could be 
cached for the key to avoid multiple cache lookups. Similarly for putting 
objects we lookup and insert each tag separately and then update the cache to 
update the weight for the per-window set.  This could be done once after all 
updates for the window have been made.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to