Steve Niemitz created BEAM-7568:
-----------------------------------

             Summary: Java dataflow harness re-encodes value state cells even 
if they haven't changed
                 Key: BEAM-7568
                 URL: https://issues.apache.org/jira/browse/BEAM-7568
             Project: Beam
          Issue Type: Improvement
          Components: runner-dataflow
    Affects Versions: 2.13.0
            Reporter: Steve Niemitz


The java dataflow worker seems to re-encode ValueState cells after every work 
item, even they weren't modified.

You can see here 
[https://github.com/apache/beam/blob/a71bfda77df36aa1531f01533c372233cfba0dd9/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateInternals.java#L413]
 that the value is always encoded (and used to weight the cache entry) even if 
it won't be persisted back to windmill. 

This can have some large performance implications if they values being stored 
are expensive/large to encode, and infrequently modified.  Ideally, the weight 
would be also cached, and the value would only need to be modified if it was 
changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to