Steve Niemitz created BEAM-7568:
-----------------------------------
Summary: Java dataflow harness re-encodes value state cells even
if they haven't changed
Key: BEAM-7568
URL: https://issues.apache.org/jira/browse/BEAM-7568
Project: Beam
Issue Type: Improvement
Components: runner-dataflow
Affects Versions: 2.13.0
Reporter: Steve Niemitz
The java dataflow worker seems to re-encode ValueState cells after every work
item, even they weren't modified.
You can see here
[https://github.com/apache/beam/blob/a71bfda77df36aa1531f01533c372233cfba0dd9/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateInternals.java#L413]
that the value is always encoded (and used to weight the cache entry) even if
it won't be persisted back to windmill.
This can have some large performance implications if they values being stored
are expensive/large to encode, and infrequently modified. Ideally, the weight
would be also cached, and the value would only need to be modified if it was
changed.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)