[
https://issues.apache.org/jira/browse/BEAM-7568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ismaël Mejía updated BEAM-7568:
-------------------------------
Status: Open (was: Triage Needed)
> Java dataflow harness re-encodes value state cells even if they haven't
> changed
> -------------------------------------------------------------------------------
>
> Key: BEAM-7568
> URL: https://issues.apache.org/jira/browse/BEAM-7568
> Project: Beam
> Issue Type: Improvement
> Components: runner-dataflow
> Affects Versions: 2.13.0
> Reporter: Steve Niemitz
> Priority: Major
>
> The java dataflow worker seems to re-encode ValueState cells after every work
> item, even they weren't modified.
> You can see here
> [https://github.com/apache/beam/blob/a71bfda77df36aa1531f01533c372233cfba0dd9/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateInternals.java#L413]
> that the value is always encoded (and used to weight the cache entry) even
> if it won't be persisted back to windmill.
> This can have some large performance implications if they values being stored
> are expensive/large to encode, and infrequently modified. Ideally, the
> weight would be also cached, and the value would only need to be modified if
> it was changed.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)