[
https://issues.apache.org/jira/browse/FLINK-27934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated FLINK-27934:
-----------------------------------
Labels: pull-request-available (was: )
> Python API- Inefficient deserialization/serialization of state variables
> within a batch
> ---------------------------------------------------------------------------------------
>
> Key: FLINK-27934
> URL: https://issues.apache.org/jira/browse/FLINK-27934
> Project: Flink
> Issue Type: Improvement
> Components: Stateful Functions
> Affects Versions: statefun-3.2.0
> Reporter: Frans King
> Priority: Minor
> Labels: pull-request-available
>
> In the Python API state variables can be accessed via the UserFacingContext:
> variable = context.storage.variable
> This calls into the Cell instance for that state variable which has get() &
> set() methods. The get() method always deserializes from the typed_value and
> the set() always re-serializes and marks the cell dirty.
>
> This has two side effects
> 1:
> var1 = context.storage.variable
> var2 = context.storage.variable
> id(var2) != id(var1) - they are different instances
>
> 2:
> In a large batch (say 1000 calls to the same function type and id) this can
> result in deserializing and re-serializing the same same state variable 1000
> times when really it only needs to be deserialized in the first invocation in
> the batch, held in memory until the last invocation and then re-serialized
> prior to collecting the mutations.
>
> I think this can be improved by having a lazily initialized backing field in
> the Cell class but I don't know if this was a conscious design decision to
> have the behavior described in 1.
>
> Any feedback would be welcome.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)