[
https://issues.apache.org/jira/browse/FLINK-27934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Flink Jira Bot updated FLINK-27934:
-----------------------------------
Labels: pull-request-available stale-minor (was: pull-request-available)
I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help
the community manage its development. I see this issues has been marked as
Minor but is unassigned and neither itself nor its Sub-Tasks have been updated
for 180 days. I have gone ahead and marked it "stale-minor". If this ticket is
still Minor, please either assign yourself or give an update. Afterwards,
please remove the label or in 7 days the issue will be deprioritized.
> Python API- Inefficient deserialization/serialization of state variables
> within a batch
> ---------------------------------------------------------------------------------------
>
> Key: FLINK-27934
> URL: https://issues.apache.org/jira/browse/FLINK-27934
> Project: Flink
> Issue Type: Improvement
> Components: Stateful Functions
> Affects Versions: statefun-3.2.0
> Reporter: Frans King
> Priority: Minor
> Labels: pull-request-available, stale-minor
>
> In the Python API state variables can be accessed via the UserFacingContext:
> variable = context.storage.variable
> This calls into the Cell instance for that state variable which has get() &
> set() methods. The get() method always deserializes from the typed_value and
> the set() always re-serializes and marks the cell dirty.
>
> This has two side effects
> 1:
> var1 = context.storage.variable
> var2 = context.storage.variable
> id(var2) != id(var1) - they are different instances
>
> 2:
> In a large batch (say 1000 calls to the same function type and id) this can
> result in deserializing and re-serializing the same same state variable 1000
> times when really it only needs to be deserialized in the first invocation in
> the batch, held in memory until the last invocation and then re-serialized
> prior to collecting the mutations.
>
> I think this can be improved by having a lazily initialized backing field in
> the Cell class but I don't know if this was a conscious design decision to
> have the behavior described in 1.
>
> Any feedback would be welcome.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)