[
https://issues.apache.org/jira/browse/FLINK-11517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nico Kruber updated FLINK-11517:
--------------------------------
Component/s: (was: Runtime / Task)
API / DataStream
> Inefficient window state access when using RocksDB state backend
> ----------------------------------------------------------------
>
> Key: FLINK-11517
> URL: https://issues.apache.org/jira/browse/FLINK-11517
> Project: Flink
> Issue Type: Bug
> Components: API / DataStream
> Reporter: Elias Levy
> Priority: Major
>
> When using an aggregate function on a window with a process function and the
> RocksDB state backend, state access is inefficient.
> The WindowOperator calls windowState.add to merge the new element using the
> aggregate function. The add method of RocksDBAggregatingState will read the
> state, deserialize the state, call the aggregate function, deserialize the
> state, and write it out.
> If the trigger decides the window must be fired, as the the windowState.add
> does not return the state, the WindowOperator must call windowState.get to
> get it and pass it to the window process function, resulting in another read
> and deserialization.
> Finally, while the state is not passed in to the trigger, in some cases the
> trigger may have a need to access the state. That is our case. As the state
> is not passed to the trigger, we must read and deserialize the state one more
> from within the trigger.
> Thus, state must be read and deserialized three times to process a single
> element. If the state is large, this can be quite costly.
>
> Ideally windowState.add would return the state, so that the WindowOperator
> can pass it to the process function without having to read it again.
> Additionally, the state would be made available to the trigger to enable more
> use cases without having to go through the state descriptor again.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)