[
https://issues.apache.org/jira/browse/FLINK-5917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15887094#comment-15887094
]
Xiaogang Shi commented on FLINK-5917:
-------------------------------------
[~StephanEwen] We just use the cache to avoid the costly scanning. It's
initialized the first time the `size()` method is called. After then, the cache
will be updated every time a new entry is inserted or an entry is removed. When
the backend is closed, we can simply drop the cache.
A better choice, i think, is to use a RocksDB entry to record the value of the
`size`. We don't need to write the value into the entry everytime it's updated.
We can update it only when taking snapshots. But this requires states to be
aware of checkpointing which is missing in our current implementation.
> Remove MapState.size()
> ----------------------
>
> Key: FLINK-5917
> URL: https://issues.apache.org/jira/browse/FLINK-5917
> Project: Flink
> Issue Type: Improvement
> Components: DataStream API
> Affects Versions: 1.3.0
> Reporter: Aljoscha Krettek
>
> I'm proposing to remove {{size()}} because it is a prohibitively expensive
> operation and users might not be aware of it. Instead of {{size()}} users can
> use an iterator over all mappings to determine the size, when doing this they
> will be aware of the fact that it is a costly operation.
> Right now, {{size()}} is only costly on the RocksDB state backend but I think
> with future developments on the in-memory state backend it might also become
> an expensive operation there.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)