[
https://issues.apache.org/jira/browse/FLINK-6763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349909#comment-16349909
]
Stephan Ewen commented on FLINK-6763:
-------------------------------------
As a related comment - I think the whole snapshot procedure can be optimized a
bit. We can create the serializer snapshot one and then just keep the bytes and
add those to every checkpoint. In smaller state programs, the majority of
checkpoint time can be spent on serializer snapshots (still only, milliseconds,
but optimization potential non the less)
> Inefficient PojoSerializerConfigSnapshot serialization format
> -------------------------------------------------------------
>
> Key: FLINK-6763
> URL: https://issues.apache.org/jira/browse/FLINK-6763
> Project: Flink
> Issue Type: Improvement
> Components: State Backends, Checkpointing, Type Serialization System
> Affects Versions: 1.3.0, 1.4.0
> Reporter: Till Rohrmann
> Assignee: Tzu-Li (Gordon) Tai
> Priority: Blocker
> Fix For: 1.5.0
>
>
> The {{PojoSerializerConfigSnapshot}} stores for each serializer the beginning
> offset and ending offset in the serialization stream. This information is
> also written if the serializer serialization is supposed to be ignored. The
> beginning and ending offsets are stored as a sequence of integers at the
> beginning of the serialization stream. We store this information to skip
> broken serializers.
> I think we don't need both offsets. Instead I would suggest to write the
> length of the serialized serializer first into the serialization stream and
> then the serialized serializer. This can be done in
> {{TypeSerializerSerializationUtil.writeSerializer}}. When reading the
> serializer via {{TypeSerializerSerializationUtil.tryReadSerializer}}, we can
> try to deserialize the serializer. If this operation fails, then we can skip
> the number of serialized serializer because we know how long it was.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)