[
https://issues.apache.org/jira/browse/KAFKA-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16012738#comment-16012738
]
Tommy Becker commented on KAFKA-5256:
-------------------------------------
I noticed this originally when my state store directories were much larger than
the topics backing them, and discovered it was because the data was being
duplicated. The scenario I described above notwithstanding, this doesn't
produce incorrect results, but wastes both disk space and CPU cycles as RocksDB
compacting the duplicate data.
> Non-checkpointed state stores should be deleted before restore
> --------------------------------------------------------------
>
> Key: KAFKA-5256
> URL: https://issues.apache.org/jira/browse/KAFKA-5256
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Affects Versions: 0.10.2.1
> Reporter: Tommy Becker
>
> Currently, Kafka Streams will re-use an existing state store even if there is
> no checkpoint for it. This seems both inefficient (because duplicate inserts
> can be made on restore) and incorrect (records which have been deleted from
> the backing topic may still exist in the store). Since the contents of a
> store with no checkpoint are unknown, the best way to proceed would be to
> delete the store and recreate before restoring.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)