[ https://issues.apache.org/jira/browse/KAFKA-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dong Lin updated KAFKA-5829: ---------------------------- Priority: Critical (was: Major) > Speedup broker startup after unclean shutdown by reducing unnecessary > snapshot files deletion > --------------------------------------------------------------------------------------------- > > Key: KAFKA-5829 > URL: https://issues.apache.org/jira/browse/KAFKA-5829 > Project: Kafka > Issue Type: Bug > Reporter: Dong Lin > Assignee: Dong Lin > Priority: Critical > Fix For: 1.0.0 > > > The current Kafka implementation will cause slow startup after unclean > shutdown. The time to load a partition will be 10X or more than what it > actually needs. Here is the explanation with example: > - Say we have a partition of 20 segments, each segment has 250 message > starting with offset 0. And each message has 1 MB bytes. > - Broker experiences hard kill and the index file of the first segment is > corrupted. > - When broker startup and load the first segment, it realizes that the index > of the first segment is corrupted. So it calls `log.recoverSegment(...)` to > recover this segment. This method will call > `stateManager.truncateAndReload(...)` which deletes the snapshot files whose > offset is larger than base offset of the first segment. Thus all snapshot > files are deleted. > - To rebuild the snapshot files, the `log.loadSegmentFiles(...)` will have to > read every message in this partition even if their log and index files are > not corrupted. This will increase the time to load this partition by more > than an order of magnitude. > In order to address this issue, one simple solution is not to delete snapshot > files that are than the given offset if only the index files needs re-build. > More specifically, we should not need to re-build producer state offset file > unless the log file itself is corrupted or truncated. -- This message was sent by Atlassian JIRA (v6.4.14#64029)