GitHub user benstopford opened a pull request: https://github.com/apache/kafka/pull/2808
KIP-101: Alter Replication Protocol to use Leader Epoch rather than High Watermark for Truncation This PR describes the addition of Partition Level Leader Epochs to messages in Kafka as a mechanism for fixing some known issues in the replication protocol. Full details can be found here: [KIP-101 Reference](https://cwiki.apache.org/confluence/display/KAFKA/KIP-101+-+Alter+Replication+Protocol+to+use+Leader+Epoch+rather+than+High+Watermark+for+Truncation) *The key elements are*: - Epochs are stamped on messages as they enter the leader. - Epochs are tracked in both leader and follower in a new checkpoint file. - A new API allows followers to retrieve the leader's latest offset for a particular epoch. - The logic for truncating the log, when a replica becomes a follower, has been moved from Partition into the ReplicaFetcherThread - When partitions are added to the ReplicaFetcherThread they are added in an initialising state. Initialising partitions request leader epochs and then truncate their logs appropriately. This test provides a good overview of the workflow `EpochDrivenReplicationProtocolAcceptanceTest.shouldFollowLeaderEpochBasicWorkflow()` The corrupted log use case is covered by the test `EpochDrivenReplicationProtocolAcceptanceTest.offsetsShouldNotGoBackwards()` Remaining work: The test `EpochDrivenReplicationProtocolAcceptanceTest.shouldSurviveFastLeaderChange()` doesn't correctly reproduce the underlying issue. This will be altered later to properly support this use case. You can merge this pull request into a Git repository by running: $ git pull https://github.com/confluentinc/kafka kip-101-v2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/2808.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2808 ---- commit a96a8bbee2435bd46cd19746f61b73eeb2f94088 Author: Ben Stopford <benstopf...@gmail.com> Date: 2017-03-27T16:16:16Z All work to date squashed (18 committs) KIP-101: Push after merge. KIP-101: Fixes for checksytle breaks KIP-101: Remove TestSuite class KIP-101: Comments KIP-101: Comments KIP-101: Altered logic in ReplicaFetcherThread: - On NoLeaderForPartition continue to poll for epochs indefinitely - Add synchronisation around log trucation to ensure we cannot truncate the log of a leader (light testing, more to follow, noted in TODO) KIP-101: Rename Epoch -> PartitionLeaderEpoch KIP-101: First commit based on feedback from Jun/Jason KIP-101: Second commit based on feedback from Jun/Jason KIP-101: Third commit based on feedback from Jun/Jason KIP-101: Fourth commit based on feedback from Jun/Jason - removed retainMatchingOffset parameter from clearOldest as not used KIP-101: tidy only KIP-101: Return Log End Offset If Undefined Epoch Requested (this covers the case of a bootstrapping broker) KIP-101: Altered log truncation to always be inclusive, so we always delete epochs inclusive of the passed offset, whether clearing earliest or latest entries. KIP-101: Add optimisation back in for previous commit. KIP-101: If epochOffset.endOffset() is UNSUPPORTED_EPOCH_OFFSET, which can happen during the transition phase, we should fall back to HW. Improved fuglyness too. KIP-101: Small tidy KIP-101: Refactored threading model in Abstract/ReplicaFetcherThread. Functionally identical but now the logic sits largely in the abstract class. KIP-101: Moved OffsetsForLeaderEpoch.getResponseFor() into ReplicaManager KIP-101: (1) Altered ReplicaFetcherThread to poll continuously on errors. (2) Only send epoch requests if version >= 11 KIP-101: As segments are recovered, truncate the epoch cache with the appropriate segment KIP-101: Fix bug in DummyFetcherThread which was defaulting to requiring initialisation. Caused AbstractFetherThread test to hang. KIP-101: Fix bug in ReplicaManager imports KIP-101: Fix bug in ReplicaManager imports by making all imports explicit. Also remove OffsetCheckpointFile which appears to still be in the remote repostiory. This was causing a compilation issue. KIP-101: Remove override of OffsetsTopicPartitionsProp (to 5) in PlaintexConsumerTest as it causes a test in BaseConsumerTest to fail. Will fix this issue in separate PR KIP-101: Rename only (OffsesForLeaderEpochRequest) KIP-101: Fix merge error KIP-101: Fix couple more merge errors KIP-101: Re-enable test_zk_security_upgrade on Ismael's request KIP-101: Commenting out EndToEndClusterIdTest as it fails on jenkins, although passes consistently locally, including from a fresh checkout. Puzzling. KIP-101: Large refactor to alter the data structure used in the Request/Response classes. These are now Maps keyed by TopicPartition. Pushed this change through other code and cleaned up tests as appropriate. KIP-101: Addressed Jun's second round of feedback. KIP-101: Addressed first part of Jun's third round of feedback, this relates largely to test code. KIP-101: Don't assign epochs if magic byte indicates previous version. KIP-101: Altered the logic for clearEarliest so that it keeps the previous epoch and updates it's offset to the one used to clear. KIP-101: Added test and removed some of the ; that aren't used commit ab7abdbe9cb25ccb7f8cf045190a3cf812631aae Author: Ben Stopford <benstopf...@gmail.com> Date: 2017-03-27T16:16:16Z All work to date squashed (18 committs) KIP-101: Push after merge. KIP-101: Fixes for checksytle breaks KIP-101: Remove TestSuite class KIP-101: Comments KIP-101: Comments KIP-101: Altered logic in ReplicaFetcherThread: - On NoLeaderForPartition continue to poll for epochs indefinitely - Add synchronisation around log trucation to ensure we cannot truncate the log of a leader (light testing, more to follow, noted in TODO) KIP-101: Rename Epoch -> PartitionLeaderEpoch KIP-101: First commit based on feedback from Jun/Jason KIP-101: Second commit based on feedback from Jun/Jason KIP-101: Third commit based on feedback from Jun/Jason KIP-101: Fourth commit based on feedback from Jun/Jason - removed retainMatchingOffset parameter from clearOldest as not used KIP-101: tidy only KIP-101: Return Log End Offset If Undefined Epoch Requested (this covers the case of a bootstrapping broker) KIP-101: Altered log truncation to always be inclusive, so we always delete epochs inclusive of the passed offset, whether clearing earliest or latest entries. KIP-101: Add optimisation back in for previous commit. KIP-101: If epochOffset.endOffset() is UNSUPPORTED_EPOCH_OFFSET, which can happen during the transition phase, we should fall back to HW. Improved fuglyness too. KIP-101: Small tidy KIP-101: Refactored threading model in Abstract/ReplicaFetcherThread. Functionally identical but now the logic sits largely in the abstract class. KIP-101: Moved OffsetsForLeaderEpoch.getResponseFor() into ReplicaManager KIP-101: (1) Altered ReplicaFetcherThread to poll continuously on errors. (2) Only send epoch requests if version >= 11 KIP-101: As segments are recovered, truncate the epoch cache with the appropriate segment KIP-101: Fix bug in DummyFetcherThread which was defaulting to requiring initialisation. Caused AbstractFetherThread test to hang. KIP-101: Fix bug in ReplicaManager imports KIP-101: Fix bug in ReplicaManager imports by making all imports explicit. Also remove OffsetCheckpointFile which appears to still be in the remote repostiory. This was causing a compilation issue. KIP-101: Remove override of OffsetsTopicPartitionsProp (to 5) in PlaintexConsumerTest as it causes a test in BaseConsumerTest to fail. Will fix this issue in separate PR KIP-101: Rename only (OffsesForLeaderEpochRequest) KIP-101: Fix merge error KIP-101: Fix couple more merge errors KIP-101: Re-enable test_zk_security_upgrade on Ismael's request KIP-101: Commenting out EndToEndClusterIdTest as it fails on jenkins, although passes consistently locally, including from a fresh checkout. Puzzling. KIP-101: Large refactor to alter the data structure used in the Request/Response classes. These are now Maps keyed by TopicPartition. Pushed this change through other code and cleaned up tests as appropriate. KIP-101: Addressed Jun's second round of feedback. KIP-101: Addressed first part of Jun's third round of feedback, this relates largely to test code. KIP-101: Don't assign epochs if magic byte indicates previous version. KIP-101: Altered the logic for clearEarliest so that it keeps the previous epoch and updates it's offset to the one used to clear. KIP-101: Added test and removed some of the ; that aren't used commit 2f4f171444fcc4246ba7224b99fc626ec0980b93 Author: Ben Stopford <benstopf...@gmail.com> Date: 2017-04-04T11:56:18Z KIP-101: fix test break commit a281b581f0da4bdd424bc198bdff01723dd2f8e5 Author: Ben Stopford <benstopf...@gmail.com> Date: 2017-04-04T12:26:25Z KIP-101: just testing... commit de672b31169f56a336bb0c046745448a90d2e290 Author: Ben Stopford <benstopf...@gmail.com> Date: 2017-04-04T12:27:27Z KIP-101: just testing... commit 1a674bbe104dd7f30045b24c8d245edc83896cc5 Author: Ben Stopford <benstopf...@gmail.com> Date: 2017-04-04T13:00:46Z KIP-101: small changes based on Jun's review commit 275a130bd8be320de6b584e4a6cf3e196cc1eb37 Author: Jun Rao <jun...@gmail.com> Date: 2017-04-04T21:51:02Z recover leader epoch during log recovery; other minor cleanups commit fedbd6150246d0a7adb47aaabe0a7d92a3a9a9fc Author: Ben Stopford <benstopf...@gmail.com> Date: 2017-04-04T21:40:31Z KIP-101: Ensure log directory is created before we create the LeaderEpochCache (addresses a couple of Jun's feedback points) commit adb3b98de10b712be8137fc95e2a63e3f97e8444 Author: Ben Stopford <benstopf...@gmail.com> Date: 2017-04-04T21:41:55Z KIP-101: Remove todo commit 85f5b9364e97cd56194fa09b9ddaa906c1ea91ff Author: Ben Stopford <benstopf...@gmail.com> Date: 2017-04-04T21:44:11Z KIP-101: Move clearLatest call so it doesn't overlap with existing clear() in truncation phase (in response to Jun's comment) commit e52f0178e613c64f2f0a07762c4a30760f5658d5 Author: Ben Stopford <benstopf...@gmail.com> Date: 2017-04-04T21:50:23Z KIP-101: Tidy only commit 2d7f55dd7ee074e304154f2d84590268ab40e237 Author: Ben Stopford <benstopf...@gmail.com> Date: 2017-04-04T22:44:21Z KIP-101: Comments only ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---