[jira] [Commented] (KAFKA-340) ensure ISR has enough brokers during clean shutdown of a broker

Joel Koshy (JIRA) Mon, 01 Oct 2012 18:05:12 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13467380#comment-13467380
 ]


Joel Koshy commented on KAFKA-340:
----------------------------------

This may not be too difficult to implement, but tricky to use so I would
like to put down a more detailed description as that may help come up with
more subtleties and corner cases to account for. The actual invocation of
shutdown (describe at end) needs to be thought through.

The existing shutdown hook provides a conventional shutdown approach and is
convenient from an operations perspective in that it can be easily scripted
(just kill -s SIGTERM pid).  The problem with the existing shutdown logic which
this jira is meant to address is that it simply brings the broker down, which
takes partitions offline abruptly. This can lead to message loss during rolling
bounces - e.g., consider a rolling bounce of brokers B1, B2. If a producer is
configured with num-request-acks -1, and T1-P1 is on (B1, B2) and B2 is leader.
If B1 is bounced, it may fall out of the ISR for T1-P1. B2 continues to receive
messages to T1-P1 and commits them immediately. After B1 comes back up, if B2
is bounced, then B1 would become the new leader, but its log would be truncated
to the last HW which is smaller than the committed (and ack'd) offsets on B2.

The original proposal (to wait until there is at least one other broker in ISR
before shutting down) is also not without issues. It could lead to long waits
during clean shutdown. Furthermore, it involves moving all the leaders on the
broker that is being shutdown to other brokers. Partitions that are moved later
will experience longer outages than the partitions earlier in the list. In
addition to avoiding message loss, another goal in clean shutdown should be to
keep partitions as available as possible. Jun suggested another possible
approach: before shutting down a broker, pre-emptively relinquish and move the
leadership of partitions for which it is currently leader to another broker in
ISR. One benefit of this approach is that leader transition can be done
serially, one partition at a time. So each partition will only be unavailable
for the period it takes to establish a new leader.

The actual process to shut down broker B may look something like this. (This
assumes the command is issued to the current controller but can be adapted for
the other options outlined at the end.)

- For each partition T-P that B is leader for:
  - Move leadership to another broker that is in ISR. (So if |ISR| == 1
    - meaning only B - and replication factor >= 2 then we have to wait.)
  - Issue the leaderAndISR request to the updated list of brokers in
    leaderAndISR.

- For all partitions for which Bx is a follower:
  - Issue a "stopFetcher" for T-P. (Explicitly stopping the fetchers helps
    reduce the possibility of timeouts on producer requests, since not doing so
    would cause B to remain in ISR longer.)
  - Issue a leaderAndISR request to the updated list of brokers in
    leaderAndISR.

Here are a few options, for the actual clean-shutdown invocation. These are not
mutually exclusive, so we can pick one for now and implement the others later
if they make sense.

- Option 1: JMX-operation on the Kafka-controller to clean shut-down a
  specified broker. (If the controller fails during its execution, then the
  operation will simply fail.) This could also be a JMX-operation on individual
  brokers, but that would require individual brokers to forward the
  "relinquishLeadership" request to the controller.
- Option 2: The broker's shutdown handler sends a "relinquishLeadership"
  request to the controller and awaits a response from the controller. See note
  below on the effect of a failure in this step.
- Option 3: Expose a new command port to issue a "cleanShutdown brokerid"
  command to the controller. As above, this could be on individual brokers as
  well. The command port would also be useful for other commands such as "print
  broker config", "dump stats", etc.

The main disadvantage with Option 2 is that if the leader transition fails then
you cannot really retry - well, you could wait/retry until the transition is
successful, but then you would need to introduce a timeout. Introducing a
timeout has its own issues - e.g., what is a reasonable timeout; also you would
be forced to wait for the timeout during a full cluster shutdown. With Options
1/3 the operation would just fail. (And if you *know* that you are doing a full
cluster shutdown you can go ahead and use the SIGTERM-based shutdown.)

Typical scenarios:
- Rolling bounce. After a broker is bounced and comes back up it can take some
  time for it to return to ISR (if it fell out). In this case, the
  relinquishLeadership operation of the next broker in the bounce sequence may
  fail but would succeed after the preceding broker comes back up and returns
  to ISR.
- Full cluster shutdown. In this case, relinquishLeadership would likely fail
  for brokers that come later in the shutdown sequence. However, the user would
  know this is a full cluster shutdown and just use the usual SIGTERM-based
  shutdown.
- Replication factor of 1: similar to full cluster shutdown.

                
> ensure ISR has enough brokers during clean shutdown of a broker
> ---------------------------------------------------------------
>
>                 Key: KAFKA-340
>                 URL: https://issues.apache.org/jira/browse/KAFKA-340
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Joel Koshy
>            Priority: Blocker
>              Labels: bugs
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> If we are shutting down a broker when the ISR of a partition includes only 
> that broker, we could lose some messages that have been previously committed. 
> For clean shutdown, we need to guarantee that there is at least 1 other 
> broker in ISR after the broker is shut down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-340) ensure ISR has enough brokers during clean shutdown of a broker

Reply via email to