[jira] [Commented] (KAFKA-3924) Data loss due to halting when LEO is larger than leader's LEO

Alexey Ozeritskiy (JIRA) Sat, 13 Aug 2016 03:59:42 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419873#comment-15419873
 ]


Alexey Ozeritskiy commented on KAFKA-3924:
------------------------------------------

I've got the deadlock with that patch. Stack traces:
{code}
"ReplicaFetcherThread-3-2" #112 prio=5 os_prio=0 tid=0x00007f0acc100000 
nid=0xfd54f in Object.wait() [0x00007f0b141d7000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Thread.join(Thread.java:1245)
        - locked <0x00000003d8269bc8> (a kafka.Kafka$$anon$1)
        at java.lang.Thread.join(Thread.java:1319)
        at 
java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:106)
        at 
java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46)
        at java.lang.Shutdown.runHooks(Shutdown.java:123)
        at java.lang.Shutdown.sequence(Shutdown.java:167)
        at java.lang.Shutdown.exit(Shutdown.java:212)
        - locked <0x00000003d8106e88> (a java.lang.Class for java.lang.Shutdown)
        at java.lang.Runtime.exit(Runtime.java:109)
        at java.lang.System.exit(System.java:971)
        at 
kafka.server.ReplicaFetcherThread.handleOffsetOutOfRange(ReplicaFetcherThread.scala:179)
{code}

{code}
"Thread-2" #29 prio=5 os_prio=0 tid=0x00007f0a70008000 nid=0xfecbf in 
Object.wait() [0x00007f0b166e5000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Thread.join(Thread.java:1245)
        - locked <0x00000003e5c46960> (a java.lang.Thread)
        at java.lang.Thread.join(Thread.java:1319)
        at 
kafka.server.KafkaRequestHandlerPool$$anonfun$shutdown$3.apply(KafkaRequestHandler.scala:92)
        at 
kafka.server.KafkaRequestHandlerPool$$anonfun$shutdown$3.apply(KafkaRequestHandler.scala:91)
        at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
        at 
kafka.server.KafkaRequestHandlerPool.shutdown(KafkaRequestHandler.scala:91)
        at 
kafka.server.KafkaServer$$anonfun$shutdown$3.apply$mcV$sp(KafkaServer.scala:559)
        at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:79)
        at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
        at kafka.utils.CoreUtils$.swallowWarn(CoreUtils.scala:51)
        at kafka.utils.Logging$class.swallow(Logging.scala:94)
        at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:51)
        at kafka.server.KafkaServer.shutdown(KafkaServer.scala:559)
        at 
kafka.server.KafkaServerStartable.shutdown(KafkaServerStartable.scala:49)
        at kafka.Kafka$$anon$1.run(Kafka.scala:63)
{code}

System.exit executes hook in Thread 2 and joins it (first trace). Thread 2 
joins ReplicaFetcherThread-3-2 (second trace). So they are waiting each other 
forever.

> Data loss due to halting when LEO is larger than leader's LEO
> -------------------------------------------------------------
>
>                 Key: KAFKA-3924
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3924
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.10.0.0
>            Reporter: Maysam Yabandeh
>             Fix For: 0.10.0.1
>
>
> Currently the follower broker panics when its LEO is larger than its leader's 
> LEO,  and assuming that this is an impossible state to reach, halts the 
> process to prevent any further damage.
> {code}
>     if (leaderEndOffset < replica.logEndOffset.messageOffset) {
>       // Prior to truncating the follower's log, ensure that doing so is not 
> disallowed by the configuration for unclean leader election.
>       // This situation could only happen if the unclean election 
> configuration for a topic changes while a replica is down. Otherwise,
>       // we should never encounter this situation since a non-ISR leader 
> cannot be elected if disallowed by the broker configuration.
>       if (!LogConfig.fromProps(brokerConfig.originals, 
> AdminUtils.fetchEntityConfig(replicaMgr.zkUtils,
>         ConfigType.Topic, 
> topicAndPartition.topic)).uncleanLeaderElectionEnable) {
>         // Log a fatal error and shutdown the broker to ensure that data loss 
> does not unexpectedly occur.
>         fatal("...")
>         Runtime.getRuntime.halt(1)
>       }
> {code}
> Firstly this assumption is invalid and there are legitimate cases (examples 
> below) that this case could actually occur. Secondly halt results into the 
> broker losing its un-flushed data, and if multiple brokers halt 
> simultaneously there is a chance that both leader and followers of a 
> partition are among the halted brokers, which would result into permanent 
> data loss.
> Given that this is a legit case, we suggest to replace it with a graceful 
> shutdown to avoid propagating data loss to the entire cluster.
> Details:
> One legit case that this could actually occur is when a troubled broker 
> shrinks its partitions right before crashing (KAFKA-3410 and KAFKA-3861). In 
> this case the broker has lost some data but the controller cannot still 
> elects the others as the leader. If the crashed broker comes back up, the 
> controller elects it as the leader, and as a result all other brokers who are 
> now following it halt since they have LEOs larger than that of shrunk topics 
> in the restarted broker.  We actually had a case that bringing up a crashed 
> broker simultaneously took down the entire cluster and as explained above 
> this could result into data loss.
> The other legit case is when multiple brokers ungracefully shutdown at the 
> same time. In this case both of the leader and the followers lose their 
> un-flushed data but one of them has HW larger than the other. Controller 
> elects the one who comes back up sooner as the leader and if its LEO is less 
> than its future follower, the follower will halt (and probably lose more 
> data). Simultaneous ungrateful shutdown could happen due to hardware issue 
> (e.g., rack power failure), operator errors, or software issue (e.g., the 
> case above that is further explained in KAFKA-3410 and KAFKA-3861 and causes 
> simultaneous halts in multiple brokers)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-3924) Data loss due to halting when LEO is larger than leader's LEO

Reply via email to