GitHub user ZoneMayor reopened a pull request:

    https://github.com/apache/kafka/pull/648

    KAFKA-2837: fix transient failure of kafka.api.ProducerBounceTest > 
testBrokerFailure

    I can reproduced this transient failure, it seldom happen;
    code is like below:
     // rolling bounce brokers
        for (i <- 0 until numServers) {
          for (server <- servers) {
            server.shutdown()
            server.awaitShutdown()
            server.startup()
            Thread.sleep(2000)
          }
    
          // Make sure the producer do not see any exception
          // in returned metadata due to broker failures
          assertTrue(scheduler.failed == false)
    
          // Make sure the leader still exists after bouncing brokers
          (0 until numPartitions).foreach(partition => 
TestUtils.waitUntilLeaderIsElectedOrChanged(zkUtils, topic1, partition))
    Brokers keep rolling restart, and producer keep sending messages;
    In every loop, it will wait for election of partition leader;
    But if the election is slow, more messages will be buffered in 
RecordAccumulator's BufferPool;
    The limit for buffer is set to be 30000;
    TimeoutException("Failed to allocate memory within the configured max 
blocking time") will show up when out of memory;
    Since for every restart of the broker, it will sleep for 2000 ms,  so this 
transient failure seldom happen;
    But if I reduce the sleeping period, the bigger chance failure happens; 
    for example if the broker with role of controller suffered a restart, it 
will take time to select controller first, then select leader, which will lead 
to more messges blocked in KafkaProducer:RecordAccumulator:BufferPool;
    In this fix, I just enlarge the producer's buffer size to be 1MB;
    @guozhangwang , Could you give some comments?

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ZoneMayor/kafka trunk-KAFKA-2837

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/648.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #648
    
----
commit 95374147a28208d4850f6e73f714bf418935fc2d
Author: ZoneMayor <jinxing6...@126.com>
Date:   2015-11-27T03:49:34Z

    Merge pull request #1 from apache/trunk
    
    merge

commit cec5b48b651a7efd3900cfa3c1fd0ab1eeeaa3ec
Author: ZoneMayor <jinxing6...@126.com>
Date:   2015-12-01T10:44:02Z

    Merge pull request #2 from apache/trunk
    
    2015-12-1

commit a119d547bf1741625ce0627073c7909992a20f15
Author: ZoneMayor <jinxing6...@126.com>
Date:   2015-12-04T13:42:27Z

    Merge pull request #3 from apache/trunk
    
    2015-12-04#KAFKA-2893

commit b767a8dff85fc71c75d4cf5178c3f6f03ff81bfc
Author: ZoneMayor <jinxing6...@126.com>
Date:   2015-12-09T10:42:30Z

    Merge pull request #5 from apache/trunk
    
    2015-12-9

commit cd5e6f4700a4387f9383b84aca0ee9c4639b1033
Author: jinxing <jinx...@fenbi.com>
Date:   2015-12-09T13:49:07Z

    KAFKA-2837: fix transient failure kafka.api.ProducerBounceTest > 
testBrokerFailure

commit 8ded9104a04861f789a7a990c2ddd4fc38a899cd
Author: ZoneMayor <jinxing6...@126.com>
Date:   2015-12-10T04:47:06Z

    Merge pull request #6 from apache/trunk
    
    2015-12-10

commit 2bcf010c73923bb24bbd9cece7e39983b2bdce0c
Author: jinxing <jinx...@fenbi.com>
Date:   2015-12-10T04:47:39Z

    KAFKA-2837: WIP

commit dae4a3cc0b564bb25121d54e65b5ad363c3e866d
Author: jinxing <jinx...@fenbi.com>
Date:   2015-12-10T04:48:21Z

    Merge branch 'trunk-KAFKA-2837' of https://github.com/ZoneMayor/kafka into 
trunk-KAFKA-2837

commit 7118e11813e445bca3eab65a23028e76138b136a
Author: jinxing <jinx...@fenbi.com>
Date:   2015-12-10T04:51:43Z

    KAFKA-2837: WIP

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to