GitHub user moesol opened a pull request:

    https://github.com/apache/storm/pull/1391

    (STORM-1674) Idle KafkaSpout consumes more bandwidth than needed

    * Allows minBytes in fetch request to be configured
      from KafkaConfig.fetchMinBytes.
    * Defaults new configuration KafkaConfig.fetchMinBytes to 1.
    * Defaults fetchMaxWait to 100ms instead of 10000ms.
    
    Discovered 30 megabits of traffic flowing between a set of KafkaSpouts
    and our kafka servers even though no Kafka messages were moving.
    Using the wireshark kafka dissector, we were able to see that
    each FetchRequest had maxWait set to 10000
    and minBytes set to 0. When binBytes is set to 0 the kafka server
    responds immediately when there are no messages. In turn the KafkaSpout
    polls without any delay causing a constant stream of FetchRequest/
    FetchResponse messages. Using a non-KafkaSpout client had a similar
    traffic pattern with two key differences
    1) minBytes was 1
    2) maxWait was 100
    With these FetchRequest parameters and no messages flowing,
    the kafka server delays the FetchResponse by 100 ms. This reduces
    the network traffic from megabits to the low kilobits. It also
    reduced the CPU utilization of our kafka server from 140% to 2%.
    
    Hopefully the risk of this change is low because
    the old behavior can be restored using the API by setting
    KafkaConfig.fetchMaxWait to 10000
    KafkaConfig.fetchMinBytes to 0

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MoebiusSolutions/storm 0.10.x-branch

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/1391.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1391
    
----
commit ea1d3189c8113cc2888366fa5b24579776279886
Author: Robert Hastings <rhasti...@moesol.com>
Date:   2016-03-31T23:14:47Z

    Addresses network flood from KafkaSpout to kafka server.
    
    * Allows minBytes in fetch request to be configured
      from KafkaConfig.fetchMinBytes.
    * Defaults new configuration KafkaConfig.fetchMinBytes to 1.
    * Defaults fetchMaxWait to 100ms instead of 10000ms.
    
    Discovered 30 megabits of traffic flowing between a set of KafkaSpouts
    and our kafka servers even though no Kafka messages were moving.
    Using the wireshark kafka dissector, we were able to see that
    each FetchRequest had maxWait set to 10000
    and minBytes set to 0. When binBytes is set to 0 the kafka server
    responds immediately when there are no messages. In turn the KafkaSpout
    polls without any delay causing a constant stream of FetchRequest/
    FetchResponse messages. Using a non-KafkaSpout client had a similar
    traffic pattern with two key differences
    1) minBytes was 1
    2) maxWait was 100
    With these FetchRequest parameters and no messages flowing,
    the kafka server delays the FetchResponse by 100 ms. This reduces
    the network traffic from megabits to the low kilobits. It also
    reduced the CPU utilization of our kafka server from 140% to 2%.
    
    Hopefully the risk of this change is low because
    the old behavior can be restored using the API by setting
    KafkaConfig.fetchMaxWait to 10000
    KafkaConfig.fetchMinBytes to 0

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to