[GitHub] storm pull request #2241: STORM-2306 : Messaging subsystem redesign.

HeartSaVioR Wed, 20 Dec 2017 06:50:35 -0800

Github user HeartSaVioR commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2241#discussion_r158009771
  
    --- Diff: docs/Performance.md ---
    @@ -0,0 +1,128 @@
    +---
    +title: Performance Tuning
    +layout: documentation
    +documentation: true
    +---
    +
    +Latency, throughput and CPU consumption are the three key dimensions 
involved in performance tuning.
    +In the following sections we discuss the settings that can used to tune 
along these dimension and understand the trade-offs.
    +
    +It is important to understand that these settings can vary depending on 
the topology, the type of hardware and the number of hosts used by the topology.
    +
    +## 1. Batch Size
    +Spouts and Bolts communicate with each other via concurrent message 
queues. The batch size determines the number of messages to be buffered before
    +the producer (spout/bolt) attempts to actually write to the downstream 
component's message queue. Inserting messages in batches to downstream
    +queues helps reduce the number of synchronization operations required for 
the inserts. Consequently this helps achieve higher throughput. However,
    +sometimes it may take a little time for the buffer to fill up, before it 
is flushed into the downstream queue. This implies that the buffered messages
    +will take longer to become visible to the downstream consumer who is 
waiting to process them. This can increase the average end-to-end latency for
    +these messages. The latency can get very bad if the batch sizes are large 
and the topology is not experiencing high traffic.
    +
    +`topology.producer.batch.size` : The batch size for writes into the 
receive queue of any spout/bolt is controlled via this setting. This setting
    +impacts the communication within a worker process. Each upstream producer 
maintains a separate batch to a component's receive queue. So if two spout
    +instances are writing to the same downstream bolt instance, each of the 
spout instances will have maintain a separate batch.
    +
    +`topology.transfer.batch.size` : Messages that are destined to a 
spout/bolt running on a different worker process, are sent to a queue called
    +the **Worker Transfer Queue**. The Worker Transfer Thread is responsible 
for draining the messages in this queue and send them to the appropriate
    +worker process over the network. This setting controls the batch size for 
writes into the Worker Transfer Queue.  This impacts the communication
    +between worker processes.
    +
    +#### Guidance
    +
    +**For Low latency:** Set batch size to 1. This basically disables 
batching. This is likely to reduce peak sustainable throughput under heavy 
traffic, but
    +not likely to impact throughput much under low/medium traffic situations.
    +**For High throughput:** Set batch size > 1. Try values like 10, 100, 1000 
or even higher and see what yields the best throughput for the topology.
    +Beyond a certain point the throughput is likely to get worse.
    +**Varying throughput:** Topologies often experience fluctuating amounts of 
incoming traffic over the day. Other topos may experience higher traffic in some
    +paths and lower throughput in other paths simultaneously. If latency is 
not a concern, a small bach size (e.g. 10) and in conjunction with the right 
flush
    +frequency may provide a reasonable compromise for such scenarios. For 
meeting stricter latency SLAs, consider setting it to 1.
    +
    +
    +## 2. Flush Tuple Frequency
    +In low/medium traffic situations or when batch size is too large, the 
batches may take too long to fill up and consequently the messages could take 
unacceptably
    +long time to become visible to downstream components. In such case, 
periodic flushing of batches is necessary to keep the messages moving and avoid 
compromising
    +latencies when batching is enabled.
    +
    +When batching has been enabled, special messages called *flush tuples* are 
inserted periodically into the receive queues of all spout and bolt instances.
    +This causes each spout/bolt instance to flush all its outstanding batches 
to their respective downstream components.
    +
    +`topology.flush.tuple.freq.millis` : This setting controls how often the 
flush tuples are generated. Flush tuples are not generated if this 
configuration is
    +set to 0 or if (`topology.producer.batch.size`=1 and 
`topology.transfer.batch.size`=1).
    +
    +
    +#### Guidance
    +Flushing interval can be used as tool to retain the higher throughput 
benefits of batching and avoid batched messages getting stuck for too long 
waiting for their.
    +batch to fill. Preferably this value should be larger than the average 
execute latencies of the bolts in the topology. Trying to flush the queues more 
frequently than
    +the amount of time it takes to produce the messages may hurt performance. 
Understanding the average execute latencies of each bolt will help determine 
the average
    +number of messages in the queues between two flushes.
    +
    +**For Low latency:** A smaller value helps achieve tighter latency SLAs.
    +**For High throughput:**  When trying to maximize throughput under high 
traffic situations, the batches are likely to get filled and flushed 
automatically.
    +To optimize for such cases, this value can be set to a higher number.
    +**Varying throughput:** If latency is not a concern, a larger value will 
optimize for high traffic situations. For meeting tighter SLAs set this to lower
    +values.
    +
    +
    +## 3. Wait Strategy
    +Wait strategies are used to conserve CPU usage by trading off some latency 
and throughput. They are applied for the following situations:
    +
    +3.1 **Spout Wait:**  In low/no traffic situations, Spout's nextTuple() may 
not produce any new emits. To prevent invoking the Spout's nextTuple,
    +this wait strategy is used between nextTuple() calls to allow the spout's 
executor thread to idle and conserve CPU. Select a strategy using 
`topology.spout.wait.strategy`.
    +
    +3.2 **Bolt Wait:** : When a bolt polls it's receive queue for new messages 
to process, it is possible that the queue is empty. This typically happens
    +in case of low/no traffic situations or when the upstream spout/bolt is 
inherently slower. This wait strategy is used in such cases. It avoids high CPU 
usage
    +due to the bolt continuously checking on a typically empty queue. Select a 
strategy using `topology.bolt.wait.strategy`. The chosen strategy can be 
further configured
    +using the `topology.bolt.wait.*` settings.
    +
    +3.3 **Backpressure Wait** : Select a strategy using 
`topology.backpressure.wait.strategy`. When a spout/bolt tries to write to a 
downstream component's receive queue,
    +there is a possibility that the queue is full. In such cases the write 
needs to be retried. This wait strategy is used to induce some idling 
in-between re-attempts for
    +conserving CPU. The chosen strategy can be further configured using the 
`topology.backpressure.wait.*` settings.
    +
    +
    +#### Built-in wait strategies:
    +
    +- **SleepSpoutWaitStrategy** : This is the only built-in strategy 
available for Spout Wait. It cannot be applied to other Wait situations. It is 
a simple static strategy that
    +calls Thread.sleep() each time. Set `topology.spout.wait.strategy` to 
`org.apache.storm.spout.SleepSpoutWaitStrategy` for using this. 
`topology.sleep.spout.wait.strategy.time.ms`
    +configures the sleep time.
    +
    +- **ProgressiveWaitStrategy** : This strategy can be used for Bolt Wait or 
Backpressure Wait situations. Set the strategy to 
'org.apache.storm.policy.WaitStrategyProgressive' to
    +select this wait strategy. This is a dynamic wait strategy that enters 
into progressively deeper states of CPU conservation if the Backpressure Wait 
or Bolt Wait situations persist.
    +It has 3 levels of idling and allows configuring how long to stay at each 
level :
    +
    +  1. No Waiting - The first few times it will return immediately. This 
does not conserve any CPU. The number of times it remains in this state is 
configured using
    +  `topology.bolt.wait.progressive.level1.count` or 
`topology.backpressure.wait.progressive.level1.count` depending which situation 
it is being used.
    +
    +  2. Park Nanos - In this state it disables the current thread for thread 
scheduling purposes, for 1 nano second using LockSupport.parkNanos(). This puts 
the CPU in a minimal
    +  conservation state. It remains in this state for 
`topology.bolt.wait.progressive.level2.count` or 
`topology.backpressure.wait.progressive.level2.count` iterations.
    +
    +  3. Thread.sleep() - In this state it calls Thread.sleep() with the value 
specified in `topology.backpressure.wait.progressive.level3.sleep.millis` or in
    +  `topology.bolt.wait.progressive.level3.sleep.millis` based on the Wait 
situation it is used in. This is the most CPU conserving level it remains in 
this level for
    +  the remaining iterations.
    +
    +
    +- **ParkWaitStrategy** : This strategy can be used for Bolt Wait or 
Backpressure Wait situations. Set the strategy to 
`org.apache.storm.policy.WaitStrategyPark` to use this.
    +This strategy disables the current thread for thread scheduling purposes 
by calling LockSupport.parkNanos(). The amount of park time is configured using 
either
    +`topology.bolt.wait.park.microsec` or 
`topology.backpressure.wait.park.microsec` based on the wait situation it is 
used. Setting the park time to 0, effectively disables
    +invocation of LockSupport.parkNanos and this mode can be used to achieve 
busy polling (which at the cost of high CPU utilization even when idle, may 
improve latency and/or throughput).
    +
    +
    +## Max.spout.pending
    +The back pressure mechanism no longer requires 
`topology.max.spout.pending`. It is recommend to set this to null (default).
    --- End diff --
    
    According to the report posted in issue, setting this option still shows 
better result and hence the tests in the report utilizes the option. Do we want 
to guide it as well, or still be better to recommend users to set this to null?

---

[GitHub] storm pull request #2241: STORM-2306 : Messaging subsystem redesign.

Reply via email to