Repository: trafficserver Updated Branches: refs/heads/master e869da69d -> 47eeaf34c
docs: add initial content to the performance tuning section Project: http://git-wip-us.apache.org/repos/asf/trafficserver/repo Commit: http://git-wip-us.apache.org/repos/asf/trafficserver/commit/47eeaf34 Tree: http://git-wip-us.apache.org/repos/asf/trafficserver/tree/47eeaf34 Diff: http://git-wip-us.apache.org/repos/asf/trafficserver/diff/47eeaf34 Branch: refs/heads/master Commit: 47eeaf34cdba8dccf350f59339699aab50e5e814 Parents: e869da6 Author: Jon Sime <[email protected]> Authored: Mon Dec 15 09:48:29 2014 -0800 Committer: James Peach <[email protected]> Committed: Mon Dec 15 16:27:53 2014 -0800 ---------------------------------------------------------------------- doc/admin/performance-tuning.en.rst | 278 ++++++++++++++++++++++++++++--- 1 file changed, 259 insertions(+), 19 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/trafficserver/blob/47eeaf34/doc/admin/performance-tuning.en.rst ---------------------------------------------------------------------- diff --git a/doc/admin/performance-tuning.en.rst b/doc/admin/performance-tuning.en.rst index 5486e90..616f299 100644 --- a/doc/admin/performance-tuning.en.rst +++ b/doc/admin/performance-tuning.en.rst @@ -1,8 +1,3 @@ -.. _performance-tuning: - -Performance Tuning -****************** - .. Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information @@ -20,33 +15,278 @@ Performance Tuning specific language governing permissions and limitations under the License. +.. include:: common.defs + +.. _performance-tuning: + +Performance Tuning +****************** + +|ATS| in its default configuration should perform suitably for running the +included regression test suite, but will need special attention to both its own +configuration and the environment in which it runs to perform optimally for +production usage. + +There are numerous options and strategies for tuning the performance of |TS| +and we attempt to document as many of them as possible in the sections below. +Because |TS| offers enough flexibility to be useful for many caching and +proxying scenarios, which tuning strategies will be most effective for any +given use case may differ, as well as the specific values for various +configuration options. + .. toctree:: :maxdepth: 2 -Before you start +Before You Start ================ -There is no single option to that will guarantee maximum performance of -Apache Traffic Server in every use case. There are however numerous options -that help tune its performance under different loads and in its - often -vastly different - use cases. +One of the most important aspects of any attempt to optimize the performance +of a |TS| installation is the ability to measure that installation's +performance; both prior to and after any changes are made. To that end, it is +strongly recommended that you establish some means to monitor and record a +variety of performance metrics: request and response speed, latency, and +throughput; memory and CPU utilization; and storage I/O operations. + +Attempts to tune a system without being able to compare the impact of changes +made will at best result in haphazard, *feel good* results that may end up +having no real world impact on your customers' experiences, and at worst may +even result in lower performance than before you started. Additionally, in the +all too common situation of budget constraints, having proper measurements of +existing performance will greatly ease the process of focusing on those +individual components that, should they require hardware expenditures or larger +investments of employee time, have the highest potential gains relative to +their cost. Building Traffic Server ======================= -A lot of speed can be gained or lost depending on the way ATS is built. +While the default compilation settings for |TS| will produce a set of binaries +capable of serving most caching and proxying needs, there are some build +options worth considering in specific environments. + +.. TODO:: + + - any reasons why someone wouldn't want to just go with distro packages? + (other than "distro doesn't package versions i want") + - list relevant build options, impact each can potentially have + +Hardware Tuning +=============== + +As with any other server software, efficient allocation of hardware resources +will have a significant impact on |TS| performance. + +CPU Selection +------------- + +|ATS| uses a hybrid event-driven engine and multi-threaded processing model for +handling incoming requests. As such, it is highly scalable and makes efficient +use of modern, multicore processor architectures. + +.. TODO:: + + any benchmarks showing relative req/s improvements between 1 core, 2 core, + N core? diminishing rate of return? can't be totally linear, but maybe it + doesn't realistically drop off within the currently available options (i.e. + the curve holds up pretty well all the way through current four socket xeon + 8 core systems, so given a lack of monetary constraint, adding more cores + is a surefire performance improvement (up to the bandwidth limits), or does + it fall off earlier, or can any modern 4 core saturate a 10G network link + given fast enough disks?) + +Memory Allocation +----------------- + +Though |TS| stores cached content within an on-disk host database, the entire +:ref:`cache-directory` is always maintained in memory during server operation. +Additionally, most operating systems will maintain disk caches within system +memory. It is also possible, and commonly advisable, to maintain an in-memory +cache of frequently accessed content. + +The memory footprint of the |TS| process is largely fixed at the time of server +startup. Your |TS| systems will need at least enough memory to satisfy basic +operating system requirements, as well as capacity for the cache directory, and +any memory cache you wish to use. The default settings allocate roughly 10 +megabytes of RAM cache for every gigabyte of disk cache storage, though this +setting can be adjusted manually in :file:`records.config` using the setting +:ts:cv:`proxy.config.cache.ram_cache.size`. |TS| will, under the default +configuration, adjust this automatically if your system does not have enough +physical memory to accomodate the aforementioned target. + +Aside from the cost of physical memory, and necessary supporting hardware to +make use of large amounts of RAM, there is little downside to increasing the +memory allocation of your cache servers. You will see, however, no benefit from +sizing your memory allocation larger than the sum of your content (and index +overhead). + +Disk Storage +------------ + +Except in cases where your entire cache may fit into system memory, your cache +nodes will eventually need to interact with their disks. While a more detailed +discussion of storage stratification is covered in `Cache Partitioning`_ below, +very briefly you may be able to realize gains in performance by separating +more frequently accessed content onto faster disks (PCIe SSDs, for instance) +while maintaining the bulk of your on-disk cache objects, which may not receive +the same high volume of requests, on lower-cost mechanical drives. + + + +Operating System Tuning +======================== + +|ATS| is supported on a variety of operating systems, and as a result the tuning +strategies available at the OS level will vary depending upon your chosen +platform. + +General Recommendations +----------------------- + +TCP Keep Alive +~~~~~~~~~~~~~~ + +TCP Congestion Control Settings +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Ephemeral and Reserved Ports +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Jumbo Frames +~~~~~~~~~~~~ + +.. TODO:: would they be useful/harmful/neutral for anything other than local forward/transparent proxies? + +Linux +----- -Tuning the Machine -================== +FreeBSD +------- -Operating Systems Options -========================= +OmniOS / illumos +---------------- -Optimal Use of Memory +Mac OS X +-------- + +Traffic Server Tuning ===================== -Tuning different Thread types +|TS| itself, of course, has many options you may want to consider adjusting to +achieve optimal performance in your environment. Many of these settings are +recorded in :file:`records.config` and may be adjusted with the +:option:`traffic_line -s` command line utility while the server is operating. + +CPU and Thread Optimization +--------------------------- + +Thread Scaling +~~~~~~~~~~~~~~ + +By default, |TS| creates 1.5 threads per CPU core on the host system. This may +be adjusted with the following settings in :file:`records.config`: + +* :ts:cv:`proxy.config.exec_thread.autoconfig` +* :ts:cv:`proxy.config.exec_thread.autoconfig.scale` +* :ts:cv:`proxy.config.exec_thread.limit` + +Thread Affinity +~~~~~~~~~~~~~~~ + +On multi-socket servers, such as Intel architectures with NUMA, you can adjust +the thread affinity configuration to take advantage of cache pipelines and +faster memory access, as well as preventing possibly costly thread migrations +across sockets. This is adjusted with :ts:cv:`proxy.config.exec_thread.affinity` +in :file:`records.config`. :: + + CONFIG proxy.config.exec_thread.affinity INT 1 + +Thread Stack Size +~~~~~~~~~~~~~~~~~ + +:ts:cv:`proxy.config.thread.default.stacksize` + +.. TODO:: + + is there ever a need to fiddle with this, outside of possibly custom developed plugins? + +Polling Timeout +~~~~~~~~~~~~~~~ + +If you are experiencing unusually or unacceptably high CPU utilization during +idle workloads, you may consider adjusting the polling timeout with +:ts:cv:`proxy.config.net.poll_timeout`:: + + CONFIG proxy.config.net.poll_timeout INT 60 + +Memory Optimization +------------------- + +:ts:cv:`proxy.config.thread.default.stacksize` +:ts:cv:`proxy.config.cache.ram_cache.size` + + +Disk Storage Optimization +------------------------- + +:ts:cv:`proxy.config.cache.force_sector_size` +:ts:cv:`proxy.config.cache.max_doc_size` +:ts:cv:`proxy.config.cache.target_fragment_size` + +Cache Partitioning +~~~~~~~~~~~~~~~~~~ + +Network Tuning +-------------- + +:ts:cv:`proxy.config.net.connections_throttle` + +Error responses from origins are conistent and costly +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If error responses are costly for your origin server to generate, you may elect +to have |TS| cache these responses for a period of time. The default behavior is +to consider all of these responses to be uncacheable, which will lead to every +client request to result in an origin request. + +This behavior is controlled by both enabling the feature via +:ts:cv:`proxy.config.http.negative_caching_enabled` and setting the cache time +(in seconds) with :ts:cv:`proxy.config.http.negative_caching_lifetime`. :: + + CONFIG proxy.config.http.negative_caching_enabled INT 1 + CONFIG proxy.config.http.negative_caching_lifetime INT 10 + +SSL-Specific Options +~~~~~~~~~~~~~~~~~~~~ + +:ts:cv:`proxy.config.ssl.max_record_size` +:ts:cv:`proxy.config.ssl.session_cache` +:ts:cv:`proxy.config.ssl.session_cache.size` + +Thread Types +------------ + +Logging Configuration +--------------------- + +.. TODO:: + + binary vs. ascii output + multiple log formats (netscape+squid+custom vs. just custom) + overhead to log collation + using direct writes vs. syslog target + +Plugin Tuning +============= + +Common Scenarios and Pitfalls ============================= -Tuning Plugin Execution -======================= +While environments vary widely and |TS| is useful in a great number of different +situations, there are at least some recurring elements that may be used as +shortcuts to identifying problem areas, or realizing easier performance gains. + +.. TODO:: + + - origins not sending proper expiration headers (can fix at the origin (preferable) or use proxy.config.http.cache.heuristic_(min|max)_lifetime as hacky bandaids) + - cookies and http_auth prevent caching + - avoid thundering herd with read-while-writer (link to section in http-proxy-caching)
