Profiled serving a 500 byte file out of mod_mem_cache (CacheEnable mem /). This analysis covers one complete request starting at ap_read_request and ending with ap_graceful_stop_signalled.
I've hacked Apache 2 in several places... The most notable hack is that I remove ap_http_header_filter and send the pre-built headers straight out of the cache. This substantially reduced the time spent in apr_brigade_puts (and was good for a 4% or so performance boost). Can get a lot fancier here by making sure the buffer holding the headers is properly aligned on a page boundary. I also shortcircuit the age and freshness calculations in mod_cache (to save gettimeofday calls). I know we have performance problems with apr_time_now with potential fixes being discussed. Last hack is to do a call apr_setsocketopt(csd, APR_INCOMPLETE_READ, 1) immediately after the call to ap_graceful_stop_signaled to cause the next ap_read_request to go directly to select rather than calling read first. I think this is a good optimization because we just checked for a pipelined request (read returned EAGAIN) so no sense in calling read again. Will submit a patch for this piece later. Instructions = Cycles/Ratio I assume a large ratio is probably a bad thing and indicates, among other things, that pipeline flushes, pagefaults, etc, are occuring... I have a routine flow which should show transtions to the kernel (not sure if it will show pagefaults or not) that I have not spent a lot of time on. We spend a suprising amount of time in module log routines and logging is not enabled. Bill Space % Cycles ===== ==== ===== User 19.0 46645 Shared Library 48.6 119222 Kernel 32.4 79405 Total 245272 (4.22 per instruction) /usr/local/apachetest/bin/httpd : Subroutine Name Source File Ratio Enter % Cycles =============== =========== ===== ===== ==== ====== .core_input_filter core.c 3.15 11 1.5 3628 .ap_rgetline_core protocol.c 3.05 6 1.3 3241 .get_filter_handle util_filter.c 4.58 2 1.1 2705 .ap_find_token util.c 4.16 6 0.9 2306 .add_any_filter_handle util_filter.c 4.37 6 0.9 2090 .net_time_filter core.c 2.90 12 0.7 1713 ._ptrgl ptrgl.s 4.13 66 0.7 1637 .fix_hostname vhost.c 3.65 1 0.7 1626 .core_output_filter core.c 3.63 2 0.6 1469 .ap_update_child_status_from_indexes scoreboard.c 17.94 2 0.6 1435 .add_any_filter util_filter.c 5.46 1 0.6 1403 .ap_get_brigade util_filter.c 3.32 25 0.4 1079 .ap_set_keepalive http_protocol.c 3.50 2 0.4 1036 ._moveeq moveeq.s 4.72 14 0.4 1025 .ap_read_request protocol.c 6.32 1 0.3 834 .ap_getword_white util.c 3.09 2 0.3 784 .ap_get_mime_headers protocol.c 3.32 1 0.3 756 .ap_content_length_filter protocol.c 4.33 1 0.3 736 .ap_pass_brigade util_filter.c 4.82 5 0.2 584 .ap_recent_rfc822_date util_time.c 2.74 1 0.2 520 .log_error_core log.c 4.28 4 0.2 513 .read_request_line protocol.c 4.15 1 0.2 502 .apr_palloc glink.s 3.71 22 0.2 490 .apr_brigade_destroy glink.s 6.51 12 0.2 469 .isspace glink.s 3.45 22 0.2 455 .core_create_req core.c 7.57 1 0.2 439 .apr_table_get glink.s 4.50 16 0.2 432 .ap_run_log_transaction protocol.c 12.05 1 0.2 398 .ap_run_quick_handler config.c 13.96 1 0.2 391 .cached_explode util_time.c 5.61 1 0.1 365 .ap_run_create_request request.c 8.06 1 0.1 363 .ap_log_error log.c 3.82 4 0.1 336 .tolower glink.s 2.93 19 0.1 334 .ap_http_filter http_protocol.c 3.35 2 0.1 332 .ap_add_output_filters_by_type core.c 14.32 1 0.1 329 .apr_brigade_create glink.s 3.48 15 0.1 313 .__divi64 glink.s 12.72 4 0.1 305 .isalpha glink.s 2.64 19 0.1 301 .lookup_builtin_method http_protocol.c 10.91 1 0.1 295 .apr_setsocketopt glink.s 4.39 11 0.1 290 .ap_process_http_connection http_core.c 8.17 0 0.1 286 .writev_it_all core.c 6.07 1 0.1 273 .basic_http_header http_protocol.c 3.09 1 0.1 260 .http_create_request http_core.c 6.19 1 0.1 235 .strlen glink.s 9.76 4 0.1 234 .apr_brigade_split_line glink.s 6.39 6 0.1 230 .apr_table_make glink.s 6.28 6 0.1 226 .form_header_field http_protocol.c 2.73 2 0.1 213 .avoid_xlc_bug core.c 4.79 11 0.1 211 .ap_byterange_filter http_protocol.c 7.26 1 0.1 210 .ap_run_post_read_request protocol.c 5.88 1 0.1 200 .ap_discard_request_body http_protocol.c 2.82 1 0.1 200 .apr_pstrcatv glink.s 29.67 1 0.1 178 .strchr glink.s 7.39 4 0.1 177 .basic_http_header_check http_protocol.c 4.18 1 0.1 176 .check_pipeline_flush http_request.c 3.97 1 0.1 175 .ap_update_vhost_from_headers vhost.c 5.61 1 0.1 174 .memset glink.s 6.95 4 0.1 167 .strncasecmp glink.s 5.44 5 0.1 163 .create_empty_config config.c 7.75 1 0.1 163 .apr_brigade_puts glink.s 6.48 4 0.1 156 .ap_process_request http_request.c 5.68 1 0.1 153 .ap_graceful_stop_signalled worker.c 51.13 1 0.1 153 .apr_brigade_write glink.s 3.18 8 0.1 153 .ap_parse_uri protocol.c 3.40 1 0.1 150 .ap_set_byterange http_protocol.c 4.54 1 0.1 145 .ap_add_output_filter util_filter.c 17.40 1 0.1 139 .apr_pool_destroy glink.s 22.80 1 0.1 137 .ap_basic_http_header http_protocol.c 9.04 1 0.1 136 .apr_table_unset glink.s 10.73 2 0.1 129 .apr_sendv glink.s 20.00 1 0.0 120 .ap_update_child_status scoreboard.c 8.30 2 0.0 116 .ap_index_of_response http_protocol.c 6.09 1 0.0 116 .ap_get_limit_req_body core.c 15.43 1 0.0 108 .apr_array_make glink.s 17.97 1 0.0 108 .ap_get_server_version core.c 21.36 1 0.0 107 .remove_any_filter util_filter.c 3.44 2 0.0 107 .apr_table_mergen glink.s 8.72 2 0.0 105 .isdigit glink.s 8.20 2 0.0 98 .apr_table_overlap glink.s 16.00 1 0.0 96 .apr_pstrdup glink.s 15.73 1 0.0 94 .apr_table_addn glink.s 3.93 4 0.0 94 .ap_get_output_filter_handle util_filter.c 15.53 2 0.0 93 .apr_off_t_toa glink.s 15.53 1 0.0 93 .apr_uri_parse glink.s 15.33 1 0.0 92 .apr_brigade_partition glink.s 3.73 4 0.0 90 .apr_pool_create_ex glink.s 14.60 1 0.0 88 .apr_brigade_split glink.s 2.85 5 0.0 86 .ap_make_method_list http_protocol.c 3.25 1 0.0 81 .apr_parse_addr_port glink.s 13.10 1 0.0 79 .ap_set_content_length protocol.c 3.73 1 0.0 75 .ap_remove_output_filter util_filter.c 3.42 2 0.0 68 .ap_explode_recent_gmt util_time.c 17.05 1 0.0 68 .apr_psprintf glink.s 5.57 2 0.0 67 .ap_set_content_type http_protocol.c 2.91 1 0.0 61 .apr_bucket_flush_create glink.s 9.67 1 0.0 58 .apr_brigade_cleanup glink.s 9.17 1 0.0 55 .ap_create_request_config config.c 54.20 1 0.0 54 .ap_add_output_filter_handle util_filter.c 2.83 3 0.0 51 .ap_method_number_of http_protocol.c 2.40 1 0.0 48 .ap_finalize_request_protocol protocol.c 2.47 1 0.0 47 .apr_table_setn glink.s 2.39 3 0.0 43 .apr_time_now glink.s 6.23 1 0.0 37 .ap_add_input_filter_handle util_filter.c 2.83 2 0.0 34 .apr_pstrmemdup glink.s 5.67 1 0.0 34 .apr_bucket_eos_create glink.s 2.17 1 0.0 13 Shlib Subroutine Source File Ratio Enter % Cycles ================ =========== ===== ===== ==== ====== .memset memset.s 4.99 9 5.5 13580 .apr_table_get apr_tables.c 3.15 21 2.1 5030 .__is_wctype_std libc/__is_wctype_std.c 3.34 68 1.8 4536 .strcasecmp libaixinet/strcasecmp.c 3.47 12 1.8 4458 .apr_palloc apr_pools.c 1.97 104 1.5 3680 .apr_vformatter apr_snprintf.c 3.83 2 1.3 3231 .apr_table_setn apr_tables.c 2.97 13 1.2 2936 ._moveeq moveeq.s 3.83 11 1.1 2818 .apr_setsocketopt sockopt.c 3.53 13 1.1 2753 .__divu64 divu64.s 2.75 3 1.1 2675 .memchr libc/memchr.c 3.61 11 1.1 2668 .__divi64 divi64.s 2.71 5 1.0 2509 .strlen strlen.s 3.18 17 0.8 2077 .pthread_mutex_lock libpthreads/mutex.c 10.40 3 0.8 1841 .apr_bucket_alloc apr_buckets_alloc.c 2.72 28 0.7 1684 .apr_brigade_cleanup apr_brigade.c 2.74 17 0.7 1648 .apr_brigade_puts apr_brigade.c 2.32 5 0.6 1585 .isspace libc/isspace.c 2.84 23 0.5 1306 .apr_brigade_split_line apr_brigade.c 2.43 6 0.5 1297 .apr_brigade_write apr_brigade.c 2.22 11 0.5 1247 .apr_brigade_create apr_brigade.c 2.11 21 0.5 1199 .strncasecmp libaixinet/strcasecmp.c 2.74 6 0.5 1177 .cache_url_handler mod_cache.c 5.88 1 0.5 1147 .allocator_alloc apr_pools.c 4.47 5 0.5 1143 ._moveeq moveeq.s 4.70 16 0.5 1118 .apr_pool_cleanup_register apr_pools.c 2.17 22 0.4 1099 .apr_bucket_free apr_buckets_alloc.c 2.87 28 0.4 1080 ._ptrgl ptrgl.s 2.46 72 0.4 1063 ._ptrgl ptrgl.s 2.65 63 0.4 1000 .allocator_free apr_pools.c 4.23 5 0.4 960 .tolower libc/tolower.c 2.87 19 0.4 926 .apr_bucket_simple_copy apr_buckets_simple.c 3.39 14 0.4 902 .multi_log_transaction mod_log_config.c 28.55 1 0.4 885 .make_array_core apr_tables.c 2.40 12 0.3 790 .apr_pool_cleanup_kill apr_pools.c 4.05 12 0.3 786 .unserialize_table mod_mem_cache.c 5.17 4 0.3 781 .strchr strchr.s 2.90 5 0.3 778 .apr_array_push_noclear apr_tables.c 1.69 20 0.3 776 .isupper libc/isupper.c 1.94 19 0.3 739 .isalpha libc/isalpha.c 1.92 19 0.3 730 .apr_recv sendrecv.c 6.67 2 0.3 693 .heap_bucket_destroy apr_buckets_heap.c 2.49 17 0.3 644 .apr_uri_parse apr_uri.c 5.22 1 0.3 627 .match_headers mod_setenvif.c 16.49 1 0.3 627 .apr_table_mergen apr_tables.c 3.11 2 0.3 621 .find_entry cache_hash.c 4.42 1 0.3 619 .apr_table_overlap apr_tables.c 2.98 1 0.2 611 .pthread_mutex_unlock libpthreads/mutex.c 5.02 3 0.2 603 .ap_cache_get_cachetype cache_util.c 10.48 1 0.2 597 .overlap_hash apr_tables.c 3.50 4 0.2 588 .apr_palloc glink.s 2.62 37 0.2 583 .apr_bucket_simple_split apr_buckets_simple.c 1.82 10 0.2 564 .apr_table_unset apr_tables.c 2.72 2 0.2 564 .read libc/read.c 7.89 2 0.2 552 .spin_lock_global_ppc_up locks_ppc_up.s 16.71 3 0.2 551 .apr_brigade_destroy apr_brigade.c 2.68 12 0.2 547 .apr_bucket_alloc glink.s 3.21 28 0.2 539 .heap_bucket_read apr_buckets_heap.c 1.95 30 0.2 528 .open_entity mod_mem_cache.c 6.32 1 0.2 512 .apr_brigade_partition apr_brigade.c 3.31 4 0.2 476 .apr_table_make apr_tables.c 2.50 10 0.2 475 .islower libc/islower.c 5.89 4 0.2 471 .run_cleanups apr_pools.c 9.58 1 0.2 460 .apr_pool_cleanup_kill glink.s 6.06 12 0.2 437 .socket_bucket_read apr_buckets_socket.c 3.62 2 0.2 428 .pthread_mutex_lock glink.s 23.62 3 0.2 425 .cache_out_filter mod_cache.c 4.35 1 0.2 387 .apr_palloc glink.s 3.07 21 0.2 387 .apr_sendv sendrecv.c 8.37 1 0.2 385 .apr_pvsprintf apr_pools.c 3.91 2 0.2 383 .apr_pstrcat apr_strings.c 3.15 1 0.2 378 .conv_10 apr_snprintf.c 2.64 4 0.2 370 .apr_bucket_heap_make apr_buckets_heap.c 3.81 3 0.1 365 .apr_pool_cleanup_register glink.s 2.88 21 0.1 362 .apr_bucket_shared_split apr_buckets_refcount.c 2.00 10 0.1 360 .apr_wait_for_io_or_timeout sendrecv.c 6.40 1 0.1 352 .apr_bucket_shared_destroy glink.s 3.45 17 0.1 352 .cache_select_url cache_storage.c 4.76 1 0.1 343 .strcasecmp glink.s 5.13 11 0.1 339 .apr_table_addn apr_tables.c 2.14 4 0.1 333 .apr_pstrdup apr_strings.c 2.84 5 0.1 330 .apr_parse_addr_port sockaddr.c 5.30 1 0.1 323 .apr_bucket_shared_destroy apr_buckets_refcount.c 3.07 17 0.1 313 .kread glink.s 26.03 2 0.1 312 .pthread_mutex_unlock glink.s 16.82 3 0.1 303 .isdigit libc/isdigit.c 5.03 3 0.1 302 .apr_brigade_split apr_brigade.c 2.12 5 0.1 292 .apr_allocator_alloc glink.s 12.09 4 0.1 290 .spin_unlock_global_ppc_up locks_ppc_up.s 6.94 3 0.1 271 .apr_thread_mutex_lock thread_mutex.c 5.60 3 0.1 269 .decrement_refcount mod_mem_cache.c 10.18 1 0.1 265 .apr_pool_destroy apr_pools.c 4.97 1 0.1 263 .writev libc/write.c 13.13 1 0.1 263 ._ptrgl ptrgl.s 10.81 4 0.1 259 .strlen glink.s 5.28 8 0.1 254 .read_headers mod_mem_cache.c 3.96 1 0.1 253 .apr_bucket_simple_split glink.s 4.07 10 0.1 244 .apr_pstrcatv apr_strings.c 2.88 1 0.1 242 .read_real_time read_real_time.s 8.61 1 0.1 241 .__pthread_geterrno_addr libpthreads/lib_lock.c 4.91 4 0.1 236 .apr_pool_create_ex apr_pools.c 3.50 1 0.1 235 ._Errno libc/errno.c 3.99 3 0.1 228 .apr_bucket_shared_make glink.s 12.47 3 0.1 224 .read_body mod_mem_cache.c 5.32 1 0.1 223 .apr_atomic_dec apr_atomic.c 7.10 1 0.1 220 .apr_pstrmemdup glink.s 36.37 1 0.1 218 ._ptrgl ptrgl.s 8.62 4 0.1 207 .memchr glink.s 3.18 10 0.1 191 ._ptrgl ptrgl.s 6.33 5 0.1 190 .memcache_gdsf_algorithm mod_mem_cache.c 6.68 2 0.1 187 .apr_off_t_toa apr_strings.c 2.84 1 0.1 182 .apr_atomic_inc apr_atomic.c 6.14 1 0.1 178 .apr_pstrndup apr_strings.c 5.72 1 0.1 177 .cache_run_open_entity cache_storage.c 5.50 1 0.1 176 .ap_cache_liststr cache_util.c 4.14 2 0.1 174 .read glink.s 14.18 2 0.1 170 .apr_allocator_free glink.s 7.08 4 0.1 170 .gettimeofday libc/gettimeofday.c 3.96 1 0.1 166 .ap_cache_tokstr cache_util.c 3.66 1 0.1 165 .memcmp memcmp.s 4.41 1 0.1 159 .apr_thread_mutex_unlock thread_mutex.c 3.24 3 0.1 156 .apr_pstrmemdup apr_strings.c 3.33 2 0.1 153 .apr_table_setn glink.s 2.50 10 0.1 150 .memset glink.s 12.15 2 0.1 146 .apr_recv glink.s 11.87 2 0.1 142 .islower glink.s 5.60 4 0.1 134 .apr_thread_mutex_lock glink.s 11.12 2 0.1 133 .apr_time_now time.c 5.29 1 0.1 132 .ap_set_content_type glink.s 21.80 1 0.1 131 .__divi64 glink.s 21.70 1 0.1 130 .apr_vformatter glink.s 10.83 2 0.1 130 .strlen glink.s 4.33 5 0.1 130 .cache_hash_get glink.s 21.33 1 0.1 128 .memcmp glink.s 21.30 1 0.1 128 ._Errno glink.s 7.04 3 0.1 127 .ap_cache_get_cachetype glink.s 20.77 1 0.1 125 .ap_cache_tokstr glink.s 20.60 1 0.1 124 .apr_table_make glink.s 5.09 4 0.0 122 .apr_bucket_heap_create glink.s 10.13 2 0.0 122 .ap_cache_liststr glink.s 10.08 2 0.0 121 .cache_find glink.s 18.63 1 0.0 112 .cache_update glink.s 18.63 1 0.0 112 .cache_pq_change_priority cache_pqueue.c 5.30 1 0.0 111 .apr_bucket_shared_copy apr_buckets_refcount.c 1.69 4 0.0 108 .apr_atomic_dec glink.s 17.50 1 0.0 105 .kwritev glink.s 17.13 1 0.0 103 .gettimeofday glink.s 17.10 1 0.0 103 .apr_bucket_eos_create apr_buckets_eos.c 2.55 2 0.0 102 .apr_bucket_socket_create apr_buckets_socket.c 4.82 1 0.0 101 .cache_find cache_cache.c 7.71 1 0.0 100 .bzero glink.s 16.67 1 0.0 100 .ap_log_error glink.s 4.16 4 0.0 100 .free_proc_chain apr_pools.c 6.97 1 0.0 98 .apr_table_get glink.s 3.25 5 0.0 98 .cache_read_entity_body cache_storage.c 9.74 1 0.0 97 .writev glink.s 16.00 1 0.0 96 .apr_bucket_heap_create apr_buckets_heap.c 1.79 2 0.0 93 .time_base_to_time libc/POWER/time_base_to_time.c 1.66 1 0.0 93 .apr_pstrdup glink.s 5.16 3 0.0 93 .memset glink.s 15.47 1 0.0 93 .ap_pass_brigade glink.s 7.70 2 0.0 92 .select glink.s 15.30 1 0.0 92 .apr_bucket_shared_make apr_buckets_refcount.c 5.08 3 0.0 91 .apr_thread_mutex_lock glink.s 15.13 1 0.0 91 .apr_bucket_immortal_create apr_buckets_simple.c 3.54 1 0.0 85 .isdigit glink.s 13.77 1 0.0 83 .ap_add_output_filter glink.s 13.53 1 0.0 81 .minchild cache_pqueue.c 4.48 1 0.0 76 .apr_bucket_simple_copy glink.s 3.08 4 0.0 74 .cache_generate_key_default cache_storage.c 4.29 1 0.0 73 .cache_read_entity_body glink.s 11.87 1 0.0 71 .apr_bucket_free glink.s 2.93 4 0.0 70 .apr_bucket_eos_make apr_buckets_eos.c 4.80 2 0.0 67 .apr_array_make apr_tables.c 3.36 1 0.0 67 .cache_pq_percolate_down cache_pqueue.c 3.94 1 0.0 67 .cache_update cache_cache.c 2.16 1 0.0 67 .apr_psprintf apr_pools.c 1.83 2 0.0 66 .apr_brigade_create glink.s 10.97 1 0.0 66 .apr_thread_mutex_unlock glink.s 5.48 2 0.0 66 .ap_set_keepalive glink.s 5.45 2 0.0 65 .apr_bucket_eos_create glink.s 10.77 1 0.0 65 .apr_allocator_mutex_get apr_pools.c 15.90 2 0.0 64 .strncasecmp glink.s 10.47 1 0.0 63 .memchr glink.s 10.17 1 0.0 61 .brigade_cleanup apr_brigade.c 15.00 4 0.0 60 .apr_bucket_flush_create apr_buckets_flush.c 2.86 1 0.0 57 .cache_read_entity_headers cache_storage.c 2.60 1 0.0 57 .apr_bucket_immortal_create glink.s 9.23 1 0.0 55 .apr_getsocketopt glink.s 9.10 1 0.0 55 .apr_atomic_inc glink.s 9.00 1 0.0 54 .simple_bucket_read apr_buckets_simple.c 3.26 2 0.0 52 .apr_pstrcat glink.s 8.33 1 0.0 50 .cache_hash_get cache_hash.c 3.54 1 0.0 50 .ap_remove_output_filter glink.s 8.10 1 0.0 49 .memcache_get_pos mod_mem_cache.c 16.00 1 0.0 48 .apr_palloc glink.s 3.93 2 0.0 47 .ap_get_output_filter_handle glink.s 3.90 2 0.0 47 .apr_brigade_write glink.s 2.39 3 0.0 43 .strcasecmp glink.s 7.07 1 0.0 42 .apr_bucket_immortal_make apr_buckets_simple.c 5.86 1 0.0 41 .bzero memset.s 13.27 1 0.0 40 .apr_bucket_heap_make glink.s 6.50 1 0.0 39 .ap_basic_http_header glink.s 6.17 1 0.0 37 .apr_bucket_socket_make apr_buckets_socket.c 5.23 1 0.0 37 .apr_pstrndup glink.s 5.63 1 0.0 34 .strchr glink.s 5.23 1 0.0 31 .apr_bucket_destroy_noop apr_buckets.c 7.00 4 0.0 28 .apr_allocator_free apr_pools.c 7.00 4 0.0 28 .apr_allocator_alloc apr_pools.c 7.00 4 0.0 28 .apr_getsocketopt sockopt.c 4.47 1 0.0 27 .apr_setsocketopt glink.s 2.17 2 0.0 26 .isspace glink.s 4.07 1 0.0 24 .apr_allocator_owner_get apr_pools.c 5.00 2 0.0 20 .memcache_inc_frequency mod_mem_cache.c 2.14 1 0.0 15 .apr_bucket_flush_make apr_buckets_flush.c 2.00 1 0.0 14 .apr_brigade_puts glink.s 2.17 1 0.0 13 .memset glink.s 2.17 1 0.0 13 .cache_select_url glink.s 2.17 1 0.0 13 .apr_pstrdup glink.s 2.17 1 0.0 13 .cache_pq_change_priority glink.s 2.17 1 0.0 13 .apr_pool_cleanup_register glink.s 2.17 1 0.0 13 .apr_thread_mutex_unlock glink.s 2.17 1 0.0 13 .eos_bucket_read apr_buckets_eos.c 2.75 1 0.0 11
