Here is an updated list of the top 20 CPU-consuming functions in 2.0. As with the previous profiles that I've posted, this one reflects that delivery of server-parsed HTML pages, as measured by Quantify on Solaris. There are two important differences, however: 1. The percentages listed here are of the program's total usr CPU time, not usr+sys. (I'm ignoring system calls for now because 2.0 has been doing well in that area.) 2. This profile is for an httpd with my directory_walk/location_walk "pre-merge" patch applied. Without this patch, the profile would look a bit different, as mod_mime's dir-merge function would account for about 15% of the total usr CPU time.
The top 20: 1. bndm 23.76% No problem here; the bndm algorithm, used to identify "<!--#" tokens during mod_include parsing, seems to be near optimal for that application. As other parts of the httpd get more efficient, this function should increase toward 100% of the non-syscall time. 2. strcasecmp 7.77% Mostly from apr_table_get, apr_table_setn, and sort_overlap (used in the table overlap operations)... I think we've finally reached the point where the O(n) table scans are the top bottleneck in the code. 3. strlen 5.69% The biggest contributor to this one is directory_walk, which seems to be doing a lot more strlen calls in its latest version. Other major callers are apr_pool_userdata_set (my recent patch to add a "_setn" variant will partially fix this one) and strdup calls within apr_filepath_merge and ap_location_walk. 4. apr_file_read 3.10% Called mostly to read the config file during startup, so I'm not worried about it... 5. memset 2.14% Used in apr_pcalloc 6. find_entry 2.07% This is part of the implementation of the apr_hash_t get/set functions, which in turn are used mostly in the pool userdata API. If anybody can speed up this function (possibly by optimizing the hash computation?), it will be beneficial for APR apps in general. 7. memchr 1.72% Most of the calls to this are from core_input_filter. 8. strchr 1.70% directory_walk makes about 75% of the calls to strchr. 9. apr_palloc 1.47% The three big callers of apr_palloc are: - apr_pcalloc (no obvious optimizations here) - apr_pstrdup (reducing the strdup calls, as described above in the discussion of strlen, will fix this) - apr_pool_cleanup_register (I think this is mostly from apr_pool_userdata_set; my userdata patch optimizes away some of the cleanup registration) 10. apr_lock_release 1.27% 11. apr_lock_acquire 1.27% Most of the calls to these are from apr_file_read--and therefore affect startup rather than request processing. 12. strcmp 1.06% directory_walk, handle_include (in mod_include), and location_walk constitute most of the calls to strcmp. 13. tolower 1.05% ap_add_any_filter and the ap_strcasestr call in ap_make_content_type... 14. ap_directory_walk 1.02% I don't know where in directory_walk the bulk of this time is being spent...but it doesn't matter much, because things called from directory_walk represent much bigger opportunities for optimization. 15. qsort 0.99% apr_table_overlap uses qsort. 16. apr_os_thread_current 0.96% This is called mostly from apr_lock_acquire/release. For the reasons noted above, it doesn't have a big effect on request processing. 17. apr_table_setn 0.90% 18. apr_vformatter 0.88% Optimizing away the apr_psprintf calls in ap_make_etag and ap_add_common_vars would substantially reduce the time spent in apr_vformatter. Note that we're in the <1% category here, though. 19. strrchr 0.86% 20. memcmp 0.83% memcmp is called primarily by the apr_hash_t lookup functions.