I certainly have some cycles if someone can help point me in the right direction. Right now I am at a loss as to where to dig. Profiler just shows a lot of nothing (epoll) happening as you wrote.
10sec @ 10ms polling under full load on CentOS 7 w/8x X5660 @ 2.80GHz shows... total self symbol module 14,553 (72.8%) 14,553 (72.8%) __epoll_wait_nocancel libc-2.17.so 2,996 (15.0%) 2,996 (15.0%) __pthread_cond_timedwait libpthread-2.17.so 999 (5.0%) 999 (5.0%) __GI_nanosleep libc-2.17.so 703 (3.5%) 696 (3.5%) ironbee_plugin ts_ironbee.so The rest is < 1% each. ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq| read writ| recv send| in out | int csw 2 2 96 0 0 0| 19k 102k| 0 0 |1676B 6813B| 12k 18k 4 11 85 0 0 0| 0 0 | 168k 229k| 0 0 | 11k 10k 4 10 85 0 0 0| 0 9216B| 195k 269k| 0 0 |9825 8924 4 11 85 0 0 0| 0 68k| 229k 315k| 0 0 | 11k 9412 5 10 85 0 0 0| 0 12k| 185k 252k| 0 0 | 13k 11k 4 10 86 0 0 0| 0 0 | 86k 122k| 0 0 |7993 7175 5 11 84 0 0 0| 0 0 | 369k 502k| 0 0 | 15k 13k -B -- Brian Rectanus On Wed, Mar 11, 2015 at 2:41 AM, Brian Geffon <bri...@apache.org> wrote: > I've also observed unexplained latency when it comes to transformations, I > think it's time that we dig into this more. The reason we're observing an > increase in latency without a corresponding increase in CPU load is because > TS simply isn't doing anything, it appears that it's just rescheduling > transformations in certain situations. > > Does anyone have cycles to investigate? > > On Wed, Mar 11, 2015 at 12:31 AM, Brian Rectanus <brect...@gmail.com> > wrote: > > > All, > > > > I am looking for advice on tuning performance of a plugin. As some may > > know, I have a plugin for trafficserver (using 4.2.2 w/hwloc) that does a > > lot of inspection of the http traffic (github/ironbee). As such, it can > > introduce a fair amount of latency due to what should be high CPU usage > > parsing, normalizing and looking for various patterns in the HTTP. This > is > > what I expect to see at least, but that is not how the server is acting. > > > > What I am seeing: > > > > * Without plugin loaded I see great performance and machine basically > idle > > (4% cpu or so) - I am using Ixia's ixLoad to generate a very consistent > > load. > > > > * With plugin loaded and fully configured, the machine is slightly more > > than idle (12% cpu or so), but transaction per sec drops by 25x. > > > > * Default threads settings of 1.5 * cores seems to be very poor setting > > (50x slower). Setting manually obscenely higher threads (200) works a bit > > better, but best setting is 2 threads (1 is bad, 3 is bad, but 2 works > some > > 7-10 times faster). Using accept threads is also very poor. > > > > Best balance (far better than others) is: > > > > CONFIG proxy.config.exec_thread.autoconfig INT 0 > > CONFIG proxy.config.exec_thread.autoconfig.scale FLOAT 1.5 > > CONFIG proxy.config.exec_thread.limit INT 2 > > CONFIG proxy.config.accept_threads INT 0 > > CONFIG proxy.config.exec_thread.affinity INT 2 > > CONFIG proxy.config.task_threads INT 3 > > > > Everything else is pretty much default - caching is disabled. The above > is > > about 15x faster (in tx/s) than the default settings. > > > > * Profiling (perf and Zoom profiler) with the 2 thread max setting shows > > that two threads are active, one far more than the other > > > > * Profiling with the 1.5 x cores (e.g., 12 in this case as there are 8 > > cores) shows 4-5 threads active, but far less active than with the 2 core > > max setting - most threads are always idle > > > > * First thought was blocking and lock contention, but there does not seem > > to be any seen with the profiler. > > > > * Next thought was malloc() speed issues, so tried jemalloc (and > tcmalloc) > > which helps slightly, but not much (we use memory pools, so much is > > pre-allocated anyhow) > > > > Attached a screenshot of the profiler timeline, but not sure it will come > > through on the list. The plugin does not block, but should be using lots > of > > CPU for parsing, running regex, etc. It also uses a lot of extra RAM for > > normalizing HTTP, etc. However I am not seeing high CPU nor am I seeing > > high RAM usage. It is like it just cannot get CPU, but the system is > idle - > > more threads I add, the less it gets CPU as if the extra accounting is > > getting in the way. > > > > * I expect high CPU utilization, but the machine is mostly idle. > > * I expect all the cores (8 of them) to get used, but really only 1-2 are > > somewhat used. > > * I expect the threads to be saturated with work, but they are mostly > idle. > > > > Any ideas why the complete lack of CPU/thread utilization? > > > > Any ideas what to look at? > > > > Any ideas what I can enable (tools I could use) to see more insight into > > what is happening? > > > > Cheers! > > -B > > > > -- > > Brian Rectanus > > >