I certainly have some cycles if someone can help point me in the right
direction. Right now I am at a loss as to where to dig. Profiler just shows
a lot of nothing (epoll) happening as you wrote.

10sec @ 10ms polling under full load on CentOS 7 w/8x X5660 @ 2.80GHz
shows...

       total          self     symbol                    module
 14,553 (72.8%) 14,553 (72.8%) __epoll_wait_nocancel     libc-2.17.so
  2,996 (15.0%)  2,996 (15.0%) __pthread_cond_timedwait  libpthread-2.17.so
    999  (5.0%)    999  (5.0%) __GI_nanosleep            libc-2.17.so
    703  (3.5%)    696  (3.5%) ironbee_plugin            ts_ironbee.so
The rest is < 1% each.

----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
  2   2  96   0   0   0|  19k  102k|   0     0 |1676B 6813B|  12k   18k
  4  11  85   0   0   0|   0     0 | 168k  229k|   0     0 |  11k   10k
  4  10  85   0   0   0|   0  9216B| 195k  269k|   0     0 |9825  8924
  4  11  85   0   0   0|   0    68k| 229k  315k|   0     0 |  11k 9412
  5  10  85   0   0   0|   0    12k| 185k  252k|   0     0 |  13k   11k
  4  10  86   0   0   0|   0     0 |  86k  122k|   0     0 |7993  7175
  5  11  84   0   0   0|   0     0 | 369k  502k|   0     0 |  15k   13k

-B

--
Brian Rectanus

On Wed, Mar 11, 2015 at 2:41 AM, Brian Geffon <bri...@apache.org> wrote:

> I've also observed unexplained latency when it comes to transformations, I
> think it's time that we dig into this more. The reason we're observing an
> increase in latency without a corresponding increase in CPU load is because
> TS simply isn't doing anything, it appears that it's just rescheduling
> transformations in certain situations.
>
> Does anyone have cycles to investigate?
>
> On Wed, Mar 11, 2015 at 12:31 AM, Brian Rectanus <brect...@gmail.com>
> wrote:
>
> > All,
> >
> > I am looking for advice on tuning performance of a plugin. As some may
> > know, I have a plugin for trafficserver (using 4.2.2 w/hwloc) that does a
> > lot of inspection of the http traffic (github/ironbee). As such, it can
> > introduce a fair amount of latency due to what should be high CPU usage
> > parsing, normalizing and looking for various patterns in the HTTP. This
> is
> > what I expect to see at least, but that is not how the server is acting.
> >
> > What I am seeing:
> >
> > * Without plugin loaded I see great performance and machine basically
> idle
> > (4% cpu or so) - I am using Ixia's ixLoad to generate a very consistent
> > load.
> >
> > * With plugin loaded and fully configured, the machine is slightly more
> > than idle (12% cpu or so), but transaction per sec drops by 25x.
> >
> > * Default threads settings of 1.5 * cores seems to be very poor setting
> > (50x slower). Setting manually obscenely higher threads (200) works a bit
> > better, but best setting is 2 threads (1 is bad, 3 is bad, but 2 works
> some
> > 7-10 times faster). Using accept threads is also very poor.
> >
> > Best balance (far better than others) is:
> >
> > CONFIG proxy.config.exec_thread.autoconfig INT 0
> > CONFIG proxy.config.exec_thread.autoconfig.scale FLOAT 1.5
> > CONFIG proxy.config.exec_thread.limit INT 2
> > CONFIG proxy.config.accept_threads INT 0
> > CONFIG proxy.config.exec_thread.affinity INT 2
> > CONFIG proxy.config.task_threads INT 3
> >
> > Everything else is pretty much default - caching is disabled. The above
> is
> > about 15x faster (in tx/s) than the default settings.
> >
> > * Profiling (perf and Zoom profiler) with the 2 thread max setting shows
> > that two threads are active, one far more than the other
> >
> > * Profiling with the 1.5 x cores (e.g., 12 in this case as there are 8
> > cores) shows 4-5 threads active, but far less active than with the 2 core
> > max setting - most threads are always idle
> >
> > * First thought was blocking and lock contention, but there does not seem
> > to be any seen with the profiler.
> >
> > * Next thought was malloc() speed issues, so tried jemalloc (and
> tcmalloc)
> > which helps slightly, but not much (we use memory pools, so much is
> > pre-allocated anyhow)
> >
> > Attached a screenshot of the profiler timeline, but not sure it will come
> > through on the list. The plugin does not block, but should be using lots
> of
> > CPU for parsing, running regex, etc. It also uses a lot of extra RAM for
> > normalizing HTTP, etc. However I am not seeing high CPU nor am I seeing
> > high RAM usage. It is like it just cannot get CPU, but the system is
> idle -
> > more threads I add, the less it gets CPU as if the extra accounting is
> > getting in the way.
> >
> > * I expect high CPU utilization, but the machine is mostly idle.
> > * I expect all the cores (8 of them) to get used, but really only 1-2 are
> > somewhat used.
> > * I expect the threads to be saturated with work, but they are mostly
> idle.
> >
> > Any ideas why the complete lack of CPU/thread utilization?
> >
> > Any ideas what to look at?
> >
> > Any ideas what I can enable (tools I could use) to see more insight into
> > what is happening?
> >
> > Cheers!
> > -B
> >
> > --
> > Brian Rectanus
> >
>

Reply via email to