Re: [gentoo-user] Re: TCP Queuing problem

2016-10-01 Thread Grant
> It turned out this was a combination of two problems which made it
> much more difficult to figure out.
>
> First of all I didn't have enough apache2 processes.  That seems like
> it should have been obvious but it wasn't for two reasons.  Firstly,
> my apache2 processes are always idle or nearly idle, even when traffic
> levels are high.  But it must be the case that each request made to
> nginx which is then handed off to apache2 monopolizes an apache2
> process even though my backend application server is the one using all
> the CPU instead of apache2.  The other thing that made it difficult to
> track down was the way munin graphs apache2 processes.  On my graph,
> busy and free processes only appeared as tiny dots at the bottom
> because apache2's ServerLimit is drawn on the same graph which is many
> times greater than the number of busy and free processes.  It would be
> better to draw MaxClients instead of ServerLimit since I think
> MaxClients is more likely to be tuned.  It at least appears in the
> default config file on Gentoo.  Since busy and free apache2 processes
> were virtually invisible on the munin graph, I wasn't able to
> correlate their ebb and flow with my server's response times.
>
> Once I fixed the apache2 problem, I was sure I had it nailed.  That's
> when I emailed here a few days ago to say I think I got it.  But it
> turned out there was another problem and that was Odoo (formerly known
> as OpenERP) which is also running in a reverse proxy configuration
> behind nginx.  Whenever someone uses Odoo on my server, it absolutely
> destroys performance for my non-Odoo website.  That would have been
> really easy to test and I did test stopping the odoo service early on,
> but I ruled it out when the problem persisted after stopping Odoo
> which I now realize must have been because of the apache2 problem.


The root of the Odoo problem was that I didn't have keepalive enabled
between the nginx reverse proxy server and the Odoo server.  nginx
enables keepalive by default for the client side (HTTP/1.1) but not
for the upstream side (HTTP/1.0).  I still see TCP Queuing spikes in
munin with Odoo usage, but they no longer slow down the apache2/nginx
reverse proxy running my main site.

- Grant



Re: [gentoo-user] Re: TCP Queuing problem

2016-09-24 Thread Grant
>> >> I haven't mentioned it yet, but several times I've seen the website
>> >> perform fine all day until I browse to it myself and then all of a
>> >> sudden it's super slow for me and my third-party monitor.  WTF???
>> >
>> > I had a similar problems once when routing through a IPsec VPN
>> > tunnnel. I needed to reduce MTU in front of the tunnel to make it
>> > work correctly. But I think your problem is different.
>>
>>
>> I'm not using IPsec or a VPN.
>>
>>
>> > Does the http server backlog on the other side? Do you have
>> > performance graphs for other parts of the system to see them in
>> > relation? Maybe some router on the path doesn't work as expected.
>>
>>
>> I've attached a graph of http response time, CPU usage, and TCP
>> queueing over the past week.  It seems clear from watching top, iotop,
>> and free than my CPU is always the bottleneck on my server.
>
> What kind of application stack is running in the http server? CPU is a
> bottleneck you cannot always circumvent by throwing more CPUs at the
> problem. Maybe that stack needs tuning...
>
> At the point when requests start queuing up in the http server, the load
> on the server will exponentially rise. It's like a traffic jam on a
> multi lane high way. If one car brakes, thinks may still work. If a car
> in every lane brakes, you suddenly have a huge traffic jam backlogging
> a few miles. And it takes time to recover from that. You need to solve
> the cause for "braking" in the first place and add some alternative
> routes for "cars that never brake" (static files and cacheable
> content). Each lane corresponds to one CPU. Adding just more lanes when
> you have just 4 CPUs will only make the lanes slower. The key is to
> drastically lower the response times which are much too high if I look
> at your graphs. What do memory and IO say?


It turned out this was a combination of two problems which made it
much more difficult to figure out.

First of all I didn't have enough apache2 processes.  That seems like
it should have been obvious but it wasn't for two reasons.  Firstly,
my apache2 processes are always idle or nearly idle, even when traffic
levels are high.  But it must be the case that each request made to
nginx which is then handed off to apache2 monopolizes an apache2
process even though my backend application server is the one using all
the CPU instead of apache2.  The other thing that made it difficult to
track down was the way munin graphs apache2 processes.  On my graph,
busy and free processes only appeared as tiny dots at the bottom
because apache2's ServerLimit is drawn on the same graph which is many
times greater than the number of busy and free processes.  It would be
better to draw MaxClients instead of ServerLimit since I think
MaxClients is more likely to be tuned.  It at least appears in the
default config file on Gentoo.  Since busy and free apache2 processes
were virtually invisible on the munin graph, I wasn't able to
correlate their ebb and flow with my server's response times.

Once I fixed the apache2 problem, I was sure I had it nailed.  That's
when I emailed here a few days ago to say I think I got it.  But it
turned out there was another problem and that was Odoo (formerly known
as OpenERP) which is also running in a reverse proxy configuration
behind nginx.  Whenever someone uses Odoo on my server, it absolutely
destroys performance for my non-Odoo website.  That would have been
really easy to test and I did test stopping the odoo service early on,
but I ruled it out when the problem persisted after stopping Odoo
which I now realize must have been because of the apache2 problem.

So this was much more difficult to figure out due to the fact that I
had multiple problems interacting with each other.

- Grant



Re: [gentoo-user] Re: TCP Queuing problem

2016-09-22 Thread Volker Armin Hemmann
Am 20.09.2016 um 21:52 schrieb Grant:
 My web server's response time for http requests skyrockets every
 weekday between about 9am and 5pm.  I've gone over my munin
>>> graphs
> and
 the only one that really correlates well with the slowdown is
>>> "TCP
 Queuing".  It looks like I normally have about 400 packets per
> second
 graphed as "direct copy from queue" in munin throughout the day,
> but 2
 to 3.5 times that many are periodically graphed during work
>>> hours.
> I
 don't see the same pattern at all from the graph of all traffic
>>> on
> my
 network interface which actually peaks over the weekend.  TCP
> Queuing
 doesn't rise above 400 packets per second all weekend.  This is
 consistent week after week.

 My two employees come into work during the hours in question, and
> they
 certainly make frequent requests of the web server while at work,
> but
 if their volume of requests were the cause of the problem then
>>> that
 would be reflected in the graph of web server requests but it is
> not.
 I do run a small MTU on the systems at work due to the config of
> the
 modem/router we have there.

 Is this a recognizable problem to anyone?
>>>
>>> I'm in the midst of this.  Are there certain attacks I should
>>> check
> for?
>>
>> It looks like the TCP Queuing spike itself was due to imapproxy
>>> which
>> I've now disabled.  I'll post more info as I gather it.
>
> imapproxy was clearly affecting the TCP Queuing graph in munin but I
> still ended up with a massive TCP Queuing spike today and
> corresponding http response time issues long after I disabled
> imapproxy.  Graph attached.  I'm puzzled.
>
> - Grant
 Things to check for:
 Torrent or other distributed downloads.
 Download program with multiple download threads
>>>
>>> There sure shouldn't be anything like that running either on the
>>> server or in the office.  Is there a good way to find out? Maybe
>>> something that would clearly indicate it?
>>>
>>>
 Maybe another proxy running? Esp. as you saw this also with
>>> imapproxy.
>>>
>>>
>>> nginx acts as a reverse proxy to apache2 but that's a pretty common
>>> config.  Nothing else that I know of.
>>>
>>> - Grant
>> Any way to find out between which hosts/servers those connections are for?
>> That might help in locating the cause.
>>
>> Eg. which of your desktops/laptops inside your network and where they are 
>> trying to connect to.
>
> The spikes are taking place on my remote server but they seem to
> roughly coincide with user activity within my own network.  My
> technical knowledge of networking internals is weak.  Does anyone know
> which tool will tell me more about the connections that are causing
> the TCP Queuing spikes?
>
> - Grant
>
>

wireshark or whatever it is called at the moment?




[gentoo-user] Re: TCP Queuing problem

2016-09-22 Thread Kai Krakow
Am Wed, 21 Sep 2016 13:47:28 -0700
schrieb Grant :

> >> I haven't mentioned it yet, but several times I've seen the website
> >> perform fine all day until I browse to it myself and then all of a
> >> sudden it's super slow for me and my third-party monitor.  WTF???  
> >
> > I had a similar problems once when routing through a IPsec VPN
> > tunnnel. I needed to reduce MTU in front of the tunnel to make it
> > work correctly. But I think your problem is different.  
> 
> 
> I'm not using IPsec or a VPN.
> 
> 
> > Does the http server backlog on the other side? Do you have
> > performance graphs for other parts of the system to see them in
> > relation? Maybe some router on the path doesn't work as expected.  
> 
> 
> I've attached a graph of http response time, CPU usage, and TCP
> queueing over the past week.  It seems clear from watching top, iotop,
> and free than my CPU is always the bottleneck on my server.

What kind of application stack is running in the http server? CPU is a
bottleneck you cannot always circumvent by throwing more CPUs at the
problem. Maybe that stack needs tuning...

At the point when requests start queuing up in the http server, the load
on the server will exponentially rise. It's like a traffic jam on a
multi lane high way. If one car brakes, thinks may still work. If a car
in every lane brakes, you suddenly have a huge traffic jam backlogging
a few miles. And it takes time to recover from that. You need to solve
the cause for "braking" in the first place and add some alternative
routes for "cars that never brake" (static files and cacheable
content). Each lane corresponds to one CPU. Adding just more lanes when
you have just 4 CPUs will only make the lanes slower. The key is to
drastically lower the response times which are much too high if I look
at your graphs. What do memory and IO say?

-- 
Regards,
Kai

Replies to list-only preferred.




Re: [gentoo-user] Re: TCP Queuing problem

2016-09-21 Thread Grant
>> >> I haven't mentioned it yet, but several times I've seen the website
>> >> perform fine all day until I browse to it myself and then all of a
>> >> sudden it's super slow for me and my third-party monitor.  WTF???
>> >
>> > I had a similar problems once when routing through a IPsec VPN tunnnel.
>> > I needed to reduce MTU in front of the tunnel to make it work
>> > correctly. But I think your problem is different.
>>
>> I'm not using IPsec or a VPN.
>>
>> > Does the http server backlog on the other side? Do you have performance
>> > graphs for other parts of the system to see them in relation? Maybe
>> > some router on the path doesn't work as expected.
>>
>> I've attached a graph of http response time, CPU usage, and TCP
>> queueing over the past week.  It seems clear from watching top, iotop,
>> and free than my CPU is always the bottleneck on my server.
>
> I'm going to throw one more tool at you; give atop a try. (htop is nice, too,
> but atop is more powerful.)


I think I got it.  I'll post all the gory details shortly.

- Grant



Re: [gentoo-user] Re: TCP Queuing problem

2016-09-21 Thread Michael Mol
On Wednesday, September 21, 2016 01:47:28 PM Grant wrote:
> >> I haven't mentioned it yet, but several times I've seen the website
> >> perform fine all day until I browse to it myself and then all of a
> >> sudden it's super slow for me and my third-party monitor.  WTF???
> > 
> > I had a similar problems once when routing through a IPsec VPN tunnnel.
> > I needed to reduce MTU in front of the tunnel to make it work
> > correctly. But I think your problem is different.
> 
> I'm not using IPsec or a VPN.
> 
> > Does the http server backlog on the other side? Do you have performance
> > graphs for other parts of the system to see them in relation? Maybe
> > some router on the path doesn't work as expected.
> 
> I've attached a graph of http response time, CPU usage, and TCP
> queueing over the past week.  It seems clear from watching top, iotop,
> and free than my CPU is always the bottleneck on my server.

I'm going to throw one more tool at you; give atop a try. (htop is nice, too, 
but atop is more powerful.)

-- 
:wq

signature.asc
Description: This is a digitally signed message part.


Re: [gentoo-user] Re: TCP Queuing problem

2016-09-21 Thread Grant
>> I haven't mentioned it yet, but several times I've seen the website
>> perform fine all day until I browse to it myself and then all of a
>> sudden it's super slow for me and my third-party monitor.  WTF???
>
> I had a similar problems once when routing through a IPsec VPN tunnnel.
> I needed to reduce MTU in front of the tunnel to make it work
> correctly. But I think your problem is different.


I'm not using IPsec or a VPN.


> Does the http server backlog on the other side? Do you have performance
> graphs for other parts of the system to see them in relation? Maybe
> some router on the path doesn't work as expected.


I've attached a graph of http response time, CPU usage, and TCP
queueing over the past week.  It seems clear from watching top, iotop,
and free than my CPU is always the bottleneck on my server.

- Grant


[gentoo-user] Re: TCP Queuing problem

2016-09-21 Thread Kai Krakow
Am Wed, 21 Sep 2016 22:06:37 +0200
schrieb Kai Krakow :

> Am Wed, 21 Sep 2016 12:37:51 -0700
> schrieb Grant :
> 
> >  [...]  
> >  [...]
>  [...]  
> > >
> > > You may want to set the default congestion control to fq-codel
> > > (it's in the kernel) if you're using DSL links. This may help your
> > > problem a little bit. It is most effective if you deploy traffic
> > > shaping at the same time. There was once something like
> > > wondershaper. Trick is to get the TCP queuing back inside your
> > > router (that is where you deployed pppoe) as otherwise packets
> > > will queue up in the modem (dsl modems use huge queues by
> > > default). This works by lowering the uplink bandwith to 80-90% of
> > > measured maximum upload (the excess bandwidth is for short bursts
> > > of traffic). Traffic shaping now re-orders the packets. It should
> > > send ACK and small packets first. This should solve your queuing
> > > problem.
> > 
> > 
> > We're talking about optimizing the DSL connection at my office but
> > the server is located in a data center.  I can't imagine optimizing
> > that office DSL connection is the way to solve this even though the
> > http response slowdowns do correlate to office hours.  As a note,
> > the slowdowns are recorded by my third-party monitoring service.  
> 
> Ah I didn't correctly get this... So as the problem correlates with
> office hours the first step of solving your traffic flow problems
> (timeouts, drops etc) should be sufficient to further work on problems
> of the other side.

I'd probably try iperf at this stage to test if this problem occurs
with application stacks only or also occurs using only raw tcp streams.
Try upload, download, and bidirectional. Do your tests again after
modifying some parameters, maybe access the http server in parallel,
test at different times of day (off-hours and office hours). See if you
can isolate it around specific settings or tasks.

-- 
Regards,
Kai

Replies to list-only preferred.




[gentoo-user] Re: TCP Queuing problem

2016-09-21 Thread Kai Krakow
Am Wed, 21 Sep 2016 12:53:17 -0700
schrieb Grant :

>  [...]  
>  [...]  
> >>
> >> You may want to set the default congestion control to fq-codel
> >> (it's in the kernel) if you're using DSL links. This may help your
> >> problem a little bit. It is most effective if you deploy traffic
> >> shaping at the same time. There was once something like
> >> wondershaper. Trick is to get the TCP queuing back inside your
> >> router (that is where you deployed pppoe) as otherwise packets
> >> will queue up in the modem (dsl modems use huge queues by
> >> default). This works by lowering the uplink bandwith to 80-90% of
> >> measured maximum upload (the excess bandwidth is for short bursts
> >> of traffic). Traffic shaping now re-orders the packets. It should
> >> send ACK and small packets first. This should solve your queuing
> >> problem.
> >>
> >> Between each step check dslreports.com for bufferbloat. I'm
> >> guessing it is currently way above 1000 ms while it should stay
> >> below 20-50 ms for dsl.
> >>
> >> The fq-codel congestion control fights against buffer bloat. But it
> >> can only effectively work if you're doing traffic shaping at least
> >> on your uplink (downlink may or may not be worth the effort
> >> depending on your use-case).
> >>
> >> Additionally, you can lower the priority of icmp-echo-reply this
> >> way so during icmp flooding your uplink will still work.
> >>
> >> This link may help you:
> >> https://www.bufferbloat.net/projects/codel/wiki/Cake/  
> >
> > And this:
> > https://github.com/tohojo/sqm-scripts  
> 
> 
> I haven't mentioned it yet, but several times I've seen the website
> perform fine all day until I browse to it myself and then all of a
> sudden it's super slow for me and my third-party monitor.  WTF???

I had a similar problems once when routing through a IPsec VPN tunnnel.
I needed to reduce MTU in front of the tunnel to make it work
correctly. But I think your problem is different.

Does the http server backlog on the other side? Do you have performance
graphs for other parts of the system to see them in relation? Maybe
some router on the path doesn't work as expected.

-- 
Regards,
Kai

Replies to list-only preferred.




[gentoo-user] Re: TCP Queuing problem

2016-09-21 Thread Kai Krakow
Am Wed, 21 Sep 2016 12:37:51 -0700
schrieb Grant :

>  [...]  
>  [...]  
> >>
> >>
> >> Hi Kai, yesterday I switched my Gentoo router over to handling
> >> PPPoE and pings seem to be working properly now.  The AT device
> >> is now functioning as a modem only and passing everything
> >> through.  Today I'll find out if it helps with TCP Queuing and
> >> (supposedly) related http response slowdowns.  
> >
> > You may want to set the default congestion control to fq-codel
> > (it's in the kernel) if you're using DSL links. This may help your
> > problem a little bit. It is most effective if you deploy traffic
> > shaping at the same time. There was once something like
> > wondershaper. Trick is to get the TCP queuing back inside your
> > router (that is where you deployed pppoe) as otherwise packets will
> > queue up in the modem (dsl modems use huge queues by default). This
> > works by lowering the uplink bandwith to 80-90% of measured maximum
> > upload (the excess bandwidth is for short bursts of traffic).
> > Traffic shaping now re-orders the packets. It should send ACK and
> > small packets first. This should solve your queuing problem.  
> 
> 
> We're talking about optimizing the DSL connection at my office but the
> server is located in a data center.  I can't imagine optimizing that
> office DSL connection is the way to solve this even though the http
> response slowdowns do correlate to office hours.  As a note, the
> slowdowns are recorded by my third-party monitoring service.

Ah I didn't correctly get this... So as the problem correlates with
office hours the first step of solving your traffic flow problems
(timeouts, drops etc) should be sufficient to further work on problems
of the other side.


-- 
Regards,
Kai

Replies to list-only preferred.




Re: [gentoo-user] Re: TCP Queuing problem

2016-09-21 Thread Grant
>> > > If that device behaves badly in router mode by blocking just all
>> > > icmp traffic instead of only icmp-echo-req, this is a good idea.
>> > > You may want to bug AT about this problem then. It should really
>> > > not block related icmp traffic.
>> >
>> >
>> > Hi Kai, yesterday I switched my Gentoo router over to handling PPPoE
>> > and pings seem to be working properly now.  The AT device is now
>> > functioning as a modem only and passing everything through.  Today
>> > I'll find out if it helps with TCP Queuing and (supposedly) related
>> > http response slowdowns.
>>
>> You may want to set the default congestion control to fq-codel (it's
>> in the kernel) if you're using DSL links. This may help your problem a
>> little bit. It is most effective if you deploy traffic shaping at the
>> same time. There was once something like wondershaper. Trick is to get
>> the TCP queuing back inside your router (that is where you deployed
>> pppoe) as otherwise packets will queue up in the modem (dsl modems use
>> huge queues by default). This works by lowering the uplink bandwith to
>> 80-90% of measured maximum upload (the excess bandwidth is for short
>> bursts of traffic). Traffic shaping now re-orders the packets. It
>> should send ACK and small packets first. This should solve your
>> queuing problem.
>>
>> Between each step check dslreports.com for bufferbloat. I'm guessing
>> it is currently way above 1000 ms while it should stay below 20-50 ms
>> for dsl.
>>
>> The fq-codel congestion control fights against buffer bloat. But it
>> can only effectively work if you're doing traffic shaping at least on
>> your uplink (downlink may or may not be worth the effort depending on
>> your use-case).
>>
>> Additionally, you can lower the priority of icmp-echo-reply this way
>> so during icmp flooding your uplink will still work.
>>
>> This link may help you:
>> https://www.bufferbloat.net/projects/codel/wiki/Cake/
>
> And this:
> https://github.com/tohojo/sqm-scripts


I haven't mentioned it yet, but several times I've seen the website
perform fine all day until I browse to it myself and then all of a
sudden it's super slow for me and my third-party monitor.  WTF???

- Grant



[gentoo-user] Re: TCP Queuing problem

2016-09-21 Thread Kai Krakow
Am Wed, 21 Sep 2016 21:29:13 +0200
schrieb Kai Krakow :

> Am Wed, 21 Sep 2016 07:30:40 -0700
> schrieb Grant :
> 
> >  [...]  
> >  [...]
>  [...]  
> > >
> > > If that device behaves badly in router mode by blocking just all
> > > icmp traffic instead of only icmp-echo-req, this is a good idea.
> > > You may want to bug AT about this problem then. It should really
> > > not block related icmp traffic.
> > 
> > 
> > Hi Kai, yesterday I switched my Gentoo router over to handling PPPoE
> > and pings seem to be working properly now.  The AT device is now
> > functioning as a modem only and passing everything through.  Today
> > I'll find out if it helps with TCP Queuing and (supposedly) related
> > http response slowdowns.  
> 
> You may want to set the default congestion control to fq-codel (it's
> in the kernel) if you're using DSL links. This may help your problem a
> little bit. It is most effective if you deploy traffic shaping at the
> same time. There was once something like wondershaper. Trick is to get
> the TCP queuing back inside your router (that is where you deployed
> pppoe) as otherwise packets will queue up in the modem (dsl modems use
> huge queues by default). This works by lowering the uplink bandwith to
> 80-90% of measured maximum upload (the excess bandwidth is for short
> bursts of traffic). Traffic shaping now re-orders the packets. It
> should send ACK and small packets first. This should solve your
> queuing problem.
> 
> Between each step check dslreports.com for bufferbloat. I'm guessing
> it is currently way above 1000 ms while it should stay below 20-50 ms
> for dsl.
> 
> The fq-codel congestion control fights against buffer bloat. But it
> can only effectively work if you're doing traffic shaping at least on
> your uplink (downlink may or may not be worth the effort depending on
> your use-case).
> 
> Additionally, you can lower the priority of icmp-echo-reply this way
> so during icmp flooding your uplink will still work.
> 
> This link may help you:
> https://www.bufferbloat.net/projects/codel/wiki/Cake/

And this:
https://github.com/tohojo/sqm-scripts

-- 
Regards,
Kai

Replies to list-only preferred.




Re: [gentoo-user] Re: TCP Queuing problem

2016-09-21 Thread Grant
>> >> I just remembered that our AT modem/router does not respond to
>> >> pings.  My solution is to move PPPoE off of that device and onto my
>> >> Gentoo router so that pings pass through the AT device to the
>> >> Gentoo router but I haven't done that yet as I want to be on-site
>> >> for it. Could that behavior somehow be contributing to this
>> >> problem?  There does seem to be a clear correlation between user
>> >> activity at that location and the bad server behavior.
>> >
>> > If that device behaves badly in router mode by blocking just all
>> > icmp traffic instead of only icmp-echo-req, this is a good idea.
>> > You may want to bug AT about this problem then. It should really
>> > not block related icmp traffic.
>>
>>
>> Hi Kai, yesterday I switched my Gentoo router over to handling PPPoE
>> and pings seem to be working properly now.  The AT device is now
>> functioning as a modem only and passing everything through.  Today
>> I'll find out if it helps with TCP Queuing and (supposedly) related
>> http response slowdowns.
>
> You may want to set the default congestion control to fq-codel (it's in
> the kernel) if you're using DSL links. This may help your problem a
> little bit. It is most effective if you deploy traffic shaping at the
> same time. There was once something like wondershaper. Trick is to get
> the TCP queuing back inside your router (that is where you deployed
> pppoe) as otherwise packets will queue up in the modem (dsl modems use
> huge queues by default). This works by lowering the uplink bandwith to
> 80-90% of measured maximum upload (the excess bandwidth is for short
> bursts of traffic). Traffic shaping now re-orders the packets. It
> should send ACK and small packets first. This should solve your
> queuing problem.


We're talking about optimizing the DSL connection at my office but the
server is located in a data center.  I can't imagine optimizing that
office DSL connection is the way to solve this even though the http
response slowdowns do correlate to office hours.  As a note, the
slowdowns are recorded by my third-party monitoring service.

- Grant


> Between each step check dslreports.com for bufferbloat. I'm guessing it
> is currently way above 1000 ms while it should stay below 20-50 ms for
> dsl.
>
> The fq-codel congestion control fights against buffer bloat. But it can
> only effectively work if you're doing traffic shaping at least on your
> uplink (downlink may or may not be worth the effort depending on your
> use-case).
>
> Additionally, you can lower the priority of icmp-echo-reply this way so
> during icmp flooding your uplink will still work.
>
> This link may help you:
> https://www.bufferbloat.net/projects/codel/wiki/Cake/



[gentoo-user] Re: TCP Queuing problem

2016-09-21 Thread Kai Krakow
Am Wed, 21 Sep 2016 07:30:40 -0700
schrieb Grant :

>  [...]  
>  [...]  
> >>
> >>
> >> I just remembered that our AT modem/router does not respond to
> >> pings.  My solution is to move PPPoE off of that device and onto my
> >> Gentoo router so that pings pass through the AT device to the
> >> Gentoo router but I haven't done that yet as I want to be on-site
> >> for it. Could that behavior somehow be contributing to this
> >> problem?  There does seem to be a clear correlation between user
> >> activity at that location and the bad server behavior.  
> >
> > If that device behaves badly in router mode by blocking just all
> > icmp traffic instead of only icmp-echo-req, this is a good idea.
> > You may want to bug AT about this problem then. It should really
> > not block related icmp traffic.  
> 
> 
> Hi Kai, yesterday I switched my Gentoo router over to handling PPPoE
> and pings seem to be working properly now.  The AT device is now
> functioning as a modem only and passing everything through.  Today
> I'll find out if it helps with TCP Queuing and (supposedly) related
> http response slowdowns.

You may want to set the default congestion control to fq-codel (it's in
the kernel) if you're using DSL links. This may help your problem a
little bit. It is most effective if you deploy traffic shaping at the
same time. There was once something like wondershaper. Trick is to get
the TCP queuing back inside your router (that is where you deployed
pppoe) as otherwise packets will queue up in the modem (dsl modems use
huge queues by default). This works by lowering the uplink bandwith to
80-90% of measured maximum upload (the excess bandwidth is for short
bursts of traffic). Traffic shaping now re-orders the packets. It
should send ACK and small packets first. This should solve your
queuing problem.

Between each step check dslreports.com for bufferbloat. I'm guessing it
is currently way above 1000 ms while it should stay below 20-50 ms for
dsl.

The fq-codel congestion control fights against buffer bloat. But it can
only effectively work if you're doing traffic shaping at least on your
uplink (downlink may or may not be worth the effort depending on your
use-case).

Additionally, you can lower the priority of icmp-echo-reply this way so
during icmp flooding your uplink will still work.

This link may help you:
https://www.bufferbloat.net/projects/codel/wiki/Cake/

-- 
Regards,
Kai

Replies to list-only preferred.




Re: [gentoo-user] Re: TCP Queuing problem

2016-09-21 Thread Grant
>> >> It looks like the TCP Queuing spike itself was due to imapproxy
>> >> which I've now disabled.  I'll post more info as I gather it.
>> >
>> >
>> > imapproxy was clearly affecting the TCP Queuing graph in munin but I
>> > still ended up with a massive TCP Queuing spike today and
>> > corresponding http response time issues long after I disabled
>> > imapproxy.  Graph attached.  I'm puzzled.
>>
>>
>> I just remembered that our AT modem/router does not respond to
>> pings.  My solution is to move PPPoE off of that device and onto my
>> Gentoo router so that pings pass through the AT device to the Gentoo
>> router but I haven't done that yet as I want to be on-site for it.
>> Could that behavior somehow be contributing to this problem?  There
>> does seem to be a clear correlation between user activity at that
>> location and the bad server behavior.
>
> If that device behaves badly in router mode by blocking just all icmp
> traffic instead of only icmp-echo-req, this is a good idea. You may
> want to bug AT about this problem then. It should really not block
> related icmp traffic.


Hi Kai, yesterday I switched my Gentoo router over to handling PPPoE
and pings seem to be working properly now.  The AT device is now
functioning as a modem only and passing everything through.  Today
I'll find out if it helps with TCP Queuing and (supposedly) related
http response slowdowns.

- Grant



[gentoo-user] Re: TCP Queuing problem

2016-09-20 Thread Kai Krakow
Am Tue, 20 Sep 2016 06:08:31 -0700
schrieb Grant :

>  [...]  
>  [...]  
> >>
> >>
> >> It looks like the TCP Queuing spike itself was due to imapproxy
> >> which I've now disabled.  I'll post more info as I gather it.  
> >
> >
> > imapproxy was clearly affecting the TCP Queuing graph in munin but I
> > still ended up with a massive TCP Queuing spike today and
> > corresponding http response time issues long after I disabled
> > imapproxy.  Graph attached.  I'm puzzled.  
> 
> 
> I just remembered that our AT modem/router does not respond to
> pings.  My solution is to move PPPoE off of that device and onto my
> Gentoo router so that pings pass through the AT device to the Gentoo
> router but I haven't done that yet as I want to be on-site for it.
> Could that behavior somehow be contributing to this problem?  There
> does seem to be a clear correlation between user activity at that
> location and the bad server behavior.

If that device behaves badly in router mode by blocking just all icmp
traffic instead of only icmp-echo-req, this is a good idea. You may
want to bug AT about this problem then. It should really not block
related icmp traffic.

-- 
Regards,
Kai

Replies to list-only preferred.




Re: [gentoo-user] Re: TCP Queuing problem

2016-09-20 Thread Alarig Le Lay
On Tue Sep 20 12:52:57 2016, Grant wrote:
> The spikes are taking place on my remote server but they seem to
> roughly coincide with user activity within my own network.  My
> technical knowledge of networking internals is weak.  Does anyone know
> which tool will tell me more about the connections that are causing
> the TCP Queuing spikes?
> 
> - Grant

As you know when the pick appears, you can tcpdump during this period.

-- 
alarig


signature.asc
Description: Digital signature


Re: [gentoo-user] Re: TCP Queuing problem

2016-09-20 Thread Grant
>>> My web server's response time for http requests skyrockets every
>>> weekday between about 9am and 5pm.  I've gone over my munin
>>graphs
and
>>> the only one that really correlates well with the slowdown is
>>"TCP
>>> Queuing".  It looks like I normally have about 400 packets per
second
>>> graphed as "direct copy from queue" in munin throughout the day,
but 2
>>> to 3.5 times that many are periodically graphed during work
>>hours.
I
>>> don't see the same pattern at all from the graph of all traffic
>>on
my
>>> network interface which actually peaks over the weekend.  TCP
Queuing
>>> doesn't rise above 400 packets per second all weekend.  This is
>>> consistent week after week.
>>>
>>> My two employees come into work during the hours in question, and
they
>>> certainly make frequent requests of the web server while at work,
but
>>> if their volume of requests were the cause of the problem then
>>that
>>> would be reflected in the graph of web server requests but it is
not.
>>> I do run a small MTU on the systems at work due to the config of
the
>>> modem/router we have there.
>>>
>>> Is this a recognizable problem to anyone?
>>
>>
>> I'm in the midst of this.  Are there certain attacks I should
>>check
for?
>
>
> It looks like the TCP Queuing spike itself was due to imapproxy
>>which
> I've now disabled.  I'll post more info as I gather it.


imapproxy was clearly affecting the TCP Queuing graph in munin but I
still ended up with a massive TCP Queuing spike today and
corresponding http response time issues long after I disabled
imapproxy.  Graph attached.  I'm puzzled.

- Grant
>>>
>>> Things to check for:
>>> Torrent or other distributed downloads.
>>> Download program with multiple download threads
>>
>>
>>There sure shouldn't be anything like that running either on the
>>server or in the office.  Is there a good way to find out? Maybe
>>something that would clearly indicate it?
>>
>>
>>> Maybe another proxy running? Esp. as you saw this also with
>>imapproxy.
>>
>>
>>nginx acts as a reverse proxy to apache2 but that's a pretty common
>>config.  Nothing else that I know of.
>>
>>- Grant
>
> Any way to find out between which hosts/servers those connections are for?
> That might help in locating the cause.
>
> Eg. which of your desktops/laptops inside your network and where they are 
> trying to connect to.


The spikes are taking place on my remote server but they seem to
roughly coincide with user activity within my own network.  My
technical knowledge of networking internals is weak.  Does anyone know
which tool will tell me more about the connections that are causing
the TCP Queuing spikes?

- Grant



Re: [gentoo-user] Re: TCP Queuing problem

2016-09-20 Thread J. Roeleveld
On September 20, 2016 4:53:41 PM GMT+02:00, Grant  wrote:
>> My web server's response time for http requests skyrockets every
>> weekday between about 9am and 5pm.  I've gone over my munin
>graphs
>>>and
>> the only one that really correlates well with the slowdown is
>"TCP
>> Queuing".  It looks like I normally have about 400 packets per
>>>second
>> graphed as "direct copy from queue" in munin throughout the day,
>>>but 2
>> to 3.5 times that many are periodically graphed during work
>hours.
>>>I
>> don't see the same pattern at all from the graph of all traffic
>on
>>>my
>> network interface which actually peaks over the weekend.  TCP
>>>Queuing
>> doesn't rise above 400 packets per second all weekend.  This is
>> consistent week after week.
>>
>> My two employees come into work during the hours in question, and
>>>they
>> certainly make frequent requests of the web server while at work,
>>>but
>> if their volume of requests were the cause of the problem then
>that
>> would be reflected in the graph of web server requests but it is
>>>not.
>> I do run a small MTU on the systems at work due to the config of
>>>the
>> modem/router we have there.
>>
>> Is this a recognizable problem to anyone?
>
>
> I'm in the midst of this.  Are there certain attacks I should
>check
>>>for?


 It looks like the TCP Queuing spike itself was due to imapproxy
>which
 I've now disabled.  I'll post more info as I gather it.
>>>
>>>
>>>imapproxy was clearly affecting the TCP Queuing graph in munin but I
>>>still ended up with a massive TCP Queuing spike today and
>>>corresponding http response time issues long after I disabled
>>>imapproxy.  Graph attached.  I'm puzzled.
>>>
>>>- Grant
>>
>> Things to check for:
>> Torrent or other distributed downloads.
>> Download program with multiple download threads
>
>
>There sure shouldn't be anything like that running either on the
>server or in the office.  Is there a good way to find out? Maybe
>something that would clearly indicate it?
>
>
>> Maybe another proxy running? Esp. as you saw this also with
>imapproxy.
>
>
>nginx acts as a reverse proxy to apache2 but that's a pretty common
>config.  Nothing else that I know of.
>
>- Grant

Any way to find out between which hosts/servers those connections are for?
That might help in locating the cause.

Eg. which of your desktops/laptops inside your network and where they are 
trying to connect to.

--
Joost
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: [gentoo-user] Re: TCP Queuing problem

2016-09-20 Thread Grant
> My web server's response time for http requests skyrockets every
> weekday between about 9am and 5pm.  I've gone over my munin graphs
>>and
> the only one that really correlates well with the slowdown is "TCP
> Queuing".  It looks like I normally have about 400 packets per
>>second
> graphed as "direct copy from queue" in munin throughout the day,
>>but 2
> to 3.5 times that many are periodically graphed during work hours.
>>I
> don't see the same pattern at all from the graph of all traffic on
>>my
> network interface which actually peaks over the weekend.  TCP
>>Queuing
> doesn't rise above 400 packets per second all weekend.  This is
> consistent week after week.
>
> My two employees come into work during the hours in question, and
>>they
> certainly make frequent requests of the web server while at work,
>>but
> if their volume of requests were the cause of the problem then that
> would be reflected in the graph of web server requests but it is
>>not.
> I do run a small MTU on the systems at work due to the config of
>>the
> modem/router we have there.
>
> Is this a recognizable problem to anyone?


 I'm in the midst of this.  Are there certain attacks I should check
>>for?
>>>
>>>
>>> It looks like the TCP Queuing spike itself was due to imapproxy which
>>> I've now disabled.  I'll post more info as I gather it.
>>
>>
>>imapproxy was clearly affecting the TCP Queuing graph in munin but I
>>still ended up with a massive TCP Queuing spike today and
>>corresponding http response time issues long after I disabled
>>imapproxy.  Graph attached.  I'm puzzled.
>>
>>- Grant
>
> Things to check for:
> Torrent or other distributed downloads.
> Download program with multiple download threads


There sure shouldn't be anything like that running either on the
server or in the office.  Is there a good way to find out? Maybe
something that would clearly indicate it?


> Maybe another proxy running? Esp. as you saw this also with imapproxy.


nginx acts as a reverse proxy to apache2 but that's a pretty common
config.  Nothing else that I know of.

- Grant



Re: [gentoo-user] Re: TCP Queuing problem

2016-09-20 Thread J. Roeleveld
On September 20, 2016 2:38:03 AM GMT+02:00, Grant  wrote:
 My web server's response time for http requests skyrockets every
 weekday between about 9am and 5pm.  I've gone over my munin graphs
>and
 the only one that really correlates well with the slowdown is "TCP
 Queuing".  It looks like I normally have about 400 packets per
>second
 graphed as "direct copy from queue" in munin throughout the day,
>but 2
 to 3.5 times that many are periodically graphed during work hours. 
>I
 don't see the same pattern at all from the graph of all traffic on
>my
 network interface which actually peaks over the weekend.  TCP
>Queuing
 doesn't rise above 400 packets per second all weekend.  This is
 consistent week after week.

 My two employees come into work during the hours in question, and
>they
 certainly make frequent requests of the web server while at work,
>but
 if their volume of requests were the cause of the problem then that
 would be reflected in the graph of web server requests but it is
>not.
 I do run a small MTU on the systems at work due to the config of
>the
 modem/router we have there.

 Is this a recognizable problem to anyone?
>>>
>>>
>>> I'm in the midst of this.  Are there certain attacks I should check
>for?
>>
>>
>> It looks like the TCP Queuing spike itself was due to imapproxy which
>> I've now disabled.  I'll post more info as I gather it.
>
>
>imapproxy was clearly affecting the TCP Queuing graph in munin but I
>still ended up with a massive TCP Queuing spike today and
>corresponding http response time issues long after I disabled
>imapproxy.  Graph attached.  I'm puzzled.
>
>- Grant

Things to check for:
Torrent or other distributed downloads.
Download program with multiple download threads

Maybe another proxy running? Esp. as you saw this also with imapproxy.

--
Joost
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



[gentoo-user] Re: TCP Queuing problem

2016-09-20 Thread Grant
 My web server's response time for http requests skyrockets every
 weekday between about 9am and 5pm.  I've gone over my munin graphs and
 the only one that really correlates well with the slowdown is "TCP
 Queuing".  It looks like I normally have about 400 packets per second
 graphed as "direct copy from queue" in munin throughout the day, but 2
 to 3.5 times that many are periodically graphed during work hours.  I
 don't see the same pattern at all from the graph of all traffic on my
 network interface which actually peaks over the weekend.  TCP Queuing
 doesn't rise above 400 packets per second all weekend.  This is
 consistent week after week.

 My two employees come into work during the hours in question, and they
 certainly make frequent requests of the web server while at work, but
 if their volume of requests were the cause of the problem then that
 would be reflected in the graph of web server requests but it is not.
 I do run a small MTU on the systems at work due to the config of the
 modem/router we have there.

 Is this a recognizable problem to anyone?
>>>
>>>
>>> I'm in the midst of this.  Are there certain attacks I should check for?
>>
>>
>> It looks like the TCP Queuing spike itself was due to imapproxy which
>> I've now disabled.  I'll post more info as I gather it.
>
>
> imapproxy was clearly affecting the TCP Queuing graph in munin but I
> still ended up with a massive TCP Queuing spike today and
> corresponding http response time issues long after I disabled
> imapproxy.  Graph attached.  I'm puzzled.


I just remembered that our AT modem/router does not respond to
pings.  My solution is to move PPPoE off of that device and onto my
Gentoo router so that pings pass through the AT device to the Gentoo
router but I haven't done that yet as I want to be on-site for it.
Could that behavior somehow be contributing to this problem?  There
does seem to be a clear correlation between user activity at that
location and the bad server behavior.

- Grant



[gentoo-user] Re: TCP Queuing problem

2016-09-19 Thread Grant
>>> My web server's response time for http requests skyrockets every
>>> weekday between about 9am and 5pm.  I've gone over my munin graphs and
>>> the only one that really correlates well with the slowdown is "TCP
>>> Queuing".  It looks like I normally have about 400 packets per second
>>> graphed as "direct copy from queue" in munin throughout the day, but 2
>>> to 3.5 times that many are periodically graphed during work hours.  I
>>> don't see the same pattern at all from the graph of all traffic on my
>>> network interface which actually peaks over the weekend.  TCP Queuing
>>> doesn't rise above 400 packets per second all weekend.  This is
>>> consistent week after week.
>>>
>>> My two employees come into work during the hours in question, and they
>>> certainly make frequent requests of the web server while at work, but
>>> if their volume of requests were the cause of the problem then that
>>> would be reflected in the graph of web server requests but it is not.
>>> I do run a small MTU on the systems at work due to the config of the
>>> modem/router we have there.
>>>
>>> Is this a recognizable problem to anyone?
>>
>>
>> I'm in the midst of this.  Are there certain attacks I should check for?
>
>
> It looks like the TCP Queuing spike itself was due to imapproxy which
> I've now disabled.  I'll post more info as I gather it.


imapproxy was clearly affecting the TCP Queuing graph in munin but I
still ended up with a massive TCP Queuing spike today and
corresponding http response time issues long after I disabled
imapproxy.  Graph attached.  I'm puzzled.

- Grant


[gentoo-user] Re: TCP Queuing problem

2016-09-19 Thread Grant
>> My web server's response time for http requests skyrockets every
>> weekday between about 9am and 5pm.  I've gone over my munin graphs and
>> the only one that really correlates well with the slowdown is "TCP
>> Queuing".  It looks like I normally have about 400 packets per second
>> graphed as "direct copy from queue" in munin throughout the day, but 2
>> to 3.5 times that many are periodically graphed during work hours.  I
>> don't see the same pattern at all from the graph of all traffic on my
>> network interface which actually peaks over the weekend.  TCP Queuing
>> doesn't rise above 400 packets per second all weekend.  This is
>> consistent week after week.
>>
>> My two employees come into work during the hours in question, and they
>> certainly make frequent requests of the web server while at work, but
>> if their volume of requests were the cause of the problem then that
>> would be reflected in the graph of web server requests but it is not.
>> I do run a small MTU on the systems at work due to the config of the
>> modem/router we have there.
>>
>> Is this a recognizable problem to anyone?
>
>
> I'm in the midst of this.  Are there certain attacks I should check for?


It looks like the TCP Queuing spike itself was due to imapproxy which
I've now disabled.  I'll post more info as I gather it.

- Grant



[gentoo-user] Re: TCP Queuing problem

2016-09-19 Thread Grant
> My web server's response time for http requests skyrockets every
> weekday between about 9am and 5pm.  I've gone over my munin graphs and
> the only one that really correlates well with the slowdown is "TCP
> Queuing".  It looks like I normally have about 400 packets per second
> graphed as "direct copy from queue" in munin throughout the day, but 2
> to 3.5 times that many are periodically graphed during work hours.  I
> don't see the same pattern at all from the graph of all traffic on my
> network interface which actually peaks over the weekend.  TCP Queuing
> doesn't rise above 400 packets per second all weekend.  This is
> consistent week after week.
>
> My two employees come into work during the hours in question, and they
> certainly make frequent requests of the web server while at work, but
> if their volume of requests were the cause of the problem then that
> would be reflected in the graph of web server requests but it is not.
> I do run a small MTU on the systems at work due to the config of the
> modem/router we have there.
>
> Is this a recognizable problem to anyone?


I'm in the midst of this.  Are there certain attacks I should check for?

- Grant