Gang,

I have a somewhat heavily loaded tomcat application that uses a fairly standard Apache 1.3.33 / mod_jk 1.2.15 / tomcat 5.0.28 set up. I am doing load balancing in the mod_jk, going to 2 RedHat 2.6.9 boxes that run the tomcats. The load is split evenly across the two tomcat boxes. The site processes anywhere between 400,000 and 700,000 JSP pages per day. I've included the configuration files at the bottom of this message.

Usually, the site runs great. No problems whatsoever (well, my code occasionally crashes, but that's to be expected). But once in awhile, my mod_jk seems to suddenly start to "lose" connections to my tomcats. mod_jk.log sees this:

[Tue Apr 04 18:23:35 2006] [info] ajp_process_callback::jk_ajp_common.c (1384): Connection aborted or network problems [Tue Apr 04 18:23:35 2006] [info] ajp_service::jk_ajp_common.c (1731): Receiving from tomcat failed, because of client error without recovery in send loop 0 [Tue Apr 04 18:23:35 2006] [info] service::jk_lb_worker.c (711): unrecoverable error 400, request failed. Client failed in the middle of request, we can't recover to another instance. [Tue Apr 04 18:23:35 2006] [info] jk_handler::mod_jk.c (1841): Aborting connection for worker=loadbalancer [Tue Apr 04 18:23:35 2006] [info] ajp_process_callback::jk_ajp_common.c (1384): Connection aborted or network problems [Tue Apr 04 18:23:35 2006] [info] ajp_service::jk_ajp_common.c (1731): Receiving from tomcat failed, because of client error without recovery in send loop 0 [Tue Apr 04 18:23:35 2006] [info] service::jk_lb_worker.c (711): unrecoverable error 400, request failed. Client failed in the middle of request, we can't recover to another instance. [Tue Apr 04 18:23:35 2006] [info] jk_handler::mod_jk.c (1841): Aborting connection for worker=loadbalancer [Tue Apr 04 18:23:35 2006] [info] ajp_process_callback::jk_ajp_common.c (1384): Connection aborted or network problems [Tue Apr 04 18:23:35 2006] [info] ajp_service::jk_ajp_common.c (1731): Receiving from tomcat failed, because of client error without recovery in send loop 0 [Tue Apr 04 18:23:35 2006] [info] service::jk_lb_worker.c (711): unrecoverable error 400, request failed. Client failed in the middle of request, we can't recover to another instance. [Tue Apr 04 18:23:35 2006] [info] jk_handler::mod_jk.c (1841): Aborting connection for worker=loadbalancer


...when this happens, my Apache scoreboard fills up with threads in the "W" state. All new requests seem to get stuck in this state. Quickly, the apache fills up and new requests are locked out. If I look at my tomcat logs, I see that processing has slowed and stopped. One interesting thing is that if I kill and restart my apache, the tomcat logs go crazy with output, as if killing the apache somehow "unsticks" them.

Is something running out of sockets? If I had to guess, I'd say that it feels like mod_jk can't receive the data back from the tomcats, but thats just a hunch. I've seen the apache mod_jk config page which seems to have a bunch of different parameters for dealing with "stuck" tomcats, but I'm unsure of which one to use, because I don't really know whats happening or what the problem is. I've found other posts on the web that list similar problems, but haven't seen any "Oh this is the problem and this is the simple solution". I haven't yet dug in with netstat to see what's going on network-wise, I'm hoping there some magical bullet configuration parameter that will help. C'moooonnnnn magic bullet!!

Thanks for any insight you can provide!


/kurt




Appropriate configs:

My Apache is set to 1024 simultaneous connections and the JkMounts are configured properly. I have my maximum number of file descriptors (sockets) set to 65535 on all machines.

Tomcat AJP connector configs on machines tc1 and tc2

<Connector port="8089" protocol="AJP/1.3" maxThreads="400" maxProcessors="0" minSpareThreads="25" maxSpareThreads="75" enableLookups="false" redirectPort="8593"/>



workers.properties config

worker.list=tc1-w1,tc2-w1,status,loadbalancer
# tomcat 1 on host tc1
worker.tc1-w1.host=192.168.1.254
worker.tc1-w1.port=8089
worker.tc1-w1.type=ajp13
worker.tc1-w1.lbfactor=8

# tomcat 1 on host tc2
worker.tc2-w1.host=192.168.1.247
worker.tc2-w1.port=8089
worker.tc2-w1.type=ajp13
worker.tc2-w1.lbfactor=8

# status and loadbalancer workers
worker.status.type=status
worker.loadbalancer.type=lb
worker.loadbalancer.balanced_workers=tc1-w1,tc2-w1






---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to