RE: Load balancing problem with activation=disabled
I actually achieved my solution with activation directive, and such a configuration works. The only drawback is: When PROD recovers all traffic is managed by DR until DR goes down, and my case would be to take back all traffic after PROD is recovered. Milos -Original Message- From: Rainer Jung [mailto:rainer.j...@kippdata.de] Sent: 18 October 2016 11:33 To: Tomcat Users List Subject: Re: Load balancing problem with activation=disabled Am 18.10.2016 um 10:10 schrieb Kozak, Milos: > Hi, > > I am debugging a mod_jk load-balancing configuration which has been used a > lot, but for two nodes only. Currently, we made a change for more nodes, and > we are facing problem. > > Original idea was to have one PROD and DR servers such that all requests are > handled by PROD and if PROD goes down DR takes over. In order to do that we > used activation=disabled for DR worker, such that: > > > worker.list=jkstatus,lbhierarchy > worker.jkstatus.type=status > worker.lbhierarchy.type=lb > worker.lbhierarchy.balance_workers=hierarchy-1,hierarchy-2 > > worker.hierarchy-1.type=ajp13 > worker.hierarchy-1.host=host1 > worker.hierarchy-1.port=8009 > worker.hierarchy-1.socket_timeout=0 > worker.hierarchy-1.socket_keepalive=False > worker.hierarchy-1.retries=2 > worker.hierarchy-1.connection_pool_timeout=0 > worker.hierarchy-1.lbfactor=1 > worker.hierarchy-1.redirect=hierarchy-2 > > worker.hierarchy-2.type=ajp13 > worker.hierarchy-2.host=host2 > worker.hierarchy-2.port=8009 > worker.hierarchy-2.socket_timeout=0 > worker.hierarchy-2.socket_keepalive=False > worker.hierarchy-2.retries=2 > worker.hierarchy-2.connection_pool_timeout=0 > worker.hierarchy-2.lbfactor=1 > worker.hierarchy-2.activation=disabled > > > However, the current demand is to have four servers which are chained. > Basically, we try to have one PROD and 3 DR servers. Each DR is activated > when the previous worker goes down. Therefore, we prepared configuration like > this: > > > > worker.list=jkstatus,lbhierarchy,lb2hierarchy > worker.jkstatus.type=status > worker.lbhierarchy.type=lb > worker.lbhierarchy.balance_workers=hierarchy-1,hierarchy-2,hierarchy-3,hierarchy-4 > > worker.hierarchy-1.type=ajp13 > worker.hierarchy-1.host=host > worker.hierarchy-1.port=8009 > worker.hierarchy-1.socket_timeout=0 > worker.hierarchy-1.socket_keepalive=False > worker.hierarchy-1.retries=2 > worker.hierarchy-1.connection_pool_timeout=0 > worker.hierarchy-1.lbfactor=1 > worker.hierarchy-1.redirect=hierarchy-2 > > worker.hierarchy-2.type=ajp13 > worker.hierarchy-2.host=host > worker.hierarchy-2.port=8010 > worker.hierarchy-2.socket_timeout=0 > worker.hierarchy-2.socket_keepalive=False > worker.hierarchy-2.retries=2 > worker.hierarchy-2.connection_pool_timeout=0 > worker.hierarchy-2.lbfactor=1 > worker.hierarchy-2.activation=disabled > worker.hierarchy-2.redirect=hierarchy-3 > > worker.hierarchy-3.type=ajp13 > worker.hierarchy-3.host=host > worker.hierarchy-3.port=8011 > worker.hierarchy-3.socket_timeout=0 > worker.hierarchy-3.socket_keepalive=False > worker.hierarchy-3.retries=2 > worker.hierarchy-3.connection_pool_timeout=0 > worker.hierarchy-3.lbfactor=1 > worker.hierarchy-3.activation=disabled > worker.hierarchy-3.redirect=hierarchy-4 > > worker.hierarchy-4.type=ajp13 > worker.hierarchy-4.host=host12 > worker.hierarchy-4.port=10603 > worker.hierarchy-4.socket_timeout=0 > worker.hierarchy-4.socket_keepalive=False > worker.hierarchy-4.retries=2 > worker.hierarchy-4.connection_pool_timeout=0 > worker.hierarchy-4.lbfactor=1 > worker.hierarchy-4.activation=disabled > worker.hierarchy-4.redirect=hierarchy-1 > > Initially, 3 servers are disabled and redirect is specified. > > Problem occurs when hierarchy-3 worker goes down because hierarchy-4 never > gets initiated and mod_jk log says that: > All tomcat instances failed, no more workers left > > However, workers list is like this: > worker.lbhierarchy.balance_workers=hierarchy-1,hierarchy-2,hierarchy-3,hierarchy-4 > > Which means we have got more than 3 workers hence we have more workers > left... Here is log where I tried to take down workers one by one: > > [Tue Oct 18 09:51:30.623 2016] [31890:139909801125856] [info] > ajp_service::jk_ajp_common.c (2773): (hierarchy-1) sending request to tomcat > failed (recoverable), because of error during request sending (attempt=1) > [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info] > jk_open_socket::jk_connect.c (817): connect to HOST:8009 failed (errno=111) > [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info] > ajp_connect_to_endpoint::jk_ajp_common.c (1068): (hierarchy-1) Failed opening > socket to (HO
Re: Load balancing problem with activation=disabled
Am 18.10.2016 um 10:10 schrieb Kozak, Milos: Hi, I am debugging a mod_jk load-balancing configuration which has been used a lot, but for two nodes only. Currently, we made a change for more nodes, and we are facing problem. Original idea was to have one PROD and DR servers such that all requests are handled by PROD and if PROD goes down DR takes over. In order to do that we used activation=disabled for DR worker, such that: worker.list=jkstatus,lbhierarchy worker.jkstatus.type=status worker.lbhierarchy.type=lb worker.lbhierarchy.balance_workers=hierarchy-1,hierarchy-2 worker.hierarchy-1.type=ajp13 worker.hierarchy-1.host=host1 worker.hierarchy-1.port=8009 worker.hierarchy-1.socket_timeout=0 worker.hierarchy-1.socket_keepalive=False worker.hierarchy-1.retries=2 worker.hierarchy-1.connection_pool_timeout=0 worker.hierarchy-1.lbfactor=1 worker.hierarchy-1.redirect=hierarchy-2 worker.hierarchy-2.type=ajp13 worker.hierarchy-2.host=host2 worker.hierarchy-2.port=8009 worker.hierarchy-2.socket_timeout=0 worker.hierarchy-2.socket_keepalive=False worker.hierarchy-2.retries=2 worker.hierarchy-2.connection_pool_timeout=0 worker.hierarchy-2.lbfactor=1 worker.hierarchy-2.activation=disabled However, the current demand is to have four servers which are chained. Basically, we try to have one PROD and 3 DR servers. Each DR is activated when the previous worker goes down. Therefore, we prepared configuration like this: worker.list=jkstatus,lbhierarchy,lb2hierarchy worker.jkstatus.type=status worker.lbhierarchy.type=lb worker.lbhierarchy.balance_workers=hierarchy-1,hierarchy-2,hierarchy-3,hierarchy-4 worker.hierarchy-1.type=ajp13 worker.hierarchy-1.host=host worker.hierarchy-1.port=8009 worker.hierarchy-1.socket_timeout=0 worker.hierarchy-1.socket_keepalive=False worker.hierarchy-1.retries=2 worker.hierarchy-1.connection_pool_timeout=0 worker.hierarchy-1.lbfactor=1 worker.hierarchy-1.redirect=hierarchy-2 worker.hierarchy-2.type=ajp13 worker.hierarchy-2.host=host worker.hierarchy-2.port=8010 worker.hierarchy-2.socket_timeout=0 worker.hierarchy-2.socket_keepalive=False worker.hierarchy-2.retries=2 worker.hierarchy-2.connection_pool_timeout=0 worker.hierarchy-2.lbfactor=1 worker.hierarchy-2.activation=disabled worker.hierarchy-2.redirect=hierarchy-3 worker.hierarchy-3.type=ajp13 worker.hierarchy-3.host=host worker.hierarchy-3.port=8011 worker.hierarchy-3.socket_timeout=0 worker.hierarchy-3.socket_keepalive=False worker.hierarchy-3.retries=2 worker.hierarchy-3.connection_pool_timeout=0 worker.hierarchy-3.lbfactor=1 worker.hierarchy-3.activation=disabled worker.hierarchy-3.redirect=hierarchy-4 worker.hierarchy-4.type=ajp13 worker.hierarchy-4.host=host12 worker.hierarchy-4.port=10603 worker.hierarchy-4.socket_timeout=0 worker.hierarchy-4.socket_keepalive=False worker.hierarchy-4.retries=2 worker.hierarchy-4.connection_pool_timeout=0 worker.hierarchy-4.lbfactor=1 worker.hierarchy-4.activation=disabled worker.hierarchy-4.redirect=hierarchy-1 Initially, 3 servers are disabled and redirect is specified. Problem occurs when hierarchy-3 worker goes down because hierarchy-4 never gets initiated and mod_jk log says that: All tomcat instances failed, no more workers left However, workers list is like this: worker.lbhierarchy.balance_workers=hierarchy-1,hierarchy-2,hierarchy-3,hierarchy-4 Which means we have got more than 3 workers hence we have more workers left... Here is log where I tried to take down workers one by one: [Tue Oct 18 09:51:30.623 2016] [31890:139909801125856] [info] ajp_service::jk_ajp_common.c (2773): (hierarchy-1) sending request to tomcat failed (recoverable), because of error during request sending (attempt=1) [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info] jk_open_socket::jk_connect.c (817): connect to HOST:8009 failed (errno=111) [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info] ajp_connect_to_endpoint::jk_ajp_common.c (1068): (hierarchy-1) Failed opening socket to (HOST:8009) (errno=111) [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [error] ajp_send_request::jk_ajp_common.c (1728): (hierarchy-1) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=111) [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info] ajp_service::jk_ajp_common.c (2773): (hierarchy-1) sending request to tomcat failed (recoverable), because of error during request sending (attempt=2) [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [error] ajp_service::jk_ajp_common.c (2794): (hierarchy-1) connecting to tomcat failed (rc=-3, errors=5, client_errors=0). [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info] service::jk_lb_worker.c (1595): service failed, worker hierarchy-1 is in error state [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info] jk_open_socket::jk_connect.c (817): connect to HOST:8010 failed (errno=111) [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info]
Load balancing problem with activation=disabled
Hi, I am debugging a mod_jk load-balancing configuration which has been used a lot, but for two nodes only. Currently, we made a change for more nodes, and we are facing problem. Original idea was to have one PROD and DR servers such that all requests are handled by PROD and if PROD goes down DR takes over. In order to do that we used activation=disabled for DR worker, such that: worker.list=jkstatus,lbhierarchy worker.jkstatus.type=status worker.lbhierarchy.type=lb worker.lbhierarchy.balance_workers=hierarchy-1,hierarchy-2 worker.hierarchy-1.type=ajp13 worker.hierarchy-1.host=host1 worker.hierarchy-1.port=8009 worker.hierarchy-1.socket_timeout=0 worker.hierarchy-1.socket_keepalive=False worker.hierarchy-1.retries=2 worker.hierarchy-1.connection_pool_timeout=0 worker.hierarchy-1.lbfactor=1 worker.hierarchy-1.redirect=hierarchy-2 worker.hierarchy-2.type=ajp13 worker.hierarchy-2.host=host2 worker.hierarchy-2.port=8009 worker.hierarchy-2.socket_timeout=0 worker.hierarchy-2.socket_keepalive=False worker.hierarchy-2.retries=2 worker.hierarchy-2.connection_pool_timeout=0 worker.hierarchy-2.lbfactor=1 worker.hierarchy-2.activation=disabled However, the current demand is to have four servers which are chained. Basically, we try to have one PROD and 3 DR servers. Each DR is activated when the previous worker goes down. Therefore, we prepared configuration like this: worker.list=jkstatus,lbhierarchy,lb2hierarchy worker.jkstatus.type=status worker.lbhierarchy.type=lb worker.lbhierarchy.balance_workers=hierarchy-1,hierarchy-2,hierarchy-3,hierarchy-4 worker.hierarchy-1.type=ajp13 worker.hierarchy-1.host=host worker.hierarchy-1.port=8009 worker.hierarchy-1.socket_timeout=0 worker.hierarchy-1.socket_keepalive=False worker.hierarchy-1.retries=2 worker.hierarchy-1.connection_pool_timeout=0 worker.hierarchy-1.lbfactor=1 worker.hierarchy-1.redirect=hierarchy-2 worker.hierarchy-2.type=ajp13 worker.hierarchy-2.host=host worker.hierarchy-2.port=8010 worker.hierarchy-2.socket_timeout=0 worker.hierarchy-2.socket_keepalive=False worker.hierarchy-2.retries=2 worker.hierarchy-2.connection_pool_timeout=0 worker.hierarchy-2.lbfactor=1 worker.hierarchy-2.activation=disabled worker.hierarchy-2.redirect=hierarchy-3 worker.hierarchy-3.type=ajp13 worker.hierarchy-3.host=host worker.hierarchy-3.port=8011 worker.hierarchy-3.socket_timeout=0 worker.hierarchy-3.socket_keepalive=False worker.hierarchy-3.retries=2 worker.hierarchy-3.connection_pool_timeout=0 worker.hierarchy-3.lbfactor=1 worker.hierarchy-3.activation=disabled worker.hierarchy-3.redirect=hierarchy-4 worker.hierarchy-4.type=ajp13 worker.hierarchy-4.host=host12 worker.hierarchy-4.port=10603 worker.hierarchy-4.socket_timeout=0 worker.hierarchy-4.socket_keepalive=False worker.hierarchy-4.retries=2 worker.hierarchy-4.connection_pool_timeout=0 worker.hierarchy-4.lbfactor=1 worker.hierarchy-4.activation=disabled worker.hierarchy-4.redirect=hierarchy-1 Initially, 3 servers are disabled and redirect is specified. Problem occurs when hierarchy-3 worker goes down because hierarchy-4 never gets initiated and mod_jk log says that: All tomcat instances failed, no more workers left However, workers list is like this: worker.lbhierarchy.balance_workers=hierarchy-1,hierarchy-2,hierarchy-3,hierarchy-4 Which means we have got more than 3 workers hence we have more workers left... Here is log where I tried to take down workers one by one: [Tue Oct 18 09:51:30.623 2016] [31890:139909801125856] [info] ajp_service::jk_ajp_common.c (2773): (hierarchy-1) sending request to tomcat failed (recoverable), because of error during request sending (attempt=1) [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info] jk_open_socket::jk_connect.c (817): connect to HOST:8009 failed (errno=111) [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info] ajp_connect_to_endpoint::jk_ajp_common.c (1068): (hierarchy-1) Failed opening socket to (HOST:8009) (errno=111) [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [error] ajp_send_request::jk_ajp_common.c (1728): (hierarchy-1) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=111) [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info] ajp_service::jk_ajp_common.c (2773): (hierarchy-1) sending request to tomcat failed (recoverable), because of error during request sending (attempt=2) [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [error] ajp_service::jk_ajp_common.c (2794): (hierarchy-1) connecting to tomcat failed (rc=-3, errors=5, client_errors=0). [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info] service::jk_lb_worker.c (1595): service failed, worker hierarchy-1 is in error state [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info] jk_open_socket::jk_connect.c (817): connect to HOST:8010 failed (errno=111) [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info] ajp_connect_to_endpoint::jk_ajp_common.c (1068):