Re: What causes "client errors" with mod_jk

2022-05-27 Thread Rainer Jung

Hi Chris,

Am 27.05.2022 um 01:49 schrieb Christopher Schultz:

On 5/26/22 17:25, Rainer Jung wrote:

Am 26.05.2022 um 21:49 schrieb Christopher Schultz:


Would you prefer to use mod_unique_id + unique-id-logging in mod_jk over 
just adding more request-level information to the mod_jk.log? I'm kind 
of okay either way, but for my current purposes it seems more convenient 
to have all relevant information in a single place (the mod_jk.log file).


I understand, but more useful info might pile up (URI, client IP), so I 
prefer having a working correlation ID to the other major sources of 
info, Apache access log and Apache error log. Also: the current idea of 
mod_jk when a request starts is not exactly when it started in Apache, 
but when the JK service method started. So we would also need to add the 
web server start time.


For now I used the request id and switched a lot (!) of log calls to 
using the new jk_request_log instead of jk_log. So you get the id in the 
next release for many of the lines mod_jk is able to log in a request 
context. Some very low level log lines will not have access to the 
request id, but on ase of error they should always be accompanied by a 
higher level logging of the error including the request id.


We'll see how it goes. Feel free to try it, I committed everything to 
the git repos.


I'm okay adding either or both of these "features" to the JK portion of 
the code. If we are considering "enhancing" this kind of logging in the 
JK portion, I would recommend that we add request_start_time to the 
jk_ws_service; I don't see a good way to determine the nature of the 
host web server from within the JK code and it's better-done from the 
outside-in rather than the inside-out.


Unrelated: I believe the segfaults I'm seeing have to do with me simply 
updating the .so file on the disk before restarting the httpd process. 
As soon as I copy the .so file over the existing module binary, I start 
getting a string of segfaults in the log file. When I don't try to 
"hot-update" the module binary, I don't see any of that happening. (I 
also don't see any possible segfaults in my code at this point, eitaher.) 
I have httpd set up to dump cores but I think my file permissions are 
wrong so I'm not actually getting anything in there (yet).


Best regards,

Rainer

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: What causes "client errors" with mod_jk

2022-05-26 Thread Christopher Schultz

Rainer,

On 5/26/22 17:25, Rainer Jung wrote:

Hi Chris,

Am 26.05.2022 um 21:49 schrieb Christopher Schultz:

On 5/16/22 13:48, Christopher Schultz wrote:


I see the place in the code where the error is generated, but I'm not 
familiar enough with the code to know how to add that kind of thing. 
The function in question (ajp_process_callback) has a pointer to a 
jk_ws_service_t structure. Is it as simple as also logging like this?


   /* convert start-time to a string */
   char[32] timestamp;
   apr_strftime(timestamp, NULL, 32, "%Y-%m-%d %H:%M:%S", 
r->r->request_time);

   /* emit a detailed log message */
   jk_log(l, JK_LOG_INFO,
  "(%s) Request to (%s %s) started at %s,%ld",
  ae->worker->name, r->method, r->req_uri, timestamp, 
r->r->request_time.tm_usec);


Does anyone think this might be generally useful?


It looks like I needed a few more things in order to get this to work:

   {
 char *timestamp = malloc(32);
 apr_time_exp_t *timerec = malloc(sizeof(apr_time_exp_t));
 apache_private_data_t *ap = (apache_private_data_t*)r->ws_private;
 apr_time_exp_gmt(timerec, ap->r->request_time);
 apr_strftime(timestamp, NULL, 32, "%Y-%m-%d %H:%M:%S", timerec);
 /* emit a detailed log message */
 jk_log(l, JK_LOG_INFO,
    "(%s) Request to (%s %s) started at %s",
    ae->worker->name, r->method, r->req_uri, timestamp);
 free(timerec);
 free(timestamp);
   }

The compiler wouldn't let me use an automatically-allocated char[32] 
so I had to malloc it. I also had to convert from long to 
apr_time_exp_t which required another structure/malloc as well.


Finally, I had to cast the r->ws_private to apache_private_data_t* 
which required me to copy/paste the definition of 
apache_private_data_t from mod_jk.h into jk_ajp_common.c -- just as a 
test for now.


Obviously, this works only for Apache httpd.

I have it running on one of my web servers for now -- hoping to catch 
an error and see this additional logging to see if it's helpful to me. 
(LOL just segfaulted; time to go back and look.)


What would be the best way to get the "start time" of the request in a 
platform-independent way? Maybe just expand the jk_ws_service 
structure to include it?


Apart from the info chuck send via PM, I think it would be better to try 
to add a unique request ID as a correlation ID. Apache can already 
generate them using mod_unique_id and you can add them there to the 
access log.


Would you prefer to use mod_unique_id + unique-id-logging in mod_jk over 
just adding more request-level information to the mod_jk.log? I'm kind 
of okay either way, but for my current purposes it seems more convenient 
to have all relevant information in a single place (the mod_jk.log file).



Now how could we make that ID accessible from mod_jk?

We could either add it as a new item to jk_ws_service and I think it 
would be a good fit. Any server not yet providing it, eg. if we find no 
way adding it in IIS, would have a null value there (or some constant we 
init it to). We are free to add things, because we do not really provide 
a public API used elsewhere which we need to keep stable.


When we do logging in mod_jk, we use jk_log(jk_logger_t *l, ...) where 
the remaining arguments are just standard log arguments. We could add a 
new jk_request_log(jk_logger_t *l, jk_ws_service_t *s, ...) and that 
function retrieves the request ID from s and adds it to the log line.


It seems, typically where we want to log something that would benefit 
from a request ID, we have the jk_ws_service_t at hand, so could add it 
to the log call.


WDYT?


I'm okay adding either or both of these "features" to the JK portion of 
the code. If we are considering "enhancing" this kind of logging in the 
JK portion, I would recommend that we add request_start_time to the 
jk_ws_service; I don't see a good way to determine the nature of the 
host web server from within the JK code and it's better-done from the 
outside-in rather than the inside-out.


Unrelated: I believe the segfaults I'm seeing have to do with me simply 
updating the .so file on the disk before restarting the httpd process. 
As soon as I copy the .so file over the existing module binary, I start 
getting a string of segfaults in the log file. When I don't try to 
"hot-update" the module binary, I don't see any of that happening. (I 
also don't see any possible segfaults in my code at this point, either.) 
I have httpd set up to dump cores but I think my file permissions are 
wrong so I'm not actually getting anything in there (yet).


-chris

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: What causes "client errors" with mod_jk

2022-05-26 Thread Rainer Jung

Hi Chris,

Am 26.05.2022 um 21:49 schrieb Christopher Schultz:

On 5/16/22 13:48, Christopher Schultz wrote:


I see the place in the code where the error is generated, but I'm not 
familiar enough with the code to know how to add that kind of thing. 
The function in question (ajp_process_callback) has a pointer to a 
jk_ws_service_t structure. Is it as simple as also logging like this?


   /* convert start-time to a string */
   char[32] timestamp;
   apr_strftime(timestamp, NULL, 32, "%Y-%m-%d %H:%M:%S", 
r->r->request_time);

   /* emit a detailed log message */
   jk_log(l, JK_LOG_INFO,
  "(%s) Request to (%s %s) started at %s,%ld",
  ae->worker->name, r->method, r->req_uri, timestamp, 
r->r->request_time.tm_usec);


Does anyone think this might be generally useful?


It looks like I needed a few more things in order to get this to work:

   {
     char *timestamp = malloc(32);
     apr_time_exp_t *timerec = malloc(sizeof(apr_time_exp_t));
     apache_private_data_t *ap = (apache_private_data_t*)r->ws_private;
     apr_time_exp_gmt(timerec, ap->r->request_time);
     apr_strftime(timestamp, NULL, 32, "%Y-%m-%d %H:%M:%S", timerec);
     /* emit a detailed log message */
     jk_log(l, JK_LOG_INFO,
    "(%s) Request to (%s %s) started at %s",
    ae->worker->name, r->method, r->req_uri, timestamp);
     free(timerec);
     free(timestamp);
   }

The compiler wouldn't let me use an automatically-allocated char[32] so 
I had to malloc it. I also had to convert from long to apr_time_exp_t 
which required another structure/malloc as well.


Finally, I had to cast the r->ws_private to apache_private_data_t* which 
required me to copy/paste the definition of apache_private_data_t from 
mod_jk.h into jk_ajp_common.c -- just as a test for now.


Obviously, this works only for Apache httpd.

I have it running on one of my web servers for now -- hoping to catch an 
error and see this additional logging to see if it's helpful to me. (LOL 
just segfaulted; time to go back and look.)


What would be the best way to get the "start time" of the request in a 
platform-independent way? Maybe just expand the jk_ws_service structure 
to include it?


Apart from the info chuck send via PM, I think it would be better to try 
to add a unique request ID as a correlation ID. Apache can already 
generate them using mod_unique_id and you can add them there to the 
access log.


Now how could we make that ID accessible from mod_jk?

We could either add it as a new item to jk_ws_service and I think it 
would be a good fit. Any server not yet providing it, eg. if we find no 
way adding it in IIS, would have a null value there (or some constant we 
init it to). We are free to add things, because we do not really provide 
a public API used elsewhere which we need to keep stable.


When we do logging in mod_jk, we use jk_log(jk_logger_t *l, ...) where 
the remaining arguments are just standard log arguments. We could add a 
new jk_request_log(jk_logger_t *l, jk_ws_service_t *s, ...) and that 
function retrieves the request ID from s and adds it to the log line.


It seems, typically where we want to log something that would benefit 
from a request ID, we have the jk_ws_service_t at hand, so could add it 
to the log call.


WDYT?

Best regards,

Rainer

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: What causes "client errors" with mod_jk

2022-05-26 Thread Christopher Schultz

Rainer,

On 5/26/22 16:46, Rainer Jung wrote:

Hi Chris,

Am 16.05.2022 um 19:48 schrieb Christopher Schultz:

I've been looking into this a little more in my production environment.

These errors are not super common, but there seems to be a steady 
trickle of errors from my two services that have human users. I see 0 
errors for my API-based services, which makes me think that I don't 
have any performance issues... I probably have human users 
disappearing for random reasons.


Could be unstable (mobile) client connections. Or people already 
clicking on the next frontend action before they received the complete 
response. But that is speculating. So it is right, you try to identify 
some individual reasons to understand more.



The errors in mod_jk.log look like this:

[Sun May 15 04:19:15.643 2022] [5859:139625025315392] [info] 
ajp_process_callback::jk_ajp_common.c (2077): (myworker) Writing to 
client aborted or client network problems
[Sun May 15 04:19:15.644 2022] [5859:139625025315392] [info] 
ajp_service::jk_ajp_common.c (2773): (myworker) sending request to 
tomcat failed (unrecoverable), because of client write error (attempt=1)
[Sun May 15 04:19:15.646 2022] [5859:139625025315392] [info] 
service::jk_lb_worker.c (1595): service failed, worker myworker is in 
local error state
[Sun May 15 04:19:15.646 2022] [5859:139625025315392] [info] 
service::jk_lb_worker.c (1614): unrecoverable error 200, request 
failed. Client failed in the middle of request, we can't recover to 
another instance.
[Sun May 15 04:19:15.646 2022] [5859:139625025315392] [info] 
jk_handler::mod_jk.c (2984): Aborting connection for worker=myworker


(Note that the last message "Aborting connection for worker=myworker" 
may have a bug; my actual worker name has a name containing a hyphen 
(-) and only the text before the hyphen is being emitted in that error 
message.)


Strange, never observed that, but maybe never used a hyphen. Docs say, 
hypens are allowed. Would be interesting to do a server startup with 
trace-Logging and see where things corresponding to the name start to go 
wrong. But of course not related to sporadic client failures.


Right. I may investigate that separately, as I have a setup already with 
everything in place.


Anyway, when researching these errors, it would be helpful to me to 
know which requests are failing. By looking at the access_log, I only 
see a single request logged for 04:19:15 on that server so it's 
probably the right one, but httpd says that the response code is 302 
instead of e.g. 50x for an error.


What I typically do:

- log "%P:%{tid}P" in your Apache httpd custom LogFormat used for the 
access log.


- make sure, I log in in the Apache httpd access log the request 
timestamp including milli or microseconds (not default but 
configurable). Can be done by using the %{format}t syntax in the 
LogFormat and adding "usec_frac" to the format.


- adding %D to the access log format (duration in microseconds)

- remember that Apache logs start of request as default time stamp, but 
mod_jk logs at the moment of error, so later than start of request.


Finding the right access log line for a mod_jk error log line now means:

- filter the access log according to the PID:TID logged in the mod_jk 
error log. In your case 5859:139625025315392. We know, that the requests 
handled by one thread in one process are run strictly sequentially.


- look for the last request in this filtered list, that by access log 
line timestamp started before (or unlikely exactly at) the point in time 
given by the mod_jk access log. If you find one exactly add, it might be 
also the one directly before.


- look at the request durations of these one or two requests to double 
check, whether the times fit.


If you can spare the additional log line bytes, you can additionally log 
the end of request timestamp in the apache access log (prefix "format" 
by "begin:").


Especially by adding this "enhanced logging", it was very easy to find 
the failing requests.


Fortunately for me, the JK log is "now" and the request_time is the 
start-of-request, so I can see the delay between the two. In the cases 
I've seen since I started watching the log, it's typically a very short 
tie like 1-2 sec which shouldn't be something a user gets tired waiting for.


I was more worried like "mod_jk waited 35 seconds for a response from 
upstream and the user went away" and that's not the case.


So I'm happy to find that the reason for these errors is pathological 
user behavior and not some performance problem on my end.


Thanks,
-chris

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: What causes "client errors" with mod_jk

2022-05-26 Thread Christopher Schultz

All,

On 5/26/22 15:49, Christopher Schultz wrote:

Rainer,

On 5/16/22 13:48, Christopher Schultz wrote:

Rainer,

I've been looking into this a little more in my production environment.

These errors are not super common, but there seems to be a steady 
trickle of errors from my two services that have human users. I see 0 
errors for my API-based services, which makes me think that I don't 
have any performance issues... I probably have human users 
disappearing for random reasons.


The errors in mod_jk.log look like this:

[Sun May 15 04:19:15.643 2022] [5859:139625025315392] [info] 
ajp_process_callback::jk_ajp_common.c (2077): (myworker) Writing to 
client aborted or client network problems
[Sun May 15 04:19:15.644 2022] [5859:139625025315392] [info] 
ajp_service::jk_ajp_common.c (2773): (myworker) sending request to 
tomcat failed (unrecoverable), because of client write error (attempt=1)
[Sun May 15 04:19:15.646 2022] [5859:139625025315392] [info] 
service::jk_lb_worker.c (1595): service failed, worker myworker is in 
local error state
[Sun May 15 04:19:15.646 2022] [5859:139625025315392] [info] 
service::jk_lb_worker.c (1614): unrecoverable error 200, request 
failed. Client failed in the middle of request, we can't recover to 
another instance.
[Sun May 15 04:19:15.646 2022] [5859:139625025315392] [info] 
jk_handler::mod_jk.c (2984): Aborting connection for worker=myworker


(Note that the last message "Aborting connection for worker=myworker" 
may have a bug; my actual worker name has a name containing a hyphen 
(-) and only the text before the hyphen is being emitted in that error 
message.)


Anyway, when researching these errors, it would be helpful to me to 
know which requests are failing. By looking at the access_log, I only 
see a single request logged for 04:19:15 on that server so it's 
probably the right one, but httpd says that the response code is 302 
instead of e.g. 50x for an error.


When we log these kinds of errors, it would be great to know a few 
things IMO:


1. What was the URL of the request
2. How long did the client wait for the response before we found we 
couldn't write to the stream (or, conversely, the start-timestamp of 
the request as well as the timestamp of the error, which I think is 
already in the log itself)


I see the place in the code where the error is generated, but I'm not 
familiar enough with the code to know how to add that kind of thing. 
The function in question (ajp_process_callback) has a pointer to a 
jk_ws_service_t structure. Is it as simple as also logging like this?


   /* convert start-time to a string */
   char[32] timestamp;
   apr_strftime(timestamp, NULL, 32, "%Y-%m-%d %H:%M:%S", 
r->r->request_time);

   /* emit a detailed log message */
   jk_log(l, JK_LOG_INFO,
  "(%s) Request to (%s %s) started at %s,%ld",
  ae->worker->name, r->method, r->req_uri, timestamp, 
r->r->request_time.tm_usec);


Does anyone think this might be generally useful?


It looks like I needed a few more things in order to get this to work:

   {
     char *timestamp = malloc(32);
     apr_time_exp_t *timerec = malloc(sizeof(apr_time_exp_t));
     apache_private_data_t *ap = (apache_private_data_t*)r->ws_private;
     apr_time_exp_gmt(timerec, ap->r->request_time);
     apr_strftime(timestamp, NULL, 32, "%Y-%m-%d %H:%M:%S", timerec);
     /* emit a detailed log message */
     jk_log(l, JK_LOG_INFO,
    "(%s) Request to (%s %s) started at %s",
    ae->worker->name, r->method, r->req_uri, timestamp);
     free(timerec);
     free(timestamp);
   }

The compiler wouldn't let me use an automatically-allocated char[32] so 
I had to malloc it. I also had to convert from long to apr_time_exp_t 
which required another structure/malloc as well.


Finally, I had to cast the r->ws_private to apache_private_data_t* which 
required me to copy/paste the definition of apache_private_data_t from 
mod_jk.h into jk_ajp_common.c -- just as a test for now.


Obviously, this works only for Apache httpd.

I have it running on one of my web servers for now -- hoping to catch an 
error and see this additional logging to see if it's helpful to me. (LOL 
just segfaulted; time to go back and look.)


What would be the best way to get the "start time" of the request in a 
platform-independent way? Maybe just expand the jk_ws_service structure 
to include it?


Here is my latest code which hasn't segfaulted in a while:

{
  /* emit a detailed log message */
  char timestamp[32]; // For formatted timestamp string
  apr_time_exp_t timerec; // Required for apr_strftime
  apr_size_t retsize; // Required for apr_strftime
  apache_private_data_t *ap = (apache_private_data_t*)r->ws_private;

  apr_time_exp_gmt(, ap->r->request_time); // Convert 
request_time to apr_time_exp_t
  apr_strftime(timestamp, , 32, "%Y-%m-%d %H:%M:%S", ); 
// Format the timestamp

  if(0 == retsize) {
strcpy(timestamp, "(overflow)");
  }

  jk_log(l, JK_LOG_INFO,
 

Re: What causes "client errors" with mod_jk

2022-05-26 Thread Rainer Jung

Hi Chris,

Am 16.05.2022 um 19:48 schrieb Christopher Schultz:

I've been looking into this a little more in my production environment.

These errors are not super common, but there seems to be a steady 
trickle of errors from my two services that have human users. I see 0 
errors for my API-based services, which makes me think that I don't have 
any performance issues... I probably have human users disappearing for 
random reasons.


Could be unstable (mobile) client connections. Or people already 
clicking on the next frontend action before they received the complete 
response. But that is speculating. So it is right, you try to identify 
some individual reasons to understand more.



The errors in mod_jk.log look like this:

[Sun May 15 04:19:15.643 2022] [5859:139625025315392] [info] 
ajp_process_callback::jk_ajp_common.c (2077): (myworker) Writing to 
client aborted or client network problems
[Sun May 15 04:19:15.644 2022] [5859:139625025315392] [info] 
ajp_service::jk_ajp_common.c (2773): (myworker) sending request to 
tomcat failed (unrecoverable), because of client write error (attempt=1)
[Sun May 15 04:19:15.646 2022] [5859:139625025315392] [info] 
service::jk_lb_worker.c (1595): service failed, worker myworker is in 
local error state
[Sun May 15 04:19:15.646 2022] [5859:139625025315392] [info] 
service::jk_lb_worker.c (1614): unrecoverable error 200, request failed. 
Client failed in the middle of request, we can't recover to another 
instance.
[Sun May 15 04:19:15.646 2022] [5859:139625025315392] [info] 
jk_handler::mod_jk.c (2984): Aborting connection for worker=myworker


(Note that the last message "Aborting connection for worker=myworker" 
may have a bug; my actual worker name has a name containing a hyphen (-) 
and only the text before the hyphen is being emitted in that error 
message.)


Strange, never observed that, but maybe never used a hyphen. Docs say, 
hypens are allowed. Would be interesting to do a server startup with 
trace-Logging and see where things corresponding to the name start to go 
wrong. But of course not related to sporadic client failures.


Anyway, when researching these errors, it would be helpful to me to know 
which requests are failing. By looking at the access_log, I only see a 
single request logged for 04:19:15 on that server so it's probably the 
right one, but httpd says that the response code is 302 instead of e.g. 
50x for an error.


What I typically do:

- log "%P:%{tid}P" in your Apache httpd custom LogFormat used for the 
access log.


- make sure, I log in in the Apache httpd access log the request 
timestamp including milli or microseconds (not default but 
configurable). Can be done by using the %{format}t syntax in the 
LogFormat and adding "usec_frac" to the format.


- adding %D to the access log format (duration in microseconds)

- remember that Apache logs start of request as default time stamp, but 
mod_jk logs at the moment of error, so later than start of request.


Finding the right access log line for a mod_jk error log line now means:

- filter the access log according to the PID:TID logged in the mod_jk 
error log. In your case 5859:139625025315392. We know, that the requests 
handled by one thread in one process are run strictly sequentially.


- look for the last request in this filtered list, that by access log 
line timestamp started before (or unlikely exactly at) the point in time 
given by the mod_jk access log. If you find one exactly add, it might be 
also the one directly before.


- look at the request durations of these one or two requests to double 
check, whether the times fit.


If you can spare the additional log line bytes, you can additionally log 
the end of request timestamp in the apache access log (prefix "format" 
by "begin:").


When we log these kinds of errors, it would be great to know a few 
things IMO:


1. What was the URL of the request
2. How long did the client wait for the response before we found we 
couldn't write to the stream (or, conversely, the start-timestamp of the 
request as well as the timestamp of the error, which I think is already 
in the log itself)


I see the place in the code where the error is generated, but I'm not 
familiar enough with the code to know how to add that kind of thing. The 
function in question (ajp_process_callback) has a pointer to a 
jk_ws_service_t structure. Is it as simple as also logging like this?


   /* convert start-time to a string */
   char[32] timestamp;
   apr_strftime(timestamp, NULL, 32, "%Y-%m-%d %H:%M:%S", 
r->r->request_time);

   /* emit a detailed log message */
   jk_log(l, JK_LOG_INFO,
  "(%s) Request to (%s %s) started at %s,%ld",
  ae->worker->name, r->method, r->req_uri, timestamp, 
r->r->request_time.tm_usec);


Does anyone think this might be generally useful?


I'll have a look at your other mail on this next.

Best regards,

Rainer


Thanks,
-chris

On 3/25/22 08:37, Christopher Schultz wrote:

Rainer,

On 3/24/22 

Re: What causes "client errors" with mod_jk

2022-05-26 Thread Christopher Schultz

Rainer,

On 5/16/22 13:48, Christopher Schultz wrote:

Rainer,

I've been looking into this a little more in my production environment.

These errors are not super common, but there seems to be a steady 
trickle of errors from my two services that have human users. I see 0 
errors for my API-based services, which makes me think that I don't have 
any performance issues... I probably have human users disappearing for 
random reasons.


The errors in mod_jk.log look like this:

[Sun May 15 04:19:15.643 2022] [5859:139625025315392] [info] 
ajp_process_callback::jk_ajp_common.c (2077): (myworker) Writing to 
client aborted or client network problems
[Sun May 15 04:19:15.644 2022] [5859:139625025315392] [info] 
ajp_service::jk_ajp_common.c (2773): (myworker) sending request to 
tomcat failed (unrecoverable), because of client write error (attempt=1)
[Sun May 15 04:19:15.646 2022] [5859:139625025315392] [info] 
service::jk_lb_worker.c (1595): service failed, worker myworker is in 
local error state
[Sun May 15 04:19:15.646 2022] [5859:139625025315392] [info] 
service::jk_lb_worker.c (1614): unrecoverable error 200, request failed. 
Client failed in the middle of request, we can't recover to another 
instance.
[Sun May 15 04:19:15.646 2022] [5859:139625025315392] [info] 
jk_handler::mod_jk.c (2984): Aborting connection for worker=myworker


(Note that the last message "Aborting connection for worker=myworker" 
may have a bug; my actual worker name has a name containing a hyphen (-) 
and only the text before the hyphen is being emitted in that error 
message.)


Anyway, when researching these errors, it would be helpful to me to know 
which requests are failing. By looking at the access_log, I only see a 
single request logged for 04:19:15 on that server so it's probably the 
right one, but httpd says that the response code is 302 instead of e.g. 
50x for an error.


When we log these kinds of errors, it would be great to know a few 
things IMO:


1. What was the URL of the request
2. How long did the client wait for the response before we found we 
couldn't write to the stream (or, conversely, the start-timestamp of the 
request as well as the timestamp of the error, which I think is already 
in the log itself)


I see the place in the code where the error is generated, but I'm not 
familiar enough with the code to know how to add that kind of thing. The 
function in question (ajp_process_callback) has a pointer to a 
jk_ws_service_t structure. Is it as simple as also logging like this?


   /* convert start-time to a string */
   char[32] timestamp;
   apr_strftime(timestamp, NULL, 32, "%Y-%m-%d %H:%M:%S", 
r->r->request_time);

   /* emit a detailed log message */
   jk_log(l, JK_LOG_INFO,
  "(%s) Request to (%s %s) started at %s,%ld",
  ae->worker->name, r->method, r->req_uri, timestamp, 
r->r->request_time.tm_usec);


Does anyone think this might be generally useful?


It looks like I needed a few more things in order to get this to work:

  {
char *timestamp = malloc(32);
apr_time_exp_t *timerec = malloc(sizeof(apr_time_exp_t));
apache_private_data_t *ap = (apache_private_data_t*)r->ws_private;
apr_time_exp_gmt(timerec, ap->r->request_time);
apr_strftime(timestamp, NULL, 32, "%Y-%m-%d %H:%M:%S", timerec);
/* emit a detailed log message */
jk_log(l, JK_LOG_INFO,
   "(%s) Request to (%s %s) started at %s",
   ae->worker->name, r->method, r->req_uri, timestamp);
free(timerec);
free(timestamp);
  }

The compiler wouldn't let me use an automatically-allocated char[32] so 
I had to malloc it. I also had to convert from long to apr_time_exp_t 
which required another structure/malloc as well.


Finally, I had to cast the r->ws_private to apache_private_data_t* which 
required me to copy/paste the definition of apache_private_data_t from 
mod_jk.h into jk_ajp_common.c -- just as a test for now.


Obviously, this works only for Apache httpd.

I have it running on one of my web servers for now -- hoping to catch an 
error and see this additional logging to see if it's helpful to me. (LOL 
just segfaulted; time to go back and look.)


What would be the best way to get the "start time" of the request in a 
platform-independent way? Maybe just expand the jk_ws_service structure 
to include it?


-chris

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: What causes "client errors" with mod_jk

2022-05-16 Thread Christopher Schultz

Rainer,

I've been looking into this a little more in my production environment.

These errors are not super common, but there seems to be a steady 
trickle of errors from my two services that have human users. I see 0 
errors for my API-based services, which makes me think that I don't have 
any performance issues... I probably have human users disappearing for 
random reasons.


The errors in mod_jk.log look like this:

[Sun May 15 04:19:15.643 2022] [5859:139625025315392] [info] 
ajp_process_callback::jk_ajp_common.c (2077): (myworker) Writing to 
client aborted or client network problems
[Sun May 15 04:19:15.644 2022] [5859:139625025315392] [info] 
ajp_service::jk_ajp_common.c (2773): (myworker) sending request to 
tomcat failed (unrecoverable), because of client write error (attempt=1)
[Sun May 15 04:19:15.646 2022] [5859:139625025315392] [info] 
service::jk_lb_worker.c (1595): service failed, worker myworker is in 
local error state
[Sun May 15 04:19:15.646 2022] [5859:139625025315392] [info] 
service::jk_lb_worker.c (1614): unrecoverable error 200, request failed. 
Client failed in the middle of request, we can't recover to another 
instance.
[Sun May 15 04:19:15.646 2022] [5859:139625025315392] [info] 
jk_handler::mod_jk.c (2984): Aborting connection for worker=myworker


(Note that the last message "Aborting connection for worker=myworker" 
may have a bug; my actual worker name has a name containing a hyphen (-) 
and only the text before the hyphen is being emitted in that error message.)


Anyway, when researching these errors, it would be helpful to me to know 
which requests are failing. By looking at the access_log, I only see a 
single request logged for 04:19:15 on that server so it's probably the 
right one, but httpd says that the response code is 302 instead of e.g. 
50x for an error.


When we log these kinds of errors, it would be great to know a few 
things IMO:


1. What was the URL of the request
2. How long did the client wait for the response before we found we 
couldn't write to the stream (or, conversely, the start-timestamp of the 
request as well as the timestamp of the error, which I think is already 
in the log itself)


I see the place in the code where the error is generated, but I'm not 
familiar enough with the code to know how to add that kind of thing. The 
function in question (ajp_process_callback) has a pointer to a 
jk_ws_service_t structure. Is it as simple as also logging like this?


  /* convert start-time to a string */
  char[32] timestamp;
  apr_strftime(timestamp, NULL, 32, "%Y-%m-%d %H:%M:%S", 
r->r->request_time);

  /* emit a detailed log message */
  jk_log(l, JK_LOG_INFO,
 "(%s) Request to (%s %s) started at %s,%ld",
 ae->worker->name, r->method, r->req_uri, timestamp, 
r->r->request_time.tm_usec);


Does anyone think this might be generally useful?

Thanks,
-chris

On 3/25/22 08:37, Christopher Schultz wrote:

Rainer,

On 3/24/22 05:50, Rainer Jung wrote:

Hi Chris,

client errors in jk log are always errors occurring when mod_jk tries 
to write back what it got from the backend using web server APIs to 
the client of the web server (user, browser etc.). So they point to a 
problem between and including the web server and something in front of 
it.


Especially during performance problems, client errors are expected as 
a consequence, because whenever people try to reload, the browser 
closes the original connection and sending back response data via this 
connection later fails.


I was pretty sure this was the case. Is that specifically documented 
anywhere? If not, I'd like to clarify that in the documentation for mod_jk.


Thanks,
-chris


Am 23.03.2022 um 13:08 schrieb Christopher Schultz:

All,

What kinds of things will cause a "client error" in mod_jk's 
accounting? Does that mean things like unexpected disconnects on the 
part of the remote client (i.e. web browser), or does it mean failure 
of the jk module itself to connect (as a client) to the back-end Tomcat?


I'm starting to see situations where we have small numbers of client 
errors occurring "all the time", meaning that we accumulate maybe 
10-20 per day. If that's web browser disconnects then I don't care at 
all. If it's a problem I have with my internal networking and 
resource-allocation, then it's something I have to adjust.


Thanks,
-chris


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: What causes "client errors" with mod_jk

2022-03-25 Thread Christopher Schultz

Rainer,

On 3/24/22 05:50, Rainer Jung wrote:

Hi Chris,

client errors in jk log are always errors occurring when mod_jk tries to 
write back what it got from the backend using web server APIs to the 
client of the web server (user, browser etc.). So they point to a 
problem between and including the web server and something in front of it.


Especially during performance problems, client errors are expected as a 
consequence, because whenever people try to reload, the browser closes 
the original connection and sending back response data via this 
connection later fails.


I was pretty sure this was the case. Is that specifically documented 
anywhere? If not, I'd like to clarify that in the documentation for mod_jk.


Thanks,
-chris


Am 23.03.2022 um 13:08 schrieb Christopher Schultz:

All,

What kinds of things will cause a "client error" in mod_jk's 
accounting? Does that mean things like unexpected disconnects on the 
part of the remote client (i.e. web browser), or does it mean failure 
of the jk module itself to connect (as a client) to the back-end Tomcat?


I'm starting to see situations where we have small numbers of client 
errors occurring "all the time", meaning that we accumulate maybe 
10-20 per day. If that's web browser disconnects then I don't care at 
all. If it's a problem I have with my internal networking and 
resource-allocation, then it's something I have to adjust.


Thanks,
-chris


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: What causes "client errors" with mod_jk

2022-03-24 Thread Rainer Jung

Hi Chris,

client errors in jk log are always errors occurring when mod_jk tries to 
write back what it got from the backend using web server APIs to the 
client of the web server (user, browser etc.). So they point to a 
problem between and including the web server and something in front of it.


Especially during performance problems, client errors are expected as a 
consequence, because whenever people try to reload, the browser closes 
the original connection and sending back response data via this 
connection later fails.


Best regards,

Rainer

Am 23.03.2022 um 13:08 schrieb Christopher Schultz:

All,

What kinds of things will cause a "client error" in mod_jk's accounting? 
Does that mean things like unexpected disconnects on the part of the 
remote client (i.e. web browser), or does it mean failure of the jk 
module itself to connect (as a client) to the back-end Tomcat?


I'm starting to see situations where we have small numbers of client 
errors occurring "all the time", meaning that we accumulate maybe 10-20 
per day. If that's web browser disconnects then I don't care at all. If 
it's a problem I have with my internal networking and 
resource-allocation, then it's something I have to adjust.


Thanks,
-chris


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: What causes "client errors" with mod_jk

2022-03-23 Thread Christopher Schultz

All,

On 3/23/22 08:08, Christopher Schultz wrote:
What kinds of things will cause a "client error" in mod_jk's accounting? 
Does that mean things like unexpected disconnects on the part of the 
remote client (i.e. web browser), or does it mean failure of the jk 
module itself to connect (as a client) to the back-end Tomcat?


I'm starting to see situations where we have small numbers of client 
errors occurring "all the time", meaning that we accumulate maybe 10-20 
per day. If that's web browser disconnects then I don't care at all. If 
it's a problem I have with my internal networking and 
resource-allocation, then it's something I have to adjust.


I have items such as these in the mod_jk.log file:

[Wed Mar 23 12:07:36.192 2022] [22062:140330401956416] [info] 
ajp_process_callback::jk_ajp_common.c (2077): (myworker) Writing to 
client aborted or client network problems
[Wed Mar 23 12:07:36.192 2022] [22062:140330401956416] [info] 
ajp_service::jk_ajp_common.c (2773): (myworker) sending request to 
tomcat failed (unrecoverable), because of client write error (attempt=1)
[Wed Mar 23 12:07:36.194 2022] [22062:140330401956416] [info] 
service::jk_lb_worker.c (1595): service failed, worker myworker is in 
local error state
[Wed Mar 23 12:07:36.194 2022] [22062:140330401956416] [info] 
service::jk_lb_worker.c (1614): unrecoverable error 200, request failed. 
Client failed in the middle of request, we can't recover to another 
instance.
[Wed Mar 23 12:07:36.194 2022] [22062:140330401956416] [info] 
jk_handler::mod_jk.c (2984): Aborting connection for worker=myworker


Those messages are somewhat confusing because some of them seem to 
indicate that the remote client (i.e. web browser) is gone, but then I 
see "worker myworker is in local error state" which looks like mod_jk is 
considering this *worker* to be in an error-state, meaning that it would 
direct traffic to another worker. I'm not seeing the worker being put 
into an ERR state. Is that log message just misleading?


Thanks,
-chris

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



What causes "client errors" with mod_jk

2022-03-23 Thread Christopher Schultz

All,

What kinds of things will cause a "client error" in mod_jk's accounting? 
Does that mean things like unexpected disconnects on the part of the 
remote client (i.e. web browser), or does it mean failure of the jk 
module itself to connect (as a client) to the back-end Tomcat?


I'm starting to see situations where we have small numbers of client 
errors occurring "all the time", meaning that we accumulate maybe 10-20 
per day. If that's web browser disconnects then I don't care at all. If 
it's a problem I have with my internal networking and 
resource-allocation, then it's something I have to adjust.


Thanks,
-chris

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org