Rainer,

On 5/16/22 13:48, Christopher Schultz wrote:
Rainer,

I've been looking into this a little more in my production environment.

These errors are not super common, but there seems to be a steady trickle of errors from my two services that have human users. I see 0 errors for my API-based services, which makes me think that I don't have any performance issues... I probably have human users disappearing for random reasons.

The errors in mod_jk.log look like this:

[Sun May 15 04:19:15.643 2022] [5859:139625025315392] [info] ajp_process_callback::jk_ajp_common.c (2077): (myworker) Writing to client aborted or client network problems [Sun May 15 04:19:15.644 2022] [5859:139625025315392] [info] ajp_service::jk_ajp_common.c (2773): (myworker) sending request to tomcat failed (unrecoverable), because of client write error (attempt=1) [Sun May 15 04:19:15.646 2022] [5859:139625025315392] [info] service::jk_lb_worker.c (1595): service failed, worker myworker is in local error state [Sun May 15 04:19:15.646 2022] [5859:139625025315392] [info] service::jk_lb_worker.c (1614): unrecoverable error 200, request failed. Client failed in the middle of request, we can't recover to another instance. [Sun May 15 04:19:15.646 2022] [5859:139625025315392] [info] jk_handler::mod_jk.c (2984): Aborting connection for worker=myworker

(Note that the last message "Aborting connection for worker=myworker" may have a bug; my actual worker name has a name containing a hyphen (-) and only the text before the hyphen is being emitted in that error message.)

Anyway, when researching these errors, it would be helpful to me to know which requests are failing. By looking at the access_log, I only see a single request logged for 04:19:15 on that server so it's probably the right one, but httpd says that the response code is 302 instead of e.g. 50x for an error.

When we log these kinds of errors, it would be great to know a few things IMO:

1. What was the URL of the request
2. How long did the client wait for the response before we found we couldn't write to the stream (or, conversely, the start-timestamp of the request as well as the timestamp of the error, which I think is already in the log itself)

I see the place in the code where the error is generated, but I'm not familiar enough with the code to know how to add that kind of thing. The function in question (ajp_process_callback) has a pointer to a jk_ws_service_t structure. Is it as simple as also logging like this?

   /* convert start-time to a string */
   char[32] timestamp;
  apr_strftime(timestamp, NULL, 32, "%Y-%m-%d %H:%M:%S", r->r->request_time);
   /* emit a detailed log message */
   jk_log(l, JK_LOG_INFO,
          "(%s) Request to (%s %s) started at %s,%ld",
         ae->worker->name, r->method, r->req_uri, timestamp, r->r->request_time.tm_usec);

Does anyone think this might be generally useful?

It looks like I needed a few more things in order to get this to work:

  {
    char *timestamp = malloc(32);
    apr_time_exp_t *timerec = malloc(sizeof(apr_time_exp_t));
    apache_private_data_t *ap = (apache_private_data_t*)r->ws_private;
    apr_time_exp_gmt(timerec, ap->r->request_time);
    apr_strftime(timestamp, NULL, 32, "%Y-%m-%d %H:%M:%S", timerec);
    /* emit a detailed log message */
    jk_log(l, JK_LOG_INFO,
           "(%s) Request to (%s %s) started at %s",
           ae->worker->name, r->method, r->req_uri, timestamp);
    free(timerec);
    free(timestamp);
  }

The compiler wouldn't let me use an automatically-allocated char[32] so I had to malloc it. I also had to convert from long to apr_time_exp_t which required another structure/malloc as well.

Finally, I had to cast the r->ws_private to apache_private_data_t* which required me to copy/paste the definition of apache_private_data_t from mod_jk.h into jk_ajp_common.c -- just as a test for now.

Obviously, this works only for Apache httpd.

I have it running on one of my web servers for now -- hoping to catch an error and see this additional logging to see if it's helpful to me. (LOL just segfaulted; time to go back and look.)

What would be the best way to get the "start time" of the request in a platform-independent way? Maybe just expand the jk_ws_service structure to include it?

-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to