https://bz.apache.org/bugzilla/show_bug.cgi?id=58901

            Bug ID: 58901
           Summary: Download of large file is aborted in SSL_write on
                    EINTR
           Product: Apache httpd-2
           Version: 2.4.4
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: mod_ssl
          Assignee: [email protected]
          Reporter: [email protected]

Created attachment 33473
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=33473&action=edit
test script files, one to download 20x 9m file, one to download smaller file

A misconfiguration has revealed problem with SSL_write not handling EINTR .

Problem found in 2.4.4 but code is the same in 2.4.18 so expect the issue still
occurs there as well. 

The problem occurs when large files over https://  and when the httpd process
gets a signal in our case SIGUSR1. The large SSL download is aborted. 

A possible precondition is that the large download hits threshold and then
results in synchronous write call - dont know just from some of the doco I read
: 

[Tue Jan 19 05:04:59.742393 2016] [core:trace6] [pid 1402:tid 4115659632]
core_filters.c(525): [client 10.134.120.119:59922] core_output_filter: flushing
because of THRESHOLD_MAX_BUFFER
[Tue Jan 19 05:04:59.955371 2016] [core:trace6] [pid 1402:tid 4115659632]
core_filters.c(525): [client 10.134.120.119:59922] core_output_filter: flushing
because of THRESHOLD_MAX_BUFFER

Then with getting the SIGUSR1 and the download failure we see the following
entry in the apache error_log : 

[Tue Jan 19 05:04:59.997093 2016] [ssl:info] [pid 1402:tid 4115659632]
(4)Interrupted system call: [client 10.134.120.119:59922] AH01993: SSL output
filter write failed.


The code link for 2.4.4 is here : 

https://svn.apache.org/viewvc/httpd/httpd/tags/2.4.4/modules/ssl/ssl_engine_io.c?view=markup#l761

760    outctx = (bio_filter_out_ctx_t *)filter_ctx->pbioWrite->ptr;
761        res = SSL_write(filter_ctx->pssl, (unsigned char *)data, len);
...
...
778            else if (ssl_err == SSL_ERROR_SYSCALL) {
779                ap_log_cerror(APLOG_MARK, APLOG_INFO, outctx->rc, c,
APLOGNO(01993)
780                              "SSL output filter write failed.");
781            }


So we can see SSL_ERROR_SYSCALL was the cause. 

Checking SSL_write, looks like on write fail it needs to test if can retry.  

http://stackoverflow.com/questions/24188013/openssl-and-signals

int result = SSL_write(ssl, buff, length);
if ((result < 0) && BIO_should_retry(SSL_get_wbio(ssl)))
    // need to retry


Here is sample retry code from OpenSSL

https://github.com/openssl/openssl/blob/919ba009429b3617e975933f37a23be996a33b8d/demos/bio/sconnect.c#L88

    for (;;) {
        i = BIO_write(out, &(p[off]), len);
        if (i <= 0) {
            if (BIO_should_retry(out)) {
                fprintf(stderr, "write DELAY\n");
                sleep(1);
                continue;
            } else {
                goto err;
            }
        }


There are also other samples within mod_ssl itself for checking EINTR for
retry, as per this code around SSL_read, where we can see handling for EINTR
that will resume : 

https://svn.apache.org/viewvc/httpd/httpd/tags/2.4.4/modules/ssl/ssl_engine_io.c?view=markup#l653


591            rc = SSL_read(inctx->filter_ctx->pssl, buf + bytes, wanted -
bytes);
...
...
653            else if (ssl_err == SSL_ERROR_SYSCALL) {
654                    if (APR_STATUS_IS_EAGAIN(inctx->rc)
655                            || APR_STATUS_IS_EINTR(inctx->rc)) {
656                        /* Already read something, return APR_SUCCESS
instead. */
657                        if (*len > 0) {
658                            inctx->rc = APR_SUCCESS;
659                            break;
660                        }
661                        if (inctx->block == APR_NONBLOCK_READ) {
662                            break;
663                        }
664                        continue;  /* Blocking and nothing yet?  Try again.
*/
665                    }
666                    else {
667                        ap_log_cerror(APLOG_MARK, APLOG_INFO, inctx->rc, c,
APLOGNO(01991)
668                                      "SSL input filter read failed.");
669                    }



For completeness this was from strace 

strace -v -o /tmp/strace.log -f -tt -s 16384 httpd/bin/apachectl -k start -D
SSL


13817 23:11:00.530835 <... futex resumed> ) = 0
13817 23:11:00.530889 munmap(0xf5720000, 10489856) = 0
13817 23:11:00.530968 tgkill(13817, 13821, SIGUSR1 <unfinished ...>


13821 23:11:00.531027 <... poll resumed> ) = ? ERESTART_RESTARTBLOCK (To be
rest
arted)
13817 23:11:00.531047 <... tgkill resumed> ) = 0
13821 23:11:00.531061 --- SIGUSR1 (User defined signal 1) @ 0 (0) ---
13821 23:11:00.531138 sigreturn( <unfinished ...>
13817 23:11:00.531158 futex(0xf41fdbd8, FUTEX_WAIT, 13821, NULL <unfinished
...>
13821 23:11:00.531175 <... sigreturn resumed> ) = -1 EINTR (Interrupted system
c
all)
13821 23:11:00.531210 gettimeofday({1453263060, 531221}, NULL) = 0
13821 23:11:00.531274 write(2, "[Tue Jan 19 23:11:00.531221 2016] [ssl:info]
[pi
d 13817:tid 4095728496] (4)Interrupted system call: [client
10.132.160.35:59069]
 AH01993: SSL output filter write failed.\n", 170) = 170
13821 23:11:00.531333 time(NULL)        = 1453263060
13821 23:11:00.531365 shutdown(14, 1 /* send */) = 0
13821 23:11:00.531664 poll([{fd=14, events=POLLIN}], 1, 500) = 1 ([{fd=14,
reven
ts=POLLIN}])


Reproduction steps : 

The misconfiguration that made this problem visible was adding
MaxRequestsPerChild = 1. to httpd-mpm.conf  : 

<IfModule mpm_worker_module>
    ...
    MaxRequestsPerChild      1
</IfModule>


The byproduct of that was that when one request was completed it then sent
SIGUSR1 to the process, interrupting any working threads (which included our
long running download). 

Here is the complete, MPM settings for worker, we also kept number of httpd
proceses down, so we were more likely to get multiple requests goign to the
same httpd process. 

<IfModule mpm_worker_module>
    StartServers             1
    MinSpareThreads         5
    MaxSpareThreads        25
    ThreadsPerChild         25
    MaxRequestWorkers      400
    MaxConnectionsPerChild   0
# tuneing - as per client
    ThreadLimit             128
    MaxClients             800
    ServerLimit            800
    MaxRequestsPerChild      1
</IfModule>


Then we run two scripts 
1) A script that downloads a large file (9) over SSL, 20 times. 
2) A script that repeatedly downloads a small .png file. 

(I will attach them)  

We find we can regularly interrupt the SSL download, so that none of the large
downloads succeed. 

Obviously the problem is resolved by commenting out the parameter : 
    MaxRequestsPerChild      1
Which results in signal being sent after the small request is complete (since
excceded maxrequests), and while the long download is still proceeding. 

And reduced signals will mean much less likely hood of the problem in SSL_write
occuring. 

But I am sure there are legitemate times in operation when this condition will
occur, and will result in aborted SSL download. 


Notes:

a) Problem does not occur over HTTP , only over HTTPS 
b) Problem happens with much smaller files too (300k), larger files just give
more
   chance for the problem to occur.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to