> On 28 Oct 2021, at 01:56, Maxim Dounin <[email protected]> wrote: > > Hello! > > On Thu, Oct 28, 2021 at 12:50:25AM +0300, Sergey Kandaurov wrote: > >>> On 27 Oct 2021, at 22:19, Maxim Dounin <[email protected]> wrote: >>> >>> On Wed, Oct 27, 2021 at 05:19:19PM +0300, Sergey Kandaurov wrote: >>> >>>>> On 11 Oct 2021, at 21:58, Maxim Dounin <[email protected]> wrote: >>>>> >>>>> # HG changeset patch >>>>> # User Maxim Dounin <[email protected]> >>>>> # Date 1633978587 -10800 >>>>> # Mon Oct 11 21:56:27 2021 +0300 >>>>> # Node ID 489323e194e4c3b1a7937c51bd4e1671c70f52f8 >>>>> # Parent d175cd09ac9d2bab7f7226eac3bfce196a296cc0 >>>>> Simplified sendfile_max_chunk handling. >>>>> >>>>> Previously, it was checked that sendfile_max_chunk was enabled and >>>>> almost whole sendfile_max_chunk was sent (see e67ef50c3176), to avoid >>>>> delaying connections where sendfile_max_chunk wasn't reached (for example, >>>>> when sending responses smaller than sendfile_max_chunk). Now we instead >>>>> check if there are unsent data, and the connection is still ready for >>>>> writing. >>>>> Additionally we also check c->write->delayed to ignore connections already >>>>> delayed by limit_rate. >>>>> >>>>> This approach is believed to be more robust, and correctly handles >>>>> not only sendfile_max_chunk, but also internal limits of c->send_chain(), >>>>> such as sendfile() maximum supported length (ticket #1870). >>>>> >>>>> diff --git a/src/http/ngx_http_write_filter_module.c >>>>> b/src/http/ngx_http_write_filter_module.c >>>>> --- a/src/http/ngx_http_write_filter_module.c >>>>> +++ b/src/http/ngx_http_write_filter_module.c >>>>> @@ -321,16 +321,12 @@ ngx_http_write_filter(ngx_http_request_t >>>>> delay = (ngx_msec_t) ((nsent - sent) * 1000 / r->limit_rate); >>>>> >>>>> if (delay > 0) { >>>>> - limit = 0; >>>>> c->write->delayed = 1; >>>>> ngx_add_timer(c->write, delay); >>>>> } >>>>> } >>>>> >>>>> - if (limit >>>>> - && c->write->ready >>>>> - && c->sent - sent >= limit - (off_t) (2 * ngx_pagesize)) >>>>> - { >>>>> + if (chain && c->write->ready && !c->write->delayed) { >>>>> ngx_post_event(c->write, &ngx_posted_next_events); >>>>> } >>>>> >>>> >>>> Looks good. >>>> >>>> Not strictly related to this change, so FYI. I noticed a stray writev() >>>> after Linux sendfile(), when it writes more than its internal limits. >>>> >>>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 write old buf t:0 f:1 >>>> 0000000000000000, >>>> pos 0000000000000000, size: 0 file: 416072437, size: 3878894859 >>>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 http write filter: l:1 f:0 >>>> s:3878894859 >>>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 http write filter limit 0 >>>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 sendfile: @416072437 2147482891 >>>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 sendfile: 2147479552 of >>>> 2147482891 @416072437 >>>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 writev: 0 of 0 >>>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 http write filter >>>> 0000561528695820 >>>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 post event 00005615289C2310 >>>> >>>> Here sendfile() partially sent 2147479552, which is above its internal >>>> limit NGX_SENDFILE_MAXSIZE - ngx_pagesize. On the second iteration, >>>> due to this, it falls back to writev() with zero-size headers. >>>> Then, with the patch applied, it posts the next write event, as designed >>>> (previously, it would seemingly stuck instead, such as in ticket #1870). >>> >>> Interesting. >>> >>> Overall it looks harmless, but we may want to look further why >>> sendfile() only sent 2147479552 instead of 2147482891. It seems >>> that 2147479552 is in pages (524287 x 4096) despite the fact that >>> the initial offset is not page-aligned. We expect sendfile() to >>> send page-aligned ranges instead (416072437 + 2147482891 == 625868 x 4096). >>> >>> Looking into Linux sendfile() manpage suggests that 2,147,479,552 >>> is a documented limit: >>> >>> sendfile() will transfer at most 0x7ffff000 (2,147,479,552) >>> bytes, returning the number of bytes actually transferred. >>> (This is true on both 32-bit and 64-bit systems.) >>> >>> This seems to be mostly arbitrary limitation appeared in Linux >>> kernel 2.6.16 >>> (https://github.com/torvalds/linux/commit/e28cc71572da38a5a12c1cfe4d7032017adccf69). >>> >>> Interesting enough, the actual limitation is not 0x7ffff000 as >>> documented, but instead MAX_RW_COUNT, which is defined as >>> (INT_MAX & PAGE_MASK). This suggests that the behaviour will be >>> actually different on platforms with larger pages. >>> >>> Something as simple as: >>> >>> diff --git a/src/os/unix/ngx_linux_sendfile_chain.c >>> b/src/os/unix/ngx_linux_sendfile_chain.c >>> --- a/src/os/unix/ngx_linux_sendfile_chain.c >>> +++ b/src/os/unix/ngx_linux_sendfile_chain.c >>> @@ -216,7 +216,6 @@ ngx_linux_sendfile_chain(ngx_connection_ >>> */ >>> >>> send = prev_send + sent; >>> - continue; >>> } >>> >>> if (send >= limit || in == NULL) { >>> >>> should be enough to resolve this additional 0-sized writev(). >>> Untested though, as I don't have a test playground on hand where >>> 2G sendfile() can be reached. It would be great if you'll test >>> it. >>> >> >> That seems to help: >> >> 2021/10/27 20:36:31 [debug] 1498568#0: *1 write old buf t:1 f:0 >> 000055D8D328FDB0, >> pos 000055D8D328FDB0, size: 252 file: 0, size: 0 >> 2021/10/27 20:36:31 [debug] 1498568#0: *1 write new buf t:0 f:1 >> 0000000000000000, >> pos 0000000000000000, size: 0 file: 0, size: 4294967296 >> 2021/10/27 20:36:31 [debug] 1498568#0: *1 http write filter: l:1 f:0 >> s:4294967548 >> 2021/10/27 20:36:31 [debug] 1498568#0: *1 http write filter limit 0 >> 2021/10/27 20:36:31 [debug] 1498568#0: *1 writev: 252 of 252 >> [.. next ngx_linux_sendfile_chain() loop iteration ..] >> 2021/10/27 20:36:31 [debug] 1498568#0: *1 sendfile: @0 2147479552 >> 2021/10/27 20:36:31 [debug] 1498568#0: *1 sendfile: 2147479552 of 2147479552 >> @0 >> [.. return from ngx_linux_sendfile_chain() on exceeded limit ..] >> 2021/10/27 20:36:31 [debug] 1498568#0: *1 http write filter 000055D8D329C8D0 >> 2021/10/27 20:36:31 [debug] 1498568#0: *1 post event 000055D8D35CC660 > > Thanks for testing. > >>> Full patch: >>> >>> # HG changeset patch >>> # User Maxim Dounin <[email protected]> >>> # Date 1635361800 -10800 >>> # Wed Oct 27 22:10:00 2021 +0300 >>> # Node ID 859447c7b7076b676a3421597514b324b708658d >>> # Parent 2a7155733855d1c2ea1c1ded8d1a4ba654b533cb >>> Fixed sendfile() limit handling on Linux. >>> >>> On Linux starting with 2.6.16, sendfile() silently limits all operations >>> to MAX_RW_COUNT, defined as (INT_MAX & PAGE_MASK). This incorrectly >>> triggered the interrupt check, and resulted in 0-sized writev() on the >>> next loop iteration. >>> >>> Fix is to make sure the limit is always checked, so we will return from >>> the loop if the limit is already reached even if number of bytes sent is >>> not exactly equal to the number of bytes we've tried to send. >>> >>> diff --git a/src/os/unix/ngx_linux_sendfile_chain.c >>> b/src/os/unix/ngx_linux_sendfile_chain.c >>> --- a/src/os/unix/ngx_linux_sendfile_chain.c >>> +++ b/src/os/unix/ngx_linux_sendfile_chain.c >>> @@ -216,7 +216,6 @@ ngx_linux_sendfile_chain(ngx_connection_ >>> */ >>> >>> send = prev_send + sent; >>> - continue; >>> } >>> >>> if (send >= limit || in == NULL) { >>> >> >> The change looks good to me. >> >> Btw, this should also stop exceeding the limit after several sendfile() >> calls each interrupted, on Linux 4.3+ (which is rather theoretical). > > The limiting takes "send" into account, so I don't see how the > limit can be exceeded. > >> It probably deserves updating comments in this file about the count >> parameter constraints. > > The exact behaviour does not seem to be relevant to the resulting > code (in particular, the patch does not change the > NGX_SENDFILE_MAXSIZE limit). On the other hand, I agree that it > might make sense to update the comment anyway, in particular, to > make it clear that the 2G limit is still relevant to current > kernels. I've added the following to the patch: > > @@ -38,6 +38,9 @@ static void ngx_linux_sendfile_thread_ha > * On Linux up to 2.6.16 sendfile() does not allow to pass the count parameter > * more than 2G-1 bytes even on 64-bit platforms: it returns EINVAL, > * so we limit it to 2G-1 bytes. > + * > + * On Linux 2.6.16 and later, sendfile() silently limits the count parameter > + * to 2G minus the page size, even on 64-bit platforms. > */ > > #define NGX_SENDFILE_MAXSIZE 2147483647L > > > Full patch: > > # HG changeset patch > # User Maxim Dounin <[email protected]> > # Date 1635374871 -10800 > # Thu Oct 28 01:47:51 2021 +0300 > # Node ID 3c5679dfe561e3087a96acabe4cf73ef232acabb > # Parent 2a7155733855d1c2ea1c1ded8d1a4ba654b533cb > Fixed sendfile() limit handling on Linux. > > On Linux starting with 2.6.16, sendfile() silently limits all operations > to MAX_RW_COUNT, defined as (INT_MAX & PAGE_MASK). This incorrectly > triggered the interrupt check, and resulted in 0-sized writev() on the > next loop iteration. > > Fix is to make sure the limit is always checked, so we will return from > the loop if the limit is already reached even if number of bytes sent is > not exactly equal to the number of bytes we've tried to send. > > diff --git a/src/os/unix/ngx_linux_sendfile_chain.c > b/src/os/unix/ngx_linux_sendfile_chain.c > --- a/src/os/unix/ngx_linux_sendfile_chain.c > +++ b/src/os/unix/ngx_linux_sendfile_chain.c > @@ -38,6 +38,9 @@ static void ngx_linux_sendfile_thread_ha > * On Linux up to 2.6.16 sendfile() does not allow to pass the count parameter > * more than 2G-1 bytes even on 64-bit platforms: it returns EINVAL, > * so we limit it to 2G-1 bytes. > + * > + * On Linux 2.6.16 and later, sendfile() silently limits the count parameter > + * to 2G minus the page size, even on 64-bit platforms. > */ > > #define NGX_SENDFILE_MAXSIZE 2147483647L > @@ -216,7 +219,6 @@ ngx_linux_sendfile_chain(ngx_connection_ > */ > > send = prev_send + sent; > - continue; > } > > if (send >= limit || in == NULL) { >
Looks fine. -- Sergey Kandaurov _______________________________________________ nginx-devel mailing list [email protected] http://mailman.nginx.org/mailman/listinfo/nginx-devel
