Hi, W dniu 26.04.2019 o 10:42, Lech Perczak pisze: > Cc [email protected] > > W dniu 25.04.2019 o 11:25, Lech Perczak pisze: >> Hello, >> >> Some time ago, after upgrading the Kernel on our i.MX6Q-based boards to >> mainline 4.18, and now to LTS 4.19 line, during stress tests we started >> noticing strange warnings coming from 'read' syscall, when page_copy_sane() >> check failed. Typical reproducibility is up to ~4 events per 24h. Warnings >> origin from different processes, mostly involved with the stress tests, but >> not necessarily with block devices we're stressing. If the warning appeared >> in process relating to block device stress test, it would be accompanied by >> corrupted data, as the read operation gets aborted. >> Our reason for dropping caches is to ensure that the actual block device >> gets accessed during the test without resorting to O_DIRECT. >> When dropping caches was disabled in the tests, the issue would also >> disappear, at least in a single-week run. >> >> Example processes caught in the last run (grepped from stacktraces): >> [38128.430694] CPU: 2 PID: 32735 Comm: glmark2 Not tainted >> 4.19.32-devboardimx6q+g0a64e3717985 #1 >> [44168.081821] CPU: 2 PID: 21555 Comm: cat Tainted: G W >> 4.19.32-devboardimx6q+g0a64e3717985 #1 >> [75879.424076] CPU: 3 PID: 1478 Comm: flashcp Tainted: G W >> 4.19.32-devboardimx6q+g0a64e3717985 #1 >> [117060.951333] CPU: 1 PID: 14150 Comm: cat Tainted: G W >> 4.19.32-devboardimx6q+g0a64e3717985 #1 >> [201668.438219] CPU: 1 PID: 14370 Comm: glmark2 Tainted: G W >> 4.19.32-devboardimx6q+g0a64e3717985 #1 >> [214332.212960] CPU: 0 PID: 25633 Comm: cp Tainted: G W >> 4.19.32-devboardimx6q+g0a64e3717985 #1 >> [283484.619058] CPU: 0 PID: 5091 Comm: glmark2 Tainted: G W >> 4.19.32-devboardimx6q+g0a64e3717985 #1 >> [330256.709124] CPU: 2 PID: 4565 Comm: sensors Tainted: G W >> 4.19.32-devboardimx6q+g0a64e3717985 #1 >> [337685.549556] CPU: 2 PID: 30132 Comm: sensors Tainted: G W >> 4.19.32-devboardimx6q+g0a64e3717985 #1 >> >> When I started debugging the issue, I noticed that in all cases we're >> dealing with highmem zero-order pages. In this case, page_head(page) == >> page, so page_address(page) should be equal to page_address(head). >> However, it isn't the case, as page_address(head) in each case returns zero, >> causing the value of "v" to explode, and the check to fail. >> >> At first I thought, that it could be caused by modification of struct page >> itself, but it turned out to not be the case - sometimes struct page would >> be unmodified during duration of the check. With lowmem pages, this wouldn't >> also happen, because page_address is derived directly from struct page >> pointer itself. >> >> Finally, after gathering a few traces with added debug logs I decided to >> trigger a panic on the warning and capture a vmcore. >> When analyzing the vmcore, I confirmed that the struct page_address_slot >> corresponding to the struct page had an empty list of mappings, explaining >> NULL returned from page_address(head) in page_copy_sane(). >> Yeah, I had to manually calculate hash of struct page pointer ;) >> >> In the meantime I noticed a patch by Eric Dumazet ("iov_iter: optimize >> page_copy_sane()") [1]. Applying this patch silenced the warnings, at least >> for test run going for over a week. This is quite expected, as with this >> change 'page' isn't dereferenced at all, nor is page_address() called. Here >> I suspect that applying this patch may only hide a real issue I'm facing, >> and might be missing the case of highmem pages as well. >> >> An example stacktrace I captured, with added debug logs and copy of struct >> page before and after the check in page_copy_sane: >> >> This last one shows no changes to struct page over the call to >> page_copy_sane(), where first call to page_address(page) returns a valid >> page address (page_address=2146996224) and second call returns 0 >> (head_address=0). Subsequent calls to page_address inside WARN() argument >> list also return 0. The same was true for the occurence I captured vmcore >> for. >> >> (For your reference, a patch containing my debug prints is at the end of the >> message) >> >> [337685.344204] ------------[ cut here ]------------ >> [337685.356870] WARNING: CPU: 0 PID: 30132 at >> /mnt/work/git/platforms-manifest/build_devboardimx6q/tmp/work-shared/devboardimx6q/kernel-source/lib/iov_iter.c:834 >> page_copy_sane+0x13c/0x178 >> [337685.420706] page_copy_sane: page=7721123e, offset=0, n=4096, >> v=2147000320, head=7721123e, page_address=2146996224, head_address=0, >> compound_order(head)=0, page_address(page)=0, page_address(head)=0 >> page_address(page_copy)=0, >> flags(page)=referenced|uptodate|lru|active|arch_1, >> flags(page_copy)=referenced|uptodate|lru|active|arch_1 >> [337685.494736] Modules linked in: xt_nat iptable_nat nf_nat_ipv4 nf_nat >> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_tables x_tables >> usb_f_mass_storage usb_f_rndis u_ether ci_hdrc_imx ci_hdrc usbmisc_imx ulpi >> libcomposite configfs udc_core >> [337685.549556] CPU: 2 PID: 30132 Comm: sensors Tainted: G W >> 4.19.32-devboardimx6q+g0a64e3717985 #1 >> [337685.558314] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) >> [337685.563600] [<8010fe44>] (unwind_backtrace) from [<8010b840>] >> (show_stack+0x10/0x14) >> [337685.570081] [<8010b840>] (show_stack) from [<807a5958>] >> (dump_stack+0x88/0x9c) >> [337685.576033] [<807a5958>] (dump_stack) from [<8011f500>] >> (__warn+0xfc/0x114) >> [337685.581721] [<8011f500>] (__warn) from [<8011f560>] >> (warn_slowpath_fmt+0x48/0x6c) >> [337685.587936] [<8011f560>] (warn_slowpath_fmt) from [<803a0664>] >> (page_copy_sane+0x13c/0x178) >> [337685.595004] [<803a0664>] (page_copy_sane) from [<803a2fe4>] >> (copy_page_to_iter+0x18/0x474) >> [337685.602020] [<803a2fe4>] (copy_page_to_iter) from [<801bd9d4>] >> (generic_file_read_iter+0x2cc/0x980) >> [337685.609794] [<801bd9d4>] (generic_file_read_iter) from [<8020949c>] >> (__vfs_read+0xf8/0x158) >> [337685.616866] [<8020949c>] (__vfs_read) from [<80209588>] >> (vfs_read+0x8c/0x118) >> [337685.622717] [<80209588>] (vfs_read) from [<80209aac>] >> (ksys_read+0x4c/0xac) >> [337685.628396] [<80209aac>] (ksys_read) from [<80101000>] >> (ret_fast_syscall+0x0/0x54) >> [337685.634668] Exception stack(0xa4e79fa8 to 0xa4e79ff0) >> [337685.638427] 9fa0: 00000008 00000000 00000003 7ed03208 >> 00000200 00000000 >> [337685.645318] 9fc0: 00000008 00000000 76fe8958 00000003 00000000 7ed03208 >> 76fe5640 7ed031b4 >> [337685.652205] 9fe0: 76fe7cf0 7ed0316c 76fbeee0 76fd1b1c >> [337685.717080] ---[ end trace 660d072e57b3d168 ]--- >> [337685.720698] page_copy 00000000: 6c 04 00 40 84 1f f4 ef dc aa c4 80 5c >> e6 9b ed l..@........\... >> [337685.720713] page_copy 00000010: 00 00 00 00 00 00 00 00 ff ff ff ff 02 >> 00 00 00 ................ >> [337685.720721] *page 00000000: 6c 04 00 40 84 1f f4 ef dc aa c4 80 5c >> e6 9b ed l..@........\... >> [337685.720728] *page 00000010: 00 00 00 00 00 00 00 00 ff ff ff ff 02 >> 00 00 00 ................ >> >> The scenario with dropping caches via /proc/sys/vm/drop_caches seems to be >> rarely used (and tested), however I suspect that during dropping caches, >> some pages might be unmapped too early causing other hard-to notice problems. >> >> As my findings are quite worrisome, I'd be glad to hear if my analysis is >> valid, and if optimization made by Eric can be considered 'the fix', or >> maybe it is too optimistic. >> Also as I'd like to investigate the issue further, I'd be glad for any hints >> on where to continue. >> >> [1] https://lkml.org/lkml/2019/2/26/686, merged as 6daef95b8c91 in master >> >> Patch containing debug logs: >> >> diff --git a/lib/iov_iter.c b/lib/iov_iter.c >> index 8be175df3075..ded7e073434c 100644 >> --- a/lib/iov_iter.c >> +++ b/lib/iov_iter.c >> @@ -815,14 +815,25 @@ bool _copy_from_iter_full_nocache(void *addr, size_t >> bytes, struct iov_iter *i) >> } >> EXPORT_SYMBOL(_copy_from_iter_full_nocache); >> >> -static inline bool page_copy_sane(struct page *page, size_t offset, size_t >> n) >> +static noinline bool page_copy_sane(struct page *page, size_t offset, >> size_t n) >> { >> + struct page page_copy = *page; >> struct page *head = compound_head(page); >> - size_t v = n + offset + page_address(page) - page_address(head); >> + size_t page_addr = (size_t) page_address(page); >> + size_t head_addr = (size_t) page_address(head); >> + size_t v = n + offset + page_addr - head_addr; >> + unsigned int order = compound_order(head); >> >> - if (likely(n <= v && v <= (PAGE_SIZE << compound_order(head)))) >> + if (likely(n <= v && v <= (PAGE_SIZE << order))) >> return true; >> - WARN_ON(1); >> + WARN(1, "%s: page=%p, offset=%zu, n=%zu, v=%zu, head=%p, >> page_address=%zu," >> + " head_address=%zu, compound_order(head)=%u, >> page_address(page)=%zu," >> + " page_address(head)=%zu page_address(page_copy)=%zu, >> flags(page)=%pGp, flags(page_copy)=%pGp\n", >> + __func__, page, offset, n, v, head, page_addr, head_addr, order, >> + (size_t) page_address(page), (size_t) page_address(head), >> + (size_t) page_address(&page_copy), &page->flags, >> &page_copy.flags); >> + print_hex_dump_bytes("page_copy ", DUMP_PREFIX_OFFSET, &page_copy, >> sizeof(page_copy)); >> + print_hex_dump_bytes("*page ", DUMP_PREFIX_OFFSET, page, >> sizeof(*page)); >> return false; >> } >> >> -- Pozdrawiam/With kind regards, Lech Perczak Sr. Software Engineer Camlin >> Technologies Poland Limited Sp. z o.o. >> As my question went in during Linux MM & FS summit, it might have gone unnoticed. So just a kind reminder :) Maybe someone got a chance to look at it?
-- Pozdrawiam/With kind regards, Lech Perczak Sr. Software Engineer Camlin Technologies Poland Limited Sp. z o.o.

