Hi,

W dniu 26.04.2019 o 10:42, Lech Perczak pisze:
> Cc [email protected]
>
> W dniu 25.04.2019 o 11:25, Lech Perczak pisze:
>> Hello,
>>
>> Some time ago, after upgrading the Kernel on our i.MX6Q-based boards to 
>> mainline 4.18, and now to LTS 4.19 line, during stress tests we started 
>> noticing strange warnings coming from 'read' syscall, when page_copy_sane() 
>> check failed. Typical reproducibility is up to ~4 events per 24h. Warnings 
>> origin from different processes, mostly involved with the stress tests, but 
>> not necessarily with block devices we're stressing. If the warning appeared 
>> in process relating to block device stress test, it would be accompanied by 
>> corrupted data, as the read operation gets aborted. 
>> Our reason for dropping caches is to ensure that the actual block device 
>> gets accessed during the test without resorting to O_DIRECT.
>> When dropping caches was disabled in the tests, the issue would also 
>> disappear, at least in a single-week run.
>>
>> Example processes caught in the last run (grepped from stacktraces):
>> [38128.430694] CPU: 2 PID: 32735 Comm: glmark2 Not tainted 
>> 4.19.32-devboardimx6q+g0a64e3717985 #1
>> [44168.081821] CPU: 2 PID: 21555 Comm: cat Tainted: G        W         
>> 4.19.32-devboardimx6q+g0a64e3717985 #1
>> [75879.424076] CPU: 3 PID: 1478 Comm: flashcp Tainted: G        W         
>> 4.19.32-devboardimx6q+g0a64e3717985 #1
>> [117060.951333] CPU: 1 PID: 14150 Comm: cat Tainted: G        W         
>> 4.19.32-devboardimx6q+g0a64e3717985 #1
>> [201668.438219] CPU: 1 PID: 14370 Comm: glmark2 Tainted: G        W         
>> 4.19.32-devboardimx6q+g0a64e3717985 #1
>> [214332.212960] CPU: 0 PID: 25633 Comm: cp Tainted: G        W         
>> 4.19.32-devboardimx6q+g0a64e3717985 #1
>> [283484.619058] CPU: 0 PID: 5091 Comm: glmark2 Tainted: G        W         
>> 4.19.32-devboardimx6q+g0a64e3717985 #1
>> [330256.709124] CPU: 2 PID: 4565 Comm: sensors Tainted: G        W         
>> 4.19.32-devboardimx6q+g0a64e3717985 #1
>> [337685.549556] CPU: 2 PID: 30132 Comm: sensors Tainted: G        W         
>> 4.19.32-devboardimx6q+g0a64e3717985 #1
>>
>> When I started debugging the issue, I noticed that in all cases we're 
>> dealing with highmem zero-order pages. In this case, page_head(page) == 
>> page, so page_address(page) should be equal to page_address(head).
>> However, it isn't the case, as page_address(head) in each case returns zero, 
>> causing the value of "v" to explode, and the check to fail.
>>
>> At first I thought, that it could be caused by modification of struct page 
>> itself, but it turned out to not be the case - sometimes struct page would 
>> be unmodified during duration of the check. With lowmem pages, this wouldn't 
>> also happen, because page_address is derived directly from struct page 
>> pointer itself.
>>
>> Finally, after gathering a few traces with added debug logs I decided to 
>> trigger a panic on the warning and capture a vmcore.
>> When analyzing the vmcore, I confirmed that the struct page_address_slot 
>> corresponding to the struct page had an empty list of mappings, explaining 
>> NULL returned from page_address(head) in page_copy_sane().
>> Yeah, I had to manually calculate hash of struct page pointer ;)
>>
>> In the meantime I noticed a patch by Eric Dumazet ("iov_iter: optimize 
>> page_copy_sane()") [1]. Applying this patch silenced the warnings, at least 
>> for test run going for over a week. This is quite expected, as with this 
>> change 'page' isn't dereferenced at all, nor is page_address() called. Here 
>> I suspect that applying this patch may only hide a real issue I'm facing, 
>> and might be missing the case of highmem pages as well.
>>
>> An example stacktrace I captured, with added debug logs and copy of struct 
>> page before and after the check in page_copy_sane:
>>
>> This last one shows no changes to struct page over the call to 
>> page_copy_sane(), where first call to page_address(page) returns a valid 
>> page address (page_address=2146996224) and second call returns 0 
>> (head_address=0). Subsequent calls to page_address inside WARN() argument 
>> list also return 0. The same was true for the occurence I captured vmcore 
>> for.
>>
>> (For your reference, a patch containing my debug prints is at the end of the 
>> message)
>>
>> [337685.344204] ------------[ cut here ]------------
>> [337685.356870] WARNING: CPU: 0 PID: 30132 at 
>> /mnt/work/git/platforms-manifest/build_devboardimx6q/tmp/work-shared/devboardimx6q/kernel-source/lib/iov_iter.c:834
>>  page_copy_sane+0x13c/0x178
>> [337685.420706] page_copy_sane: page=7721123e, offset=0, n=4096, 
>> v=2147000320, head=7721123e, page_address=2146996224, head_address=0, 
>> compound_order(head)=0, page_address(page)=0, page_address(head)=0 
>> page_address(page_copy)=0, 
>> flags(page)=referenced|uptodate|lru|active|arch_1, 
>> flags(page_copy)=referenced|uptodate|lru|active|arch_1
>> [337685.494736] Modules linked in: xt_nat iptable_nat nf_nat_ipv4 nf_nat 
>> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_tables x_tables 
>> usb_f_mass_storage usb_f_rndis u_ether ci_hdrc_imx ci_hdrc usbmisc_imx ulpi 
>> libcomposite configfs udc_core
>> [337685.549556] CPU: 2 PID: 30132 Comm: sensors Tainted: G        W         
>> 4.19.32-devboardimx6q+g0a64e3717985 #1
>> [337685.558314] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
>> [337685.563600] [<8010fe44>] (unwind_backtrace) from [<8010b840>] 
>> (show_stack+0x10/0x14)
>> [337685.570081] [<8010b840>] (show_stack) from [<807a5958>] 
>> (dump_stack+0x88/0x9c)
>> [337685.576033] [<807a5958>] (dump_stack) from [<8011f500>] 
>> (__warn+0xfc/0x114)
>> [337685.581721] [<8011f500>] (__warn) from [<8011f560>] 
>> (warn_slowpath_fmt+0x48/0x6c)
>> [337685.587936] [<8011f560>] (warn_slowpath_fmt) from [<803a0664>] 
>> (page_copy_sane+0x13c/0x178)
>> [337685.595004] [<803a0664>] (page_copy_sane) from [<803a2fe4>] 
>> (copy_page_to_iter+0x18/0x474)
>> [337685.602020] [<803a2fe4>] (copy_page_to_iter) from [<801bd9d4>] 
>> (generic_file_read_iter+0x2cc/0x980)
>> [337685.609794] [<801bd9d4>] (generic_file_read_iter) from [<8020949c>] 
>> (__vfs_read+0xf8/0x158)
>> [337685.616866] [<8020949c>] (__vfs_read) from [<80209588>] 
>> (vfs_read+0x8c/0x118)
>> [337685.622717] [<80209588>] (vfs_read) from [<80209aac>] 
>> (ksys_read+0x4c/0xac)
>> [337685.628396] [<80209aac>] (ksys_read) from [<80101000>] 
>> (ret_fast_syscall+0x0/0x54)
>> [337685.634668] Exception stack(0xa4e79fa8 to 0xa4e79ff0)
>> [337685.638427] 9fa0:                   00000008 00000000 00000003 7ed03208 
>> 00000200 00000000
>> [337685.645318] 9fc0: 00000008 00000000 76fe8958 00000003 00000000 7ed03208 
>> 76fe5640 7ed031b4
>> [337685.652205] 9fe0: 76fe7cf0 7ed0316c 76fbeee0 76fd1b1c
>> [337685.717080] ---[ end trace 660d072e57b3d168 ]---
>> [337685.720698] page_copy 00000000: 6c 04 00 40 84 1f f4 ef dc aa c4 80 5c 
>> e6 9b ed  l..@........\...
>> [337685.720713] page_copy 00000010: 00 00 00 00 00 00 00 00 ff ff ff ff 02 
>> 00 00 00  ................
>> [337685.720721] *page     00000000: 6c 04 00 40 84 1f f4 ef dc aa c4 80 5c 
>> e6 9b ed  l..@........\...
>> [337685.720728] *page     00000010: 00 00 00 00 00 00 00 00 ff ff ff ff 02 
>> 00 00 00  ................
>>
>> The scenario with dropping caches via /proc/sys/vm/drop_caches seems to be 
>> rarely used (and tested), however I suspect that during dropping caches, 
>> some pages might be unmapped too early causing other hard-to notice problems.
>>
>> As my findings are quite worrisome, I'd be glad to hear if my analysis is 
>> valid, and if optimization made by Eric can be considered 'the fix', or 
>> maybe it is too optimistic.
>> Also as I'd like to investigate the issue further, I'd be glad for any hints 
>> on where to continue.
>>
>> [1] https://lkml.org/lkml/2019/2/26/686, merged as 6daef95b8c91 in master
>>
>> Patch containing debug logs:
>>
>> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
>> index 8be175df3075..ded7e073434c 100644
>> --- a/lib/iov_iter.c
>> +++ b/lib/iov_iter.c
>> @@ -815,14 +815,25 @@ bool _copy_from_iter_full_nocache(void *addr, size_t 
>> bytes, struct iov_iter *i)
>>  }
>>  EXPORT_SYMBOL(_copy_from_iter_full_nocache);
>>  
>> -static inline bool page_copy_sane(struct page *page, size_t offset, size_t 
>> n)
>> +static noinline bool page_copy_sane(struct page *page, size_t offset, 
>> size_t n)
>>  {
>> +       struct page page_copy = *page;
>>         struct page *head = compound_head(page);
>> -       size_t v = n + offset + page_address(page) - page_address(head);
>> +       size_t page_addr = (size_t) page_address(page);
>> +       size_t head_addr = (size_t) page_address(head);
>> +       size_t v = n + offset + page_addr - head_addr;
>> +       unsigned int order = compound_order(head);
>>  
>> -       if (likely(n <= v && v <= (PAGE_SIZE << compound_order(head))))
>> +       if (likely(n <= v && v <= (PAGE_SIZE << order)))
>>                 return true;
>> -       WARN_ON(1);
>> +       WARN(1, "%s: page=%p, offset=%zu, n=%zu, v=%zu, head=%p, 
>> page_address=%zu,"
>> +               " head_address=%zu, compound_order(head)=%u, 
>> page_address(page)=%zu,"
>> +               " page_address(head)=%zu page_address(page_copy)=%zu, 
>> flags(page)=%pGp, flags(page_copy)=%pGp\n",
>> +            __func__, page, offset, n, v, head, page_addr, head_addr, order,
>> +           (size_t) page_address(page), (size_t) page_address(head),
>> +           (size_t) page_address(&page_copy), &page->flags, 
>> &page_copy.flags);
>> +       print_hex_dump_bytes("page_copy ", DUMP_PREFIX_OFFSET, &page_copy, 
>> sizeof(page_copy));
>> +       print_hex_dump_bytes("*page     ", DUMP_PREFIX_OFFSET, page, 
>> sizeof(*page));
>>         return false;
>>  }
>>
>> -- Pozdrawiam/With kind regards, Lech Perczak Sr. Software Engineer Camlin 
>> Technologies Poland Limited Sp. z o.o.
>>
As my question went in during Linux MM & FS summit, it might have gone 
unnoticed. So just a kind reminder :)
Maybe someone got a chance to look at it?

-- 
Pozdrawiam/With kind regards,
Lech Perczak

Sr. Software Engineer
Camlin Technologies Poland Limited Sp. z o.o.

Reply via email to