Re: [kees:devel/overflow/sanitizers] [overflow] 660787b56e: UBSAN:signed-integer-overflow_in_lib/test_memcat_p.c

2024-01-30 Thread Oliver Sang
hi, Kees,

On Tue, Jan 30, 2024 at 04:23:06PM -0800, Kees Cook wrote:
> On Tue, Jan 30, 2024 at 10:52:56PM +0800, kernel test robot wrote:
> > 

...
 
> > while testing, we observed below different (and same part) between parent 
> > and
> > this commit:
> > 
> > ea804316c9db5148 660787b56e6e97ddc34c7882cbe
> >  ---
> >fail:runs  %reproductionfail:runs
> >| | |
> >   6:60%   6:6 
> > dmesg.UBSAN:shift-out-of-bounds_in_arch/x86/kernel/cpu/intel.c
> >   6:60%   6:6 
> > dmesg.UBSAN:shift-out-of-bounds_in_arch/x86/kernel/cpu/topology.c
> >   6:60%   6:6 
> > dmesg.UBSAN:shift-out-of-bounds_in_fs/namespace.c
> >   6:60%   6:6 
> > dmesg.UBSAN:shift-out-of-bounds_in_fs/read_write.c
> >   6:60%   6:6 
> > dmesg.UBSAN:shift-out-of-bounds_in_include/linux/rhashtable.h
> >   6:60%   6:6 
> > dmesg.UBSAN:shift-out-of-bounds_in_include/net/tcp.h
> 
> Are these shift-out-of-bounds warnings new?

no, they also happen on parent commit.

thanks a lot for all guildance!

> 
> >:6  100%   6:6 
> > dmesg.UBSAN:signed-integer-overflow_in_lib/test_memcat_p.c
> 
> This is new for sure, catching an issue you show below...
> 
> > this looks like the commit uncovered issue. but since it's hard for us to 
> > back
> > port this commit to each commit while bisecting, we cannot capture the real
> > first bad commit. not sure if this report could help somebody to investigate
> > the real issue?
> 
> Yeah, I think there is an unexpected wrap-around in test_memcat_p.c:
> 
> > If you fix the issue in a separate patch/commit (i.e. not just a new 
> > version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot 
> > | Closes: 
> > https://lore.kernel.org/oe-lkp/202401302219.db90a6d5-oliver.s...@intel.com
> > 
> > 
> > [   42.894536][T1] [ cut here ]
> > [   42.895474][T1] UBSAN: signed-integer-overflow in 
> > lib/test_memcat_p.c:47:10
> > [   42.897128][T1] 6570 * 725861 cannot be represented in type 'int'
> 
> I'm surprised to see the sanitizer catching anything here since the
> kernel is built with -fno-strict-overflow, but regardless, I'll send a
> patch...
> 
> -Kees
> 
> -- 
> Kees Cook



Re: [selftests] e48d82b67a: BUG_TestSlub_RZ_alloc(Not_tainted):Redzone_overwritten

2021-03-21 Thread Oliver Sang
Hi Vlastimil,

On Wed, Mar 17, 2021 at 12:29:40PM +0100, Vlastimil Babka wrote:
> On 3/17/21 9:36 AM, kernel test robot wrote:
> > 
> > 
> > Greeting,
> > 
> > FYI, we noticed the following commit (built with gcc-9):
> > 
> > commit: e48d82b67a2b760eedf7b95ca15f41267496386c ("[PATCH 1/2] selftests: 
> > add a kselftest for SLUB debugging functionality")
> > url: 
> > https://github.com/0day-ci/linux/commits/glittao-gmail-com/selftests-add-a-kselftest-for-SLUB-debugging-functionality/20210316-204257
> > base: 
> > https://git.kernel.org/cgit/linux/kernel/git/shuah/linux-kselftest.git next
> > 
> > in testcase: trinity
> > version: trinity-static-i386-x86_64-f93256fb_2019-08-28
> > with following parameters:
> > 
> > group: group-04
> > 
> > test-description: Trinity is a linux system call fuzz tester.
> > test-url: http://codemonkey.org.uk/projects/trinity/
> > 
> > 
> > on test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 8G
> > 
> > caused below changes (please refer to attached dmesg/kmsg for entire 
> > log/backtrace):
> > 
> > 
> > +---+---++
> > |   
> > | v5.12-rc2 | e48d82b67a |
> > +---+---++
> > | BUG_TestSlub_RZ_alloc(Not_tainted):Redzone_overwritten
> > | 0 | 69 |
> > | INFO:0x(ptrval)-0x(ptrval)@offset=#.First_byte#instead_of 
> > | 0 | 69 |
> > | INFO:Allocated_in_resiliency_test_age=#cpu=#pid=  
> > | 0 | 69 |
> > | INFO:Slab0x(ptrval)objects=#used=#fp=0x(ptrval)flags= 
> > | 0 | 69 |
> > | INFO:Object0x(ptrval)@offset=#fp=0x(ptrval)   
> > | 0 | 69 |
> > | BUG_TestSlub_next_ptr_free(Tainted:G_B):Freechain_corrupt 
> > | 0 | 69 |
> > | INFO:Freed_in_resiliency_test_age=#cpu=#pid=  
> > | 0 | 69 |
> > | 
> > BUG_TestSlub_next_ptr_free(Tainted:G_B):Wrong_object_count.Counter_is#but_counted_were
> > | 0 | 69 |
> > | BUG_TestSlub_next_ptr_free(Tainted:G_B):Redzone_overwritten   
> > | 0 | 69 |
> > | 
> > BUG_TestSlub_next_ptr_free(Tainted:G_B):Objects_remaining_in_TestSlub_next_ptr_free_on__kmem_cache_shutdown()
> >  | 0 | 69 |
> > | INFO:Object0x(ptrval)@offset= 
> > | 0 | 69 |
> > | BUG_TestSlub_1th_word_free(Tainted:G_B):Poison_overwritten
> > | 0 | 69 |
> > | BUG_TestSlub_50th_word_free(Tainted:G_B):Poison_overwritten   
> > | 0 | 69 |
> > | BUG_TestSlub_RZ_free(Tainted:G_B):Redzone_overwritten 
> > | 0 | 69 |
> > +---+---++
> > 
> > 
> > If you fix the issue, kindly add following tag
> > Reported-by: kernel test robot 
> > 
> > 
> > 
> > [   22.154049] random: get_random_u32 called from 
> > __kmem_cache_create+0x23/0x3e0 with crng_init=0 
> > [   22.154070] random: get_random_u32 called from 
> > cache_random_seq_create+0x7c/0x140 with crng_init=0 
> > [   22.154167] random: get_random_u32 called from allocate_slab+0x155/0x5e0 
> > with crng_init=0 
> > [   22.154690] test_slub: 1. kmem_cache: Clobber Redzone 0x12->0x(ptrval)
> > [   22.164499] 
> > =
> > [   22.166629] BUG TestSlub_RZ_alloc (Not tainted): Redzone overwritten
> > [   22.168179] 
> > -
> > [   22.168179]
> > [   22.168372] Disabling lock debugging due to kernel taint
> > [   22.168372] INFO: 0x(ptrval)-0x(ptrval) @offset=1064. First byte 0x12 
> > instead of 0xcc
> > [   22.168372] INFO: Allocated in resiliency_test+0x47/0x1be age=3 cpu=0 
> > pid=1 
> > [   22.168372] __slab_alloc+0x57/0x80 
> > [   22.168372] kmem_cache_alloc (kbuild/src/consumer/mm/slub.c:2871 
> > kbuild/src/consumer/mm/slub.c:2915 kbuild/src/consumer/mm/slub.c:2920) 
> > [   

Re: [mm] 8fd8d23ab1: WARNING:at_fs/buffer.c:#__brelse

2021-03-19 Thread Oliver Sang
Hi Minchan,

On Wed, Mar 17, 2021 at 09:29:38AM -0700, Minchan Kim wrote:
> On Wed, Mar 17, 2021 at 10:37:57AM +0800, kernel test robot wrote:
> > 
> > 
> > Greeting,
> > 
> > FYI, we noticed the following commit (built with gcc-9):
> > 
> > commit: 8fd8d23ab10cc2fceeac25ea7b0e2eaf98e78d64 ("[PATCH v3 3/3] mm: fs: 
> > Invalidate BH LRU during page migration")
> > url: 
> > https://github.com/0day-ci/linux/commits/Minchan-Kim/mm-replace-migrate_prep-with-lru_add_drain_all/20210311-001714
> > base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 
> > 144c79ef33536b4ecb4951e07dbc1f2b7fa99d32
> > 
> > in testcase: blktests
> > version: blktests-x86_64-a210761-1_20210124
> > with following parameters:
> > 
> > test: nbd-group-01
> > ucode: 0xe2
> > 
> > 
> > 
> > on test machine: 4 threads Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz with 32G 
> > memory
> > 
> > caused below changes (please refer to attached dmesg/kmsg for entire 
> > log/backtrace):
> > 
> > 
> > 
> > If you fix the issue, kindly add following tag
> > Reported-by: kernel test robot 
> > 
> > 
> > [   40.465061] WARNING: CPU: 2 PID: 5207 at fs/buffer.c:1177 __brelse 
> > (kbuild/src/consumer/fs/buffer.c:1177 kbuild/src/consumer/fs/buffer.c:1171) 
> > [   40.465066] Modules linked in: nbd loop xfs libcrc32c dm_multipath 
> > dm_mod ipmi_devintf ipmi_msghandler sd_mod t10_pi sg intel_rapl_msr 
> > intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel 
> > kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel 
> > ghash_clmulni_intel rapl i915 mei_wdt intel_cstate wmi_bmof intel_gtt 
> > drm_kms_helper syscopyarea ahci intel_uncore sysfillrect sysimgblt libahci 
> > fb_sys_fops drm libata mei_me mei intel_pch_thermal wmi video 
> > intel_pmc_core acpi_pad ip_tables
> > [   40.465086] CPU: 2 PID: 5207 Comm: mount_clear_soc Tainted: G  I 
> >   5.12.0-rc2-00062-g8fd8d23ab10c #1
> > [   40.465088] Hardware name: Dell Inc. OptiPlex 7040/0Y7WYT, BIOS 1.1.1 
> > 10/07/2015
> > [   40.465089] RIP: 0010:__brelse (kbuild/src/consumer/fs/buffer.c:1177 
> > kbuild/src/consumer/fs/buffer.c:1171) 
> > [ 40.465091] Code: 00 00 00 00 00 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 8b 
> > 47 60 85 c0 74 05 f0 ff 4f 60 c3 48 c7 c7 d8 99 57 82 e8 02 5d 80 00 <0f> 
> > 0b c3 0f 1f 44 00 00 55 65 ff 05 13 79 c8 7e 53 48 c7 c3 c0 89
> 
> Hi,
> 
> Unfortunately, I couldn't set the lkp test in my local mahcine
> since installation failed(I guess my linux distribution is
> very minor)
> 
> Do you mind testing this patch? (Please replace the original
> patch with this one)

by replacing the original patch with below one, we confirmed the issue fixed. 
Thanks

> 
> From 23cfe5a8e939e2c077223e009887af8a0b5d6381 Mon Sep 17 00:00:00 2001
> From: Minchan Kim 
> Date: Tue, 2 Mar 2021 12:05:08 -0800
> Subject: [PATCH] mm: fs: Invalidate BH LRU during page migration
> 
> Pages containing buffer_heads that are in one of the per-CPU
> buffer_head LRU caches will be pinned and thus cannot be migrated.
> This can prevent CMA allocations from succeeding, which are often used
> on platforms with co-processors (such as a DSP) that can only use
> physically contiguous memory. It can also prevent memory
> hot-unplugging from succeeding, which involves migrating at least
> MIN_MEMORY_BLOCK_SIZE bytes of memory, which ranges from 8 MiB to 1
> GiB based on the architecture in use.
> 
> Correspondingly, invalidate the BH LRU caches before a migration
> starts and stop any buffer_head from being cached in the LRU caches,
> until migration has finished.
> 
> Signed-off-by: Chris Goldsworthy 
> Signed-off-by: Minchan Kim 
> ---
>  fs/buffer.c | 36 ++--
>  include/linux/buffer_head.h |  4 
>  mm/swap.c   |  5 -
>  3 files changed, 38 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 0cb7ffd4977c..e9872d0dcbf1 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -1264,6 +1264,15 @@ static void bh_lru_install(struct buffer_head *bh)
>   int i;
>  
>   check_irqs_on();
> + /*
> +  * the refcount of buffer_head in bh_lru prevents dropping the
> +  * attached page(i.e., try_to_free_buffers) so it could cause
> +  * failing page migration.
> +  * Skip putting upcoming bh into bh_lru until migration is done.
> +  */
> + if (lru_cache_disabled())
> + return;
> +
>   bh_lru_lock();
>  
>   b = this_cpu_ptr(_lrus);
> @@ -1404,6 +1413,15 @@ __bread_gfp(struct block_device *bdev, sector_t block,
>  }
>  EXPORT_SYMBOL(__bread_gfp);
>  
> +static void __invalidate_bh_lrus(struct bh_lru *b)
> +{
> + int i;
> +
> + for (i = 0; i < BH_LRU_SIZE; i++) {
> + brelse(b->bhs[i]);
> + b->bhs[i] = NULL;
> + }
> +}
>  /*
>   * invalidate_bh_lrus() is called rarely - but not only at unmount.
>   * This doesn't race because it runs in each cpu either in irq
> @@ -1412,16 +1430,12 

Re: [vdpa_sim_net] 79991caf52: net/ipv4/ipmr.c:#RCU-list_traversed_in_non-reader_section

2021-03-18 Thread Oliver Sang
; | 0  | 1  |
> >> | Kernel_panic-not_syncing:Fatal_exception
> >> | 0  | 1  |
> >> | net/ipv4/ipmr.c:#RCU-list_traversed_in_non-reader_section   
> >> | 0  | 8  |
> >> | RIP:arch_local_irq_restore  
> >> | 0  | 1  |
> >> | RIP:idr_get_free
> >> | 0  | 1  |
> >> | net/ipv6/ip6mr.c:#RCU-list_traversed_in_non-reader_section  
> >> | 0  | 2  |
> >> +-+++
> >>
> >>
> >> If you fix the issue, kindly add following tag
> >> Reported-by: kernel test robot 
> >>
> >>
> >> [  890.196279] =
> >> [  890.212608] WARNING: suspicious RCU usage
> >> [  890.228281] 5.11.0-rc4-8-g79991caf5202 #1 Tainted: GW
> >> [  890.244087] -
> >> [  890.259417] net/ipv4/ipmr.c:138 RCU-list traversed in non-reader 
> >> section!!
> >> [  890.275043]
> >> [  890.275043] other info that might help us debug this:
> >> [  890.275043]
> >> [  890.318497]
> >> [  890.318497] rcu_scheduler_active = 2, debug_locks = 1
> >> [  890.346089] 2 locks held by trinity-c1/2476:
> >> [  890.360897]  #0: 888149d6f400 (>f_pos_lock){+.+.}-{3:3}, at: 
> >> __fdget_pos+0xc0/0xe0
> >> [  890.375165]  #1: 8881cabfd5c8 (>lock){+.+.}-{3:3}, at: 
> >> seq_read_iter+0xa0/0x9c0
> >> [  890.389706]
> >> [  890.389706] stack backtrace:
> >> [  890.416375] CPU: 1 PID: 2476 Comm: trinity-c1 Tainted: GW   
> >>   5.11.0-rc4-8-g79991caf5202 #1
> >> [  890.430706] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> >> 1.12.0-1 04/01/2014
> >> [  890.444971] Call Trace:
> >> [  890.458554]  dump_stack+0x15f/0x1bf
> >> [  890.471996]  ipmr_get_table+0x140/0x160
> >> [  890.485328]  ipmr_vif_seq_start+0x4d/0xe0
> >> [  890.498620]  seq_read_iter+0x1b2/0x9c0
> >> [  890.511469]  ? kvm_sched_clock_read+0x14/0x40
> >> [  890.524008]  ? sched_clock+0x1b/0x40
> >> [  890.536095]  ? iov_iter_init+0x7c/0xa0
> >> [  890.548028]  seq_read+0x2fd/0x3e0
> >> [  890.559948]  ? seq_hlist_next_percpu+0x140/0x140
> >> [  890.572204]  ? should_fail+0x78/0x2a0
> >> [  890.584189]  ? write_comp_data+0x2a/0xa0
> >> [  890.596235]  ? __sanitizer_cov_trace_pc+0x1d/0x60
> >> [  890.608134]  ? seq_hlist_next_percpu+0x140/0x140
> >> [  890.620042]  proc_reg_read+0x14e/0x180
> >> [  890.631585]  do_iter_read+0x397/0x420
> >> [  890.642843]  vfs_readv+0xf5/0x160
> >> [  890.653833]  ? vfs_iter_read+0x80/0x80
> >> [  890.664229]  ? __fdget_pos+0xc0/0xe0
> >> [  890.674236]  ? pvclock_clocksource_read+0xd9/0x1a0
> >> [  890.684259]  ? kvm_sched_clock_read+0x14/0x40
> >> [  890.693852]  ? sched_clock+0x1b/0x40
> >> [  890.702898]  ? sched_clock_cpu+0x18/0x120
> >> [  890.711648]  ? write_comp_data+0x2a/0xa0
> >> [  890.720243]  ? __sanitizer_cov_trace_pc+0x1d/0x60
> >> [  890.729290]  do_readv+0x111/0x260
> >> [  890.738205]  ? vfs_readv+0x160/0x160
> >> [  890.747154]  ? lockdep_hardirqs_on+0x77/0x100
> >> [  890.756100]  ? syscall_enter_from_user_mode+0x8a/0x100
> >> [  890.765126]  do_syscall_64+0x34/0x80
> >> [  890.773795]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >> [  890.782630] RIP: 0033:0x453b29
> >> [  890.791189] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 
> >> 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 
> >> <48> 3d 01 f0 ff ff 0f 83 3b 84 00 00 c3 66 2e 0f 1f 84 00 00 00 00
> >> [  890.810866] RSP: 002b:7ffcda44fb18 EFLAGS: 0246 ORIG_RAX: 
> >> 0013
> >> [  890.820764] RAX: ffda RBX: 0013 RCX: 
> >> 00453b29
> >> [  890.830792] RDX: 009a RSI: 01de1c00 RDI: 
> >> 00b9
> >> [  890.840626] RBP: 7ffcda44fbc0 R08: 722c279d69ffc468 R09: 
> >> 0400
> >> [  890.850366] R10: 0098d82a42c63c22 R11: 0246 R12: 
> >> 0002
> >> [  890.860001] R13: 7f042ae6f058 R14: 010a2830 R15: 
> >> 7f042ae6f000
> >>
> >>
> >>
> >> To reproduce:
> >>
> >> # build kernel
> >>cd linux
> >>cp config-5.11.0-rc4-8-g79991caf5202 .config
> >>make HOSTCC=gcc-9 CC=gcc-9 ARCH=x86_64 olddefconfig prepare 
> >> modules_prepare bzImage
> >>
> >> git clone 
> >> https://urldefense.com/v3/__https://github.com/intel/lkp-tests.git__;!!GqivPVa7Brio!LfgrgVVtPAjwjqTZX8yANgsix4f3cJmAA_CcMeCVymh5XYcamWdR9dnbIQA-Qkr9TyI$
> >>  
> >> cd lkp-tests
> >> bin/lkp qemu -k  job-script # job-script is attached in 
> >> this email
> >>
> >>
> >>
> >> Thanks,
> >> Oliver Sang
> >>
> 


Re: [btrfs] 5297199a8b: xfstests.btrfs.220.fail

2021-03-17 Thread Oliver Sang
Hi Nikolay,

On Tue, Mar 09, 2021 at 10:36:52AM +0200, Nikolay Borisov wrote:
> 
> 
> On 9.03.21 г. 10:49 ч., kernel test robot wrote:
> > 
> > 
> > Greeting,
> > 
> > FYI, we noticed the following commit (built with gcc-9):
> > 
> > commit: 5297199a8bca12b8b96afcbf2341605efb6005de ("btrfs: remove inode 
> > number cache feature")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > 
> > 
> > in testcase: xfstests
> > version: xfstests-x86_64-d41dcbd-1_20201218
> > with following parameters:
> > 
> > disk: 6HDD
> > fs: btrfs
> > test: btrfs-group-22
> > ucode: 0x28
> > 
> > test-description: xfstests is a regression test suite for xfs and other 
> > files ystems.
> > test-url: git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
> > 
> > 
> > on test machine: 8 threads Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz with 8G 
> > memory
> > 
> > caused below changes (please refer to attached dmesg/kmsg for entire 
> > log/backtrace):
> > 
> > 
> > 
> > 
> > If you fix the issue, kindly add following tag
> > Reported-by: kernel test robot 
> > 
> > 2021-03-09 04:13:26 export TEST_DIR=/fs/sdb1
> > 2021-03-09 04:13:26 export TEST_DEV=/dev/sdb1
> > 2021-03-09 04:13:26 export FSTYP=btrfs
> > 2021-03-09 04:13:26 export SCRATCH_MNT=/fs/scratch
> > 2021-03-09 04:13:26 mkdir /fs/scratch -p
> > 2021-03-09 04:13:26 export SCRATCH_DEV_POOL="/dev/sdb2 /dev/sdb3 /dev/sdb4 
> > /dev/sdb5 /dev/sdb6"
> > 2021-03-09 04:13:26 sed "s:^:btrfs/:" 
> > //lkp/benchmarks/xfstests/tests/btrfs-group-22
> > 2021-03-09 04:13:26 ./check btrfs/220 btrfs/221 btrfs/222 btrfs/223 
> > btrfs/224 btrfs/225 btrfs/226 btrfs/227
> > FSTYP -- btrfs
> > PLATFORM  -- Linux/x86_64 lkp-hsw-d01 5.10.0-rc7-00162-g5297199a8bca #1 
> > SMP Sat Feb 27 21:06:26 CST 2021
> > MKFS_OPTIONS  -- /dev/sdb2
> > MOUNT_OPTIONS -- /dev/sdb2 /fs/scratch
> > 
> > btrfs/220   - output mismatch (see 
> > /lkp/benchmarks/xfstests/results//btrfs/220.out.bad)
> > --- tests/btrfs/220.out 2021-01-14 07:40:58.0 +
> > +++ /lkp/benchmarks/xfstests/results//btrfs/220.out.bad 2021-03-09 
> > 04:13:32.880794446 +
> > @@ -1,2 +1,3 @@
> >  QA output created by 220
> > +Unexepcted mount options, checking for 
> > 'inode_cache,relatime,space_cache,subvol=/,subvolid=5' in 
> > 'rw,relatime,space_cache,subvolid=5,subvol=/' using 'inode_cache'
> 
> 
> Given that the commit removes the inode_cache feature that's expected, I
> assume you need to adjust your fstests configuration to not use
> inode_cache.

Thanks for information, we will change test options accordingly.



Re: [mm/highmem] 61b205f579: WARNING:at_mm/highmem.c:#__kmap_local_sched_out

2021-03-11 Thread Oliver Sang
Hi Ira,

On Thu, Mar 11, 2021 at 08:02:20AM -0800, Ira Weiny wrote:
> On Tue, Mar 09, 2021 at 08:53:04PM +, Chaitanya Kulkarni wrote:
> > Ira,
> > 
> > On 3/4/21 00:23, kernel test robot wrote:
> > > Greeting,
> > >
> > > FYI, we noticed the following commit (built with gcc-9):
> > >
> > > commit: 61b205f579911a11f0b576f73275eca2aed0d108 ("mm/highmem: Convert 
> > > memcpy_[to|from]_page() to kmap_local_page()")
> > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > >
> > >
> > > in testcase: trinity
> > > version: trinity-static-i386-x86_64-f93256fb_2019-08-28
> > > with following parameters:
> > >
> > >   runtime: 300s
> > >
> > > test-description: Trinity is a linux system call fuzz tester.
> > > test-url: http://codemonkey.org.uk/projects/trinity/
> > >
> > >
> > > on test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 
> > > 8G
> > >
> > > caused below changes (please refer to attached dmesg/kmsg for entire 
> > > log/backtrace):
> > 
> > Is the fix for this been posted yet ?
> 
> No.  I've been unable to reproduce it yet.

just FYI
the issue does not always happen but the rate on 61b205f579 is not low,
while we didn't observe it happen on parent commit.

bb90d4bc7b6a536b 61b205f579911a11f0b576f7327
 ---
   fail:runs  %reproductionfail:runs
   | | |
   :38  16%   6:38dmesg.EIP:__kmap_local_sched_in
   :38  16%   6:38dmesg.EIP:__kmap_local_sched_out
   :38  16%   6:38
dmesg.WARNING:at_mm/highmem.c:#__kmap_local_sched_in
   :38  16%   6:38
dmesg.WARNING:at_mm/highmem.c:#__kmap_local_sched_out

also please permit me to quote our internal analysis by Zhengjun (cced)
(Thanks a lot, Zhengjun)

"the commit has the potential to cause the issue.
It replaces " kmap_atomic" to " kmap_local_page".

Most of the two API is the same, except for " kmap_atomic" disable preemption 
and cannot sleep.
I check the issue happened when there is a preemption,  in FBC " 
kmap_local_page",
the  preemption is enabled,  the issue may happen."
"

> 
> Ira
> 
> > 
> > (asking since I didn't see the fix and my mailer is dropping emails from
> >  lkml).


Re: [tcp] 9d9b1ee0b2: packetdrill.packetdrill/gtests/net/tcp/user_timeout/user-timeout-probe_ipv4-mapped-v6.fail

2021-02-25 Thread Oliver Sang
Hi, Neal,

On Wed, Feb 24, 2021 at 10:13:02PM +0800, Oliver Sang wrote:
> Hi, Neal,
> 
> On Fri, Feb 19, 2021 at 09:52:04AM -0500, Neal Cardwell wrote:
> > On Thu, Feb 18, 2021 at 8:33 PM kernel test robot  
> > wrote:
> > >
> > >
> > > Greeting,
> > >
> > > FYI, we noticed the following commit (built with gcc-9):
> > >
> > > commit: 9d9b1ee0b2d1c9e02b2338c4a4b0a062d2d3edac ("tcp: fix 
> > > TCP_USER_TIMEOUT with zero window")
> > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > 
> > I have pushed to the packetdrill repo a commit that should fix this test:
> > 
> > 094da3bc77e5 (HEAD, packetdrill/master) net-test: update TCP tests for
> > USER_TIMEOUT ZWP fix
> > https://github.com/google/packetdrill/commit/094da3bc77e518d820ebc0ef8b94a5b4cf707a39
> > 
> > Can someone please pull that commit into the repo used by the test
> > bot, if needed? (Or does it automatically use the latest packetdrill
> > master branch?)
> 
> We updated our tool to use this latest packetdrill. seems improved, but not 
> totally fix.
> 
> before upgrading, we have:
> b889c7c8c02ebb0b 9d9b1ee0b2d1c9e02b2338c4a4b
>  ---
>fail:runs  %reproductionfail:runs
>| | |
>:6  100%   6:6 
> packetdrill.packetdrill/gtests/net/tcp/user_timeout/user-timeout-probe_ipv4-mapped-v6.fail
>:6  100%   6:6 
> packetdrill.packetdrill/gtests/net/tcp/user_timeout/user-timeout-probe_ipv4.fail
> 
> after upgrading, we have:
> b889c7c8c02ebb0b 9d9b1ee0b2d1c9e02b2338c4a4b
>  ---
>fail:runs  %reproductionfail:runs
>| | |
>:6  100%   5:6 
> packetdrill.packetdrill/gtests/net/tcp/user_timeout/user-timeout-probe_ipv4-mapped-v6.fail
>:6  100%   3:6 
> packetdrill.packetdrill/gtests/net/tcp/user_timeout/user-timeout-probe_ipv4.fail
> 
> 
> attached kmsg.xz and packetdrill from one run where both tests failed.

here is an update. we did't re-test parent with latest packetdrill yesterday,
so above results about b889c7c8c02ebb0b are still from old version packetdrill.

today, we did further tests based on latest packetdrill, and found the tests
always failed upon b889c7c8c02ebb0b. not sure if a kernel before your commit
(9d9b1ee0b2d1c9e02b2338c4a4b) is still valid to run latest packetdrill?

attached kmsg and test log from latest packetdrill upon parent commit FYI.


> 
> 
> > 
> > thanks,
> > neal


> Running packetdrill/tests/bsd/fast_retransmit/fr-4pkt-sack-bsd.pkt ...
> 2021-02-24 08:46:09 packetdrill/packetdrill --tolerance_usecs=4 
> packetdrill/tests/bsd/fast_retransmit/fr-4pkt-sack-bsd.pkt
> packetdrill/tests/bsd/fast_retransmit/fr-4pkt-sack-bsd.pkt:25: error handling 
> packet: live packet payload: expected 1000 bytes vs actual 2000 bytes
> packetdrill/tests/bsd/fast_retransmit/fr-4pkt-sack-bsd.pkt failed
> Running packetdrill/tests/linux/fast_retransmit/fr-4pkt-sack-linux.pkt ...
> 2021-02-24 08:46:10 packetdrill/packetdrill --tolerance_usecs=4 
> packetdrill/tests/linux/fast_retransmit/fr-4pkt-sack-linux.pkt
> packetdrill/tests/linux/fast_retransmit/fr-4pkt-sack-linux.pkt pass
> Running packetdrill/tests/linux/packetdrill/socket_err.pkt ...
> 2021-02-24 08:46:10 packetdrill/packetdrill --tolerance_usecs=4 
> packetdrill/tests/linux/packetdrill/socket_err.pkt
> packetdrill/tests/linux/packetdrill/socket_err.pkt:6: runtime error in socket 
> call: Expected non-negative result but got -1 with errno 93 (Protocol not 
> supported)
> packetdrill/tests/linux/packetdrill/socket_err.pkt failed
> Running packetdrill/tests/linux/packetdrill/socket_wrong_err.pkt ...
> 2021-02-24 08:46:10 packetdrill/packetdrill --tolerance_usecs=4 
> packetdrill/tests/linux/packetdrill/socket_wrong_err.pkt
> packetdrill/tests/linux/packetdrill/socket_wrong_err.pkt:4: runtime error in 
> socket call: Expected result -99 but got -1 with errno 93 (Protocol not 
> supported)
> packetdrill/tests/linux/packetdrill/socket_wrong_err.pkt failed
> OK   [/lkp/benchmarks/packetdrill/gtests/net/tcp/blocking/blocking-accept.pkt 
> (ipv4)]
> stdout: 
> stderr: 
> OK   [/lkp/benchmarks/packetdrill/gtests/net/tcp/blocking/blocking-accept.pkt 
> (ipv6)]
> stdout: 
> stderr: 
> OK   
> [/lkp/benchmarks/packetdrill/gtests/net/tcp/blocking/blocking-connect.pkt 
> (ipv4-mapped-v6)]
> stdout: 
> stderr: 
> OK   [/lkp/benchmarks/packetdrill/gtests/net/tcp/blocking/blocking-read.pkt 
> 

Re: [tcp] 9d9b1ee0b2: packetdrill.packetdrill/gtests/net/tcp/user_timeout/user-timeout-probe_ipv4-mapped-v6.fail

2021-02-24 Thread Oliver Sang
Hi, Neal,

On Fri, Feb 19, 2021 at 09:52:04AM -0500, Neal Cardwell wrote:
> On Thu, Feb 18, 2021 at 8:33 PM kernel test robot  
> wrote:
> >
> >
> > Greeting,
> >
> > FYI, we noticed the following commit (built with gcc-9):
> >
> > commit: 9d9b1ee0b2d1c9e02b2338c4a4b0a062d2d3edac ("tcp: fix 
> > TCP_USER_TIMEOUT with zero window")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> 
> I have pushed to the packetdrill repo a commit that should fix this test:
> 
> 094da3bc77e5 (HEAD, packetdrill/master) net-test: update TCP tests for
> USER_TIMEOUT ZWP fix
> https://github.com/google/packetdrill/commit/094da3bc77e518d820ebc0ef8b94a5b4cf707a39
> 
> Can someone please pull that commit into the repo used by the test
> bot, if needed? (Or does it automatically use the latest packetdrill
> master branch?)

We updated our tool to use this latest packetdrill. seems improved, but not 
totally fix.

before upgrading, we have:
b889c7c8c02ebb0b 9d9b1ee0b2d1c9e02b2338c4a4b
 ---
   fail:runs  %reproductionfail:runs
   | | |
   :6  100%   6:6 
packetdrill.packetdrill/gtests/net/tcp/user_timeout/user-timeout-probe_ipv4-mapped-v6.fail
   :6  100%   6:6 
packetdrill.packetdrill/gtests/net/tcp/user_timeout/user-timeout-probe_ipv4.fail

after upgrading, we have:
b889c7c8c02ebb0b 9d9b1ee0b2d1c9e02b2338c4a4b
 ---
   fail:runs  %reproductionfail:runs
   | | |
   :6  100%   5:6 
packetdrill.packetdrill/gtests/net/tcp/user_timeout/user-timeout-probe_ipv4-mapped-v6.fail
   :6  100%   3:6 
packetdrill.packetdrill/gtests/net/tcp/user_timeout/user-timeout-probe_ipv4.fail


attached kmsg.xz and packetdrill from one run where both tests failed.


> 
> thanks,
> neal


kmsg.xz
Description: application/xz
Running packetdrill/tests/bsd/fast_retransmit/fr-4pkt-sack-bsd.pkt ...
2021-02-24 08:46:09 packetdrill/packetdrill --tolerance_usecs=4 
packetdrill/tests/bsd/fast_retransmit/fr-4pkt-sack-bsd.pkt
packetdrill/tests/bsd/fast_retransmit/fr-4pkt-sack-bsd.pkt:25: error handling 
packet: live packet payload: expected 1000 bytes vs actual 2000 bytes
packetdrill/tests/bsd/fast_retransmit/fr-4pkt-sack-bsd.pkt failed
Running packetdrill/tests/linux/fast_retransmit/fr-4pkt-sack-linux.pkt ...
2021-02-24 08:46:10 packetdrill/packetdrill --tolerance_usecs=4 
packetdrill/tests/linux/fast_retransmit/fr-4pkt-sack-linux.pkt
packetdrill/tests/linux/fast_retransmit/fr-4pkt-sack-linux.pkt pass
Running packetdrill/tests/linux/packetdrill/socket_err.pkt ...
2021-02-24 08:46:10 packetdrill/packetdrill --tolerance_usecs=4 
packetdrill/tests/linux/packetdrill/socket_err.pkt
packetdrill/tests/linux/packetdrill/socket_err.pkt:6: runtime error in socket 
call: Expected non-negative result but got -1 with errno 93 (Protocol not 
supported)
packetdrill/tests/linux/packetdrill/socket_err.pkt failed
Running packetdrill/tests/linux/packetdrill/socket_wrong_err.pkt ...
2021-02-24 08:46:10 packetdrill/packetdrill --tolerance_usecs=4 
packetdrill/tests/linux/packetdrill/socket_wrong_err.pkt
packetdrill/tests/linux/packetdrill/socket_wrong_err.pkt:4: runtime error in 
socket call: Expected result -99 but got -1 with errno 93 (Protocol not 
supported)
packetdrill/tests/linux/packetdrill/socket_wrong_err.pkt failed
OK   [/lkp/benchmarks/packetdrill/gtests/net/tcp/blocking/blocking-accept.pkt 
(ipv4)]
stdout: 
stderr: 
OK   [/lkp/benchmarks/packetdrill/gtests/net/tcp/blocking/blocking-accept.pkt 
(ipv6)]
stdout: 
stderr: 
OK   [/lkp/benchmarks/packetdrill/gtests/net/tcp/blocking/blocking-connect.pkt 
(ipv4-mapped-v6)]
stdout: 
stderr: 
OK   [/lkp/benchmarks/packetdrill/gtests/net/tcp/blocking/blocking-read.pkt 
(ipv4)]
stdout: 
stderr: 
OK   [/lkp/benchmarks/packetdrill/gtests/net/tcp/blocking/blocking-read.pkt 
(ipv6)]
stdout: 
stderr: 
OK   [/lkp/benchmarks/packetdrill/gtests/net/tcp/blocking/blocking-write.pkt 
(ipv4-mapped-v6)]
stdout: 
stderr: 
OK   
[/lkp/benchmarks/packetdrill/gtests/net/tcp/close/close-local-close-then-remote-fin.pkt
 (ipv4)]
stdout: 
stderr: 
OK   
[/lkp/benchmarks/packetdrill/gtests/net/tcp/close/close-local-close-then-remote-fin.pkt
 (ipv6)]
stdout: 
stderr: 
OK   [/lkp/benchmarks/packetdrill/gtests/net/tcp/close/close-on-syn-sent.pkt 
(ipv4-mapped-v6)]
stdout: 
stderr: 
OK   
[/lkp/benchmarks/packetdrill/gtests/net/tcp/close/close-remote-fin-then-close.pkt
 (ipv4)]
stdout: 
stderr: 
OK   
[/lkp/benchmarks/packetdrill/gtests/net/tcp/close/close-remote-fin-then-close.pkt
 (ipv6)]
stdout: 
stderr: 
OK   
[/lkp/benchmarks/packetdrill/gtests/net/tcp/cwnd_moderation/cwnd-moderation-disorder-no-moderation.pkt
 (ipv4-mapped-v6)]
stdout: 
stderr: 
FAIL 

Re: [binfmt_elf] d97e11e25d: ltp.DS000.fail

2021-01-28 Thread Oliver Sang
On Tue, Jan 26, 2021 at 09:03:26AM +0100, Geert Uytterhoeven wrote:
> Hi Oliver,
> 
> On Tue, Jan 26, 2021 at 6:35 AM kernel test robot  
> wrote:
> > FYI, we noticed the following commit (built with gcc-9):
> >
> > commit: d97e11e25dd226c44257284f95494bb06d1ebf5a ("[PATCH v2] binfmt_elf: 
> > Fix fill_prstatus() call in fill_note_info()")
> > url: 
> > https://github.com/0day-ci/linux/commits/Geert-Uytterhoeven/binfmt_elf-Fix-fill_prstatus-call-in-fill_note_info/20210106-155236
> > base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 
> > e71ba9452f0b5b2e8dc8aa5445198cd9214a6a62
> 
> My patch (which you applied on top of v5.11-rc2) is a build fix for
> a commit that is not part of v5.11-rc2.  Hence the test run is invalid.

sorry for false report. we've fixed the problem. Thanks

> 
> Gr{oetje,eeting}s,
> 
> Geert
> 
> -- 
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- 
> ge...@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like 
> that.
> -- Linus Torvalds


Re: [workqueue] d5bff968ea: WARNING:at_kernel/workqueue.c:#process_one_work

2021-01-20 Thread Oliver Sang
On Fri, Jan 15, 2021 at 03:24:32PM +0800, Hillf Danton wrote:
> Thu, 14 Jan 2021 15:45:11 +0800
> > 
> > FYI, we noticed the following commit (built with gcc-9):
> > 
> > commit: d5bff968ea9cc005e632d9369c26cbd8148c93d5 ("workqueue: break 
> > affinity initiatively")
> > https://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git 
> > dev.2021.01.11b
> > 
> [...]
> > 
> > [   73.794288] WARNING: CPU: 0 PID: 22 at kernel/workqueue.c:2192 
> > process_one_work
> 
> Thanks for your report.
> 
> We can also break CPU affinity by checking POOL_DISASSOCIATED at attach 
> time without extra cost paid; that way we have the same behavior as at
> the unbind time.
> 
> What is more the change that makes kworker pcpu is cut because they are
> going to not help either hotplug or the mechanism of stop machine.

hi, by applying below patch, the issue still happened.

[ 4.574467] pci :00:00.0: Limiting direct PCI/PCI transfers
[ 4.575651] pci :00:01.0: Activating ISA DMA hang workarounds
[ 4.576900] pci :00:02.0: Video device with shadowed ROM at [mem 
0x000c-0x000d]
[ 4.578648] PCI: CLS 0 bytes, default 64
[ 4.579685] Unpacking initramfs...
[ 8.878031] ---[ cut here ]---
[ 8.879083] WARNING: CPU: 0 PID: 22 at kernel/workqueue.c:2187 
process_one_work+0x92/0x9e0
[ 8.880688] Modules linked in:
[ 8.881274] CPU: 0 PID: 22 Comm: kworker/1:0 Not tainted 
5.11.0-rc3-gc213503139bb #2
[ 8.882518] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.12.0-1 04/01/2014
[ 8.887539] Workqueue: 0x0 (events)
[ 8.887838] EIP: process_one_work+0x92/0x9e0
[ 8.887838] Code: 37 64 a1 58 54 4c 43 39 45 24 74 2c 31 c9 ba 01 00 00 00 c7 
04 24 01 00 00 00 b8 08 1d f5 42 e8 74 85 13 00 ff 05 b8 30 04 43 <0f> 0b ba 01 
00 00 00 eb 22 8d 74 26 00 90 c7 04 24 01 00 00 00 31
[ 8.887838] EAX: 42f51d08 EBX:  ECX:  EDX: 0001
[ 8.887838] ESI: 43c04720 EDI: 42e45620 EBP: de7f23c0 ESP: 43d7bf08
[ 8.887838] DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068 EFLAGS: 00010002
[ 8.887838] CR0: 80050033 CR2:  CR3: 034e3000 CR4: 000406d0
[ 8.887838] Call Trace:
[ 8.887838] ? worker_thread+0x98/0x6a0
[ 8.887838] ? worker_thread+0x2dd/0x6a0
[ 8.887838] ? kthread+0x1ba/0x1e0
[ 8.887838] ? create_worker+0x1e0/0x1e0
[ 8.887838] ? kzalloc+0x20/0x20
[ 8.887838] ? ret_from_fork+0x1c/0x28
[ 8.887838] _warn_unseeded_randomness: 63 callbacks suppressed
[ 8.887838] random: get_random_bytes called from init_oops_id+0x2b/0x60 with 
crng_init=0
[ 8.887838] --[ end trace ac461b4d54c37cfa ]--
[ 11.287055] Freeing initrd memory: 174228K
[ 11.289225] RAPL PMU: API unit is 2^-32 Joules, 0 fixed counters, 10737418240 
ms ovfl timer
[ 11.290889] clocksource: tsc: mask: 0x max_cycles: 
0x26d34b60feb, max_idle_ns: 440795225049 ns
[ 11.292884] mce: Machine check injector initialized
[ 11.313019] The force parameter has not been set to 1. The Iris poweroff 
handler will not be installed.

> 
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1847,22 +1847,17 @@ static void worker_attach_to_pool(struct
>  struct worker_pool *pool)
>  {
>   mutex_lock(_pool_attach_mutex);
> -
> - /*
> -  * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
> -  * online CPUs.  It'll be re-applied when any of the CPUs come up.
> -  */
> - set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
> -
>   /*
>* The wq_pool_attach_mutex ensures %POOL_DISASSOCIATED remains
>* stable across this function.  See the comments above the flag
>* definition for details.
>*/
> - if (pool->flags & POOL_DISASSOCIATED)
> + if (pool->flags & POOL_DISASSOCIATED) {
>   worker->flags |= WORKER_UNBOUND;
> - else
> - kthread_set_per_cpu(worker->task, true);
> + set_cpus_allowed_ptr(worker->task, cpu_possible_mask);
> + } else {
> + set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
> + }
>  
>   list_add_tail(>node, >workers);
>   worker->pool = pool;
> @@ -4922,7 +4917,6 @@ static void unbind_workers(int cpu)
>   raw_spin_unlock_irq(>lock);
>  
>   for_each_pool_worker(worker, pool) {
> - kthread_set_per_cpu(worker->task, false);
>   WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, 
> cpu_possible_mask) < 0);
>   }
>  
> @@ -4979,7 +4973,6 @@ static void rebind_workers(struct worker
>   for_each_pool_worker(worker, pool) {
>   WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
> pool->attrs->cpumask) < 0);
> - kthread_set_per_cpu(worker->task, true);
>   }
>  
>   raw_spin_lock_irq(>lock);
> --


dmesg-2.xz
Description: application/xz


Re: [workqueue] d5bff968ea: WARNING:at_kernel/workqueue.c:#process_one_work

2021-01-20 Thread Oliver Sang
On Thu, Jan 14, 2021 at 04:42:48PM +0800, Hillf Danton wrote:
> Thu, 14 Jan 2021 15:45:11 +0800
> > 
> > FYI, we noticed the following commit (built with gcc-9):
> > 
> > commit: d5bff968ea9cc005e632d9369c26cbd8148c93d5 ("workqueue: break 
> > affinity initiatively")
> > https://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git 
> > dev.2021.01.11b
> > 
> > 
> > in testcase: rcutorture
> > version: 
> > with following parameters:
> > 
> > runtime: 300s
> > test: cpuhotplug
> > torture_type: srcud
> > 
> > test-description: rcutorture is rcutorture kernel module load/unload test.
> > test-url: https://www.kernel.org/doc/Documentation/RCU/torture.txt
> > 
> > 
> > on test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 8G
> > 
> > caused below changes (please refer to attached dmesg/kmsg for entire 
> > log/backtrace):
> > 
> > 
> > +--+++
> > |  | 6211b34f6e | 
> > d5bff968ea |
> > +--+++
> > | boot_successes   | 4  | 0 
> >  |
> > | boot_failures| 0  | 12
> >  |
> > | WARNING:at_kernel/workqueue.c:#process_one_work  | 0  | 12
> >  |
> > | EIP:process_one_work | 0  | 12
> >  |
> > | WARNING:at_kernel/kthread.c:#kthread_set_per_cpu | 0  | 4 
> >  |
> > | EIP:kthread_set_per_cpu  | 0  | 4 
> >  |
> > +--+++
> > 
> > 
> > If you fix the issue, kindly add following tag
> > Reported-by: kernel test robot 
> > 
> > 
> > [   73.794288] WARNING: CPU: 0 PID: 22 at kernel/workqueue.c:2192 
> > process_one_work (kbuild/src/consumer/kernel/workqueue.c:2192) 
> > [   73.795012] Modules linked in: rcutorture torture mousedev evbug 
> > input_leds led_class psmouse pcspkr tiny_power_button button
> > [   73.795949] CPU: 0 PID: 22 Comm: kworker/1:0 Not tainted 
> > 5.11.0-rc3-gd5bff968ea9c #2
> > [   73.796592] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > 1.12.0-1 04/01/2014
> > [   73.797280] Workqueue:  0x0 (rcu_gp)
> > [   73.797592] EIP: process_one_work 
> > (kbuild/src/consumer/kernel/workqueue.c:2192) 
> 
> 
> Can you run the reproducer with the changes to WQ cut?

hi, by applying below patch, the issue still happened. detail dmesg is attached.

[ 2.505530] TCP: Hash tables configured (established 32768 bind 32768)
[ 2.506668] ---[ cut here ]---
[ 2.508080] WARNING: CPU: 0 PID: 23 at kernel/workqueue.c:2190 
process_one_work+0x92/0x9e0
[ 2.509963] Modules linked in:
[ 2.510692] CPU: 0 PID: 23 Comm: kworker/1:0H Not tainted 
5.11.0-rc3-00186-ge7792535d216 #2
[ 2.512608] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.12.0-1 04/01/2014
[ 2.514499] EIP: process_one_work+0x92/0x9e0
[ 2.515468] Code: 37 64 a1 58 54 4c 43 39 45 24 74 2c 31 c9 ba 01 00 00 00 c7 
04 24 01 00 00 00 b8 08 1d f5 42 e8 74 85 13 00 ff 05 b8 30 04 43 <0f> 0b ba 01 
00 00 00 eb 22 8d 74 26 00 90 c7 04 24 01 00 00 00 31
[ 2.516539] EAX: 42f51d08 EBX:  ECX:  EDX: 0001
[ 2.516539] ESI: 43c04780 EDI: de7eb3ec EBP: de7f25e0 ESP: 43d83f08
[ 2.516539] DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068 EFLAGS: 00010002
[ 2.516539] CR0: 80050033 CR2:  CR3: 034e3000 CR4: 000406d0
[ 2.516539] Call Trace:
[ 2.516539] ? rcuwait_wake_up+0x53/0x80
[ 2.516539] ? rcuwait_wake_up+0x5/0x80
[ 2.516539] ? worker_thread+0x2dd/0x6a0
[ 2.516539] ? kthread+0x1ba/0x1e0
[ 2.516539] ? create_worker+0x1e0/0x1e0
[ 2.516539] ? kzalloc+0x20/0x20
[ 2.516539] ? ret_from_fork+0x1c/0x28
[ 2.516539] --[ end trace 71c162214dd31179 ]--
[ 2.534063] UDP hash table entries: 2048 (order: 5, 196608 bytes, linear)
[ 2.535774] UDP-Lite hash table entries: 2048 (order: 5, 196608 bytes, linear)
[ 2.537661] NET: Registered protocol family 1

> 
> It seems special to make kworker pcpu because they are going not to
> help either hotplug or stop. If it quiesces the warning then we have
> a fresh start for breaking CPU affinity.
> 
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1861,8 +1861,6 @@ static void worker_attach_to_pool(struct
>*/
>   if (pool->flags & POOL_DISASSOCIATED)
>   worker->flags |= WORKER_UNBOUND;
> - else
> - kthread_set_per_cpu(worker->task, true);
>  
>   list_add_tail(>node, >workers);
>   worker->pool = pool;
> @@ -4922,7 +4920,6 @@ static void unbind_workers(int cpu)
>   raw_spin_unlock_irq(>lock);
>  
>   for_each_pool_worker(worker, pool) {
> - kthread_set_per_cpu(worker->task, false);
>   WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, 
> cpu_possible_mask) < 0);
>

Re: [proc/wchan] 30a3a19273: leaking-addresses.proc.wchan./proc/bus/input/devices:B:KEY=1000000000007ff980000000007fffebeffdfffeffffffffffffffffffffe

2021-01-04 Thread Oliver Sang
On Sun, Jan 03, 2021 at 07:25:36PM +0100, Helge Deller wrote:
> On 1/3/21 3:27 PM, kernel test robot wrote:
> >
> > Greeting,
> >
> > FYI, we noticed the following commit (built with gcc-9):
> >
> > commit: 30a3a192730a997bc4afff5765254175b6fb64f3 ("[PATCH] proc/wchan: Use 
> > printk format instead of lookup_symbol_name()")
> > url: 
> > https://github.com/0day-ci/linux/commits/Helge-Deller/proc-wchan-Use-printk-format-instead-of-lookup_symbol_name/20201218-010048
> > base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 
> > 09162bc32c880a791c6c0668ce0745cf7958f576
> >
> > in testcase: leaking-addresses
> > version: leaking-addresses-x86_64-4f19048-1_2020
> > with following parameters:
> >
> > ucode: 0xde
> >
> >
> >
> > on test machine: 4 threads Intel(R) Core(TM) i7-7567U CPU @ 3.50GHz with 
> > 32G memory
> >
> > caused below changes (please refer to attached dmesg/kmsg for entire 
> > log/backtrace):
> 
> I don't see anything wrong with the wchan patch 
> (30a3a192730a997bc4afff5765254175b6fb64f3),
> or that it could have leaked anything.
> 
> Maybe the kernel test robot picked up the wchan patch by mistake ?

thanks for information. we will look at this and fix robot if any problem.

> 
> Helge
> 
> 
> >
> > If you fix the issue, kindly add following tag
> > Reported-by: kernel test robot 
> >
> > 2021-01-01 01:52:25 ./leaking_addresses.pl --output-raw result/scan.out
> > 2021-01-01 01:52:49 ./leaking_addresses.pl --input-raw result/scan.out 
> > --squash-by-filename
> >
> > Total number of results from scan (incl dmesg): 156538
> >
> > dmesg output:
> > [0.058490] mapped IOAPIC to ff5fb000 (fec0)
> >
> > Results squashed by filename (excl dmesg). Displaying [ 
> > ], 
> > [1 _error_injection_whitelist] 0xc0a254b0
> > [25 __bug_table] 0xc01e0070
> > [46 .orc_unwind_ip] 0xc009f3a0
> > [6 __tracepoints_strings] 0xc027d7d0
> > [50 .strtab] 0xc00b9b88
> > [1 .rodata.cst16.mask2] 0xc00a70e0
> > [1 key] 10007 ff9807ff febeffdfffef fffe
> > [50 .note.Linux] 0xc009f024
> > [41 .data] 0xc00a1000
> > [6 .static_call.text] 0xc0274b44
> > [1 _ftrace_eval_map] 0xc0a20148
> > [10 .data.once] 0xc04475b4
> > [7 .static_call_sites] 0xc0a22088
> > [6 __tracepoints_ptrs] 0xc027d7bc
> > [7 .fixup] 0xc00852ea
> > [49 __mcount_loc] 0xc009f03c
> > [19 __param] 0xc009f378
> > [38 .rodata.str1.8] 0xc009f170
> > [1 ___srcu_struct_ptrs] 0xc0355000
> > [14 .altinstr_replacement] 0xc04349ca
> > [154936 kallsyms] 8100 T startup_64
> > [50 .gnu.linkonce.this_module] 0xc00a1140
> > [24 __ksymtab_strings] 0xc00e2048
> > [31 .bss] 0xc00a1500
> > [42 .rodata.str1.1] 0xc009f09c
> > [9 .init.rodata] 0xc00b8000
> > [11 __ex_table] 0xc00bd128
> > [14 .parainstructions] 0xc03b5d88
> > [6 __tracepoints] 0xc02818c0
> > [1 .rodata.cst16.mask1] 0xc00a70d0
> > [18 __dyndbg] 0xc00a10c8
> > [5 .altinstr_aux] 0xc0143a49
> > [22 .smp_locks] 0xc009f094
> > [2 .rodata.cst16.bswap_mask] 0xc005e070
> > [40 .init.text] 0xc00b7000
> > [4 .init.data] 0xc00e7000
> > [10 .data..read_mostly] 0xc00a1100
> > [14 .altinstructions] 0xc0446846
> > [6 __bpf_raw_tp_map] 0xc0281720
> > [50 .note.gnu.build-id] 0xc009f000
> > [6 _ftrace_events] 0xc0281780
> > [140 printk_formats] 0x82341767 : "CPU_ON"
> > [25 __jump_table] 0xc00a
> > [37 .exit.text] 0xc009ec70
> > [50 .text] 0xc009e000
> > [35 .text.unlikely] 0xc009ebaf
> > [18 __ksymtab] 0xc00e203c
> > [46 .orc_unwind] 0xc009f544
> > [1 .data..cacheline_aligned] 0xc081d8c0
> > [2 .noinstr.text] 0xc04b8d00
> > [1 uevent] KEY=10007 ff9807ff febeffdfffef 
> > fffe
> > [50 modules] netconsole 20480 0 - Live 0xc00cb000
> > [337 blacklist] 0x81c00880-0x81c008a0   asm_exc_overflow
> > [1 .rodata.cst32.byteshift_table] 0xc00a7100
> > [2 wchan] 0xc93c/proc/bus/input/devices: B: KEY=10007 
> > ff9807ff febeffdfffef fffe
> > [6 .ref.data] 0xc02817a0
> > [14 __ksymtab_gpl] 0xc03b503c
> > [42 .rodata] 0xc009f2c0
> > [50 .symtab] 0xc00b9000
> >
> >
> >
> > To reproduce:
> >
> > git clone https://github.com/intel/lkp-tests.git
> > cd lkp-tests
> > bin/lkp install job.yaml  # job file is attached in this email
> > bin/lkp run job.yaml
> >
> >
> >
> > Thanks,
> > Oliver Sang
> >
> 


Re: [iov_iter] 9bd0e337c6: will-it-scale.per_process_ops -4.8% regression

2020-12-07 Thread Oliver Sang
Hi David,

On Fri, Dec 04, 2020 at 11:51:48AM +, David Howells wrote:
> kernel test robot  wrote:
> 
> > FYI, we noticed a -4.8% regression of will-it-scale.per_process_ops due to 
> > commit:
> > 
> > 
> > commit: 9bd0e337c633aed3e8ec3c7397b7ae0b8436f163 ("[PATCH 01/29] iov_iter: 
> > Switch to using a table of operations")
> 
> Out of interest, would it be possible for you to run this on the tail of the
> series on the same hardware?

sorry for late. below is the result adding the tail of the series:
* ded69a6991fe0 
(linux-review/David-Howells/RFC-iov_iter-Switch-to-using-an-ops-table/20201121-222344)
 iov_iter: Remove iterate_all_kinds() and iterate_and_advance()

=
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
  
gcc-9/performance/x86_64-rhel-8.3/process/50%/debian-10.4-x86_64-20200603.cgz/lkp-ivb-2ep1/pwrite1/will-it-scale/0x42e

commit: 
  27bba9c532a8d21050b94224ffd310ad0058c353
  9bd0e337c633aed3e8ec3c7397b7ae0b8436f163
  ded69a6991fe0094f36d96bf1ace2a9636428676

27bba9c532a8d210 9bd0e337c633aed3e8ec3c7397b ded69a6991fe0094f36d96bf1ac 
 --- --- 
 %stddev %change %stddev %change %stddev
 \  |\  |\  
  28443113-4.8%   27064036-4.8%   27084904
will-it-scale.24.processes
   1185129-4.8%1127667-4.8%1128537
will-it-scale.per_process_ops
  28443113-4.8%   27064036-4.8%   27084904
will-it-scale.workload
 13.84+1.0%  13.98+0.3%  13.89
boot-time.dhcp
  1251 ±  9% -17.2%   1035 ± 10%  -9.1%   1137 ±  5%  
slabinfo.dmaengine-unmap-16.active_objs
  1251 ±  9% -17.2%   1035 ± 10%  -9.1%   1137 ±  5%  
slabinfo.dmaengine-unmap-16.num_objs
  1052 ±  6%  -1.1%   1041 ±  5% -13.4% 911.75 ± 10%  
slabinfo.task_group.active_objs
  1052 ±  6%  -1.1%   1041 ±  5% -13.4% 911.75 ± 10%  
slabinfo.task_group.num_objs
 31902 ±  5%  -5.6%  30124 ±  7%  -8.3%  29265 ±  4%  
slabinfo.vm_area_struct.active_objs
 32163 ±  5%  -5.4%  30441 ±  6%  -8.0%  29602 ±  4%  
slabinfo.vm_area_struct.num_objs
 73.46 ± 48% -59.7%  29.59 ±100%-100.0%   0.00
sched_debug.cfs_rq:/.MIN_vruntime.avg
  2386 ± 23% -40.5%   1420 ±100%-100.0%   0.00
sched_debug.cfs_rq:/.MIN_vruntime.max
393.92 ± 33% -48.5% 202.85 ±100%-100.0%   0.00
sched_debug.cfs_rq:/.MIN_vruntime.stddev
 73.46 ± 48% -59.7%  29.60 ±100%-100.0%   0.00
sched_debug.cfs_rq:/.max_vruntime.avg
  2386 ± 23% -40.5%   1420 ±100%-100.0%   0.00
sched_debug.cfs_rq:/.max_vruntime.max
393.92 ± 33% -48.5% 202.94 ±100%-100.0%   0.00
sched_debug.cfs_rq:/.max_vruntime.stddev
  0.00 ±  9% -13.5%   0.00 ±  3%  -2.9%   0.00 ± 13%  
sched_debug.cpu.next_balance.stddev
-18.50   +33.5% -24.70   -41.9% -10.75
sched_debug.cpu.nr_uninterruptible.min
411.75 ± 58% +76.8% 728.00 ± 32% +59.2% 655.50 ± 50%  
numa-vmstat.node0.nr_active_anon
 34304 ±  2% -35.6%  22103 ± 48%  +8.6%  37243 ± 26%  
numa-vmstat.node0.nr_anon_pages
 36087 ±  2% -31.0%  24915 ± 43%  +7.0%  38606 ± 27%  
numa-vmstat.node0.nr_inactive_anon
  2233 ± 51% +60.4%   3582 ±  7%  -7.7%   2062 ± 51%  
numa-vmstat.node0.nr_shmem
411.75 ± 58% +76.8% 728.00 ± 32% +59.2% 655.50 ± 50%  
numa-vmstat.node0.nr_zone_active_anon
 36087 ±  2% -31.0%  24915 ± 43%  +7.0%  38606 ± 27%  
numa-vmstat.node0.nr_zone_inactive_anon
 24265 ±  3% +51.3%  36707 ± 29% -12.2%  21315 ± 47%  
numa-vmstat.node1.nr_anon_pages
 25441 ±  2% +44.9%  36858 ± 29%  -9.9%  22912 ± 47%  
numa-vmstat.node1.nr_inactive_anon
537.25 ± 20% +22.8% 659.50 ± 10% +14.5% 615.00 ± 21%  
numa-vmstat.node1.nr_page_table_pages
 25441 ±  2% +44.9%  36858 ± 29%  -9.9%  22912 ± 47%  
numa-vmstat.node1.nr_zone_inactive_anon
  1649 ± 58% +76.7%   2913 ± 32% +59.0%   2621 ± 50%  
numa-meminfo.node0.Active
  1649 ± 58% +76.7%   2913 ± 32% +59.0%   2621 ± 50%  
numa-meminfo.node0.Active(anon)
137223 ±  2% -35.6%  88410 ± 48%  +8.6% 148973 ± 26%  
numa-meminfo.node0.AnonPages
164997 ±  9% -28.4% 118095 ± 42%  +6.9% 176340 ± 23%  
numa-meminfo.node0.AnonPages.max
144353 ±  2% -31.0%  99656 ± 43%  +7.0% 154424 ± 27%  
numa-meminfo.node0.Inactive
  

Re: [drm/fb] 1d46491d4a: WARNING:at_drivers/gpu/drm/drm_fb_helper.c:#drm_fb_helper_damage_work[drm_kms_helper]

2020-12-03 Thread Oliver Sang
15] DR6: fffe0ff0 DR7: 0400
> > [   28.300768] Call Trace:
> > [   28.302117]  process_one_work+0x31b/0x7b0
> > [   28.303532]  ? process_one_work+0x272/0x7b0
> > [   28.304976]  worker_thread+0x29a/0x5d0
> > [   28.308712]  ? process_one_work+0x7b0/0x7b0
> > [   28.310129]  kthread+0x181/0x1a0
> > [   28.311464]  ? process_one_work+0x7b0/0x7b0
> > [   28.312878]  ? kthread_create_worker_on_cpu+0x30/0x30
> > [   28.314318]  ret_from_fork+0x1c/0x28
> > [   28.315645] CPU: 0 PID: 122 Comm: kworker/0:2 Tainted: GE
> >  5.10.0-rc3-01102-g1d46491d4a08 #1
> > [   28.317414] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > 1.12.0-1 04/01/2014
> > [   28.319096] Workqueue: events drm_fb_helper_damage_work [drm_kms_helper]
> > [   28.320674] Call Trace:
> > [   28.321947]  dump_stack+0x6d/0x8b
> > [   28.323278]  __warn.cold+0x24/0x49
> > [   28.324639]  ? drm_fb_helper_damage_work+0x109/0x2d0 [drm_kms_helper]
> > [   28.326167]  ? drm_fb_helper_damage_work+0x109/0x2d0 [drm_kms_helper]
> > [   28.327670]  ? drm_fb_helper_damage_work+0x109/0x2d0 [drm_kms_helper]
> > [   28.329165]  report_bug+0xb0/0xf0
> > [   28.330438]  ? irq_work_queue+0x13/0x70
> > [   28.331729]  ? exc_overflow+0x60/0x60
> > [   28.333002]  handle_bug+0x2a/0x50
> > [   28.334227]  exc_invalid_op+0x28/0x80
> > [   28.335462]  handle_exception+0x15d/0x15d
> > [   28.336729] EIP: drm_fb_helper_damage_work+0x109/0x2d0 [drm_kms_helper]
> > [   28.338148] Code: 47 10 8b 58 2c 85 db 0f 84 bc 01 00 00 e8 1f f0 da f4 
> > 89 74 24 0c 89 5c 24 08 89 44 24 04 c7 04 24 98 c1 40 df e8 f7 50 1d f5 
> > <0f> 0b 31 c9 c7 04 24 01 00 00 00 ba 01 00 00 00 b8 3c e8 40 df e8
> > [   28.341442] EAX: 0036 EBX: c1c91420 ECX:  EDX: 
> > [   28.342910] ESI: fff4 EDI: d2014000 EBP: d2c0dee4 ESP: d2c0de9c
> > [   28.344372] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010292
> > [   28.345883]  ? cpu_latency_qos_write+0xeb/0xf0
> > [   28.347203]  ? run_init_process+0x5b/0x158
> > [   28.348483]  ? run_init_process+0x5b/0x158
> > [   28.349714]  ? exc_overflow+0x60/0x60
> > [   28.350895]  ? drm_fb_helper_damage_work+0x109/0x2d0 [drm_kms_helper]
> > [   28.352244]  process_one_work+0x31b/0x7b0
> > [   28.353432]  ? process_one_work+0x272/0x7b0
> > [   28.354599]  worker_thread+0x29a/0x5d0
> > [   28.355730]  ? process_one_work+0x7b0/0x7b0
> > [   28.356894]  kthread+0x181/0x1a0
> > [   28.357942]  ? process_one_work+0x7b0/0x7b0
> > [   28.359019]  ? kthread_create_worker_on_cpu+0x30/0x30
> > [   28.360134]  ret_from_fork+0x1c/0x28
> > [   28.376652] irq event stamp: 9469
> > [   28.377678] hardirqs last  enabled at (9477): [] 
> > console_unlock+0x515/0x650
> > [   28.378986] hardirqs last disabled at (9484): [] 
> > console_unlock+0x425/0x650
> > [   28.380284] softirqs last  enabled at (9464): [] 
> > __do_softirq+0x3fd/0x57c
> > [   28.381595] softirqs last disabled at (9381): [] 
> > call_on_stack+0x4c/0x60
> > [   28.382878] ---[ end trace b5fac24d1c204ab3 ]---
> > 
> > 
> > To reproduce:
> > 
> >  # build kernel
> > cd linux
> > cp config-5.10.0-rc3-01102-g1d46491d4a08 .config
> > make HOSTCC=gcc-9 CC=gcc-9 ARCH=i386 olddefconfig prepare 
> > modules_prepare bzImage modules
> > make HOSTCC=gcc-9 CC=gcc-9 ARCH=i386 INSTALL_MOD_PATH= 
> > modules_install
> > cd 
> > find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz
> > 
> > 
> >  git clone https://github.com/intel/lkp-tests.git
> >  cd lkp-tests
> >  bin/lkp qemu -k  -m modules.cgz job-script # job-script 
> > is attached in this email
> > 
> > 
> > 
> > Thanks,
> > Oliver Sang
> > 
> 
> -- 
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Felix Imendörffer
> 





Re: [mm/gup] 47e29d32af: phoronix-test-suite.npb.FT.A.total_mop_s -45.0% regression

2020-11-19 Thread Oliver Sang
On Wed, Nov 18, 2020 at 10:17:27AM -0800, Dan Williams wrote:
> On Wed, Nov 18, 2020 at 5:51 AM Jan Kara  wrote:
> >
> > On Mon 16-11-20 19:35:31, John Hubbard wrote:
> > >
> > > On 11/16/20 6:48 PM, kernel test robot wrote:
> > > >
> > > > Greeting,
> > > >
> > > > FYI, we noticed a -45.0% regression of 
> > > > phoronix-test-suite.npb.FT.A.total_mop_s due to commit:
> > > >
> > >
> > > That's a huge slowdown...
> > >
> > > >
> > > > commit: 47e29d32afba11b13efb51f03154a8cf22fb4360 ("mm/gup: 
> > > > page->hpage_pinned_refcount: exact pin counts for huge pages")
> > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > >
> > > ...but that commit happened in April, 2020. Surely if this were a serious
> > > issue we would have some other indication...is this worth following up
> > > on?? I'm inclined to ignore it, honestly.
> >
> > Why this was detected so late is a fair question although it doesn't quite
> > invalidate the report...
> 
> I don't know what specifically happened in this case, perhaps someone
> from the lkp team can comment? 

- some extra phoronix test suites are enabled/fixed gradually so we will have
better coverage
- we scan kernel releases within the year to baseline the performance, it may
trigger bisection if one release has regressed and not recovered.

With this continuous effort, 0-day ci can detect the changes on mainline.

> However, the myth / contention that
> "surely someone else would have noticed by now" is why the lkp project
> was launched. Kernels regressed without much complaint and it wasn't
> until much later in the process, around the time enterprise distros
> rebased to new kernels, did end users start filing performance loss
> regression reports. Given -stable kernel releases, 6-7 months is still
> faster than many end user upgrade cycles to new kernel baselines.


Re: [drm/i915/gem] 59dd13ad31: phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second -54.0% regression

2020-11-18 Thread Oliver Sang
---+   
> > |   
> > |   
> >9000 |-+.+. .+.+.+.+.+.   .+. .+.   .+. .+.+. .+. .+.   .+. .+.+.   
> > .|   
> > |.+   +   +.+   +   +.+   + +.+.+   +   +.+   + +.+ 
> > |   
> > |   
> > |   
> >8000 |-+ 
> > |   
> > |   
> > |   
> >7000 |-+ 
> > |   
> > |   
> > |   
> >6000 |-+ 
> > |   
> > |   
> > |   
> > |   
> > |   
> >5000 |-+         
> > |   
> > | O 
> > |   
> >4000 
> > +---+   
> > 
> > 
> > 
> > 
> > [*] bisect-good sample
> > [O] bisect-bad  sample
> > 
> > 
> > 
> > Disclaimer:
> > Results have been estimated based on internal Intel analysis and are 
> > provided
> > for informational purposes only. Any difference in system hardware or 
> > software
> > design or configuration may affect actual performance.
> > 
> > 
> > Thanks,
> > Oliver Sang
> > 


Re: [LKP] Re: [IB/srpt] c804af2c1d: last_state.test.blktests.exit_code.143

2020-11-02 Thread Oliver Sang
On Mon, Nov 02, 2020 at 09:59:29AM -0400, Jason Gunthorpe wrote:
> On Mon, Nov 02, 2020 at 10:02:36PM +0800, Sang, Oliver wrote:
> > Hi,
> > 
> > want to consult if all fix merged into mainline?
> > 
> > we found below commit merged rdma updates into mainline
> 
> rc2 probably fixes the error these logs have
> 
> But I think you'll hit a WARN_ON that isn't fixed yet

Thanks a lot for information! we'll check on rc2. and back to you
if need more help. Thanks

> 
> Jason


Re: [btrfs] 3b54a0a703: WARNING:at_fs/btrfs/inode.c:#btrfs_finish_ordered_io[btrfs]

2020-09-16 Thread Oliver Sang
On Wed, Sep 16, 2020 at 12:20:12PM +0800, Qu Wenruo wrote:
> 
> 
> On 2020/9/16 上午11:32, Oliver Sang wrote:
> > On Tue, Sep 15, 2020 at 04:00:40PM +0800, Qu Wenruo wrote:
> >>
> >>
> >> On 2020/9/15 下午3:40, Qu Wenruo wrote:
> >>>
> >>>
> >>> On 2020/9/15 下午1:54, Oliver Sang wrote:
> >>>> On Wed, Sep 09, 2020 at 03:49:30PM +0800, Qu Wenruo wrote:
> >>>>>
> >>>>>
> >>>>> On 2020/9/9 下午3:08, kernel test robot wrote:
> >>>>>> Greeting,
> >>>>>>
> >>>>>> FYI, we noticed the following commit (built with gcc-9):
> >>>>>>
> >>>>>> commit: 3b54a0a703f17d2b1317d24beefcdcca587a7667 ("[PATCH v3 3/5] 
> >>>>>> btrfs: Detect unbalanced tree with empty leaf before crashing btree 
> >>>>>> operations")
> >>>>>> url: 
> >>>>>> https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-Enhanced-runtime-defence-against-fuzzed-images/20200809-201720
> >>>>>> base: https://git.kernel.org/cgit/linux/kernel/git/kdave/linux.git 
> >>>>>> for-next
> >>>>>>
> >>>>>> in testcase: fio-basic
> >>>>>> with following parameters:
> >>>>>>
> >>>>>>runtime: 300s
> >>>>>>disk: 1SSD
> >>>>>>fs: btrfs
> >>>>>>nr_task: 100%
> >>>>>>test_size: 128G
> >>>>>>rw: write
> >>>>>>bs: 4k
> >>>>>>ioengine: sync
> >>>>>>cpufreq_governor: performance
> >>>>>>ucode: 0x42c
> >>>>>>fs2: nfsv4
> >>>>>>
> >>>>>> test-description: Fio is a tool that will spawn a number of threads or 
> >>>>>> processes doing a particular type of I/O action as specified by the 
> >>>>>> user.
> >>>>>> test-url: https://github.com/axboe/fio
> >>>>>>
> >>>>>>
> >>>>>> on test machine: 96 threads Intel(R) Xeon(R) Platinum 8260L CPU @ 
> >>>>>> 2.40GHz with 128G memory
> >>>>>>
> >>>>>> caused below changes (please refer to attached dmesg/kmsg for entire 
> >>>>>> log/backtrace):
> >>>>>>
> >>>>>>
> >>>>>> ++++
> >>>>>> |  
> >>>>>>   | 2703206ff5 | 3b54a0a703 |
> >>>>>> ++++
> >>>>>> | boot_successes   
> >>>>>>   | 9  | 0  |
> >>>>>> | boot_failures
> >>>>>>   | 4  ||
> >>>>>> | 
> >>>>>> Kernel_panic-not_syncing:VFS:Unable_to_mount_root_fs_on_unknown-block(#,#)
> >>>>>>  | 4  ||
> >>>>>> ++++
> >>>>>>
> >>>>>>
> >>>>>> If you fix the issue, kindly add following tag
> >>>>>> Reported-by: kernel test robot 
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> According to the full dmesg, it's invalid nritems causing transaction 
> >>>>> abort.
> >>>>>
> >>>>> I'm not sure if it's caused by corrupts fs or something else.
> >>>>>
> >>>>> If intel guys can reproduce it reliably, would you please add such debug
> >>>>> diff to output extra info?
> >>>>
> >>>> Hi Qu, sorry for late. we double confirmed the issue can be reproduced 
> >>>> reliably.
> >>>> The error will only occur on fbc but not parent commit.
> >>>>
> >>>> below from applying your path for extra info
> >>>> [   42.539443] 

Re: [sched/fair] 0b0695f2b3: phoronix-test-suite.compress-gzip.0.seconds 19.8% regression

2020-06-02 Thread Oliver Sang
On Tue, Jun 02, 2020 at 01:23:19PM +0800, Oliver Sang wrote:
> On Fri, May 29, 2020 at 07:26:01PM +0200, Vincent Guittot wrote:
> > On Mon, 25 May 2020 at 10:02, Vincent Guittot
> >  wrote:
> > >
> > > On Thu, 21 May 2020 at 10:28, Oliver Sang  wrote:
> > > >
> > > > On Wed, May 20, 2020 at 03:04:48PM +0200, Vincent Guittot wrote:
> > > > > On Thu, 14 May 2020 at 19:09, Vincent Guittot
> > > > >  wrote:
> > > > > >
> > > > > > Hi Oliver,
> > > > > >
> > > > > > On Thu, 14 May 2020 at 16:05, kernel test robot 
> > > > > >  wrote:
> > > > > > >
> > > > > > > Hi Vincent Guittot,
> > > > > > >
> > > > > > > Below report FYI.
> > > > > > > Last year, we actually reported an improvement "[sched/fair] 
> > > > > > > 0b0695f2b3:
> > > > > > > vm-scalability.median 3.1% improvement" on link [1].
> > > > > > > but now we found the regression on pts.compress-gzip.
> > > > > > > This seems align with what showed in "[v4,00/10] sched/fair: 
> > > > > > > rework the CFS
> > > > > > > load balance" (link [2]), where showed the reworked load balance 
> > > > > > > could have
> > > > > > > both positive and negative effect for different test suites.
> > > > > >
> > > > > > We have tried to run  all possible use cases but it's impossible to
> > > > > > covers all so there were a possibility that one that is not covered,
> > > > > > would regressed.
> > > > > >
> > > > > > > And also from link [3], the patch set risks regressions.
> > > > > > >
> > > > > > > We also confirmed this regression on another platform
> > > > > > > (Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz with 8G memory),
> > > > > > > below is the data (lower is better).
> > > > > > > v5.44.1
> > > > > > > fcf0553db6f4c79387864f6e4ab4a891601f395e4.01
> > > > > > > 0b0695f2b34a4afa3f6e9aa1ff0e5336d8dad9124.89
> > > > > > > v5.55.18
> > > > > > > v5.64.62
> > > > > > > v5.7-rc24.53
> > > > > > > v5.7-rc34.59
> > > > > > >
> > > > > > > It seems there are some recovery on latest kernels, but not fully 
> > > > > > > back.
> > > > > > > We were just wondering whether you could share some lights the 
> > > > > > > further works
> > > > > > > on the load balance after patch set [2] which could cause the 
> > > > > > > performance
> > > > > > > change?
> > > > > > > And whether you have plan to refine the load balance algorithm 
> > > > > > > further?
> > > > > >
> > > > > > I'm going to have a look at your regression to understand what is
> > > > > > going wrong and how it can be fixed
> > > > >
> > > > > I have run the benchmark on my local setups to try to reproduce the
> > > > > regression and I don't see the regression. But my setups are different
> > > > > from your so it might be a problem specific to yours
> > > >
> > > > Hi Vincent, which OS are you using? We found the regression on Clear OS,
> > > > but it cannot reproduce on Debian.
> > > > On 
> > > > https://www.phoronix.com/scan.php?page=article=mac-win-linux2018=5
> > > > it was mentioned that -
> > > > Gzip compression is much faster out-of-the-box on Clear Linux due to it 
> > > > exploiting
> > > > multi-threading capabilities compared to the other operating systems 
> > > > Gzip support.
> > >
> > > I'm using Debian, I haven't noticed it was only on Clear OS.
> > > I'm going to look at it. Could you send me traces in the meantime ?
> > 
> > I run more tests to understand the problem. Even if Clear Linux uses
> > multithreading, the system is not overloaded and there is a
> > significant amount of idle time. This means that we use the has_spare
> > capacity path that spreads tasks on the system. At least that is what
> > I have seen in the KVM image. Beside this, I think that I hav

Re: [sched/fair] 0b0695f2b3: phoronix-test-suite.compress-gzip.0.seconds 19.8% regression

2020-06-01 Thread Oliver Sang
On Fri, May 29, 2020 at 07:26:01PM +0200, Vincent Guittot wrote:
> On Mon, 25 May 2020 at 10:02, Vincent Guittot
>  wrote:
> >
> > On Thu, 21 May 2020 at 10:28, Oliver Sang  wrote:
> > >
> > > On Wed, May 20, 2020 at 03:04:48PM +0200, Vincent Guittot wrote:
> > > > On Thu, 14 May 2020 at 19:09, Vincent Guittot
> > > >  wrote:
> > > > >
> > > > > Hi Oliver,
> > > > >
> > > > > On Thu, 14 May 2020 at 16:05, kernel test robot 
> > > > >  wrote:
> > > > > >
> > > > > > Hi Vincent Guittot,
> > > > > >
> > > > > > Below report FYI.
> > > > > > Last year, we actually reported an improvement "[sched/fair] 
> > > > > > 0b0695f2b3:
> > > > > > vm-scalability.median 3.1% improvement" on link [1].
> > > > > > but now we found the regression on pts.compress-gzip.
> > > > > > This seems align with what showed in "[v4,00/10] sched/fair: rework 
> > > > > > the CFS
> > > > > > load balance" (link [2]), where showed the reworked load balance 
> > > > > > could have
> > > > > > both positive and negative effect for different test suites.
> > > > >
> > > > > We have tried to run  all possible use cases but it's impossible to
> > > > > covers all so there were a possibility that one that is not covered,
> > > > > would regressed.
> > > > >
> > > > > > And also from link [3], the patch set risks regressions.
> > > > > >
> > > > > > We also confirmed this regression on another platform
> > > > > > (Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz with 8G memory),
> > > > > > below is the data (lower is better).
> > > > > > v5.44.1
> > > > > > fcf0553db6f4c79387864f6e4ab4a891601f395e4.01
> > > > > > 0b0695f2b34a4afa3f6e9aa1ff0e5336d8dad9124.89
> > > > > > v5.55.18
> > > > > > v5.64.62
> > > > > > v5.7-rc24.53
> > > > > > v5.7-rc34.59
> > > > > >
> > > > > > It seems there are some recovery on latest kernels, but not fully 
> > > > > > back.
> > > > > > We were just wondering whether you could share some lights the 
> > > > > > further works
> > > > > > on the load balance after patch set [2] which could cause the 
> > > > > > performance
> > > > > > change?
> > > > > > And whether you have plan to refine the load balance algorithm 
> > > > > > further?
> > > > >
> > > > > I'm going to have a look at your regression to understand what is
> > > > > going wrong and how it can be fixed
> > > >
> > > > I have run the benchmark on my local setups to try to reproduce the
> > > > regression and I don't see the regression. But my setups are different
> > > > from your so it might be a problem specific to yours
> > >
> > > Hi Vincent, which OS are you using? We found the regression on Clear OS,
> > > but it cannot reproduce on Debian.
> > > On 
> > > https://www.phoronix.com/scan.php?page=article=mac-win-linux2018=5
> > > it was mentioned that -
> > > Gzip compression is much faster out-of-the-box on Clear Linux due to it 
> > > exploiting
> > > multi-threading capabilities compared to the other operating systems Gzip 
> > > support.
> >
> > I'm using Debian, I haven't noticed it was only on Clear OS.
> > I'm going to look at it. Could you send me traces in the meantime ?
> 
> I run more tests to understand the problem. Even if Clear Linux uses
> multithreading, the system is not overloaded and there is a
> significant amount of idle time. This means that we use the has_spare
> capacity path that spreads tasks on the system. At least that is what
> I have seen in the KVM image. Beside this, I think that I have been
> able to reproduce the problem on my platform with debian using pigz
> instead of gzip for the compress-gzip-1.2.0 test. On my platform, I
> can see a difference when I enable all CPU idle states whereas there
> is no performance difference when only the shallowest idle state is
> enabled.
> 
> The new load balance rework is more efficient at spreading tasks on
> the system and one side effect could be that there is more idl

Re: [sched/fair] 0b0695f2b3: phoronix-test-suite.compress-gzip.0.seconds 19.8% regression

2020-05-21 Thread Oliver Sang
On Wed, May 20, 2020 at 03:04:48PM +0200, Vincent Guittot wrote:
> On Thu, 14 May 2020 at 19:09, Vincent Guittot
>  wrote:
> >
> > Hi Oliver,
> >
> > On Thu, 14 May 2020 at 16:05, kernel test robot  
> > wrote:
> > >
> > > Hi Vincent Guittot,
> > >
> > > Below report FYI.
> > > Last year, we actually reported an improvement "[sched/fair] 0b0695f2b3:
> > > vm-scalability.median 3.1% improvement" on link [1].
> > > but now we found the regression on pts.compress-gzip.
> > > This seems align with what showed in "[v4,00/10] sched/fair: rework the 
> > > CFS
> > > load balance" (link [2]), where showed the reworked load balance could 
> > > have
> > > both positive and negative effect for different test suites.
> >
> > We have tried to run  all possible use cases but it's impossible to
> > covers all so there were a possibility that one that is not covered,
> > would regressed.
> >
> > > And also from link [3], the patch set risks regressions.
> > >
> > > We also confirmed this regression on another platform
> > > (Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz with 8G memory),
> > > below is the data (lower is better).
> > > v5.44.1
> > > fcf0553db6f4c79387864f6e4ab4a891601f395e4.01
> > > 0b0695f2b34a4afa3f6e9aa1ff0e5336d8dad9124.89
> > > v5.55.18
> > > v5.64.62
> > > v5.7-rc24.53
> > > v5.7-rc34.59
> > >
> > > It seems there are some recovery on latest kernels, but not fully back.
> > > We were just wondering whether you could share some lights the further 
> > > works
> > > on the load balance after patch set [2] which could cause the performance
> > > change?
> > > And whether you have plan to refine the load balance algorithm further?
> >
> > I'm going to have a look at your regression to understand what is
> > going wrong and how it can be fixed
> 
> I have run the benchmark on my local setups to try to reproduce the
> regression and I don't see the regression. But my setups are different
> from your so it might be a problem specific to yours

Hi Vincent, which OS are you using? We found the regression on Clear OS,
but it cannot reproduce on Debian.
On https://www.phoronix.com/scan.php?page=article=mac-win-linux2018=5
it was mentioned that -
Gzip compression is much faster out-of-the-box on Clear Linux due to it 
exploiting
multi-threading capabilities compared to the other operating systems Gzip 
support. 

> 
> After analysing the benchmark, it doesn't overload the system and is
> mainly based on 1 main gzip thread with few others waking up and
> sleeping around.
> 
> I thought that scheduler could be too aggressive when trying to
> balance the threads on your system, which could generate more task
> migrations and impact the performance. But this doesn't seem to be the
> case because perf-stat.i.cpu-migrations is -8%. On the other side, the
> context switch is +16% and more interestingly idle state C1E and C6
> usages increase more than 50%. I don't know if we can rely or this
> value or not but I wonder if it could be that threads are now spread
> on different CPUs which generates idle time on the busy CPUs but the
> added time to enter/leave these states hurts the performance.
> 
> Could you make some traces of both kernels ? Tracing sched events
> should be enough to understand the behavior
> 
> Regards,
> Vincent
> 
> >
> > Thanks
> > Vincent
> >
> > > thanks
> > >
> > > [1] 
> > > https://lists.01.org/hyperkitty/list/l...@lists.01.org/thread/SANC7QLYZKUNMM6O7UNR3OAQAKS5BESE/
> > > [2] https://lore.kernel.org/patchwork/cover/1141687/
> > > [3] 
> > > https://www.phoronix.com/scan.php?page=news_item=Linux-5.5-Scheduler


Re: [sched/fair] 0b0695f2b3: phoronix-test-suite.compress-gzip.0.seconds 19.8% regression

2020-05-18 Thread Oliver Sang
On Fri, May 15, 2020 at 10:12:26PM +0800, Hillf Danton wrote:
> 
> On Fri, 15 May 2020 09:43:39 +0800 Oliver Sang wrote:
> > On Thu, May 14, 2020 at 07:09:35PM +0200, Vincent Guittot wrote:
> > > Hi Oliver,
> > > 
> > > On Thu, 14 May 2020 at 16:05, kernel test robot  
> > > wrote:
> > > >
> > > > Hi Vincent Guittot,
> > > >
> > > > Below report FYI.
> > > > Last year, we actually reported an improvement "[sched/fair] 0b0695f2b3:
> > > > vm-scalability.median 3.1% improvement" on link [1].
> > > > but now we found the regression on pts.compress-gzip.
> > > > This seems align with what showed in "[v4,00/10] sched/fair: rework the 
> > > > CFS
> > > > load balance" (link [2]), where showed the reworked load balance could 
> > > > have
> > > > both positive and negative effect for different test suites.
> > > 
> > > We have tried to run  all possible use cases but it's impossible to
> > > covers all so there were a possibility that one that is not covered,
> > > would regressed.
> > > 
> > > > And also from link [3], the patch set risks regressions.
> > > >
> > > > We also confirmed this regression on another platform
> > > > (Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz with 8G memory),
> > > > below is the data (lower is better).
> > > > v5.44.1
> > > > fcf0553db6f4c79387864f6e4ab4a891601f395e4.01
> > > > 0b0695f2b34a4afa3f6e9aa1ff0e5336d8dad9124.89
> > > > v5.55.18
> > > > v5.64.62
> > > > v5.7-rc24.53
> > > > v5.7-rc34.59
> > > >
> > > > It seems there are some recovery on latest kernels, but not fully back.
> 
> Hi
> 
> Here is a tiny diff for growing balance in the over loaded case. Wish it's
> likely going to help you spot the factors behind the regression.

Thanks Hillf!
just wondering what's the target release of below patch?

> 
> Hillf
> 
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8683,15 +8683,12 @@ find_idlest_group(struct sched_domain *s
>   struct sched_group *idlest = NULL, *local = NULL, *group = sd->groups;
>   struct sg_lb_stats local_sgs, tmp_sgs;
>   struct sg_lb_stats *sgs;
> - unsigned long imbalance;
> + unsigned long hal, lal;
>   struct sg_lb_stats idlest_sgs = {
>   .avg_load = UINT_MAX,
>   .group_type = group_overloaded,
>   };
>  
> - imbalance = scale_load_down(NICE_0_LOAD) *
> - (sd->imbalance_pct-100) / 100;
> -
>   do {
>   int local_group;
>  
> @@ -8744,31 +8741,26 @@ find_idlest_group(struct sched_domain *s
>  
>   switch (local_sgs.group_type) {
>   case group_overloaded:
> - case group_fully_busy:
> - /*
> -  * When comparing groups across NUMA domains, it's possible for
> -  * the local domain to be very lightly loaded relative to the
> -  * remote domains but "imbalance" skews the comparison making
> -  * remote CPUs look much more favourable. When considering
> -  * cross-domain, add imbalance to the load on the remote node
> -  * and consider staying local.
> -  */
> -
> - if ((sd->flags & SD_NUMA) &&
> - ((idlest_sgs.avg_load + imbalance) >= local_sgs.avg_load))
> - return NULL;
> + if (idlest_sgs.avg_load < local_sgs.avg_load) {
> + hal = local_sgs.avg_load;
> + lal = idlest_sgs.avg_load;
> + } else {
> + lal = local_sgs.avg_load;  /*  low avg load */
> + hal = idlest_sgs.avg_load; /* high avg load */
> + }
>  
> - /*
> -  * If the local group is less loaded than the selected
> -  * idlest group don't try and push any tasks.
> -  */
> - if (idlest_sgs.avg_load >= (local_sgs.avg_load + imbalance))
> + /* No push if groups are balanced in terms of load */
> + if (100 * hal <= sd->imbalance_pct * lal)
>   return NULL;
>  
> - if (100 * local_sgs.avg_load <= sd->imbalance_pct * 
> idlest_sgs.avg_load)
> + /* No push if it only grows imbalance */
> + if (hal == idlest_sgs.avg_load)
>   return NULL;
>   brea

Re: [sched/fair] 0b0695f2b3: phoronix-test-suite.compress-gzip.0.seconds 19.8% regression

2020-05-14 Thread Oliver Sang
On Thu, May 14, 2020 at 07:09:35PM +0200, Vincent Guittot wrote:
> Hi Oliver,
> 
> On Thu, 14 May 2020 at 16:05, kernel test robot  wrote:
> >
> > Hi Vincent Guittot,
> >
> > Below report FYI.
> > Last year, we actually reported an improvement "[sched/fair] 0b0695f2b3:
> > vm-scalability.median 3.1% improvement" on link [1].
> > but now we found the regression on pts.compress-gzip.
> > This seems align with what showed in "[v4,00/10] sched/fair: rework the CFS
> > load balance" (link [2]), where showed the reworked load balance could have
> > both positive and negative effect for different test suites.
> 
> We have tried to run  all possible use cases but it's impossible to
> covers all so there were a possibility that one that is not covered,
> would regressed.
> 
> > And also from link [3], the patch set risks regressions.
> >
> > We also confirmed this regression on another platform
> > (Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz with 8G memory),
> > below is the data (lower is better).
> > v5.44.1
> > fcf0553db6f4c79387864f6e4ab4a891601f395e4.01
> > 0b0695f2b34a4afa3f6e9aa1ff0e5336d8dad9124.89
> > v5.55.18
> > v5.64.62
> > v5.7-rc24.53
> > v5.7-rc34.59
> >
> > It seems there are some recovery on latest kernels, but not fully back.
> > We were just wondering whether you could share some lights the further works
> > on the load balance after patch set [2] which could cause the performance
> > change?
> > And whether you have plan to refine the load balance algorithm further?
> 
> I'm going to have a look at your regression to understand what is
> going wrong and how it can be fixed

Thanks a lot!

> 
> Thanks
> Vincent
> 


Re: [LKP] [fs/namei.c] e013ec23b8: WARNING:at_fs/dcache.c:#dentry_free

2019-09-04 Thread Oliver Sang
On Wed, Sep 04, 2019 at 02:52:40PM +0800, Oliver Sang wrote:
> On Sat, Aug 31, 2019 at 04:42:46PM +0100, Al Viro wrote:
> > On Sat, Aug 31, 2019 at 09:09:17PM +0800, kernel test robot wrote:
> > 
> > > [   13.886602] WARNING: CPU: 0 PID: 541 at fs/dcache.c:338 
> > > dentry_free+0x7f/0x90
> > > [   13.889208] Modules linked in:
> > > [   13.890276] CPU: 0 PID: 541 Comm: readlink Not tainted 
> > > 5.3.0-rc1-8-ge013ec23b8231 #1
> > > [   13.892699] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> > > BIOS 1.10.2-1 04/01/2014
> > > [   13.895419] RIP: 0010:dentry_free+0x7f/0x90
> > > [   13.896739] Code: f0 75 cb 48 8d be b0 00 00 00 48 83 c4 08 48 c7 c6 
> > > 60 8d cd a5 e9 51 69 e4 ff 48 89 3c 24 48 c7 c7 f8 a9 cb a6 e8 7f 37 e3 
> > > ff <0f> 0b 48 8b 34 24 eb 8f 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90
> > > [   13.901957] RSP: 0018:b5524063fe38 EFLAGS: 00010282
> > > [   13.903527] RAX: 0024 RBX: 9941878040c0 RCX: 
> > > a706aa08
> > > [   13.905566] RDX:  RSI: 0096 RDI: 
> > > 0246
> > > [   13.907612] RBP:  R08: 0280 R09: 
> > > 0033
> > > [   13.909664] R10:  R11: b5524063fce8 R12: 
> > > 994187804118
> > > [   13.911711] R13: 99427a81 R14: 994187d7c8f0 R15: 
> > > 99427a810b80
> > > [   13.913753] FS:  () GS:9942bfc0() 
> > > knlGS:
> > > [   13.916187] CS:  0010 DS: 002b ES: 002b CR0: 80050033
> > > [   13.917892] CR2: 0937458b CR3: 6800a000 CR4: 
> > > 06f0
> > > [   13.919925] Call Trace:
> > > [   13.920840]  __dentry_kill+0x13c/0x1a0
> > > [   13.922076]  path_put+0x12/0x20
> > > [   13.923148]  free_fs_struct+0x1b/0x30
> > > [   13.924346]  do_exit+0x304/0xc40
> > > [   13.925438]  ? __schedule+0x25d/0x670
> > > [   13.926642]  do_group_exit+0x3a/0xa0
> > > [   13.927817]  __ia32_sys_exit_group+0x14/0x20
> > > [   13.929160]  do_fast_syscall_32+0xa9/0x340
> > > [   13.930565]  entry_SYSENTER_compat+0x7f/0x91
> > > [   13.931924] ---[ end trace 02c6706eb2c2ebf2 ]---
> > > 
> > > 
> > > To reproduce:
> > > 
> > > # build kernel
> > >   cd linux
> > >   cp config-5.3.0-rc1-8-ge013ec23b8231 .config
> > >   make HOSTCC=gcc-7 CC=gcc-7 ARCH=x86_64 olddefconfig prepare 
> > > modules_prepare bzImage
> > > 
> > > git clone https://github.com/intel/lkp-tests.git
> > > cd lkp-tests
> > > bin/lkp qemu -k  job-script # job-script is attached in 
> > > this email
> > 
> > Can't reproduce here...
> 
> any detail failure by using this reproducer?
> 
> > 
> > I see one potential problem in there, but I would expect it to have the
> > opposite effect (I really don't believe that it's a ->d_count wraparound -
> > that would've taken much longer than a minute, if nothing else).
> > 
> > How reliably is it reproduced on your setup and does the following have
> > any impact, one way or another?
> 
> It is always reproduced. We noticed that your branch was rebased. If it's 
> still with problem, will let you know.

by testing the below HEAD commit of current branch, the issue gone

commit 46c46f8df9aa425cc4d6bc89d57a6fedf83dc797 (HEAD -> work.namei, 
origin/work.namei)
Author: Al Viro 
Date:   Sat Jul 27 16:29:22 2019 -0400

> 
> > 
> > diff --git a/fs/namei.c b/fs/namei.c
> > index 412479e4c258..671c3c1a3425 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -643,10 +643,8 @@ static bool legitimize_root(struct nameidata *nd)
> >  {
> > if (!nd->root.mnt || (nd->flags & LOOKUP_ROOT))
> > return true;
> > -   if (unlikely(!legitimize_path(nd, >root, nd->root_seq)))
> > -   return false;
> > nd->flags |= LOOKUP_ROOT_GRABBED;
> > -   return true;
> > +   return legitimize_path(nd, >root, nd->root_seq);
> >  }
> >  
> >  /*
> > ___
> > LKP mailing list
> > l...@lists.01.org
> > https://lists.01.org/mailman/listinfo/lkp
> ___
> LKP mailing list
> l...@lists.01.org
> https://lists.01.org/mailman/listinfo/lkp


Re: [LKP] [fs/namei.c] e013ec23b8: WARNING:at_fs/dcache.c:#dentry_free

2019-09-04 Thread Oliver Sang
On Sat, Aug 31, 2019 at 04:42:46PM +0100, Al Viro wrote:
> On Sat, Aug 31, 2019 at 09:09:17PM +0800, kernel test robot wrote:
> 
> > [   13.886602] WARNING: CPU: 0 PID: 541 at fs/dcache.c:338 
> > dentry_free+0x7f/0x90
> > [   13.889208] Modules linked in:
> > [   13.890276] CPU: 0 PID: 541 Comm: readlink Not tainted 
> > 5.3.0-rc1-8-ge013ec23b8231 #1
> > [   13.892699] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > 1.10.2-1 04/01/2014
> > [   13.895419] RIP: 0010:dentry_free+0x7f/0x90
> > [   13.896739] Code: f0 75 cb 48 8d be b0 00 00 00 48 83 c4 08 48 c7 c6 60 
> > 8d cd a5 e9 51 69 e4 ff 48 89 3c 24 48 c7 c7 f8 a9 cb a6 e8 7f 37 e3 ff 
> > <0f> 0b 48 8b 34 24 eb 8f 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90
> > [   13.901957] RSP: 0018:b5524063fe38 EFLAGS: 00010282
> > [   13.903527] RAX: 0024 RBX: 9941878040c0 RCX: 
> > a706aa08
> > [   13.905566] RDX:  RSI: 0096 RDI: 
> > 0246
> > [   13.907612] RBP:  R08: 0280 R09: 
> > 0033
> > [   13.909664] R10:  R11: b5524063fce8 R12: 
> > 994187804118
> > [   13.911711] R13: 99427a81 R14: 994187d7c8f0 R15: 
> > 99427a810b80
> > [   13.913753] FS:  () GS:9942bfc0() 
> > knlGS:
> > [   13.916187] CS:  0010 DS: 002b ES: 002b CR0: 80050033
> > [   13.917892] CR2: 0937458b CR3: 6800a000 CR4: 
> > 06f0
> > [   13.919925] Call Trace:
> > [   13.920840]  __dentry_kill+0x13c/0x1a0
> > [   13.922076]  path_put+0x12/0x20
> > [   13.923148]  free_fs_struct+0x1b/0x30
> > [   13.924346]  do_exit+0x304/0xc40
> > [   13.925438]  ? __schedule+0x25d/0x670
> > [   13.926642]  do_group_exit+0x3a/0xa0
> > [   13.927817]  __ia32_sys_exit_group+0x14/0x20
> > [   13.929160]  do_fast_syscall_32+0xa9/0x340
> > [   13.930565]  entry_SYSENTER_compat+0x7f/0x91
> > [   13.931924] ---[ end trace 02c6706eb2c2ebf2 ]---
> > 
> > 
> > To reproduce:
> > 
> > # build kernel
> > cd linux
> > cp config-5.3.0-rc1-8-ge013ec23b8231 .config
> > make HOSTCC=gcc-7 CC=gcc-7 ARCH=x86_64 olddefconfig prepare 
> > modules_prepare bzImage
> > 
> > git clone https://github.com/intel/lkp-tests.git
> > cd lkp-tests
> > bin/lkp qemu -k  job-script # job-script is attached in 
> > this email
> 
> Can't reproduce here...

any detail failure by using this reproducer?

> 
> I see one potential problem in there, but I would expect it to have the
> opposite effect (I really don't believe that it's a ->d_count wraparound -
> that would've taken much longer than a minute, if nothing else).
> 
> How reliably is it reproduced on your setup and does the following have
> any impact, one way or another?

It is always reproduced. We noticed that your branch was rebased. If it's still 
with problem, will let you know.

> 
> diff --git a/fs/namei.c b/fs/namei.c
> index 412479e4c258..671c3c1a3425 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -643,10 +643,8 @@ static bool legitimize_root(struct nameidata *nd)
>  {
>   if (!nd->root.mnt || (nd->flags & LOOKUP_ROOT))
>   return true;
> - if (unlikely(!legitimize_path(nd, >root, nd->root_seq)))
> - return false;
>   nd->flags |= LOOKUP_ROOT_GRABBED;
> - return true;
> + return legitimize_path(nd, >root, nd->root_seq);
>  }
>  
>  /*
> ___
> LKP mailing list
> l...@lists.01.org
> https://lists.01.org/mailman/listinfo/lkp


Re: [ext4] [confidence: ] 2f7f60cf9f: WARNING:at_lib/list_debug.c:#__list_add_valid

2019-08-29 Thread Oliver Sang
2.495780]  ext4_es_register_shrinker+0x53/0x130 [ext4]
> > [   62.497235]  ext4_fill_super+0x1cd4/0x3ad0 [ext4]
> > [   62.498521]  ? ext4_calculate_overhead+0x4a0/0x4a0 [ext4]
> > [   62.499946]  mount_bdev+0x173/0x1b0
> > [   62.501120]  legacy_get_tree+0x27/0x40
> > [   62.502315]  vfs_get_tree+0x25/0xf0
> > [   62.503421]  do_mount+0x691/0x9c0
> > [   62.504516]  ? memdup_user+0x4b/0x70
> > [   62.505793]  ksys_mount+0x80/0xd0
> > [   62.506858]  __x64_sys_mount+0x21/0x30
> > [   62.507979]  do_syscall_64+0x5b/0x1f0
> > [   62.509194]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [   62.510491] RIP: 0033:0x7f8b2320f48a
> > [   62.511589] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 
> > 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 
> > <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d de f9 2a 00 f7 d8 64 89 01 48
> > [   62.515429] RSP: 002b:7ffdcb5920e8 EFLAGS: 0206 ORIG_RAX: 
> > 00a5
> > [   62.517274] RAX: ffda RBX: 5564e2fd94d5 RCX: 
> > 7f8b2320f48a
> > [   62.518865] RDX: 5564e2fd94d5 RSI: 5564e2fd6b08 RDI: 
> > 7ffdcb593edf
> > [   62.520461] RBP: 7ffdcb593edf R08:  R09: 
> > 5564e2fd94d5
> > [   62.522183] R10:  R11: 0206 R12: 
> > 5564e2fd6b08
> > [   62.523770] R13: 00005564e2fd6b67 R14: 02f5 R15: 
> > 
> > [   62.525457] ---[ end trace 6c35045d811b284c ]---
> > 
> > 
> > To reproduce:
> > 
> > # build kernel
> > cd linux
> > cp config-5.3.0-rc5-00283-g2f7f60cf9fbcd .config
> > make HOSTCC=gcc-7 CC=gcc-7 ARCH=x86_64 olddefconfig prepare 
> > modules_prepare bzImage modules
> > make HOSTCC=gcc-7 CC=gcc-7 ARCH=x86_64 
> > INSTALL_MOD_PATH= modules_install
> > cd 
> > find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz
> > 
> > 
> > git clone https://github.com/intel/lkp-tests.git
> > cd lkp-tests
> > bin/lkp qemu -k  -m modules.cgz job-script # job-script is 
> > attached in this email
> > 
> > 
> > 
> > Thanks,
> > Oliver Sang
> > 
>