Re: [PATCH v2] bootconfig: use memblock_free_late to free xbc memory to buddy

2024-04-14 Thread Qiang Zhang
On Sat, Apr 13, 2024 at 09:21:38PM +0900, Masami Hiramatsu wrote:
>Hi Qiang,
>
>I found xbc_free_mem() missed to check !addr. When I booted kernel without
>bootconfig data but with "bootconfig" cmdline, I got a kernel crash below;
>
>
>[2.394904] [ cut here ]
>[2.396490] kernel BUG at arch/x86/mm/physaddr.c:28!
>[2.398176] invalid opcode:  [#1] PREEMPT SMP PTI
>[2.399388] CPU: 7 PID: 1 Comm: swapper/0 Tainted: G N 
>6.9.0-rc3-4-g121fbb463836 #10
>[2.401579] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
>1.15.0-1 04/01/2014
>[2.403247] RIP: 0010:__phys_addr+0x40/0x60
>[2.404196] Code: 48 2b 05 fb a4 3d 01 48 05 00 00 00 80 48 39 c7 72 17 0f 
>b6 0d ee 9e c0 01 48 89 c2 48 d3 ea 48 85 d2 75 05 c3 cc cc cc cc 90 <0f> 0b 
>48 03 05 e7 e2 9d 01 48 81 ff ff ff ff 1f 76 e8 90 0f6
>[2.407250] RSP: :c9013f18 EFLAGS: 00010287
>[2.407991] RAX: 7780 RBX: 81c17940 RCX: 
>008a
>[2.408891] RDX: 008b RSI: 88800775f320 RDI: 
>8000
>[2.409727] RBP:  R08:  R09: 
>
>[2.410555] R10: 888005028a60 R11: 008a R12: 
>
>[2.411423] R13:  R14:  R15: 
>
>[2.412155] FS:  () GS:88807d9c() 
>knlGS:
>[2.412970] CS:  0010 DS:  ES:  CR0: 80050033
>[2.413550] CR2:  CR3: 02a48000 CR4: 
>06b0
>[2.414264] Call Trace:
>[2.414520]  
>[2.414755]  ? die+0x37/0x90
>[2.415062]  ? do_trap+0xe3/0x110
>[2.415451]  ? __phys_addr+0x40/0x60
>[2.415822]  ? do_error_trap+0x9c/0x120
>[2.416215]  ? __phys_addr+0x40/0x60
>[2.416573]  ? __phys_addr+0x40/0x60
>[2.416968]  ? exc_invalid_op+0x53/0x70
>[2.417358]  ? __phys_addr+0x40/0x60
>[2.417709]  ? asm_exc_invalid_op+0x1a/0x20
>[2.418122]  ? __pfx_kernel_init+0x10/0x10
>[2.418569]  ? __phys_addr+0x40/0x60
>[2.418960]  _xbc_exit+0x74/0xc0
>[2.419374]  kernel_init+0x3a/0x1c0
>[2.419764]  ret_from_fork+0x34/0x50
>[2.420132]  ? __pfx_kernel_init+0x10/0x10
>[2.420578]  ret_from_fork_asm+0x1a/0x30
>[2.420973]  
>[2.421200] Modules linked in:
>[2.421598] ---[ end trace  ]---
>[2.422053] RIP: 0010:__phys_addr+0x40/0x60
>[2.422484] Code: 48 2b 05 fb a4 3d 01 48 05 00 00 00 80 48 39 c7 72 17 0f 
>b6 0d ee 9e c0 01 48 89 c2 48 d3 ea 48 85 d2 75 05 c3 cc cc cc cc 90 <0f> 0b 
>48 03 05 e7 e2 9d 01 48 81 ff ff ff ff 1f 76 e8 90 0f6
>[2.424294] RSP: :c9013f18 EFLAGS: 00010287
>[2.424769] RAX: 7780 RBX: 81c17940 RCX: 
>008a
>[2.425378] RDX: 008b RSI: 88800775f320 RDI: 
>8000
>[2.425993] RBP:  R08:  R09: 
>
>[2.426589] R10: 888005028a60 R11: 008a R12: 
>
>[2.427156] R13:  R14:  R15: 
>
>[2.427746] FS:  () GS:88807d9c() 
>knlGS:
>[2.428368] CS:  0010 DS:  ES:  CR0: 80050033
>[2.428820] CR2:  CR3: 02a48000 CR4: 
>06b0
>[2.429373] Kernel panic - not syncing: Fatal exception
>[2.429982] Kernel Offset: disabled
>[2.430261] ---[ end Kernel panic - not syncing: Fatal exception ]---
>
>Adding below patch fixed it.
>
>diff --git a/lib/bootconfig.c b/lib/bootconfig.c
>index f9a45adc6307..8841554432d5 100644
>--- a/lib/bootconfig.c
>+++ b/lib/bootconfig.c
>@@ -65,7 +65,7 @@ static inline void __init xbc_free_mem(void *addr, size_t 
>size, bool early)
> {
>   if (early)
>   memblock_free(addr, size);
>-  else
>+  else if (addr)
>   memblock_free_late(__pa(addr), size);
> }
> 
>Can you update with this fix?

Sure.

>
>Thank you,
>
>
>On Fri, 12 Apr 2024 22:18:20 +0900
>Masami Hiramatsu (Google)  wrote:
>
>> On Fri, 12 Apr 2024 18:49:41 +0800
>> qiang4.zh...@linux.intel.com wrote:
>> 
>> > From: Qiang Zhang 
>> > 
>> > On the time to free xbc memory in xbc_exit(), memblock may has handed
>> > over memory to buddy allocator. So it doesn't make sense to free memory
>> > back to memblock. memblock_free() called by xbc_exit() even causes UAF bugs
>> > on architectures with CONFIG_ARCH_KEEP_MEMBLOCK disabled like x86.
>> > Following KASAN logs

Re: [PATCH RESEND] bootconfig: use memblock_free_late to free xbc memory to buddy

2024-04-12 Thread Qiang Zhang
On Fri, Apr 12, 2024 at 04:34:48PM +0900, Masami Hiramatsu wrote:
>On Fri, 12 Apr 2024 10:41:04 +0800
>qiang4.zh...@linux.intel.com wrote:
>
>> From: Qiang Zhang 
>> 
>> On the time to free xbc memory, memblock has handed over memory to buddy
>> allocator. So it doesn't make sense to free memory back to memblock.
>> memblock_free() called by xbc_exit() even causes UAF bugs on architectures
>> with CONFIG_ARCH_KEEP_MEMBLOCK disabled like x86. Following KASAN logs
>> shows this case.
>> 
>> [9.410890] 
>> ==
>> [9.418962] BUG: KASAN: use-after-free in 
>> memblock_isolate_range+0x12d/0x260
>> [9.426850] Read of size 8 at addr 88845dd3 by task swapper/0/1
>> 
>> [9.435901] CPU: 9 PID: 1 Comm: swapper/0 Tainted: G U 
>> 6.9.0-rc3-00208-g586b5dfb51b9 #5
>> [9.446403] Hardware name: Intel Corporation RPLP LP5 
>> (CPU:RaptorLake)/RPLP LP5 (ID:13), BIOS 
>> IRPPN02.01.01.00.00.19.015.D- Dec 28 2023
>> [9.460789] Call Trace:
>> [9.463518]  
>> [9.465859]  dump_stack_lvl+0x53/0x70
>> [9.469949]  print_report+0xce/0x610
>> [9.473944]  ? __virt_addr_valid+0xf5/0x1b0
>> [9.478619]  ? memblock_isolate_range+0x12d/0x260
>> [9.483877]  kasan_report+0xc6/0x100
>> [9.487870]  ? memblock_isolate_range+0x12d/0x260
>> [9.493125]  memblock_isolate_range+0x12d/0x260
>> [9.498187]  memblock_phys_free+0xb4/0x160
>> [9.502762]  ? __pfx_memblock_phys_free+0x10/0x10
>> [9.508021]  ? mutex_unlock+0x7e/0xd0
>> [9.512111]  ? __pfx_mutex_unlock+0x10/0x10
>> [9.516786]  ? kernel_init_freeable+0x2d4/0x430
>> [9.521850]  ? __pfx_kernel_init+0x10/0x10
>> [9.526426]  xbc_exit+0x17/0x70
>> [9.529935]  kernel_init+0x38/0x1e0
>> [9.533829]  ? _raw_spin_unlock_irq+0xd/0x30
>> [9.538601]  ret_from_fork+0x2c/0x50
>> [9.542596]  ? __pfx_kernel_init+0x10/0x10
>> [9.547170]  ret_from_fork_asm+0x1a/0x30
>> [9.551552]  
>> 
>> [9.555649] The buggy address belongs to the physical page:
>> [9.561875] page: refcount:0 mapcount:0 mapping: 
>> index:0x1 pfn:0x45dd30
>> [9.570821] flags: 0x200(node=0|zone=2)
>> [9.576271] page_type: 0x()
>> [9.580167] raw: 0200 ea0011774c48 ea0012ba1848 
>> 
>> [9.588823] raw: 0001   
>> 
>> [9.597476] page dumped because: kasan: bad access detected
>> 
>> [9.605362] Memory state around the buggy address:
>> [9.610714]  88845dd2ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
>> 00 00
>> [9.618786]  88845dd2ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
>> 00 00
>> [9.626857] >88845dd3: ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
>> ff ff
>> [9.634930]^
>> [9.638534]  88845dd30080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
>> ff ff
>> [9.646605]  88845dd30100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
>> ff ff
>> [9.654675] 
>> ==
>> 
>
>Oops, good catch! Indeed, it is too late to use memblock_free().
>
>BTW, is it safe to call memblock_free_late() in early boot stage,
>because xbc_free_mem() will be called also from xbc_init().
>If not, we need a custom internal __xbc_exit() or xbc_cleanup()
>which is called from xbc_init() and uses memblock_free().

No, memblock_free_late() can't be used early.
Exit and Cleanup seem alike and are confusing. Maybe adding a early flag to
_xbc_exit(bool early) is more clear. I will push a V2 for this.

>
>Thank you,
>
>
>> Cc: sta...@vger.kernel.org
>> Signed-off-by: Qiang Zhang 
>> ---
>>  lib/bootconfig.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/lib/bootconfig.c b/lib/bootconfig.c
>> index c59d26068a64..4524ee944df0 100644
>> --- a/lib/bootconfig.c
>> +++ b/lib/bootconfig.c
>> @@ -63,7 +63,7 @@ static inline void * __init xbc_alloc_mem(size_t size)
>>  
>>  static inline void __init xbc_free_mem(void *addr, size_t size)
>>  {
>> -memblock_free(addr, size);
>> +memblock_free_late(__pa(addr), size);
>>  }
>>  
>>  #else /* !__KERNEL__ */
>> -- 
>> 2.39.2
>> 
>> 
>
>
>-- 
>Masami Hiramatsu (Google) 



Re: [PATCH] bootconfig: use memblock_free_late to free xbc memory to buddy

2024-04-11 Thread Qiang Zhang
On Fri, Apr 12, 2024 at 10:03:26AM +0800, qiang4.zh...@linux.intel.com wrote:
>From: Qiang Zhang 
>
>On the time to free xbc memory, memblock has handed over memory to buddy
>allocator. So it doesn't make sense to free memory back to memblock.
>memblock_free() called by xbc_exit() even causes UAF bugs on architectures
>with CONFIG_ARCH_KEEP_MEMBLOCK disabled like x86. Following KASAN logs
>shows this case.
>
>[9.410890] 
>==
>[9.418962] BUG: KASAN: use-after-free in memblock_isolate_range+0x12d/0x260
>[9.426850] Read of size 8 at addr 88845dd3 by task swapper/0/1
>
>[9.435901] CPU: 9 PID: 1 Comm: swapper/0 Tainted: G U 
>6.9.0-rc3-00208-g586b5dfb51b9 #5
>[9.446403] Hardware name: Intel Corporation RPLP LP5 (CPU:RaptorLake)/RPLP 
>LP5 (ID:13), BIOS IRPPN02.01.01.00.00.19.015.D- Dec 28 2023
>[9.460789] Call Trace:
>[9.463518]  
>[9.465859]  dump_stack_lvl+0x53/0x70
>[9.469949]  print_report+0xce/0x610
>[9.473944]  ? __virt_addr_valid+0xf5/0x1b0
>[9.478619]  ? memblock_isolate_range+0x12d/0x260
>[9.483877]  kasan_report+0xc6/0x100
>[9.487870]  ? memblock_isolate_range+0x12d/0x260
>[9.493125]  memblock_isolate_range+0x12d/0x260
>[9.498187]  memblock_phys_free+0xb4/0x160
>[9.502762]  ? __pfx_memblock_phys_free+0x10/0x10
>[9.508021]  ? mutex_unlock+0x7e/0xd0
>[9.512111]  ? __pfx_mutex_unlock+0x10/0x10
>[9.516786]  ? kernel_init_freeable+0x2d4/0x430
>[9.521850]  ? __pfx_kernel_init+0x10/0x10
>[9.526426]  xbc_exit+0x17/0x70
>[9.529935]  kernel_init+0x38/0x1e0
>[9.533829]  ? _raw_spin_unlock_irq+0xd/0x30
>[9.538601]  ret_from_fork+0x2c/0x50
>[9.542596]  ? __pfx_kernel_init+0x10/0x10
>[9.547170]  ret_from_fork_asm+0x1a/0x30
>[9.551552]  
>
>[9.555649] The buggy address belongs to the physical page:
>[9.561875] page: refcount:0 mapcount:0 mapping: index:0x1 
>pfn:0x45dd30
>[9.570821] flags: 0x200(node=0|zone=2)
>[9.576271] page_type: 0x()
>[9.580167] raw: 0200 ea0011774c48 ea0012ba1848 
>
>[9.588823] raw: 0001   
>
>[9.597476] page dumped because: kasan: bad access detected
>
>[9.605362] Memory state around the buggy address:
>[9.610714]  88845dd2ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
>00
>[9.618786]  88845dd2ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
>00
>[9.626857] >88845dd3: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
>ff
>[9.634930]^
>[9.638534]  88845dd30080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
>ff
>[9.646605]  88845dd30100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
>ff
>[9.654675] 
>==

Sorry. Forget to Cc stable. Will send a new one.

>
>Signed-off-by: Qiang Zhang 
>---
> lib/bootconfig.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/lib/bootconfig.c b/lib/bootconfig.c
>index c59d26068a64..4524ee944df0 100644
>--- a/lib/bootconfig.c
>+++ b/lib/bootconfig.c
>@@ -63,7 +63,7 @@ static inline void * __init xbc_alloc_mem(size_t size)
> 
> static inline void __init xbc_free_mem(void *addr, size_t size)
> {
>-  memblock_free(addr, size);
>+  memblock_free_late(__pa(addr), size);
> }
> 
> #else /* !__KERNEL__ */
>-- 
>2.39.2
>



[PATCH] irq_work: record irq_work_queue() call stack

2021-03-31 Thread qiang . zhang
From: Zqiang 

Add the irq_work_queue() call stack into the KASAN auxiliary
stack in order to improve KASAN reports. this will let us know
where the irq work be queued.

Signed-off-by: Zqiang 
---
 kernel/irq_work.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index e8da1e71583a..23a7a0ba1388 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -19,7 +19,7 @@
 #include 
 #include 
 #include 
-
+#include 
 
 static DEFINE_PER_CPU(struct llist_head, raised_list);
 static DEFINE_PER_CPU(struct llist_head, lazy_list);
@@ -70,6 +70,9 @@ bool irq_work_queue(struct irq_work *work)
if (!irq_work_claim(work))
return false;
 
+   /*record irq_work call stack in order to print it in KASAN reports*/
+   kasan_record_aux_stack(work);
+
/* Queue the entry and raise the IPI if needed. */
preempt_disable();
__irq_work_queue_local(work);
@@ -98,6 +101,8 @@ bool irq_work_queue_on(struct irq_work *work, int cpu)
if (!irq_work_claim(work))
return false;
 
+   kasan_record_aux_stack(work);
+
preempt_disable();
if (cpu != smp_processor_id()) {
/* Arch remote IPI send/receive backend aren't NMI safe */
-- 
2.17.1



[PATCH] lib: stackdepot: turn depot_lock spinlock to raw_spinlock

2021-03-29 Thread qiang . zhang
From: Zqiang 

[2.670635] BUG: sleeping function called from invalid context
at kernel/locking/rtmutex.c:951
[2.670638] in_atomic(): 0, irqs_disabled(): 1, non_block: 0,
pid: 19, name: pgdatinit0
[2.670768] Call Trace:
[2.670800]  dump_stack+0x93/0xc2
[2.670826]  ___might_sleep.cold+0x1b2/0x1f1
[2.670838]  rt_spin_lock+0x3b/0xb0
[2.670838]  stack_depot_save+0x1b9/0x440
[2.670838]  kasan_save_stack+0x32/0x40
[2.670838]  kasan_record_aux_stack+0xa5/0xb0
[2.670838]  __call_rcu+0x117/0x880
[2.670838]  __exit_signal+0xafb/0x1180
[2.670838]  release_task+0x1d6/0x480
[2.670838]  exit_notify+0x303/0x750
[2.670838]  do_exit+0x678/0xcf0
[2.670838]  kthread+0x364/0x4f0
[2.670838]  ret_from_fork+0x22/0x30

In RT system, the spin_lock will be replaced by sleepable
rt_mutex lock, in __call_rcu(), disable interrupts before
calling kasan_record_aux_stack(), will trigger above calltrace,
replace spinlock with raw_spinlock.

Reported-by: Andrew Halaney 
Signed-off-by: Zqiang 
---
 lib/stackdepot.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 49f67a0c6e5d..df9179f4f441 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -71,7 +71,7 @@ static void *stack_slabs[STACK_ALLOC_MAX_SLABS];
 static int depot_index;
 static int next_slab_inited;
 static size_t depot_offset;
-static DEFINE_SPINLOCK(depot_lock);
+static DEFINE_RAW_SPINLOCK(depot_lock);
 
 static bool init_stack_slab(void **prealloc)
 {
@@ -305,7 +305,7 @@ depot_stack_handle_t stack_depot_save(unsigned long 
*entries,
prealloc = page_address(page);
}
 
-   spin_lock_irqsave(_lock, flags);
+   raw_spin_lock_irqsave(_lock, flags);
 
found = find_stack(*bucket, entries, nr_entries, hash);
if (!found) {
@@ -329,7 +329,7 @@ depot_stack_handle_t stack_depot_save(unsigned long 
*entries,
WARN_ON(!init_stack_slab());
}
 
-   spin_unlock_irqrestore(_lock, flags);
+   raw_spin_unlock_irqrestore(_lock, flags);
 exit:
if (prealloc) {
/* Nobody used this memory, ok to free it. */
-- 
2.17.1



[PATCH v2] loop: call __loop_clr_fd() with lo_mutex locked to avoid autoclear race

2021-03-26 Thread qiang . zhang
From: Zqiang 

lo->lo_refcnt = 0

CPU0 CPU1
lo_open()lo_open()
 mutex_lock(>lo_mutex) 
 atomic_inc(>lo_refcnt)  
 lo_refcnt == 1
 mutex_unlock(>lo_mutex) 
 mutex_lock(>lo_mutex)
 atomic_inc(>lo_refcnt)
 lo_refcnt == 2
 mutex_unlock(>lo_mutex)
loop_clr_fd()
 mutex_lock(>lo_mutex)
 atomic_read(>lo_refcnt) > 1
 lo->lo_flags |= LO_FLAGS_AUTOCLEARlo_release()
 mutex_unlock(>lo_mutex)
 return  mutex_lock(>lo_mutex)
   atomic_dec_return(>lo_refcnt)
 lo_refcnt == 1
 mutex_unlock(>lo_mutex)
 return

lo_release()
 mutex_lock(>lo_mutex)
 atomic_dec_return(>lo_refcnt)
 lo_refcnt == 0
 lo->lo_flags & LO_FLAGS_AUTOCLEAR 
  == true
 mutex_unlock(>lo_mutex)  loop_control_ioctl()
   case LOOP_CTL_REMOVE:
mutex_lock(>lo_mutex)
atomic_read(>lo_refcnt)==0
  __loop_clr_fd(lo, true)   mutex_unlock(>lo_mutex)
mutex_lock(>lo_mutex)loop_remove(lo)
   mutex_destroy(>lo_mutex)
  ..   kfree(lo)
   data race

When different tasks on two CPUs perform the above operations on the same
lo device, data race may be occur, Do not drop lo->lo_mutex before calling
 __loop_clr_fd(), so refcnt and LO_FLAGS_AUTOCLEAR check in lo_release
stay in sync.

Fixes: 6cc8e7430801 ("loop: scale loop device by introducing per device lock")
Signed-off-by: Zqiang 
---
 v1->v2:
 Modify the title and commit message. 

 drivers/block/loop.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index d58d68f3c7cd..5712f1698a66 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1201,7 +1201,6 @@ static int __loop_clr_fd(struct loop_device *lo, bool 
release)
bool partscan = false;
int lo_number;
 
-   mutex_lock(>lo_mutex);
if (WARN_ON_ONCE(lo->lo_state != Lo_rundown)) {
err = -ENXIO;
goto out_unlock;
@@ -1257,7 +1256,6 @@ static int __loop_clr_fd(struct loop_device *lo, bool 
release)
lo_number = lo->lo_number;
loop_unprepare_queue(lo);
 out_unlock:
-   mutex_unlock(>lo_mutex);
if (partscan) {
/*
 * bd_mutex has been held already in release path, so don't
@@ -1288,12 +1286,11 @@ static int __loop_clr_fd(struct loop_device *lo, bool 
release)
 * protects us from all the other places trying to change the 'lo'
 * device.
 */
-   mutex_lock(>lo_mutex);
+
lo->lo_flags = 0;
if (!part_shift)
lo->lo_disk->flags |= GENHD_FL_NO_PART_SCAN;
lo->lo_state = Lo_unbound;
-   mutex_unlock(>lo_mutex);
 
/*
 * Need not hold lo_mutex to fput backing file. Calling fput holding
@@ -1332,9 +1329,10 @@ static int loop_clr_fd(struct loop_device *lo)
return 0;
}
lo->lo_state = Lo_rundown;
+   err = __loop_clr_fd(lo, false);
mutex_unlock(>lo_mutex);
 
-   return __loop_clr_fd(lo, false);
+   return err;
 }
 
 static int
@@ -1916,13 +1914,12 @@ static void lo_release(struct gendisk *disk, fmode_t 
mode)
if (lo->lo_state != Lo_bound)
goto out_unlock;
lo->lo_state = Lo_rundown;
-   mutex_unlock(>lo_mutex);
/*
 * In autoclear mode, stop the loop thread
 * and remove configuration after last close.
 */
__loop_clr_fd(lo, true);
-   return;
+   goto out_unlock;
} else if (lo->lo_state == Lo_bound) {
/*
 * Otherwise keep thread (if running) and config,
-- 
2.17.1



[PATCH] loop: Fix use of unsafe lo->lo_mutex locks

2021-03-25 Thread qiang . zhang
From: Zqiang 

lo->lo_refcnt = 0

CPU0 CPU1
lo_open()lo_open()
 mutex_lock(>lo_mutex) 
 atomic_inc(>lo_refcnt)  
 lo_refcnt == 1
 mutex_unlock(>lo_mutex) 
 mutex_lock(>lo_mutex)
 atomic_inc(>lo_refcnt)
 lo_refcnt == 2
 mutex_unlock(>lo_mutex)
loop_clr_fd()
 mutex_lock(>lo_mutex)
 atomic_read(>lo_refcnt) > 1
 lo->lo_flags |= LO_FLAGS_AUTOCLEARlo_release()
 mutex_unlock(>lo_mutex)
 return  mutex_lock(>lo_mutex)
   atomic_dec_return(>lo_refcnt)
 lo_refcnt == 1
 mutex_unlock(>lo_mutex)
 return

lo_release()
 mutex_lock(>lo_mutex)
 atomic_dec_return(>lo_refcnt)
 lo_refcnt == 0
 lo->lo_flags & LO_FLAGS_AUTOCLEAR 
  == true
 mutex_unlock(>lo_mutex)  loop_control_ioctl()
   case LOOP_CTL_REMOVE:
mutex_lock(>lo_mutex)
atomic_read(>lo_refcnt)==0
  __loop_clr_fd(lo, true)   mutex_unlock(>lo_mutex)
mutex_lock(>lo_mutex)loop_remove(lo)
   mutex_destroy(>lo_mutex)
  ..   kfree(lo)
   UAF

When different tasks on two CPUs perform the above operations on the same
lo device, UAF may occur.

Fixes: 6cc8e7430801 ("loop: scale loop device by introducing per device lock")
Signed-off-by: Zqiang 
---
 drivers/block/loop.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index d58d68f3c7cd..5712f1698a66 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1201,7 +1201,6 @@ static int __loop_clr_fd(struct loop_device *lo, bool 
release)
bool partscan = false;
int lo_number;
 
-   mutex_lock(>lo_mutex);
if (WARN_ON_ONCE(lo->lo_state != Lo_rundown)) {
err = -ENXIO;
goto out_unlock;
@@ -1257,7 +1256,6 @@ static int __loop_clr_fd(struct loop_device *lo, bool 
release)
lo_number = lo->lo_number;
loop_unprepare_queue(lo);
 out_unlock:
-   mutex_unlock(>lo_mutex);
if (partscan) {
/*
 * bd_mutex has been held already in release path, so don't
@@ -1288,12 +1286,11 @@ static int __loop_clr_fd(struct loop_device *lo, bool 
release)
 * protects us from all the other places trying to change the 'lo'
 * device.
 */
-   mutex_lock(>lo_mutex);
+
lo->lo_flags = 0;
if (!part_shift)
lo->lo_disk->flags |= GENHD_FL_NO_PART_SCAN;
lo->lo_state = Lo_unbound;
-   mutex_unlock(>lo_mutex);
 
/*
 * Need not hold lo_mutex to fput backing file. Calling fput holding
@@ -1332,9 +1329,10 @@ static int loop_clr_fd(struct loop_device *lo)
return 0;
}
lo->lo_state = Lo_rundown;
+   err = __loop_clr_fd(lo, false);
mutex_unlock(>lo_mutex);
 
-   return __loop_clr_fd(lo, false);
+   return err;
 }
 
 static int
@@ -1916,13 +1914,12 @@ static void lo_release(struct gendisk *disk, fmode_t 
mode)
if (lo->lo_state != Lo_bound)
goto out_unlock;
lo->lo_state = Lo_rundown;
-   mutex_unlock(>lo_mutex);
/*
 * In autoclear mode, stop the loop thread
 * and remove configuration after last close.
 */
__loop_clr_fd(lo, true);
-   return;
+   goto out_unlock;
} else if (lo->lo_state == Lo_bound) {
/*
 * Otherwise keep thread (if running) and config,
-- 
2.17.1



[PATCH v3] bpf: Fix memory leak in copy_process()

2021-03-16 Thread qiang . zhang
From: Zqiang 

The syzbot report a memleak follow:
BUG: memory leak
unreferenced object 0x888101b41d00 (size 120):
  comm "kworker/u4:0", pid 8, jiffies 4294944270 (age 12.780s)
  backtrace:
[] alloc_pid+0x66/0x560
[] copy_process+0x1465/0x25e0
[] kernel_clone+0xf3/0x670
[] kernel_thread+0x61/0x80
[] call_usermodehelper_exec_work
[] call_usermodehelper_exec_work+0xc4/0x120
[] process_one_work+0x2c9/0x600
[] worker_thread+0x59/0x5d0
[] kthread+0x178/0x1b0
[] ret_from_fork+0x1f/0x30

unreferenced object 0x888110ef5c00 (size 232):
  comm "kworker/u4:0", pid 8414, jiffies 4294944270 (age 12.780s)
  backtrace:
[] kmem_cache_zalloc
[] __alloc_file+0x1f/0xf0
[] alloc_empty_file+0x69/0x120
[] alloc_file+0x33/0x1b0
[] alloc_file_pseudo+0xb2/0x140
[] create_pipe_files+0x138/0x2e0
[] umd_setup+0x33/0x220
[] call_usermodehelper_exec_async+0xb4/0x1b0
[] ret_from_fork+0x1f/0x30

after the UMD process exits, the pipe_to_umh/pipe_from_umh and tgid
need to be release.

Fixes: d71fa5c9763c ("bpf: Add kernel module with user mode driver that 
populates bpffs.")
Reported-by: syzbot+44908bb56d2bfe56b...@syzkaller.appspotmail.com
Signed-off-by: Zqiang 
---
 v1->v2:
 Judge whether the pointer variable tgid is valid.
 v2->v3:
 Add common umd_cleanup_helper() and exported as
 symbol which the driver here can use.

 include/linux/usermode_driver.h   |  1 +
 kernel/bpf/preload/bpf_preload_kern.c | 15 +++
 kernel/usermode_driver.c  | 18 ++
 3 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/include/linux/usermode_driver.h b/include/linux/usermode_driver.h
index 073a9e0ec07d..ad970416260d 100644
--- a/include/linux/usermode_driver.h
+++ b/include/linux/usermode_driver.h
@@ -14,5 +14,6 @@ struct umd_info {
 int umd_load_blob(struct umd_info *info, const void *data, size_t len);
 int umd_unload_blob(struct umd_info *info);
 int fork_usermode_driver(struct umd_info *info);
+void umd_cleanup_helper(struct umd_info *info);
 
 #endif /* __LINUX_USERMODE_DRIVER_H__ */
diff --git a/kernel/bpf/preload/bpf_preload_kern.c 
b/kernel/bpf/preload/bpf_preload_kern.c
index 79c5772465f1..356c4ca4f530 100644
--- a/kernel/bpf/preload/bpf_preload_kern.c
+++ b/kernel/bpf/preload/bpf_preload_kern.c
@@ -61,8 +61,10 @@ static int finish(void)
if (n != sizeof(magic))
return -EPIPE;
tgid = umd_ops.info.tgid;
-   wait_event(tgid->wait_pidfd, thread_group_exited(tgid));
-   umd_ops.info.tgid = NULL;
+   if (tgid) {
+   wait_event(tgid->wait_pidfd, thread_group_exited(tgid));
+   umd_cleanup_helper(_ops.info);
+   }
return 0;
 }
 
@@ -80,10 +82,15 @@ static int __init load_umd(void)
 
 static void __exit fini_umd(void)
 {
+   struct pid *tgid;
bpf_preload_ops = NULL;
/* kill UMD in case it's still there due to earlier error */
-   kill_pid(umd_ops.info.tgid, SIGKILL, 1);
-   umd_ops.info.tgid = NULL;
+   tgid = umd_ops.info.tgid;
+   if (tgid) {
+   kill_pid(tgid, SIGKILL, 1);
+   wait_event(tgid->wait_pidfd, thread_group_exited(tgid));
+   umd_cleanup_helper(_ops.info);
+   }
umd_unload_blob(_ops.info);
 }
 late_initcall(load_umd);
diff --git a/kernel/usermode_driver.c b/kernel/usermode_driver.c
index 0b35212ffc3d..6372deae27a0 100644
--- a/kernel/usermode_driver.c
+++ b/kernel/usermode_driver.c
@@ -140,13 +140,23 @@ static void umd_cleanup(struct subprocess_info *info)
 
/* cleanup if umh_setup() was successful but exec failed */
if (info->retval) {
-   fput(umd_info->pipe_to_umh);
-   fput(umd_info->pipe_from_umh);
-   put_pid(umd_info->tgid);
-   umd_info->tgid = NULL;
+   umd_cleanup_helper(umd_info);
}
 }
 
+/**
+ * umd_cleanup_helper - release the resources which allocated in umd_setup
+ * @info: information about usermode driver
+ */
+void umd_cleanup_helper(struct umd_info *info)
+{
+   fput(info->pipe_to_umh);
+   fput(info->pipe_from_umh);
+   put_pid(info->tgid);
+   info->tgid = NULL;
+}
+EXPORT_SYMBOL_GPL(umd_cleanup_helper);
+
 /**
  * fork_usermode_driver - fork a usermode driver
  * @info: information about usermode driver (shouldn't be NULL)
-- 
2.17.1



[PATCH v2] bpf: Fix memory leak in copy_process()

2021-03-15 Thread qiang . zhang
From: Zqiang 

The syzbot report a memleak follow:
BUG: memory leak
unreferenced object 0x888101b41d00 (size 120):
  comm "kworker/u4:0", pid 8, jiffies 4294944270 (age 12.780s)
  backtrace:
[] alloc_pid+0x66/0x560
[] copy_process+0x1465/0x25e0
[] kernel_clone+0xf3/0x670
[] kernel_thread+0x61/0x80
[] call_usermodehelper_exec_work
[] call_usermodehelper_exec_work+0xc4/0x120
[] process_one_work+0x2c9/0x600
[] worker_thread+0x59/0x5d0
[] kthread+0x178/0x1b0
[] ret_from_fork+0x1f/0x30

unreferenced object 0x888110ef5c00 (size 232):
  comm "kworker/u4:0", pid 8414, jiffies 4294944270 (age 12.780s)
  backtrace:
[] kmem_cache_zalloc
[] __alloc_file+0x1f/0xf0
[] alloc_empty_file+0x69/0x120
[] alloc_file+0x33/0x1b0
[] alloc_file_pseudo+0xb2/0x140
[] create_pipe_files+0x138/0x2e0
[] umd_setup+0x33/0x220
[] call_usermodehelper_exec_async+0xb4/0x1b0
[] ret_from_fork+0x1f/0x30

after the UMD process exits, the pipe_to_umh/pipe_from_umh and tgid
need to be release.

Fixes: d71fa5c9763c ("bpf: Add kernel module with user mode driver that 
populates bpffs.")
Reported-by: syzbot+44908bb56d2bfe56b...@syzkaller.appspotmail.com
Signed-off-by: Zqiang 
---
 v1->v2:
 Judge whether the pointer variable tgid is valid.

 kernel/bpf/preload/bpf_preload_kern.c | 24 
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/preload/bpf_preload_kern.c 
b/kernel/bpf/preload/bpf_preload_kern.c
index 79c5772465f1..5009875f01d3 100644
--- a/kernel/bpf/preload/bpf_preload_kern.c
+++ b/kernel/bpf/preload/bpf_preload_kern.c
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "bpf_preload.h"
 
@@ -20,6 +21,14 @@ static struct bpf_preload_ops umd_ops = {
.owner = THIS_MODULE,
 };
 
+static void bpf_preload_umh_cleanup(struct umd_info *info)
+{
+   fput(info->pipe_to_umh);
+   fput(info->pipe_from_umh);
+   put_pid(info->tgid);
+   info->tgid = NULL;
+}
+
 static int preload(struct bpf_preload_info *obj)
 {
int magic = BPF_PRELOAD_START;
@@ -61,8 +70,10 @@ static int finish(void)
if (n != sizeof(magic))
return -EPIPE;
tgid = umd_ops.info.tgid;
-   wait_event(tgid->wait_pidfd, thread_group_exited(tgid));
-   umd_ops.info.tgid = NULL;
+   if (tgid) {
+   wait_event(tgid->wait_pidfd, thread_group_exited(tgid));
+   bpf_preload_umh_cleanup(_ops.info);
+   }
return 0;
 }
 
@@ -80,10 +91,15 @@ static int __init load_umd(void)
 
 static void __exit fini_umd(void)
 {
+   struct pid *tgid;
bpf_preload_ops = NULL;
/* kill UMD in case it's still there due to earlier error */
-   kill_pid(umd_ops.info.tgid, SIGKILL, 1);
-   umd_ops.info.tgid = NULL;
+   tgid = umd_ops.info.tgid;
+   if (tgid) {
+   kill_pid(tgid, SIGKILL, 1);
+   wait_event(tgid->wait_pidfd, thread_group_exited(tgid));
+   bpf_preload_umh_cleanup(_ops.info);
+   }
umd_unload_blob(_ops.info);
 }
 late_initcall(load_umd);
-- 
2.17.1



[PATCH v2] bpf: Fix memory leak in copy_process()

2021-03-15 Thread qiang . zhang
From: Zqiang 

The syzbot report a memleak follow:
BUG: memory leak
unreferenced object 0x888101b41d00 (size 120):
  comm "kworker/u4:0", pid 8, jiffies 4294944270 (age 12.780s)
  backtrace:
[] alloc_pid+0x66/0x560
[] copy_process+0x1465/0x25e0
[] kernel_clone+0xf3/0x670
[] kernel_thread+0x61/0x80
[] call_usermodehelper_exec_work
[] call_usermodehelper_exec_work+0xc4/0x120
[] process_one_work+0x2c9/0x600
[] worker_thread+0x59/0x5d0
[] kthread+0x178/0x1b0
[] ret_from_fork+0x1f/0x30

unreferenced object 0x888110ef5c00 (size 232):
  comm "kworker/u4:0", pid 8414, jiffies 4294944270 (age 12.780s)
  backtrace:
[] kmem_cache_zalloc
[] __alloc_file+0x1f/0xf0
[] alloc_empty_file+0x69/0x120
[] alloc_file+0x33/0x1b0
[] alloc_file_pseudo+0xb2/0x140
[] create_pipe_files+0x138/0x2e0
[] umd_setup+0x33/0x220
[] call_usermodehelper_exec_async+0xb4/0x1b0
[] ret_from_fork+0x1f/0x30

after the UMD process exits, the pipe_to_umh/pipe_from_umh and tgid
need to be release.

Fixes: d71fa5c9763c ("bpf: Add kernel module with user mode driver that 
populates bpffs.")
Reported-by: syzbot+44908bb56d2bfe56b...@syzkaller.appspotmail.com
Signed-off-by: Zqiang 
---
 v1->v2:
 Judge whether the pointer variable tgid is valid.

 kernel/bpf/preload/bpf_preload_kern.c | 24 
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/preload/bpf_preload_kern.c 
b/kernel/bpf/preload/bpf_preload_kern.c
index 79c5772465f1..5009875f01d3 100644
--- a/kernel/bpf/preload/bpf_preload_kern.c
+++ b/kernel/bpf/preload/bpf_preload_kern.c
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "bpf_preload.h"
 
@@ -20,6 +21,14 @@ static struct bpf_preload_ops umd_ops = {
.owner = THIS_MODULE,
 };
 
+static void bpf_preload_umh_cleanup(struct umd_info *info)
+{
+   fput(info->pipe_to_umh);
+   fput(info->pipe_from_umh);
+   put_pid(info->tgid);
+   info->tgid = NULL;
+}
+
 static int preload(struct bpf_preload_info *obj)
 {
int magic = BPF_PRELOAD_START;
@@ -61,8 +70,10 @@ static int finish(void)
if (n != sizeof(magic))
return -EPIPE;
tgid = umd_ops.info.tgid;
-   wait_event(tgid->wait_pidfd, thread_group_exited(tgid));
-   umd_ops.info.tgid = NULL;
+   if (tgid) {
+   wait_event(tgid->wait_pidfd, thread_group_exited(tgid));
+   bpf_preload_umh_cleanup(_ops.info);
+   }
return 0;
 }
 
@@ -80,10 +91,15 @@ static int __init load_umd(void)
 
 static void __exit fini_umd(void)
 {
+   struct pid *tgid;
bpf_preload_ops = NULL;
/* kill UMD in case it's still there due to earlier error */
-   kill_pid(umd_ops.info.tgid, SIGKILL, 1);
-   umd_ops.info.tgid = NULL;
+   tgid = umd_ops.info.tgid;
+   if (tgid) {
+   kill_pid(tgid, SIGKILL, 1);
+   wait_event(tgid->wait_pidfd, thread_group_exited(tgid));
+   bpf_preload_umh_cleanup(_ops.info);
+   }
umd_unload_blob(_ops.info);
 }
 late_initcall(load_umd);
-- 
2.17.1



[PATCH] bpf: Fix memory leak in copy_process()

2021-03-15 Thread qiang . zhang
From: Zqiang 

The syzbot report a memleak follow:
BUG: memory leak
unreferenced object 0x888101b41d00 (size 120):
  comm "kworker/u4:0", pid 8, jiffies 4294944270 (age 12.780s)
  backtrace:
[] alloc_pid+0x66/0x560
[] copy_process+0x1465/0x25e0
[] kernel_clone+0xf3/0x670
[] kernel_thread+0x61/0x80
[] call_usermodehelper_exec_work
[] call_usermodehelper_exec_work+0xc4/0x120
[] process_one_work+0x2c9/0x600
[] worker_thread+0x59/0x5d0
[] kthread+0x178/0x1b0
[] ret_from_fork+0x1f/0x30

unreferenced object 0x888110ef5c00 (size 232):
  comm "kworker/u4:0", pid 8414, jiffies 4294944270 (age 12.780s)
  backtrace:
[] kmem_cache_zalloc
[] __alloc_file+0x1f/0xf0
[] alloc_empty_file+0x69/0x120
[] alloc_file+0x33/0x1b0
[] alloc_file_pseudo+0xb2/0x140
[] create_pipe_files+0x138/0x2e0
[] umd_setup+0x33/0x220
[] call_usermodehelper_exec_async+0xb4/0x1b0
[] ret_from_fork+0x1f/0x30

after the UMD process exits, the pipe_to_umh/pipe_from_umh and tgid
need to be release.

Fixes: d71fa5c9763c ("bpf: Add kernel module with user mode driver that 
populates bpffs.")
Reported-by: syzbot+44908bb56d2bfe56b...@syzkaller.appspotmail.com
Signed-off-by: Zqiang 
---
 kernel/bpf/preload/bpf_preload_kern.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/preload/bpf_preload_kern.c 
b/kernel/bpf/preload/bpf_preload_kern.c
index 79c5772465f1..5a6226f3243f 100644
--- a/kernel/bpf/preload/bpf_preload_kern.c
+++ b/kernel/bpf/preload/bpf_preload_kern.c
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "bpf_preload.h"
 
@@ -20,6 +21,14 @@ static struct bpf_preload_ops umd_ops = {
.owner = THIS_MODULE,
 };
 
+static void bpf_preload_umh_cleanup(struct umd_info *info)
+{
+   fput(info->pipe_to_umh);
+   fput(info->pipe_from_umh);
+   put_pid(info->tgid);
+   info->tgid = NULL;
+}
+
 static int preload(struct bpf_preload_info *obj)
 {
int magic = BPF_PRELOAD_START;
@@ -62,7 +71,7 @@ static int finish(void)
return -EPIPE;
tgid = umd_ops.info.tgid;
wait_event(tgid->wait_pidfd, thread_group_exited(tgid));
-   umd_ops.info.tgid = NULL;
+   bpf_preload_umh_cleanup(_ops.info);
return 0;
 }
 
@@ -80,10 +89,13 @@ static int __init load_umd(void)
 
 static void __exit fini_umd(void)
 {
+   struct pid *tgid;
bpf_preload_ops = NULL;
/* kill UMD in case it's still there due to earlier error */
-   kill_pid(umd_ops.info.tgid, SIGKILL, 1);
-   umd_ops.info.tgid = NULL;
+   tgid = umd_ops.info.tgid;
+   kill_pid(tgid, SIGKILL, 1);
+   wait_event(tgid->wait_pidfd, thread_group_exited(tgid));
+   bpf_preload_umh_cleanup(_ops.info);
umd_unload_blob(_ops.info);
 }
 late_initcall(load_umd);
-- 
2.17.1



[PATCH] ARM: Fix incorrect use of smp_processor_id() by syzbot report

2021-03-11 Thread qiang . zhang
From: Zqiang 

BUG: using smp_processor_id() in preemptible [] code:
syz-executor.0/15841
caller is debug_smp_processor_id+0x20/0x24
lib/smp_processor_id.c:64

The smp_processor_id() is used in a code segment when
preemption has been disabled, otherwise, when preemption
is enabled this pointer is usually no longer useful
since it may no longer point to per cpu data of the
current processor.

Reported-by: syzbot 
Fixes: f5fe12b1eaee ("ARM: spectre-v2: harden user aborts in kernel space")
Signed-off-by: Zqiang 
---
 arch/arm/include/asm/system_misc.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm/include/asm/system_misc.h 
b/arch/arm/include/asm/system_misc.h
index 66f6a3ae68d2..61916dc7d361 100644
--- a/arch/arm/include/asm/system_misc.h
+++ b/arch/arm/include/asm/system_misc.h
@@ -21,8 +21,10 @@ typedef void (*harden_branch_predictor_fn_t)(void);
 DECLARE_PER_CPU(harden_branch_predictor_fn_t, harden_branch_predictor_fn);
 static inline void harden_branch_predictor(void)
 {
+   preempt_disable();
harden_branch_predictor_fn_t fn = per_cpu(harden_branch_predictor_fn,
  smp_processor_id());
+   preempt_enable();
if (fn)
fn();
 }
-- 
2.29.2



[PATCH v2] workqueue: Move the position of debug_work_activate() in __queue_work()

2021-02-17 Thread qiang . zhang
From: Zqiang 

The debug_work_activate() is called on the premise that
the work can be inserted, because if wq be in WQ_DRAINING
status, insert work may be failed.

Fixes: e41e704bc4f4 ("workqueue: improve destroy_workqueue() debuggability")
Signed-off-by: Zqiang 
Reviewed-by: Lai Jiangshan 
---
 v1->v2:
 add Fixes tag.

 kernel/workqueue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 0d150da252e8..21fb00b52def 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1412,7 +1412,6 @@ static void __queue_work(int cpu, struct workqueue_struct 
*wq,
 */
lockdep_assert_irqs_disabled();
 
-   debug_work_activate(work);
 
/* if draining, only works from the same workqueue are allowed */
if (unlikely(wq->flags & __WQ_DRAINING) &&
@@ -1494,6 +1493,7 @@ static void __queue_work(int cpu, struct workqueue_struct 
*wq,
worklist = >delayed_works;
}
 
+   debug_work_activate(work);
insert_work(pwq, work, worklist, work_flags);
 
 out:
-- 
2.25.1



[PATCH] workqueue: Remove rcu_read_lock/unlock() in workqueue_congested()

2021-02-17 Thread qiang . zhang
From: Zqiang 

The RCU read critical area already by preempt_disable/enable()
(equivalent to rcu_read_lock_sched/unlock_sched()) mark, so remove
rcu_read_lock/unlock().

Signed-off-by: Zqiang 
---
 kernel/workqueue.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 0d150da252e8..c599835ad6c3 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4540,7 +4540,6 @@ bool workqueue_congested(int cpu, struct workqueue_struct 
*wq)
struct pool_workqueue *pwq;
bool ret;
 
-   rcu_read_lock();
preempt_disable();
 
if (cpu == WORK_CPU_UNBOUND)
@@ -4553,7 +4552,6 @@ bool workqueue_congested(int cpu, struct workqueue_struct 
*wq)
 
ret = !list_empty(>delayed_works);
preempt_enable();
-   rcu_read_unlock();
 
return ret;
 }
-- 
2.25.1



[PATCH v5] kvfree_rcu: Release page cache under memory pressure

2021-02-11 Thread qiang . zhang
From: Zqiang 

Add free per-cpu existing krcp's page cache operation in shrink callback
function, and also during shrink period, simple delay schedule fill page
work, to avoid refill page while free krcp page cache.

Signed-off-by: Zqiang 
Co-developed-by: Uladzislau Rezki (Sony) 
---
 v1->v4:
 During the test a page shrinker is pretty active, because of low memory
 condition. callback drains it whereas kvfree_rcu() part refill it right
 away making kind of vicious circle.
 Through Vlad Rezki suggestion, to avoid this, schedule a periodic delayed
 work with HZ, and it's easy to do that.
 v4->v5:
 change commit message and use xchg replace WRITE_ONCE()

 kernel/rcu/tree.c | 49 +++
 1 file changed, 41 insertions(+), 8 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c1ae1e52f638..f1fba23f5036 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3139,7 +3139,7 @@ struct kfree_rcu_cpu {
bool initialized;
int count;
 
-   struct work_struct page_cache_work;
+   struct delayed_work page_cache_work;
atomic_t work_in_progress;
struct hrtimer hrtimer;
 
@@ -3395,7 +3395,7 @@ schedule_page_work_fn(struct hrtimer *t)
struct kfree_rcu_cpu *krcp =
container_of(t, struct kfree_rcu_cpu, hrtimer);
 
-   queue_work(system_highpri_wq, >page_cache_work);
+   queue_delayed_work(system_highpri_wq, >page_cache_work, 0);
return HRTIMER_NORESTART;
 }
 
@@ -3404,7 +3404,7 @@ static void fill_page_cache_func(struct work_struct *work)
struct kvfree_rcu_bulk_data *bnode;
struct kfree_rcu_cpu *krcp =
container_of(work, struct kfree_rcu_cpu,
-   page_cache_work);
+   page_cache_work.work);
unsigned long flags;
bool pushed;
int i;
@@ -3428,15 +3428,21 @@ static void fill_page_cache_func(struct work_struct 
*work)
atomic_set(>work_in_progress, 0);
 }
 
+static atomic_t backoff_page_cache_fill = ATOMIC_INIT(0);
+
 static void
 run_page_cache_worker(struct kfree_rcu_cpu *krcp)
 {
if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
!atomic_xchg(>work_in_progress, 1)) {
-   hrtimer_init(>hrtimer, CLOCK_MONOTONIC,
-   HRTIMER_MODE_REL);
-   krcp->hrtimer.function = schedule_page_work_fn;
-   hrtimer_start(>hrtimer, 0, HRTIMER_MODE_REL);
+   if (atomic_xchg(_page_cache_fill, 0)) {
+   queue_delayed_work(system_highpri_wq, 
>page_cache_work, HZ);
+   } else {
+   hrtimer_init(>hrtimer, CLOCK_MONOTONIC,
+   HRTIMER_MODE_REL);
+   krcp->hrtimer.function = schedule_page_work_fn;
+   hrtimer_start(>hrtimer, 0, HRTIMER_MODE_REL);
+   }
}
 }
 
@@ -3571,19 +3577,44 @@ void kvfree_call_rcu(struct rcu_head *head, 
rcu_callback_t func)
 }
 EXPORT_SYMBOL_GPL(kvfree_call_rcu);
 
+static int free_krc_page_cache(struct kfree_rcu_cpu *krcp)
+{
+   unsigned long flags;
+   struct llist_node *page_list, *pos, *n;
+   int freed = 0;
+
+   raw_spin_lock_irqsave(>lock, flags);
+   page_list = llist_del_all(>bkvcache);
+   krcp->nr_bkv_objs = 0;
+   raw_spin_unlock_irqrestore(>lock, flags);
+
+   llist_for_each_safe(pos, n, page_list) {
+   free_page((unsigned long)pos);
+   freed++;
+   }
+
+   return freed;
+}
+
 static unsigned long
 kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
 {
int cpu;
unsigned long count = 0;
+   unsigned long flags;
 
/* Snapshot count of all CPUs */
for_each_possible_cpu(cpu) {
struct kfree_rcu_cpu *krcp = per_cpu_ptr(, cpu);
 
count += READ_ONCE(krcp->count);
+
+   raw_spin_lock_irqsave(>lock, flags);
+   count += krcp->nr_bkv_objs;
+   raw_spin_unlock_irqrestore(>lock, flags);
}
 
+   atomic_set(_page_cache_fill, 1);
return count;
 }
 
@@ -3598,6 +3629,8 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct 
shrink_control *sc)
struct kfree_rcu_cpu *krcp = per_cpu_ptr(, cpu);
 
count = krcp->count;
+   count += free_krc_page_cache(krcp);
+
raw_spin_lock_irqsave(>lock, flags);
if (krcp->monitor_todo)
kfree_rcu_drain_unlock(krcp, flags);
@@ -4574,7 +4607,7 @@ static void __init kfree_rcu_batch_init(void)
}
 
INIT_DELAYED_WORK(>monitor_work, kfree_rcu_monitor);
-   INIT_WORK(>page_cache_work, fill_page_cache_func);
+   INIT_DELAYED_WORK(>page_cache_work, fill_page_cache_func);
krcp->initialized = true;
}
if (register_shrinker(_rcu_shrinker))
-- 

[PATCH] workqueue: Move the position of debug_work_activate() in __queue_work()

2021-02-11 Thread qiang . zhang
From: Zqiang 

The debug_work_activate() is called on the premise that
the work can be inserted, because if wq be in WQ_DRAINING
status, insert work may be failed.

Signed-off-by: Zqiang 
---
 kernel/workqueue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 0d150da252e8..21fb00b52def 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1412,7 +1412,6 @@ static void __queue_work(int cpu, struct workqueue_struct 
*wq,
 */
lockdep_assert_irqs_disabled();
 
-   debug_work_activate(work);
 
/* if draining, only works from the same workqueue are allowed */
if (unlikely(wq->flags & __WQ_DRAINING) &&
@@ -1494,6 +1493,7 @@ static void __queue_work(int cpu, struct workqueue_struct 
*wq,
worklist = >delayed_works;
}
 
+   debug_work_activate(work);
insert_work(pwq, work, worklist, work_flags);
 
 out:
-- 
2.25.1



[PATCH v4] kvfree_rcu: Release page cache under memory pressure

2021-02-07 Thread qiang . zhang
From: Zqiang 

Add free per-cpu existing krcp's page cache operation, when
the system is under memory pressure.

Signed-off-by: Zqiang 
Co-developed-by: Uladzislau Rezki (Sony) 
---
 v1->v2->v3->v4:
 During the test a page shrinker is pretty active, because of low memory
 condition. callback drains it whereas kvfree_rcu() part refill it right
 away making kind of vicious circle.
 Through Vlad Rezki suggestion, to avoid this, schedule a periodic delayed
 work with HZ, and it's easy to do that.

 kernel/rcu/tree.c | 50 +++
 1 file changed, 42 insertions(+), 8 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c1ae1e52f638..f3b772eef468 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3139,7 +3139,7 @@ struct kfree_rcu_cpu {
bool initialized;
int count;
 
-   struct work_struct page_cache_work;
+   struct delayed_work page_cache_work;
atomic_t work_in_progress;
struct hrtimer hrtimer;
 
@@ -3395,7 +3395,7 @@ schedule_page_work_fn(struct hrtimer *t)
struct kfree_rcu_cpu *krcp =
container_of(t, struct kfree_rcu_cpu, hrtimer);
 
-   queue_work(system_highpri_wq, >page_cache_work);
+   queue_delayed_work(system_highpri_wq, >page_cache_work, 0);
return HRTIMER_NORESTART;
 }
 
@@ -3404,7 +3404,7 @@ static void fill_page_cache_func(struct work_struct *work)
struct kvfree_rcu_bulk_data *bnode;
struct kfree_rcu_cpu *krcp =
container_of(work, struct kfree_rcu_cpu,
-   page_cache_work);
+   page_cache_work.work);
unsigned long flags;
bool pushed;
int i;
@@ -3428,15 +3428,22 @@ static void fill_page_cache_func(struct work_struct 
*work)
atomic_set(>work_in_progress, 0);
 }
 
+static bool backoff_page_cache_fill;
+
 static void
 run_page_cache_worker(struct kfree_rcu_cpu *krcp)
 {
if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
!atomic_xchg(>work_in_progress, 1)) {
-   hrtimer_init(>hrtimer, CLOCK_MONOTONIC,
-   HRTIMER_MODE_REL);
-   krcp->hrtimer.function = schedule_page_work_fn;
-   hrtimer_start(>hrtimer, 0, HRTIMER_MODE_REL);
+   if (READ_ONCE(backoff_page_cache_fill)) {
+   queue_delayed_work(system_highpri_wq, 
>page_cache_work, HZ);
+   WRITE_ONCE(backoff_page_cache_fill, false);
+   } else {
+   hrtimer_init(>hrtimer, CLOCK_MONOTONIC,
+   HRTIMER_MODE_REL);
+   krcp->hrtimer.function = schedule_page_work_fn;
+   hrtimer_start(>hrtimer, 0, HRTIMER_MODE_REL);
+   }
}
 }
 
@@ -3571,19 +3578,44 @@ void kvfree_call_rcu(struct rcu_head *head, 
rcu_callback_t func)
 }
 EXPORT_SYMBOL_GPL(kvfree_call_rcu);
 
+static int free_krc_page_cache(struct kfree_rcu_cpu *krcp)
+{
+   unsigned long flags;
+   struct llist_node *page_list, *pos, *n;
+   int freed = 0;
+
+   raw_spin_lock_irqsave(>lock, flags);
+   page_list = llist_del_all(>bkvcache);
+   krcp->nr_bkv_objs = 0;
+   raw_spin_unlock_irqrestore(>lock, flags);
+
+   llist_for_each_safe(pos, n, page_list) {
+   free_page((unsigned long)pos);
+   freed++;
+   }
+
+   return freed;
+}
+
 static unsigned long
 kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
 {
int cpu;
unsigned long count = 0;
+   unsigned long flags;
 
/* Snapshot count of all CPUs */
for_each_possible_cpu(cpu) {
struct kfree_rcu_cpu *krcp = per_cpu_ptr(, cpu);
 
count += READ_ONCE(krcp->count);
+
+   raw_spin_lock_irqsave(>lock, flags);
+   count += krcp->nr_bkv_objs;
+   raw_spin_unlock_irqrestore(>lock, flags);
}
 
+   WRITE_ONCE(backoff_page_cache_fill, true);
return count;
 }
 
@@ -3598,6 +3630,8 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct 
shrink_control *sc)
struct kfree_rcu_cpu *krcp = per_cpu_ptr(, cpu);
 
count = krcp->count;
+   count += free_krc_page_cache(krcp);
+
raw_spin_lock_irqsave(>lock, flags);
if (krcp->monitor_todo)
kfree_rcu_drain_unlock(krcp, flags);
@@ -4574,7 +4608,7 @@ static void __init kfree_rcu_batch_init(void)
}
 
INIT_DELAYED_WORK(>monitor_work, kfree_rcu_monitor);
-   INIT_WORK(>page_cache_work, fill_page_cache_func);
+   INIT_DELAYED_WORK(>page_cache_work, fill_page_cache_func);
krcp->initialized = true;
}
if (register_shrinker(_rcu_shrinker))
-- 
2.17.1



[PATCH] uprobes: Fix kasan UAF reported by syzbot

2021-02-02 Thread qiang . zhang
From: Zqiang 

Call Trace:
 __dump_stack [inline]
 dump_stack+0x107/0x163
 print_address_description.constprop.0.cold+0x5b/0x2f8
 __kasan_report [inline]
 kasan_report.cold+0x7c/0xd8
 uprobe_cmp [inline]
 __uprobe_cmp [inline]
 rb_find_add [inline]
 __insert_uprobe [inline]
 insert_uprobe [inline]
 alloc_uprobe [inline]
 __uprobe_register+0x70f/0x850
 ..
 __do_sys_perf_event_open+0x647/0x2e60
 do_syscall_64+0x2d/0x70
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Allocated by task 12710:
 kzalloc [inline]
 alloc_uprobe [inline]
 __uprobe_register+0x19c/0x850
 trace_uprobe_enable [inline]
 trace_uprobe_register+0x443/0x880
 ...
 __do_sys_perf_event_open+0x647/0x2e60
 do_syscall_64+0x2d/0x70
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Freed by task 12710:
 kfree+0xe5/0x7b0
 put_uprobe [inline]
 put_uprobe+0x13b/0x190
 uprobe_apply+0xfc/0x130
 uprobe_perf_open [inline]
 trace_uprobe_register+0x5c9/0x880
 ...
 __do_sys_perf_event_open+0x647/0x2e60
 do_syscall_64+0x2d/0x70
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

 fix the count of references lost in __find_uprobe function

Fixes: c6bc9bd06dff ("rbtree, uprobes: Use rbtree helpers")
Reported-by: syzbot+1182ffb2063c5d087...@syzkaller.appspotmail.com
Signed-off-by: Zqiang 
---
 kernel/events/uprobes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 7e15b2efdd87..6addc9780319 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -661,7 +661,7 @@ static struct uprobe *__find_uprobe(struct inode *inode, 
loff_t offset)
struct rb_node *node = rb_find(, _tree, __uprobe_cmp_key);
 
if (node)
-   return __node_2_uprobe(node);
+   return get_uprobe(__node_2_uprobe(node));
 
return NULL;
 }
-- 
2.17.1



[PATCH v3] kvfree_rcu: Release page cache under memory pressure

2021-01-30 Thread qiang . zhang
From: Zqiang 

Add free per-cpu existing krcp's page cache operation, when
the system is under memory pressure.

Signed-off-by: Zqiang 
---
 kernel/rcu/tree.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c1ae1e52f638..644b0f3c7b9f 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3571,17 +3571,41 @@ void kvfree_call_rcu(struct rcu_head *head, 
rcu_callback_t func)
 }
 EXPORT_SYMBOL_GPL(kvfree_call_rcu);
 
+static int free_krc_page_cache(struct kfree_rcu_cpu *krcp)
+{
+   unsigned long flags;
+   struct llist_node *page_list, *pos, *n;
+   int freed = 0;
+
+   raw_spin_lock_irqsave(>lock, flags);
+   page_list = llist_del_all(>bkvcache);
+   krcp->nr_bkv_objs = 0;
+   raw_spin_unlock_irqrestore(>lock, flags);
+
+   llist_for_each_safe(pos, n, page_list) {
+   free_page((unsigned long)pos);
+   freed++;
+   }
+
+   return freed;
+}
+
 static unsigned long
 kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
 {
int cpu;
unsigned long count = 0;
+   unsigned long flags;
 
/* Snapshot count of all CPUs */
for_each_possible_cpu(cpu) {
struct kfree_rcu_cpu *krcp = per_cpu_ptr(, cpu);
 
count += READ_ONCE(krcp->count);
+
+   raw_spin_lock_irqsave(>lock, flags);
+   count += krcp->nr_bkv_objs;
+   raw_spin_unlock_irqrestore(>lock, flags);
}
 
return count;
@@ -3598,6 +3622,8 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct 
shrink_control *sc)
struct kfree_rcu_cpu *krcp = per_cpu_ptr(, cpu);
 
count = krcp->count;
+   count += free_krc_page_cache(krcp);
+
raw_spin_lock_irqsave(>lock, flags);
if (krcp->monitor_todo)
kfree_rcu_drain_unlock(krcp, flags);
-- 
2.17.1



[PATCH v2] kvfree_rcu: Release page cache under memory pressure

2021-01-28 Thread qiang . zhang
From: Zqiang 

Add free per-cpu existing krcp's page cache operation, when
the system is under memory pressure.

Signed-off-by: Zqiang 
---
 kernel/rcu/tree.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c1ae1e52f638..ec098910d80b 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3571,17 +3571,40 @@ void kvfree_call_rcu(struct rcu_head *head, 
rcu_callback_t func)
 }
 EXPORT_SYMBOL_GPL(kvfree_call_rcu);
 
+static int free_krc_page_cache(struct kfree_rcu_cpu *krcp)
+{
+   unsigned long flags;
+   struct kvfree_rcu_bulk_data *bnode;
+   int i;
+
+   for (i = 0; i < rcu_min_cached_objs; i++) {
+   raw_spin_lock_irqsave(>lock, flags);
+   bnode = get_cached_bnode(krcp);
+   raw_spin_unlock_irqrestore(>lock, flags);
+   if (!bnode)
+   break;
+   free_page((unsigned long)bnode);
+   }
+
+   return i;
+}
+
 static unsigned long
 kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
 {
int cpu;
unsigned long count = 0;
+   unsigned long flags;
 
/* Snapshot count of all CPUs */
for_each_possible_cpu(cpu) {
struct kfree_rcu_cpu *krcp = per_cpu_ptr(, cpu);
 
count += READ_ONCE(krcp->count);
+
+   raw_spin_lock_irqsave(>lock, flags);
+   count += krcp->nr_bkv_objs;
+   raw_spin_unlock_irqrestore(>lock, flags);
}
 
return count;
@@ -3598,6 +3621,8 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct 
shrink_control *sc)
struct kfree_rcu_cpu *krcp = per_cpu_ptr(, cpu);
 
count = krcp->count;
+   count += free_krc_page_cache(krcp);
+
raw_spin_lock_irqsave(>lock, flags);
if (krcp->monitor_todo)
kfree_rcu_drain_unlock(krcp, flags);
-- 
2.17.1



[PATCH] kvfree_rcu: Release page cache under memory pressure

2021-01-28 Thread qiang . zhang
From: Zqiang 

Add free per-cpu existing krcp's page cache operation, when
the system is under memory pressure.

Signed-off-by: Zqiang 
---
 kernel/rcu/tree.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c1ae1e52f638..4e1c14b12bdd 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3571,17 +3571,41 @@ void kvfree_call_rcu(struct rcu_head *head, 
rcu_callback_t func)
 }
 EXPORT_SYMBOL_GPL(kvfree_call_rcu);
 
+static inline int free_krc_page_cache(struct kfree_rcu_cpu *krcp)
+{
+   unsigned long flags;
+   struct kvfree_rcu_bulk_data *bnode;
+   int i, num = 0;
+
+   for (i = 0; i < rcu_min_cached_objs; i++) {
+   raw_spin_lock_irqsave(>lock, flags);
+   bnode = get_cached_bnode(krcp);
+   raw_spin_unlock_irqrestore(>lock, flags);
+   if (!bnode)
+   break;
+   free_page((unsigned long)bnode);
+   num++;
+   }
+
+   return num;
+}
+
 static unsigned long
 kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
 {
int cpu;
unsigned long count = 0;
+   unsigned long flags;
 
/* Snapshot count of all CPUs */
for_each_possible_cpu(cpu) {
struct kfree_rcu_cpu *krcp = per_cpu_ptr(, cpu);
 
count += READ_ONCE(krcp->count);
+
+   raw_spin_lock_irqsave(>lock, flags);
+   count += krcp->nr_bkv_objs;
+   raw_spin_unlock_irqrestore(>lock, flags);
}
 
return count;
@@ -3604,6 +3628,8 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct 
shrink_control *sc)
else
raw_spin_unlock_irqrestore(>lock, flags);
 
+   count += free_krc_page_cache(krcp);
+
sc->nr_to_scan -= count;
freed += count;
 
-- 
2.17.1



[PATCH] sched/core: add rcu_read_lock/unlock() protection

2021-01-26 Thread qiang . zhang
From: Zqiang 

Due to for_each_process_thread belongs to RCU read operation,
need to add rcu_read_lock/unlock() protection.

Signed-off-by: Zqiang 
---
 kernel/sched/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8c5481077c9c..c3f0103fdf53 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7738,6 +7738,7 @@ static void dump_rq_tasks(struct rq *rq, const char 
*loglvl)
lockdep_assert_held(>lock);
 
printk("%sCPU%d enqueued tasks (%u total):\n", loglvl, cpu, 
rq->nr_running);
+   rcu_read_lock();
for_each_process_thread(g, p) {
if (task_cpu(p) != cpu)
continue;
@@ -7747,6 +7748,7 @@ static void dump_rq_tasks(struct rq *rq, const char 
*loglvl)
 
printk("%s\tpid: %d, name: %s\n", loglvl, p->pid, p->comm);
}
+   rcu_read_unlock();
 }
 
 int sched_cpu_dying(unsigned int cpu)
-- 
2.17.1



[PATCH] PM: remove PF_WQ_WORKER mask

2021-01-24 Thread qiang . zhang
From: Zqiang 

Due to kworker also is kernel thread, it's already included
PF_KTHREAD mask, so remove PF_WQ_WORKER mask.

Signed-off-by: Zqiang 
---
 kernel/power/process.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/power/process.c b/kernel/power/process.c
index 45b054b7b5ec..50cc63534486 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -235,7 +235,7 @@ void thaw_kernel_threads(void)
 
read_lock(_lock);
for_each_process_thread(g, p) {
-   if (p->flags & (PF_KTHREAD | PF_WQ_WORKER))
+   if (p->flags & PF_KTHREAD)
__thaw_task(p);
}
read_unlock(_lock);
-- 
2.17.1



[PATCH] rcu: Release per-cpu krcp page cache when CPU going offline

2021-01-20 Thread qiang . zhang
From: Zqiang 

If CPUs go offline, the corresponding krcp's page cache can
not be use util the CPU come back online, or maybe the CPU
will never go online again, this commit therefore free krcp's
page cache when CPUs go offline.

Signed-off-by: Zqiang 
---
 kernel/rcu/tree.c | 47 ++-
 1 file changed, 46 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index e04e336bee42..2eaf6f287483 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -158,6 +158,9 @@ static void sync_sched_exp_online_cleanup(int cpu);
 static void check_cb_ovld_locked(struct rcu_data *rdp, struct rcu_node *rnp);
 static bool rcu_rdp_is_offloaded(struct rcu_data *rdp);
 
+static void krc_offline(unsigned int cpu, bool off);
+static void free_krc_page_cache(int cpu);
+
 /* rcuc/rcub kthread realtime priority */
 static int kthread_prio = IS_ENABLED(CONFIG_RCU_BOOST) ? 1 : 0;
 module_param(kthread_prio, int, 0444);
@@ -2457,6 +2460,9 @@ int rcutree_dead_cpu(unsigned int cpu)
 
// Stop-machine done, so allow nohz_full to disable tick.
tick_dep_clear(TICK_DEP_BIT_RCU);
+
+   krc_offline(cpu, true);
+   free_krc_page_cache(cpu);
return 0;
 }
 
@@ -3169,6 +3175,7 @@ struct kfree_rcu_cpu {
 
struct llist_head bkvcache;
int nr_bkv_objs;
+   bool offline;
 };
 
 static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc) = {
@@ -3220,6 +3227,8 @@ static inline bool
 put_cached_bnode(struct kfree_rcu_cpu *krcp,
struct kvfree_rcu_bulk_data *bnode)
 {
+   if (krcp->offline)
+   return false;
// Check the limit.
if (krcp->nr_bkv_objs >= rcu_min_cached_objs)
return false;
@@ -3230,6 +3239,39 @@ put_cached_bnode(struct kfree_rcu_cpu *krcp,
 
 }
 
+static void krc_offline(unsigned int cpu, bool off)
+{
+   unsigned long flags;
+   struct kfree_rcu_cpu *krcp;
+
+   krcp = per_cpu_ptr(, cpu);
+   raw_spin_lock_irqsave(>lock, flags);
+   if (off)
+   krcp->offline = true;
+   else
+   krcp->offline = false;
+   raw_spin_unlock_irqrestore(>lock, flags);
+}
+
+static void free_krc_page_cache(int cpu)
+{
+   unsigned long flags;
+   struct kfree_rcu_cpu *krcp;
+   int i;
+   struct kvfree_rcu_bulk_data *bnode;
+
+   krcp = per_cpu_ptr(, cpu);
+
+   for (i = 0; i < rcu_min_cached_objs; i++) {
+   raw_spin_lock_irqsave(>lock, flags);
+   bnode = get_cached_bnode(krcp);
+   raw_spin_unlock_irqrestore(>lock, flags);
+   if (!bnode)
+   break;
+   free_page((unsigned long)bnode);
+   }
+}
+
 /*
  * This function is invoked in workqueue context after a grace period.
  * It frees all the objects queued on ->bhead_free or ->head_free.
@@ -3549,7 +3591,8 @@ void kvfree_call_rcu(struct rcu_head *head, 
rcu_callback_t func)
kasan_record_aux_stack(ptr);
success = kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr);
if (!success) {
-   run_page_cache_worker(krcp);
+   if (!krcp->offline)
+   run_page_cache_worker(krcp);
 
if (head == NULL)
// Inline if kvfree_rcu(one_arg) call.
@@ -4086,6 +4129,7 @@ int rcutree_prepare_cpu(unsigned int cpu)
rcu_spawn_cpu_nocb_kthread(cpu);
WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus + 1);
 
+   krc_offline(cpu, false);
return 0;
 }
 
@@ -4591,6 +4635,7 @@ static void __init kfree_rcu_batch_init(void)
INIT_DELAYED_WORK(>monitor_work, kfree_rcu_monitor);
INIT_WORK(>page_cache_work, fill_page_cache_func);
krcp->initialized = true;
+   krcp->offline = true;
}
if (register_shrinker(_rcu_shrinker))
pr_err("Failed to register kfree_rcu() shrinker!\n");
-- 
2.29.2



[PATCH] workqueue: tracing the name of the workqueue instead of it's address

2021-01-04 Thread qiang . zhang
From: Zqiang 

This patch tracing workqueue name instead of it's address, the
new format is as follows.

workqueue_queue_work: work struct=84e3df56 function=
drm_fb_helper_dirty_work workqueue=events req_cpu=256 cpu=1

This tell us to know which workqueue our work is queued.

Signed-off-by: Zqiang 
---
 include/trace/events/workqueue.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/trace/events/workqueue.h b/include/trace/events/workqueue.h
index 9b8ae961acc5..970cc2ea2850 100644
--- a/include/trace/events/workqueue.h
+++ b/include/trace/events/workqueue.h
@@ -30,7 +30,7 @@ TRACE_EVENT(workqueue_queue_work,
TP_STRUCT__entry(
__field( void *,work)
__field( void *,function)
-   __field( void *,workqueue)
+   __field( const char *,  workqueue)
__field( unsigned int,  req_cpu )
__field( unsigned int,  cpu )
),
@@ -38,12 +38,12 @@ TRACE_EVENT(workqueue_queue_work,
TP_fast_assign(
__entry->work   = work;
__entry->function   = work->func;
-   __entry->workqueue  = pwq->wq;
+   __entry->workqueue  = pwq->wq->name;
__entry->req_cpu= req_cpu;
__entry->cpu= pwq->pool->cpu;
),
 
-   TP_printk("work struct=%p function=%ps workqueue=%p req_cpu=%u cpu=%u",
+   TP_printk("work struct=%p function=%ps workqueue=%s req_cpu=%u cpu=%u",
  __entry->work, __entry->function, __entry->workqueue,
  __entry->req_cpu, __entry->cpu)
 );
-- 
2.17.1



[PATCH] ipc/sem.c: Convert kfree_rcu() to call_rcu() in freeary function

2020-12-30 Thread qiang . zhang
From: Zqiang 

Due to freeary function is called with spinlock be held,
the synchronize_rcu function may be called in kfree_rcu
function, the schedule may be happen in spinlock critical
region, need to replace kfree_rcu() with call_rcu().

Fixes: 693a8b6eecce ("ipc,rcu: Convert call_rcu(free_un) to kfree_rcu()")
Signed-off-by: Zqiang 
---
 ipc/sem.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/ipc/sem.c b/ipc/sem.c
index f6c30a85dadf..12c3184347d9 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -1132,6 +1132,13 @@ static int count_semcnt(struct sem_array *sma, ushort 
semnum,
return semcnt;
 }
 
+static void free_un(struct rcu_head *head)
+{
+   struct sem_undo *un = container_of(head, struct sem_undo, rcu);
+
+   kfree(un);
+}
+
 /* Free a semaphore set. freeary() is called with sem_ids.rwsem locked
  * as a writer and the spinlock for this semaphore set hold. sem_ids.rwsem
  * remains locked on exit.
@@ -1152,7 +1159,7 @@ static void freeary(struct ipc_namespace *ns, struct 
kern_ipc_perm *ipcp)
un->semid = -1;
list_del_rcu(>list_proc);
spin_unlock(>ulp->lock);
-   kfree_rcu(un, rcu);
+   call_rcu(>rcu, free_un);
}
 
/* Wake up all pending processes and let them fail with EIDRM. */
-- 
2.17.1



[PATCH] udlfb: Fix memory leak in dlfb_usb_probe

2020-12-14 Thread qiang . zhang
From: Zqiang 

The dlfb_alloc_urb_list function is called in dlfb_usb_probe function,
after that if an error occurs, the dlfb_free_urb_list function need to
be called.

BUG: memory leak
unreferenced object 0x88810adde100 (size 32):
  comm "kworker/1:0", pid 17, jiffies 4294947788 (age 19.520s)
  hex dump (first 32 bytes):
10 30 c3 0d 81 88 ff ff c0 fa 63 12 81 88 ff ff  .0c.
00 30 c3 0d 81 88 ff ff 80 d1 3a 08 81 88 ff ff  .0:.
  backtrace:
[<19512953>] kmalloc include/linux/slab.h:552 [inline]
[<19512953>] kzalloc include/linux/slab.h:664 [inline]
[<19512953>] dlfb_alloc_urb_list drivers/video/fbdev/udlfb.c:1892 
[inline]
[<19512953>] dlfb_usb_probe.cold+0x289/0x988 
drivers/video/fbdev/udlfb.c:1704
[<72160152>] usb_probe_interface+0x177/0x370 
drivers/usb/core/driver.c:396
[] really_probe+0x159/0x480 drivers/base/dd.c:554
[] driver_probe_device+0x84/0x100 drivers/base/dd.c:738
[] __device_attach_driver+0xee/0x110 drivers/base/dd.c:844
[] bus_for_each_drv+0xb7/0x100 drivers/base/bus.c:431
[<463fbcb4>] __device_attach+0x122/0x250 drivers/base/dd.c:912
[] bus_probe_device+0xc6/0xe0 drivers/base/bus.c:491
[<364bbda5>] device_add+0x5ac/0xc30 drivers/base/core.c:2936
[] usb_set_configuration+0x9de/0xb90 
drivers/usb/core/message.c:2159
[] usb_generic_driver_probe+0x8c/0xc0 
drivers/usb/core/generic.c:238
[<1830872b>] usb_probe_device+0x5c/0x140 
drivers/usb/core/driver.c:293
[] really_probe+0x159/0x480 drivers/base/dd.c:554
[] driver_probe_device+0x84/0x100 drivers/base/dd.c:738
[] __device_attach_driver+0xee/0x110 drivers/base/dd.c:844
[] bus_for_each_drv+0xb7/0x100 drivers/base/bus.c:431

Reported-by: syzbot+c9e365d7f450e8aa6...@syzkaller.appspotmail.com
Signed-off-by: Zqiang 
---
 drivers/video/fbdev/udlfb.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/video/fbdev/udlfb.c b/drivers/video/fbdev/udlfb.c
index f9b3c1cb9530..b9cdd02c1000 100644
--- a/drivers/video/fbdev/udlfb.c
+++ b/drivers/video/fbdev/udlfb.c
@@ -1017,6 +1017,7 @@ static void dlfb_ops_destroy(struct fb_info *info)
}
vfree(dlfb->backing_buffer);
kfree(dlfb->edid);
+   dlfb_free_urb_list(dlfb);
usb_put_dev(dlfb->udev);
kfree(dlfb);
 
-- 
2.17.1



[PATCH] kasan: fix slab double free when cpu-hotplug

2020-12-04 Thread qiang . zhang
From: Zqiang 

When a CPU offline, the per-cpu quarantine's offline be set true,
after this, if the quarantine_put be called in this CPU, the objects
will be free and return false, free objects doesn't to be done, due
to return false, the slab memory manager will free this objects.

Fixes: 41ab1aae781f ("kasan: fix object remaining in offline per-cpu 
quarantine")
Signed-off-by: Zqiang 
---
 mm/kasan/quarantine.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
index d98b516f372f..55783125a767 100644
--- a/mm/kasan/quarantine.c
+++ b/mm/kasan/quarantine.c
@@ -194,7 +194,6 @@ bool quarantine_put(struct kmem_cache *cache, void *object)
 
q = this_cpu_ptr(_quarantine);
if (q->offline) {
-   qlink_free(>quarantine_link, cache);
local_irq_restore(flags);
return false;
}
-- 
2.17.1



[PATCH v2] rcu: kasan: record and print kvfree_call_rcu call stack

2020-11-19 Thread qiang . zhang
From: Zqiang 

Add kasan_record_aux_stack function for kvfree_call_rcu function to
record call stacks.

Cc: Walter Wu 
Cc: Andrey Ryabinin 
Cc: Alexander Potapenko 
Cc: "Paul E. McKenney" 
Signed-off-by: Zqiang 
---
 v1->v2:
 Add Cc tags.

 kernel/rcu/tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index da3414522285..a252b2f0208d 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3506,7 +3506,7 @@ void kvfree_call_rcu(struct rcu_head *head, 
rcu_callback_t func)
success = true;
goto unlock_return;
}
-
+   kasan_record_aux_stack(ptr);
success = kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr);
if (!success) {
run_page_cache_worker(krcp);
-- 
2.17.1



[PATCH] srcu: Remove srcu_cblist_invoking member from sdp

2020-11-18 Thread qiang . zhang
From: Zqiang 

Workqueue can ensure the multiple same sdp->work sequential
execution in rcu_gp_wq, not need srcu_cblist_invoking to
prevent concurrent execution, so remove it.

Signed-off-by: Zqiang 
---
 include/linux/srcutree.h | 1 -
 kernel/rcu/srcutree.c| 8 ++--
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h
index 9cfcc8a756ae..62d8312b5451 100644
--- a/include/linux/srcutree.h
+++ b/include/linux/srcutree.h
@@ -31,7 +31,6 @@ struct srcu_data {
struct rcu_segcblist srcu_cblist;   /* List of callbacks.*/
unsigned long srcu_gp_seq_needed;   /* Furthest future GP needed. */
unsigned long srcu_gp_seq_needed_exp;   /* Furthest future exp GP. */
-   bool srcu_cblist_invoking;  /* Invoking these CBs? */
struct timer_list delay_work;   /* Delay for CB invoking */
struct work_struct work;/* Context for CB invoking. */
struct rcu_head srcu_barrier_head;  /* For srcu_barrier() use. */
diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 3c5e2806e0b9..c4d5cd2567a6 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -134,7 +134,6 @@ static void init_srcu_struct_nodes(struct srcu_struct *ssp, 
bool is_static)
sdp = per_cpu_ptr(ssp->sda, cpu);
spin_lock_init(_PRIVATE(sdp, lock));
rcu_segcblist_init(>srcu_cblist);
-   sdp->srcu_cblist_invoking = false;
sdp->srcu_gp_seq_needed = ssp->srcu_gp_seq;
sdp->srcu_gp_seq_needed_exp = ssp->srcu_gp_seq;
sdp->mynode = _first[cpu / levelspread[level]];
@@ -1254,14 +1253,11 @@ static void srcu_invoke_callbacks(struct work_struct 
*work)
spin_lock_irq_rcu_node(sdp);
rcu_segcblist_advance(>srcu_cblist,
  rcu_seq_current(>srcu_gp_seq));
-   if (sdp->srcu_cblist_invoking ||
-   !rcu_segcblist_ready_cbs(>srcu_cblist)) {
+   if (!rcu_segcblist_ready_cbs(>srcu_cblist)) {
spin_unlock_irq_rcu_node(sdp);
return;  /* Someone else on the job or nothing to do. */
}
 
-   /* We are on the job!  Extract and invoke ready callbacks. */
-   sdp->srcu_cblist_invoking = true;
rcu_segcblist_extract_done_cbs(>srcu_cblist, _cbs);
len = ready_cbs.len;
spin_unlock_irq_rcu_node(sdp);
@@ -1282,7 +1278,7 @@ static void srcu_invoke_callbacks(struct work_struct 
*work)
rcu_segcblist_add_len(>srcu_cblist, -len);
(void)rcu_segcblist_accelerate(>srcu_cblist,
   rcu_seq_snap(>srcu_gp_seq));
-   sdp->srcu_cblist_invoking = false;
+
more = rcu_segcblist_ready_cbs(>srcu_cblist);
spin_unlock_irq_rcu_node(sdp);
if (more)
-- 
2.17.1



[PATCH] rcu: kasan: record and print kvfree_call_rcu call stack

2020-11-17 Thread qiang . zhang
From: Zqiang 

Add kasan_record_aux_stack function for kvfree_call_rcu function to
record call stacks.

Signed-off-by: Zqiang 
---
 kernel/rcu/tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index da3414522285..a252b2f0208d 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3506,7 +3506,7 @@ void kvfree_call_rcu(struct rcu_head *head, 
rcu_callback_t func)
success = true;
goto unlock_return;
}
-
+   kasan_record_aux_stack(ptr);
success = kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr);
if (!success) {
run_page_cache_worker(krcp);
-- 
2.17.1



[PATCH] kthread_worker: Add flush delayed work func

2020-11-11 Thread qiang . zhang
From: Zqiang 

Add 'kthread_flush_delayed_work' func, the principle of
this func is wait for a dwork to finish executing the
last queueing.

Signed-off-by: Zqiang 
---
 kernel/kthread.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index a5eceecd4513..1afe399ccd02 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -1045,6 +1045,30 @@ void kthread_flush_work(struct kthread_work *work)
 }
 EXPORT_SYMBOL_GPL(kthread_flush_work);
 
+/*
+ * kthread_flush_delayed_work - flush a kthread_delayed_work
+ * @dwork: dwork to flush
+ *
+ * wait for a dwork to finish executing the last queueing
+ */
+void kthread_flush_delayed_work(struct kthread_delayed_work *dwork)
+{
+   struct kthread_work *work = >work;
+   struct kthread_worker *worker = work->worker;
+   unsigned long flags;
+
+   if (del_timer_sync(>timer)) {
+   raw_spin_lock_irqsave(>lock, flags);
+   list_del_init(>node);
+   if (!work->canceling)
+   kthread_insert_work(worker, work, >work_list);
+
+   raw_spin_unlock_irqrestore(>lock, flags);
+   }
+   kthread_flush_work(work);
+}
+EXPORT_SYMBOL_GPL(kthread_flush_delayed_work);
+
 /*
  * This function removes the work from the worker queue. Also it makes sure
  * that it won't get queued later via the delayed work's timer.
-- 
2.17.1



[PATCH v2] kthread_worker: re-set CPU affinities if CPU come online

2020-10-28 Thread qiang . zhang
From: Zqiang 

When someone CPU offlined, the 'kthread_worker' which bind this CPU,
will run anywhere, if this CPU online, recovery of 'kthread_worker'
affinity by cpuhp notifiers.

Signed-off-by: Zqiang 
---
 v1->v2:
 rename variable kworker_online to kthread_worker_online.
 add 'cpuhp_node' and 'bind_cpu' init in KTHREAD_WORKER_INIT.
 add a comment explaining for WARN_ON_ONCE.

 include/linux/kthread.h |  4 
 kernel/kthread.c| 36 +++-
 2 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 65b81e0c494d..c28963e87b18 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -93,6 +93,8 @@ struct kthread_worker {
struct list_headdelayed_work_list;
struct task_struct  *task;
struct kthread_work *current_work;
+   struct hlist_node   cpuhp_node;
+   int bind_cpu;
 };
 
 struct kthread_work {
@@ -112,6 +114,8 @@ struct kthread_delayed_work {
.lock = __RAW_SPIN_LOCK_UNLOCKED((worker).lock),\
.work_list = LIST_HEAD_INIT((worker).work_list),\
.delayed_work_list = LIST_HEAD_INIT((worker).delayed_work_list),\
+   .cpuhp_node = {.next = NULL, .pprev = NULL},\
+   .bind_cpu = -1, \
}
 
 #define KTHREAD_WORK_INIT(work, fn){   \
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 34516b0a6eb7..6c66df585225 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -28,8 +28,10 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
+static enum cpuhp_state kthread_worker_online;
 
 static DEFINE_SPINLOCK(kthread_create_lock);
 static LIST_HEAD(kthread_create_list);
@@ -649,6 +651,8 @@ void __kthread_init_worker(struct kthread_worker *worker,
lockdep_set_class_and_name(>lock, key, name);
INIT_LIST_HEAD(>work_list);
INIT_LIST_HEAD(>delayed_work_list);
+   worker->bind_cpu = -1;
+   INIT_HLIST_NODE(>cpuhp_node);
 }
 EXPORT_SYMBOL_GPL(__kthread_init_worker);
 
@@ -744,8 +748,11 @@ __kthread_create_worker(int cpu, unsigned int flags,
if (IS_ERR(task))
goto fail_task;
 
-   if (cpu >= 0)
+   if (cpu >= 0) {
+   cpuhp_state_add_instance_nocalls(kthread_worker_online, 
>cpuhp_node);
kthread_bind(task, cpu);
+   worker->bind_cpu = cpu;
+   }
 
worker->flags = flags;
worker->task = task;
@@ -1230,6 +1237,9 @@ void kthread_destroy_worker(struct kthread_worker *worker)
if (WARN_ON(!task))
return;
 
+   if (worker->bind_cpu >= 0)
+   cpuhp_state_remove_instance_nocalls(kthread_worker_online, 
>cpuhp_node);
+
kthread_flush_worker(worker);
kthread_stop(task);
WARN_ON(!list_empty(>work_list));
@@ -1237,6 +1247,30 @@ void kthread_destroy_worker(struct kthread_worker 
*worker)
 }
 EXPORT_SYMBOL(kthread_destroy_worker);
 
+static int kthread_worker_cpu_online(unsigned int cpu, struct hlist_node *node)
+{
+   struct kthread_worker *worker = hlist_entry(node, struct 
kthread_worker, cpuhp_node);
+   struct task_struct *task = worker->task;
+
+   /* as we're called from CPU_ONLINE, the following shouldn't fail */
+   if (cpu == worker->bind_cpu)
+   WARN_ON_ONCE(set_cpus_allowed_ptr(task, cpumask_of(cpu)) < 0);
+   return 0;
+}
+
+static __init int kthread_worker_hotplug_init(void)
+{
+   int ret;
+
+   ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, 
"kthread-worker/online",
+   kthread_worker_cpu_online, NULL);
+   if (ret < 0)
+   return ret;
+   kthread_worker_online = ret;
+   return 0;
+}
+core_initcall(kthread_worker_hotplug_init);
+
 /**
  * kthread_use_mm - make the calling kthread operate on an address space
  * @mm: address space to operate on
-- 
2.17.1



[PATCH] io-wq: set task TASK_INTERRUPTIBLE state before schedule_timeout

2020-10-26 Thread qiang . zhang
From: Zqiang 

In 'io_wqe_worker' thread, if the work which in 'wqe->work_list' be
finished, the 'wqe->work_list' is empty, and after that the
'__io_worker_idle' func return false, the task state is TASK_RUNNING,
need to be set TASK_INTERRUPTIBLE before call schedule_timeout func.

Signed-off-by: Zqiang 
---
 fs/io-wq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/io-wq.c b/fs/io-wq.c
index 02894df7656d..5f0626935b64 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -618,7 +618,7 @@ static int io_wqe_worker(void *data)
raw_spin_unlock_irq(>lock);
if (signal_pending(current))
flush_signals(current);
-   if (schedule_timeout(WORKER_IDLE_TIMEOUT))
+   if (schedule_timeout_interruptible(WORKER_IDLE_TIMEOUT))
continue;
/* timed out, exit unless we're the fixed worker */
if (test_bit(IO_WQ_BIT_EXIT, >state) ||
-- 
2.17.1



[PATCH] kthread_worker: re-set CPU affinities if CPU come online

2020-10-26 Thread qiang . zhang
From: Zqiang 

When someone CPU offlined, the 'kthread_worker' which bind this CPU,
will run anywhere, if this CPU online, recovery of 'kthread_worker'
affinity by cpuhp notifiers.

Signed-off-by: Zqiang 
---
 include/linux/kthread.h |  2 ++
 kernel/kthread.c| 35 ++-
 2 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 65b81e0c494d..5acbf2e731cb 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -93,6 +93,8 @@ struct kthread_worker {
struct list_headdelayed_work_list;
struct task_struct  *task;
struct kthread_work *current_work;
+   struct hlist_node   cpuhp_node;
+   int bind_cpu;
 };
 
 struct kthread_work {
diff --git a/kernel/kthread.c b/kernel/kthread.c
index e29773c82b70..68968832777f 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -28,8 +28,10 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
+static enum cpuhp_state kworker_online;
 
 static DEFINE_SPINLOCK(kthread_create_lock);
 static LIST_HEAD(kthread_create_list);
@@ -649,6 +651,8 @@ void __kthread_init_worker(struct kthread_worker *worker,
lockdep_set_class_and_name(>lock, key, name);
INIT_LIST_HEAD(>work_list);
INIT_LIST_HEAD(>delayed_work_list);
+   worker->bind_cpu = -1;
+   INIT_HLIST_NODE(>cpuhp_node);
 }
 EXPORT_SYMBOL_GPL(__kthread_init_worker);
 
@@ -737,8 +741,11 @@ __kthread_create_worker(int cpu, unsigned int flags,
if (IS_ERR(task))
goto fail_task;
 
-   if (cpu >= 0)
+   if (cpu >= 0) {
kthread_bind(task, cpu);
+   worker->bind_cpu = cpu;
+   cpuhp_state_add_instance_nocalls(kworker_online, 
>cpuhp_node);
+   }
 
worker->flags = flags;
worker->task = task;
@@ -1220,6 +1227,9 @@ void kthread_destroy_worker(struct kthread_worker *worker)
if (WARN_ON(!task))
return;
 
+   if (worker->bind_cpu >= 0)
+   cpuhp_state_remove_instance_nocalls(kworker_online, 
>cpuhp_node);
+
kthread_flush_worker(worker);
kthread_stop(task);
WARN_ON(!list_empty(>work_list));
@@ -1227,6 +1237,29 @@ void kthread_destroy_worker(struct kthread_worker 
*worker)
 }
 EXPORT_SYMBOL(kthread_destroy_worker);
 
+static int kworker_cpu_online(unsigned int cpu, struct hlist_node *node)
+{
+   struct kthread_worker *worker = hlist_entry(node, struct 
kthread_worker, cpuhp_node);
+   struct task_struct *task = worker->task;
+
+   if (cpu == worker->bind_cpu)
+   WARN_ON_ONCE(set_cpus_allowed_ptr(task, cpumask_of(cpu)) < 0);
+   return 0;
+}
+
+static __init int kthread_worker_hotplug_init(void)
+{
+   int ret;
+
+   ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, 
"kthread-worker/online",
+   kworker_cpu_online, NULL);
+   if (ret < 0)
+   return ret;
+   kworker_online = ret;
+   return 0;
+}
+subsys_initcall(kthread_worker_hotplug_init);
+
 /**
  * kthread_use_mm - make the calling kthread operate on an address space
  * @mm: address space to operate on
-- 
2.17.1



[PATCH] io-wq: fix 'task->pi_lock' spin lock protect

2020-10-23 Thread qiang . zhang
From: Zqiang 

The set CPU affinity func 'do_set_cpus_allowed' may be operate
'task_rq', need add rq lock protect, replace 'pi_lock' spinlock
protect with task_rq_lock func.

Signed-off-by: Zqiang 
---
 fs/io-wq.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/io-wq.c b/fs/io-wq.c
index d3165ce339c2..6ea3e0224e63 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -1209,11 +1209,13 @@ static bool io_wq_worker_affinity(struct io_worker 
*worker, void *data)
 {
struct task_struct *task = worker->task;
unsigned long flags;
+   struct rq_flags rf;
+   struct rq *rq;
 
-   raw_spin_lock_irqsave(>pi_lock, flags);
+   rq = task_rq_lock(task, );
do_set_cpus_allowed(task, cpumask_of_node(worker->wqe->node));
task->flags |= PF_NO_SETAFFINITY;
-   raw_spin_unlock_irqrestore(>pi_lock, flags);
+   task_rq_unlock(rq, task, );
return false;
 }
 
-- 
2.17.1



[PATCH] workqueue: replace call_rcu with kfree_rcu

2020-10-14 Thread qiang . zhang
From: Zqiang 

The pwq's rcu callback func only to release 'pwq' resources,
can use 'kfree_rcu' instead of 'call_rcu' func.

Signed-off-by: Zqiang 
---
 kernel/workqueue.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ac088ce6059b..8d4fe649631a 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3653,11 +3653,6 @@ static struct worker_pool *get_unbound_pool(const struct 
workqueue_attrs *attrs)
return NULL;
 }
 
-static void rcu_free_pwq(struct rcu_head *rcu)
-{
-   kmem_cache_free(pwq_cache,
-   container_of(rcu, struct pool_workqueue, rcu));
-}
 
 /*
  * Scheduled on system_wq by put_pwq() when an unbound pwq hits zero refcnt
@@ -3683,7 +3678,7 @@ static void pwq_unbound_release_workfn(struct work_struct 
*work)
put_unbound_pool(pool);
mutex_unlock(_pool_mutex);
 
-   call_rcu(>rcu, rcu_free_pwq);
+   kfree_rcu(pwq, rcu);
 
/*
 * If we're the last pwq going away, @wq is already dead and no one
-- 
2.17.1



[PATCH v4] kthread_worker: Prevent queuing delayed work from timer_fn when it is being canceled

2020-10-14 Thread qiang . zhang
From: Zqiang 

There is a small race window when a delayed work is being canceled and
the work still might be queued from the timer_fn:

CPU0CPU1
kthread_cancel_delayed_work_sync()
   __kthread_cancel_work_sync()
 __kthread_cancel_work()
work->canceling++;
  kthread_delayed_work_timer_fn()
   kthread_insert_work();

BUG: kthread_insert_work() should not get called when work->canceling
is set.

Cc: 
Reviewed-by: Petr Mladek 
Acked-by: Tejun Heo 
Signed-off-by: Zqiang 
--- 
 v1->v2->v3:
 Change the description of the problem and add 'Reviewed-by' tags.
 v3->v4:
 Add 'stable' and 'Acked-by' tags.

 kernel/kthread.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 3edaa380dc7b..85a2c9b32049 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -897,7 +897,8 @@ void kthread_delayed_work_timer_fn(struct timer_list *t)
/* Move the work from worker->delayed_work_list. */
WARN_ON_ONCE(list_empty(>node));
list_del_init(>node);
-   kthread_insert_work(worker, work, >work_list);
+   if (!work->canceling)
+   kthread_insert_work(worker, work, >work_list);
 
raw_spin_unlock_irqrestore(>lock, flags);
 }
-- 
2.17.1



[PATCH] usb: gadget: function: printer: Fix usb function descriptors leak

2020-10-14 Thread qiang . zhang
From: Zqiang 

If an error occurs after call 'usb_assign_descriptors' func, the
'usb_free_all_descriptors' need to be call to release memory space
occupied by function descriptors.

Signed-off-by: Zqiang 
---
 drivers/usb/gadget/function/f_printer.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/usb/gadget/function/f_printer.c 
b/drivers/usb/gadget/function/f_printer.c
index 64a4112068fc..2f1eb2e81d30 100644
--- a/drivers/usb/gadget/function/f_printer.c
+++ b/drivers/usb/gadget/function/f_printer.c
@@ -1162,6 +1162,7 @@ static int printer_func_bind(struct usb_configuration *c,
printer_req_free(dev->in_ep, req);
}
 
+   usb_free_all_descriptors(f);
return ret;
 
 }
-- 
2.17.1