On Wed, Feb 13, 2019 at 10:50 PM Pankaj Gupta <[email protected]> wrote:
>
>
> Hi Yongxin,
>
> >
> > Hi experts,
> >
> > Could anyone tell me whether Linux nvdimm driver is RT compatible?
> >
> > When I was testing PMEM performance with fio using the following command, I
> > got calltrace below.
> >
> > # fio -filename=/dev/pmem0s -direct=1 -iodepth 1 -thread -rw=randrw
> > -rwmixread=70 -ioengine=psync -bs=16k -size=1G -numjobs=30 -runtime=100
> > -group_reporting -name=mytest
> >
> >     BUG: scheduling while atomic: fio/2514/0x00000002
> >     Modules linked in: intel_rapl nd_pmem nd_btt skx_edac iTCO_wdt
> >     iTCO_vendor_support intel_powerclamp coretemp
> >     crct10dif_pclmul crct10dif_common aesni_intel aes_x86_64 crypto_simd
> >     cryptd i40e nvme glue_helper nvme_core lpc_ich i2c_i801 nfit
> >     pcc_cpufreq wmi libnvdimm acpi_pad
> >     acpi_power_meter
> >     Preemption disabled at:
> >     [<ffffffffc03608d9>] nd_region_acquire_lane+0x19/0x80 [libnvdimm]
> >     CPU: 44 PID: 2514 Comm: fio Tainted: G        W
> >     4.18.20-rt8-preempt-rt #1
> >     Call Trace:
> >      dump_stack+0x4f/0x6a
> >      ? nd_region_acquire_lane+0x19/0x80 [libnvdimm]
> >      __schedule_bug.cold.17+0x38/0x55
> >      __schedule+0x484/0x6c0
> >      ? _raw_spin_lock+0x17/0x40
> >      schedule+0x3d/0xe0
> >      rt_spin_lock_slowlock_locked+0x118/0x2a0
> >      rt_spin_lock_slowlock+0x57/0x90
> >      rt_spin_lock+0x52/0x60
> >      btt_write_pg.isra.16+0x280/0x4b0 [nd_btt]
> >      btt_make_request+0x1b1/0x320 [nd_btt]
> >      generic_make_request+0x1dc/0x3f0
> >      submit_bio+0x49/0x140
> >
> > nd_region_acquire_lane() disables preemption with get_cpu() which causes
> > "scheduling while atomic" spews on RT.
> >
> > Is it safe to replace get_cpu()/put_cpu() with
> > get_cpu_light()/put_cpu_light() in
> > nd_region_acquire_lane()/nd_region_release_lane()?
>
> get_cpu disables preemption on a core and get_cpu_light will just avoid
> task migration across the cores. If we change this to get/put_cpu_light
> I think there is possibility of a race here.
>
> > After this replacement, the codes protected by
> > nd_region_release_lane/nd_region_release_lane would become pre-emptible.
> > So are these codes reentrant?
>
> Another thing I see is when get_cpu is called in "nd_region_acquire_lane"
> there is no corresponding call for "put_cpu" in same function. This is
> called only in "nd_region_release_lane". It means for the duration of
> these calls preemption will be disabled.
>
> Here, we cannot change preemption disable to migration disable for mainline.
> As there is code which can result in race.
>
> For RT only patchset as well, IIUC will have same problem as there is 
> spin_lock
> protection on a condition and path before/without spinlock may also race.
> We can have something like local_lock_* =>  local_lock_cpu
>
> This are my two cents.

The btt implementation assumes that the btt code will not be
re-entered on the same cpu. So the pre-emption disable is required for
performance, but for correctness the implementation can fall back to
using a lock along with migration being disabled. In other words I
think changing nd_region_acquire_lane() like the following would be
sufficient for RT kernels:

diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index e2818f94f292..e720fc42bbc0 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -947,18 +947,15 @@ int nd_blk_region_init(struct nd_region *nd_region)
 unsigned int nd_region_acquire_lane(struct nd_region *nd_region)
 {
        unsigned int cpu, lane;
+       struct nd_percpu_lane *ndl_lock, *ndl_count;

-       cpu = get_cpu();
-       if (nd_region->num_lanes < nr_cpu_ids) {
-               struct nd_percpu_lane *ndl_lock, *ndl_count;
+       cpu = get_cpu_light();

-               lane = cpu % nd_region->num_lanes;
-               ndl_count = per_cpu_ptr(nd_region->lane, cpu);
-               ndl_lock = per_cpu_ptr(nd_region->lane, lane);
-               if (ndl_count->count++ == 0)
-                       spin_lock(&ndl_lock->lock);
-       } else
-               lane = cpu;
+       lane = cpu % nd_region->num_lanes;
+       ndl_count = per_cpu_ptr(nd_region->lane, cpu);
+       ndl_lock = per_cpu_ptr(nd_region->lane, lane);
+       if (ndl_count->count++ == 0)
+               spin_lock(&ndl_lock->lock);

        return lane;
 }
@@ -966,17 +963,14 @@ EXPORT_SYMBOL(nd_region_acquire_lane);

 void nd_region_release_lane(struct nd_region *nd_region, unsigned int lane)
 {
-       if (nd_region->num_lanes < nr_cpu_ids) {
-               unsigned int cpu = get_cpu();
-               struct nd_percpu_lane *ndl_lock, *ndl_count;
-
-               ndl_count = per_cpu_ptr(nd_region->lane, cpu);
-               ndl_lock = per_cpu_ptr(nd_region->lane, lane);
-               if (--ndl_count->count == 0)
-                       spin_unlock(&ndl_lock->lock);
-               put_cpu();
-       }
-       put_cpu();
+       unsigned int cpu = get_cpu();
+       struct nd_percpu_lane *ndl_lock, *ndl_count;
+
+       ndl_count = per_cpu_ptr(nd_region->lane, cpu);
+       ndl_lock = per_cpu_ptr(nd_region->lane, lane);
+       if (--ndl_count->count == 0)
+               spin_unlock(&ndl_lock->lock);
+       put_cpu_light();
 }
 EXPORT_SYMBOL(nd_region_release_lane);
_______________________________________________
Linux-nvdimm mailing list
[email protected]
https://lists.01.org/mailman/listinfo/linux-nvdimm

Reply via email to