Re: [PATCH] libsas: flush pending destruct work in sas_unregister_domain_devices()
On Thu, Dec 7, 2017 at 11:54 PM, Jason Yanwrote: > > We have sent a patchset to fix this and to enhance libsas hotplug. > Please refer to https://lkml.org/lkml/2017/9/6/142 > > And I'm going to send a new version soon. Thanks for working on it! Please make sure they will be queued for -stable too, since 3.14, 4.1 and 4.9 are all affected.
Re: [PATCH] libsas: flush pending destruct work in sas_unregister_domain_devices()
On Thu, Dec 7, 2017 at 4:40 PM, Cong Wang <xiyou.wangc...@gmail.com> wrote: > On Thu, Dec 7, 2017 at 2:57 PM, Cong Wang <xiyou.wangc...@gmail.com> wrote: >> On Thu, Dec 7, 2017 at 5:37 AM, John Garry <john.ga...@huawei.com> wrote: >>> On 28/11/2017 17:04, Cong Wang wrote: >>>> >>>> I don't understand, the only caller of sas_unregister_domain_devices() >>>> is sas_deform_port(). >>>> >>> >>> And sas_deform_port() may be called from another worker on the same queue, >>> right? As in sas_phye_loss_of_signal()->sas_deform_port() >> >> Oh, good catch! I didn't notice this subtle call path. >> >> Do you have any better idea to fix this? We saw this on 4.9 too. >> > > I think we can just cancel the destruct work before calling > sas_port_delete(). This should work even if it is called in > another work. > This assumes sas_port_delete() could release resources recursively in the hierarchy, this is true for sysfs but perhaps not true for other resources...
Re: [PATCH] libsas: flush pending destruct work in sas_unregister_domain_devices()
On Thu, Dec 7, 2017 at 2:57 PM, Cong Wang <xiyou.wangc...@gmail.com> wrote: > On Thu, Dec 7, 2017 at 5:37 AM, John Garry <john.ga...@huawei.com> wrote: >> On 28/11/2017 17:04, Cong Wang wrote: >>> >>> I don't understand, the only caller of sas_unregister_domain_devices() >>> is sas_deform_port(). >>> >> >> And sas_deform_port() may be called from another worker on the same queue, >> right? As in sas_phye_loss_of_signal()->sas_deform_port() > > Oh, good catch! I didn't notice this subtle call path. > > Do you have any better idea to fix this? We saw this on 4.9 too. > I think we can just cancel the destruct work before calling sas_port_delete(). This should work even if it is called in another work. So does the attached (untested) patch make any sense now? diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c index 60de66252fa2..bc512d65e2ca 100644 --- a/drivers/scsi/libsas/sas_discover.c +++ b/drivers/scsi/libsas/sas_discover.c @@ -565,6 +565,21 @@ int sas_discover_event(struct asd_sas_port *port, enum discover_event ev) return 0; } +static void sas_cancel_work(struct sas_work *sw) +{ + cancel_work_sync(>work); +} + +void sas_cancel_event(struct asd_sas_port *port, enum discover_event ev) +{ + struct sas_discovery *disc; + + if (!port) + return; + disc = >disc; + sas_cancel_work(>disc_work[ev].work); +} + /** * sas_init_disc -- initialize the discovery struct in the port * @port: pointer to struct port diff --git a/drivers/scsi/libsas/sas_port.c b/drivers/scsi/libsas/sas_port.c index d3c5297c6c89..89e37640e26c 100644 --- a/drivers/scsi/libsas/sas_port.c +++ b/drivers/scsi/libsas/sas_port.c @@ -219,6 +219,7 @@ void sas_deform_port(struct asd_sas_phy *phy, int gone) if (port->num_phys == 1) { sas_unregister_domain_devices(port, gone); + sas_cancel_event(port, DISCE_DESTRUCT); sas_port_delete(port->port); port->port = NULL; } else { diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h index 6df6fe0c2198..5b8a7fadd9b4 100644 --- a/include/scsi/libsas.h +++ b/include/scsi/libsas.h @@ -680,6 +680,7 @@ int sas_ex_revalidate_domain(struct domain_device *); void sas_unregister_domain_devices(struct asd_sas_port *port, int gone); void sas_init_disc(struct sas_discovery *disc, struct asd_sas_port *); int sas_discover_event(struct asd_sas_port *, enum discover_event ev); +void sas_cancel_event(struct asd_sas_port *port, enum discover_event ev); int sas_discover_sata(struct domain_device *); int sas_discover_end_dev(struct domain_device *);
Re: [PATCH] libsas: flush pending destruct work in sas_unregister_domain_devices()
On Thu, Dec 7, 2017 at 5:37 AM, John Garry <john.ga...@huawei.com> wrote: > On 28/11/2017 17:04, Cong Wang wrote: >> >> I don't understand, the only caller of sas_unregister_domain_devices() >> is sas_deform_port(). >> > > And sas_deform_port() may be called from another worker on the same queue, > right? As in sas_phye_loss_of_signal()->sas_deform_port() Oh, good catch! I didn't notice this subtle call path. Do you have any better idea to fix this? We saw this on 4.9 too. > > The device destruct takes place in a separate worker from which > sas_deform_port() is called, but the same queue. So we have this queued > destruct happen after the port is fully deformed -> hence the WARN. > > I guess you only tested your patch on disks attached through an expander. I have very limited scsi hardware, so my testing is limited too.
Re: [PATCH] libsas: flush pending destruct work in sas_unregister_domain_devices()
On Tue, Nov 28, 2017 at 3:18 AM, John Garry <john.ga...@huawei.com> wrote: > On 28/11/2017 08:20, Johannes Thumshirn wrote: >> >> On Mon, Nov 27, 2017 at 04:24:45PM -0800, Cong Wang wrote: >>> >>> We saw dozens of the following kernel waring: >>> >>> WARNING: CPU: 0 PID: 705 at fs/sysfs/group.c:224 >>> sysfs_remove_group+0x54/0x88() >>> sysfs group 81ab7670 not found for kobject '6:0:3:0' >>> Modules linked in: cpufreq_ondemand x86_pkg_temp_thermal coretemp >>> kvm_intel kvm microcode raid0 iTCO_wdt iTCO_vendor_support sb_edac edac_core >>> lpc_ich mfd_core ioatdma i2c_i801 shpchp wmi hed acpi_cpufreq lp parport >>> tcp_diag inet_diag ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel igb ptp >>> pps_core i2c_algo_bit i2c_core crc32c_intel isci libsas scsi_transport_sas >>> dca ipv6 >>> CPU: 0 PID: 705 Comm: kworker/u240:0 Not tainted 4.1.35.el7.x86_64 #1 >> >> >> This should by now be fixed with commit fbce4d97fd43 ("scsi: fixup kernel >> warning during rmmod()" which went into v4.14-rc6. >> > > Is that the same issue? I think Cong Wang is just trying to deal with the > longstanding libsas hotplug WARN. Right, we saw it on both 4.1 and 3.14, clearly an old bug. > > We at Huawei are still working to fix it. Our patchset is under internal > test at the moment. > > As for this patch: >> drivers/scsi/libsas/sas_discover.c | 7 ++- >> 1 file changed, 6 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/scsi/libsas/sas_discover.c >> b/drivers/scsi/libsas/sas_discover.c >> index 60de66252fa2..27c11fc7aa2b 100644 >> --- a/drivers/scsi/libsas/sas_discover.c >> +++ b/drivers/scsi/libsas/sas_discover.c >> @@ -388,6 +388,11 @@ void sas_unregister_dev(struct asd_sas_port *port, >> struct domain_device *dev) >> } >> } >> >> +static void sas_flush_work(struct asd_sas_port *port) >> +{ >> + scsi_flush_work(port->ha->core.shost); >> +} >> + >> void sas_unregister_domain_devices(struct asd_sas_port *port, int gone) >> { >> struct domain_device *dev, *n; >> @@ -401,8 +406,8 @@ void sas_unregister_domain_devices(struct asd_sas_port >> *port, int gone) >> list_for_each_entry_safe(dev, n, >disco_list, disco_list_node) >> sas_unregister_dev(port, dev); >> >> + sas_flush_work(port); > > How can this work as sas_unregister_domain_devices() may be called from the > same workqueue which you're trying to flush? I don't understand, the only caller of sas_unregister_domain_devices() is sas_deform_port().
Re: [PATCH] libsas: flush pending destruct work in sas_unregister_domain_devices()
On Tue, Nov 28, 2017 at 12:20 AM, Johannes Thumshirn <jthumsh...@suse.de> wrote: > On Mon, Nov 27, 2017 at 04:24:45PM -0800, Cong Wang wrote: >> We saw dozens of the following kernel waring: >> >> WARNING: CPU: 0 PID: 705 at fs/sysfs/group.c:224 >> sysfs_remove_group+0x54/0x88() >> sysfs group 81ab7670 not found for kobject '6:0:3:0' >> Modules linked in: cpufreq_ondemand x86_pkg_temp_thermal coretemp kvm_intel >> kvm microcode raid0 iTCO_wdt iTCO_vendor_support sb_edac edac_core lpc_ich >> mfd_core ioatdma i2c_i801 shpchp wmi hed acpi_cpufreq lp parport tcp_diag >> inet_diag ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel igb ptp pps_core >> i2c_algo_bit i2c_core crc32c_intel isci libsas scsi_transport_sas dca ipv6 >> CPU: 0 PID: 705 Comm: kworker/u240:0 Not tainted 4.1.35.el7.x86_64 #1 > > This should by now be fixed with commit fbce4d97fd43 ("scsi: fixup kernel > warning during rmmod()" which went into v4.14-rc6. I don't see the full backtrace in commit fbce4d97fd43, but it is probably not rmmod path in our case.
[PATCH] libsas: flush pending destruct work in sas_unregister_domain_devices()
We saw dozens of the following kernel waring: WARNING: CPU: 0 PID: 705 at fs/sysfs/group.c:224 sysfs_remove_group+0x54/0x88() sysfs group 81ab7670 not found for kobject '6:0:3:0' Modules linked in: cpufreq_ondemand x86_pkg_temp_thermal coretemp kvm_intel kvm microcode raid0 iTCO_wdt iTCO_vendor_support sb_edac edac_core lpc_ich mfd_core ioatdma i2c_i801 shpchp wmi hed acpi_cpufreq lp parport tcp_diag inet_diag ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel igb ptp pps_core i2c_algo_bit i2c_core crc32c_intel isci libsas scsi_transport_sas dca ipv6 CPU: 0 PID: 705 Comm: kworker/u240:0 Not tainted 4.1.35.el7.x86_64 #1 Hardware name: WIWYNN Lyra/JD/S2600GZ, BIOS SE5C600.86B.02.03.2004.030620151456 03/06/2015 Workqueue: scsi_wq_6 sas_destruct_devices [libsas] 88056c393ba8 81544a6d 88056c393bf8 0009 88056c393be8 81069b4c 88081790d078 811dad37 81ab7670 88081b29dc10 Call Trace: [] dump_stack+0x4d/0x63 [] warn_slowpath_common+0xa1/0xbb [] ? sysfs_remove_group+0x54/0x88 [] warn_slowpath_fmt+0x46/0x48 [] ? kernfs_find_and_get_ns+0x4d/0x58 [] sysfs_remove_group+0x54/0x88 [] dpm_sysfs_remove+0x50/0x55 [] device_del+0x47/0x1ec [] ? mutex_unlock+0x16/0x18 [] device_unregister+0x48/0x54 [] bsg_unregister_queue+0x5f/0x86 [] __scsi_remove_device+0x3a/0xc3 [] scsi_remove_device+0x26/0x33 [] scsi_remove_target+0x134/0x19b [] sas_rphy_remove+0x2c/0x72 [scsi_transport_sas] [] sas_rphy_delete+0x13/0x1f [scsi_transport_sas] [] sas_destruct_devices+0x58/0x79 [libsas] [] process_one_work+0x19b/0x2d1 [] worker_thread+0x1dd/0x2bb [] ? cancel_delayed_work+0x72/0x72 [] kthread+0xa5/0xad [] ? task_work_add+0xd/0x53 [] ? __kthread_parkme+0x61/0x61 [] ret_from_fork+0x42/0x70 [] ? __kthread_parkme+0x61/0x61 It looks like we don't wait for sas destruct work properly on tear down path, at least sas_deform_port() calls sas_unregister_domain_devices() to schedule destruct work to a workqueue and then calls sas_port_delete() to remove the related sysfs files concurrently. Dan tried to fix this with a different way: https://patchwork.kernel.org/patch/6450921/ but that patch is never applied. I take a better approach as suggested by Johannes, that is waiting for pending destruct work to remove child sysfs files and then removing the parent sysfs files. Cc: Dan Williams <dan.j.willi...@intel.com> Cc: Johannes Thumshirn <jthumsh...@suse.de> Cc: Praveen Murali <pmur...@logicube.com> Cc: "James E.J. Bottomley" <j...@linux.vnet.ibm.com> Cc: "Martin K. Petersen" <martin.peter...@oracle.com> Cc: linux-scsi@vger.kernel.org Signed-off-by: Cong Wang <xiyou.wangc...@gmail.com> --- drivers/scsi/libsas/sas_discover.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c index 60de66252fa2..27c11fc7aa2b 100644 --- a/drivers/scsi/libsas/sas_discover.c +++ b/drivers/scsi/libsas/sas_discover.c @@ -388,6 +388,11 @@ void sas_unregister_dev(struct asd_sas_port *port, struct domain_device *dev) } } +static void sas_flush_work(struct asd_sas_port *port) +{ + scsi_flush_work(port->ha->core.shost); +} + void sas_unregister_domain_devices(struct asd_sas_port *port, int gone) { struct domain_device *dev, *n; @@ -401,8 +406,8 @@ void sas_unregister_domain_devices(struct asd_sas_port *port, int gone) list_for_each_entry_safe(dev, n, >disco_list, disco_list_node) sas_unregister_dev(port, dev); + sas_flush_work(port); port->port->rphy = NULL; - } void sas_device_set_phy(struct domain_device *dev, struct sas_port *port) -- 2.13.0
Re: [PATCH] net: return value of skb_linearize should be handled in Linux kernel
On Tue, Dec 6, 2016 at 10:27 PM, Zhouyi Zhou <zhouzho...@gmail.com> wrote: > On Wed, Dec 7, 2016 at 1:02 PM, Cong Wang <xiyou.wangc...@gmail.com> wrote: >> On Mon, Dec 5, 2016 at 11:10 PM, Zhouyi Zhou <zhouzho...@gmail.com> wrote: >>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c >>> b/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c >>> index 2a653ec..ab787cb 100644 >>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c >>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c >>> @@ -490,7 +490,11 @@ int ixgbe_fcoe_ddp(struct ixgbe_adapter *adapter, >>> */ >>> if ((fh->fh_r_ctl == FC_RCTL_DD_SOL_DATA) && >>> (fctl & FC_FC_END_SEQ)) { >>> - skb_linearize(skb); >>> + int err = 0; >>> + >>> + err = skb_linearize(skb); >>> + if (err) >>> + return err; >> >> >> You can reuse 'rc' instead of adding 'err'. > rc here is meaningful for the length of data being ddped. If using rc > here, a successful > skb_linearize will assign rc to 0. Right, I thought it returns 0 on success. >> >> >> >>> crc = (struct fcoe_crc_eof *)skb_put(skb, sizeof(*crc)); >>> crc->fcoe_eof = FC_EOF_T; >>> } >>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c >>> b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c >>> index fee1f29..4926d48 100644 >>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c >>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c >>> @@ -2173,8 +2173,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector >>> *q_vector, >>> total_rx_bytes += ddp_bytes; >>> total_rx_packets += DIV_ROUND_UP(ddp_bytes, >>> mss); >>> - } >>> - if (!ddp_bytes) { >>> + } else { >>> dev_kfree_skb_any(skb); >>> continue; >>> } >> >> >> This piece doesn't seem to be related. > if ddp_bytes is negative there will be some error, I think the skb > should not pass to upper layer. You misunderstand my point, this return value is for ixgbe_fcoe_ddp() not skb_linearize(), you need to make it a separate patch because this patch, as in $subject, only fixes skb_linearize(). -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: return value of skb_linearize should be handled in Linux kernel
On Mon, Dec 5, 2016 at 11:10 PM, Zhouyi Zhouwrote: > diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c > b/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c > index 2a653ec..ab787cb 100644 > --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c > +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c > @@ -490,7 +490,11 @@ int ixgbe_fcoe_ddp(struct ixgbe_adapter *adapter, > */ > if ((fh->fh_r_ctl == FC_RCTL_DD_SOL_DATA) && > (fctl & FC_FC_END_SEQ)) { > - skb_linearize(skb); > + int err = 0; > + > + err = skb_linearize(skb); > + if (err) > + return err; You can reuse 'rc' instead of adding 'err'. > crc = (struct fcoe_crc_eof *)skb_put(skb, sizeof(*crc)); > crc->fcoe_eof = FC_EOF_T; > } > diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c > b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c > index fee1f29..4926d48 100644 > --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c > +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c > @@ -2173,8 +2173,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector > *q_vector, > total_rx_bytes += ddp_bytes; > total_rx_packets += DIV_ROUND_UP(ddp_bytes, > mss); > - } > - if (!ddp_bytes) { > + } else { > dev_kfree_skb_any(skb); > continue; > } This piece doesn't seem to be related. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html