Re: [PATCH] libsas: flush pending destruct work in sas_unregister_domain_devices()

2017-12-09 Thread Cong Wang
On Thu, Dec 7, 2017 at 11:54 PM, Jason Yan  wrote:
>
> We have sent a patchset to fix this and to enhance libsas hotplug.
> Please refer to https://lkml.org/lkml/2017/9/6/142
>
> And I'm going to send a new version soon.

Thanks for working on it! Please make sure they will be queued
for -stable too, since 3.14, 4.1 and 4.9 are all affected.


Re: [PATCH] libsas: flush pending destruct work in sas_unregister_domain_devices()

2017-12-07 Thread Cong Wang
On Thu, Dec 7, 2017 at 4:40 PM, Cong Wang <xiyou.wangc...@gmail.com> wrote:
> On Thu, Dec 7, 2017 at 2:57 PM, Cong Wang <xiyou.wangc...@gmail.com> wrote:
>> On Thu, Dec 7, 2017 at 5:37 AM, John Garry <john.ga...@huawei.com> wrote:
>>> On 28/11/2017 17:04, Cong Wang wrote:
>>>>
>>>> I don't understand, the only caller of sas_unregister_domain_devices()
>>>> is sas_deform_port().
>>>>
>>>
>>> And sas_deform_port() may be called from another worker on the same queue,
>>> right? As in sas_phye_loss_of_signal()->sas_deform_port()
>>
>> Oh, good catch! I didn't notice this subtle call path.
>>
>> Do you have any better idea to fix this? We saw this on 4.9 too.
>>
>
> I think we can just cancel the destruct work before calling
> sas_port_delete(). This should work even if it is called in
> another work.
>

This assumes sas_port_delete() could release resources recursively
in the hierarchy, this is true for sysfs but perhaps not true for other
resources...


Re: [PATCH] libsas: flush pending destruct work in sas_unregister_domain_devices()

2017-12-07 Thread Cong Wang
On Thu, Dec 7, 2017 at 2:57 PM, Cong Wang <xiyou.wangc...@gmail.com> wrote:
> On Thu, Dec 7, 2017 at 5:37 AM, John Garry <john.ga...@huawei.com> wrote:
>> On 28/11/2017 17:04, Cong Wang wrote:
>>>
>>> I don't understand, the only caller of sas_unregister_domain_devices()
>>> is sas_deform_port().
>>>
>>
>> And sas_deform_port() may be called from another worker on the same queue,
>> right? As in sas_phye_loss_of_signal()->sas_deform_port()
>
> Oh, good catch! I didn't notice this subtle call path.
>
> Do you have any better idea to fix this? We saw this on 4.9 too.
>

I think we can just cancel the destruct work before calling
sas_port_delete(). This should work even if it is called in
another work.

So does the attached (untested) patch make any sense now?
diff --git a/drivers/scsi/libsas/sas_discover.c 
b/drivers/scsi/libsas/sas_discover.c
index 60de66252fa2..bc512d65e2ca 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -565,6 +565,21 @@ int sas_discover_event(struct asd_sas_port *port, enum 
discover_event ev)
return 0;
 }
 
+static void sas_cancel_work(struct sas_work *sw)
+{
+   cancel_work_sync(>work);
+}
+
+void sas_cancel_event(struct asd_sas_port *port, enum discover_event ev)
+{
+   struct sas_discovery *disc;
+
+   if (!port)
+   return;
+   disc = >disc;
+   sas_cancel_work(>disc_work[ev].work);
+}
+
 /**
  * sas_init_disc -- initialize the discovery struct in the port
  * @port: pointer to struct port
diff --git a/drivers/scsi/libsas/sas_port.c b/drivers/scsi/libsas/sas_port.c
index d3c5297c6c89..89e37640e26c 100644
--- a/drivers/scsi/libsas/sas_port.c
+++ b/drivers/scsi/libsas/sas_port.c
@@ -219,6 +219,7 @@ void sas_deform_port(struct asd_sas_phy *phy, int gone)
 
if (port->num_phys == 1) {
sas_unregister_domain_devices(port, gone);
+   sas_cancel_event(port, DISCE_DESTRUCT);
sas_port_delete(port->port);
port->port = NULL;
} else {
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 6df6fe0c2198..5b8a7fadd9b4 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -680,6 +680,7 @@ int  sas_ex_revalidate_domain(struct domain_device *);
 void sas_unregister_domain_devices(struct asd_sas_port *port, int gone);
 void sas_init_disc(struct sas_discovery *disc, struct asd_sas_port *);
 int  sas_discover_event(struct asd_sas_port *, enum discover_event ev);
+void sas_cancel_event(struct asd_sas_port *port, enum discover_event ev);
 
 int  sas_discover_sata(struct domain_device *);
 int  sas_discover_end_dev(struct domain_device *);


Re: [PATCH] libsas: flush pending destruct work in sas_unregister_domain_devices()

2017-12-07 Thread Cong Wang
On Thu, Dec 7, 2017 at 5:37 AM, John Garry <john.ga...@huawei.com> wrote:
> On 28/11/2017 17:04, Cong Wang wrote:
>>
>> I don't understand, the only caller of sas_unregister_domain_devices()
>> is sas_deform_port().
>>
>
> And sas_deform_port() may be called from another worker on the same queue,
> right? As in sas_phye_loss_of_signal()->sas_deform_port()

Oh, good catch! I didn't notice this subtle call path.

Do you have any better idea to fix this? We saw this on 4.9 too.

>
> The device destruct takes place in a separate worker from which
> sas_deform_port() is called, but the same queue. So we have this queued
> destruct happen after the port is fully deformed -> hence the WARN.
>
> I guess you only tested your patch on disks attached through an expander.

I have very limited scsi hardware, so my testing is limited too.


Re: [PATCH] libsas: flush pending destruct work in sas_unregister_domain_devices()

2017-11-28 Thread Cong Wang
On Tue, Nov 28, 2017 at 3:18 AM, John Garry <john.ga...@huawei.com> wrote:
> On 28/11/2017 08:20, Johannes Thumshirn wrote:
>>
>> On Mon, Nov 27, 2017 at 04:24:45PM -0800, Cong Wang wrote:
>>>
>>> We saw dozens of the following kernel waring:
>>>
>>>  WARNING: CPU: 0 PID: 705 at fs/sysfs/group.c:224
>>> sysfs_remove_group+0x54/0x88()
>>>  sysfs group 81ab7670 not found for kobject '6:0:3:0'
>>>  Modules linked in: cpufreq_ondemand x86_pkg_temp_thermal coretemp
>>> kvm_intel kvm microcode raid0 iTCO_wdt iTCO_vendor_support sb_edac edac_core
>>> lpc_ich mfd_core ioatdma i2c_i801 shpchp wmi hed acpi_cpufreq lp parport
>>> tcp_diag inet_diag ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel igb ptp
>>> pps_core i2c_algo_bit i2c_core crc32c_intel isci libsas scsi_transport_sas
>>> dca ipv6
>>>  CPU: 0 PID: 705 Comm: kworker/u240:0 Not tainted 4.1.35.el7.x86_64 #1
>>
>>
>> This should by now be fixed with commit fbce4d97fd43 ("scsi: fixup kernel
>> warning during rmmod()" which went into v4.14-rc6.
>>
>
> Is that the same issue? I think Cong Wang is just trying to deal with the
> longstanding libsas hotplug WARN.

Right, we saw it on both 4.1 and 3.14, clearly an old bug.


>
> We at Huawei are still working to fix it. Our patchset is under internal
> test at the moment.
>
> As for this patch:
>>  drivers/scsi/libsas/sas_discover.c | 7 ++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/libsas/sas_discover.c
>> b/drivers/scsi/libsas/sas_discover.c
>> index 60de66252fa2..27c11fc7aa2b 100644
>> --- a/drivers/scsi/libsas/sas_discover.c
>> +++ b/drivers/scsi/libsas/sas_discover.c
>> @@ -388,6 +388,11 @@ void sas_unregister_dev(struct asd_sas_port *port,
>> struct domain_device *dev)
>>   }
>>  }
>>
>> +static void sas_flush_work(struct asd_sas_port *port)
>> +{
>> + scsi_flush_work(port->ha->core.shost);
>> +}
>> +
>>  void sas_unregister_domain_devices(struct asd_sas_port *port, int gone)
>>  {
>>   struct domain_device *dev, *n;
>> @@ -401,8 +406,8 @@ void sas_unregister_domain_devices(struct asd_sas_port
>> *port, int gone)
>>   list_for_each_entry_safe(dev, n, >disco_list, disco_list_node)
>>   sas_unregister_dev(port, dev);
>>
>> + sas_flush_work(port);
>
> How can this work as sas_unregister_domain_devices() may be called from the
> same workqueue which you're trying to flush?


I don't understand, the only caller of sas_unregister_domain_devices()
is sas_deform_port().


Re: [PATCH] libsas: flush pending destruct work in sas_unregister_domain_devices()

2017-11-28 Thread Cong Wang
On Tue, Nov 28, 2017 at 12:20 AM, Johannes Thumshirn <jthumsh...@suse.de> wrote:
> On Mon, Nov 27, 2017 at 04:24:45PM -0800, Cong Wang wrote:
>> We saw dozens of the following kernel waring:
>>
>>  WARNING: CPU: 0 PID: 705 at fs/sysfs/group.c:224 
>> sysfs_remove_group+0x54/0x88()
>>  sysfs group 81ab7670 not found for kobject '6:0:3:0'
>>  Modules linked in: cpufreq_ondemand x86_pkg_temp_thermal coretemp kvm_intel 
>> kvm microcode raid0 iTCO_wdt iTCO_vendor_support sb_edac edac_core lpc_ich 
>> mfd_core ioatdma i2c_i801 shpchp wmi hed acpi_cpufreq lp parport tcp_diag 
>> inet_diag ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel igb ptp pps_core 
>> i2c_algo_bit i2c_core crc32c_intel isci libsas scsi_transport_sas dca ipv6
>>  CPU: 0 PID: 705 Comm: kworker/u240:0 Not tainted 4.1.35.el7.x86_64 #1
>
> This should by now be fixed with commit fbce4d97fd43 ("scsi: fixup kernel
> warning during rmmod()" which went into v4.14-rc6.

I don't see the full backtrace in commit fbce4d97fd43, but it is probably
not rmmod path in our case.


[PATCH] libsas: flush pending destruct work in sas_unregister_domain_devices()

2017-11-27 Thread Cong Wang
We saw dozens of the following kernel waring:

 WARNING: CPU: 0 PID: 705 at fs/sysfs/group.c:224 sysfs_remove_group+0x54/0x88()
 sysfs group 81ab7670 not found for kobject '6:0:3:0'
 Modules linked in: cpufreq_ondemand x86_pkg_temp_thermal coretemp kvm_intel 
kvm microcode raid0 iTCO_wdt iTCO_vendor_support sb_edac edac_core lpc_ich 
mfd_core ioatdma i2c_i801 shpchp wmi hed acpi_cpufreq lp parport tcp_diag 
inet_diag ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel igb ptp pps_core 
i2c_algo_bit i2c_core crc32c_intel isci libsas scsi_transport_sas dca ipv6
 CPU: 0 PID: 705 Comm: kworker/u240:0 Not tainted 4.1.35.el7.x86_64 #1
 Hardware name: WIWYNN Lyra/JD/S2600GZ, BIOS 
SE5C600.86B.02.03.2004.030620151456 03/06/2015
 Workqueue: scsi_wq_6 sas_destruct_devices [libsas]
   88056c393ba8 81544a6d 88056c393bf8
  0009 88056c393be8 81069b4c 88081790d078
  811dad37  81ab7670 88081b29dc10
 Call Trace:
  [] dump_stack+0x4d/0x63
  [] warn_slowpath_common+0xa1/0xbb
  [] ? sysfs_remove_group+0x54/0x88
  [] warn_slowpath_fmt+0x46/0x48
  [] ? kernfs_find_and_get_ns+0x4d/0x58
  [] sysfs_remove_group+0x54/0x88
  [] dpm_sysfs_remove+0x50/0x55
  [] device_del+0x47/0x1ec
  [] ? mutex_unlock+0x16/0x18
  [] device_unregister+0x48/0x54
  [] bsg_unregister_queue+0x5f/0x86
  [] __scsi_remove_device+0x3a/0xc3
  [] scsi_remove_device+0x26/0x33
  [] scsi_remove_target+0x134/0x19b
  [] sas_rphy_remove+0x2c/0x72 [scsi_transport_sas]
  [] sas_rphy_delete+0x13/0x1f [scsi_transport_sas]
  [] sas_destruct_devices+0x58/0x79 [libsas]
  [] process_one_work+0x19b/0x2d1
  [] worker_thread+0x1dd/0x2bb
  [] ? cancel_delayed_work+0x72/0x72
  [] kthread+0xa5/0xad
  [] ? task_work_add+0xd/0x53
  [] ? __kthread_parkme+0x61/0x61
  [] ret_from_fork+0x42/0x70
  [] ? __kthread_parkme+0x61/0x61

It looks like we don't wait for sas destruct work properly
on tear down path, at least sas_deform_port() calls
sas_unregister_domain_devices() to schedule destruct work
to a workqueue and then calls sas_port_delete() to remove
the related sysfs files concurrently.

Dan tried to fix this with a different way:

 https://patchwork.kernel.org/patch/6450921/

but that patch is never applied. I take a better approach
as suggested by Johannes, that is waiting for pending destruct
work to remove child sysfs files and then removing the parent
sysfs files.

Cc: Dan Williams <dan.j.willi...@intel.com>
Cc: Johannes Thumshirn <jthumsh...@suse.de>
Cc: Praveen Murali <pmur...@logicube.com>
Cc: "James E.J. Bottomley" <j...@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.peter...@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Cong Wang <xiyou.wangc...@gmail.com>
---
 drivers/scsi/libsas/sas_discover.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/libsas/sas_discover.c 
b/drivers/scsi/libsas/sas_discover.c
index 60de66252fa2..27c11fc7aa2b 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -388,6 +388,11 @@ void sas_unregister_dev(struct asd_sas_port *port, struct 
domain_device *dev)
}
 }
 
+static void sas_flush_work(struct asd_sas_port *port)
+{
+   scsi_flush_work(port->ha->core.shost);
+}
+
 void sas_unregister_domain_devices(struct asd_sas_port *port, int gone)
 {
struct domain_device *dev, *n;
@@ -401,8 +406,8 @@ void sas_unregister_domain_devices(struct asd_sas_port 
*port, int gone)
list_for_each_entry_safe(dev, n, >disco_list, disco_list_node)
sas_unregister_dev(port, dev);
 
+   sas_flush_work(port);
port->port->rphy = NULL;
-
 }
 
 void sas_device_set_phy(struct domain_device *dev, struct sas_port *port)
-- 
2.13.0



Re: [PATCH] net: return value of skb_linearize should be handled in Linux kernel

2016-12-07 Thread Cong Wang
On Tue, Dec 6, 2016 at 10:27 PM, Zhouyi Zhou <zhouzho...@gmail.com> wrote:
> On Wed, Dec 7, 2016 at 1:02 PM, Cong Wang <xiyou.wangc...@gmail.com> wrote:
>> On Mon, Dec 5, 2016 at 11:10 PM, Zhouyi Zhou <zhouzho...@gmail.com> wrote:
>>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c 
>>> b/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
>>> index 2a653ec..ab787cb 100644
>>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
>>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
>>> @@ -490,7 +490,11 @@ int ixgbe_fcoe_ddp(struct ixgbe_adapter *adapter,
>>>  */
>>> if ((fh->fh_r_ctl == FC_RCTL_DD_SOL_DATA) &&
>>> (fctl & FC_FC_END_SEQ)) {
>>> -   skb_linearize(skb);
>>> +   int err = 0;
>>> +
>>> +   err = skb_linearize(skb);
>>> +   if (err)
>>> +   return err;
>>
>>
>> You can reuse 'rc' instead of adding 'err'.
> rc here is meaningful for the length of data being ddped. If using rc
> here, a successful
> skb_linearize will assign rc to 0.

Right, I thought it returns 0 on success.


>>
>>
>>
>>> crc = (struct fcoe_crc_eof *)skb_put(skb, sizeof(*crc));
>>> crc->fcoe_eof = FC_EOF_T;
>>> }
>>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
>>> b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>>> index fee1f29..4926d48 100644
>>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>>> @@ -2173,8 +2173,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector 
>>> *q_vector,
>>> total_rx_bytes += ddp_bytes;
>>> total_rx_packets += DIV_ROUND_UP(ddp_bytes,
>>>  mss);
>>> -   }
>>> -   if (!ddp_bytes) {
>>> +   } else {
>>> dev_kfree_skb_any(skb);
>>> continue;
>>> }
>>
>>
>> This piece doesn't seem to be related.
> if ddp_bytes is negative there will be some error, I think the skb
> should not pass to upper layer.

You misunderstand my point, this return value is for ixgbe_fcoe_ddp()
not skb_linearize(), you need to make it a separate patch because this
patch, as in $subject, only fixes skb_linearize().
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: return value of skb_linearize should be handled in Linux kernel

2016-12-06 Thread Cong Wang
On Mon, Dec 5, 2016 at 11:10 PM, Zhouyi Zhou  wrote:
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c 
> b/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
> index 2a653ec..ab787cb 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
> @@ -490,7 +490,11 @@ int ixgbe_fcoe_ddp(struct ixgbe_adapter *adapter,
>  */
> if ((fh->fh_r_ctl == FC_RCTL_DD_SOL_DATA) &&
> (fctl & FC_FC_END_SEQ)) {
> -   skb_linearize(skb);
> +   int err = 0;
> +
> +   err = skb_linearize(skb);
> +   if (err)
> +   return err;


You can reuse 'rc' instead of adding 'err'.



> crc = (struct fcoe_crc_eof *)skb_put(skb, sizeof(*crc));
> crc->fcoe_eof = FC_EOF_T;
> }
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
> b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index fee1f29..4926d48 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -2173,8 +2173,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector 
> *q_vector,
> total_rx_bytes += ddp_bytes;
> total_rx_packets += DIV_ROUND_UP(ddp_bytes,
>  mss);
> -   }
> -   if (!ddp_bytes) {
> +   } else {
> dev_kfree_skb_any(skb);
> continue;
> }


This piece doesn't seem to be related.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html