Re: [PATCHv2] tcmu: Add fifo type waiter list support to avoid starvation

2017-07-12 Thread Mike Christie
On 07/12/2017 02:16 AM, lixi...@cmss.chinamobile.com wrote:
> From: Xiubo Li 
> 
> The fifo type waiter list will hold the udevs who are waiting for the
> blocks from the data global pool. The unmap thread will try to feed the
> first udevs in waiter list, if the global free blocks available are
> not enough, it will stop traversing the list and abort waking up the
> others.
> 
> Signed-off-by: Xiubo Li 
>

Looks ok to me.

Reviewed-by: Mike Christie 



Re: [PATCH v3 5/7] libsas: add a new workqueue to run probe/destruct discovery event

2017-07-12 Thread wangyijing


在 2017/7/13 0:50, John Garry 写道:
> On 10/07/2017 08:06, Yijing Wang wrote:
>> Sometimes, we want sync libsas probe or destruct in sas discovery work,
>> like when libsas revalidate domain. We need to split probe and destruct
>> work from the scsi host workqueue.
>>
>> Signed-off-by: Yijing Wang 
>> CC: John Garry 
>> CC: Johannes Thumshirn 
>> CC: Ewan Milne 
>> CC: Christoph Hellwig 
>> CC: Tomas Henzl 
>> CC: Dan Williams 
>> ---
>>  drivers/scsi/libsas/sas_discover.c | 13 -
>>  drivers/scsi/libsas/sas_init.c |  8 
>>  include/scsi/libsas.h  |  1 +
>>  3 files changed, 21 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/libsas/sas_discover.c 
>> b/drivers/scsi/libsas/sas_discover.c
>> index 5d4a3a8..a25d648 100644
>> --- a/drivers/scsi/libsas/sas_discover.c
>> +++ b/drivers/scsi/libsas/sas_discover.c
>> @@ -559,7 +559,18 @@ static void sas_chain_work(struct sas_ha_struct *ha, 
>> struct sas_work *sw)
>>   * not racing against draining
>>   */
>>  sas_port_get(port);
>> -ret = scsi_queue_work(ha->core.shost, >work);
>> +/*
>> + * discovery event probe and destruct would be called in other
>> + * discovery event like discover domain and revalidate domain
>> + * events, in some cases, we need to sync execute probe and destruct
>> + * events, so run probe and destruct discover events except in a new
>> + * workqueue.
> 
> Can we just use ha->disc_q for all discovery events for simplicity? Would 
> this solve the disco mutex you try to solve in patch 7/7?

No, since we could queue sas discovery probe/destruct event in sas discovery 
work(like sas_revalidate_domain)

sas_revalidate_domain
sas_ex_revalidate_domain
sas_rediscover
sas_rediscover_dev
sas_unregister_devs_sas_addr
sas_unregister_dev
sas_discover_event(dev->port, 
DISCE_DESTRUCT)
sas_discover_new
sas_ex_discover_devices
sas_ex_discover_dev
sas_ex_discover_end_dev

sas_discover_end_dev

sas_discover_event(dev->port, DISCE_PROBE)

So we could not sync probe or destruct sas discovery event in 
sas_revalidate_domain if they are queued in a same workqueue,
or it would block for ever.


> 
>> + */
>> +if (ev->type == DISCE_PROBE || ev->type == DISCE_DESTRUCT)
>> +ret = queue_work(ha->disc_q, >work);
>> +else
>> +ret = scsi_queue_work(ha->core.shost, >work);
>> +
>>  if (ret != 1)
> 
> Uhh, we are mixing bool and int here... but we're forced to by 
> scsi_queue_work()

Eagle eye :), I could check the return value separately.

Thanks!
Yijing.


> 
>>  sas_port_put(port);
>>  }
>> diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
>> index 2f3b736..df1d78b 100644
>> --- a/drivers/scsi/libsas/sas_init.c
>> +++ b/drivers/scsi/libsas/sas_init.c
>> @@ -152,6 +152,13 @@ int sas_register_ha(struct sas_ha_struct *sas_ha)
>>  if (!sas_ha->event_q)
>>  goto Undo_ports;
>>
>> +snprintf(name, 64, "%s_disc_q", dev_name(sas_ha->dev));
>> +sas_ha->disc_q = create_singlethread_workqueue(name);
>> +if(!sas_ha->disc_q) {
>> +destroy_workqueue(sas_ha->event_q);
>> +goto Undo_ports;
>> +}
>> +
>>  INIT_LIST_HEAD(_ha->eh_done_q);
>>  INIT_LIST_HEAD(_ha->eh_ata_q);
>>
>> @@ -187,6 +194,7 @@ int sas_unregister_ha(struct sas_ha_struct *sas_ha)
>>  __sas_drain_work(sas_ha);
>>  mutex_unlock(_ha->drain_mutex);
>>  destroy_workqueue(sas_ha->event_q);
>> +destroy_workqueue(sas_ha->disc_q);
>>
>>  return 0;
>>  }
>> diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
>> index c2ef05e..4bcb9fe 100644
>> --- a/include/scsi/libsas.h
>> +++ b/include/scsi/libsas.h
>> @@ -406,6 +406,7 @@ struct sas_ha_struct {
>>  struct device *dev;  /* should be set */
>>  struct module *lldd_module; /* should be set */
>>  struct workqueue_struct *event_q;
>> +struct workqueue_struct *disc_q;
>>
>>  u8 *sas_addr;  /* must be set */
>>  u8 hashed_sas_addr[HASHED_SAS_ADDR_SIZE];
>>
> 
> 
> 
> .
> 



Re: [PATCH v3 6/7] libsas: add wait-complete support to sync discovery event

2017-07-12 Thread wangyijing


在 2017/7/12 21:51, John Garry 写道:
> On 10/07/2017 08:06, Yijing Wang wrote:
>>
>>  static void sas_chain_event(int event, unsigned long *pending,
>> @@ -592,9 +596,9 @@ int sas_discover_event(struct asd_sas_port *port, enum 
>> discover_event ev)
>>  {
>>  struct sas_discovery *disc;
>>
>> +disc = >disc;
>>  if (!port)
>>  return 0;
>> -disc = >disc;
>>
>>  BUG_ON(ev >= DISC_NUM_EVENTS);
>>
>> diff --git a/drivers/scsi/libsas/sas_expander.c 
>> b/drivers/scsi/libsas/sas_expander.c
>> index 570b2cb..9d26c28 1
> 
> I was just looking through the code and I noticed this, above. Is there a 
> specific reason to move the NULL check, or was it modified accidentally?
> 
> I mean, if port is NULL I don't think we would get as far as checking it as 
> we would have already de-referenced it.

Oh, sorry, it's a accidental change, good catch, thanks!

> 
> 
> 
> .
> 



Re: [PATCH v3 1/7] libsas: Use static sas event pool to appease sas event lost

2017-07-12 Thread wangyijing
 There is no special meaning for the pool size, if flutter of > 25 events, 
 notify sas events will return error, and the further step work is 
 depending on LLDD drivers.
 I hope libsas could do more work in this case, but now it seems a little 
 difficult, this patch may be a interim fix, until we find a perfect 
 solution.
>>>
>>> The principal of having a fixed-sized pool is ok, even though the pool size 
>>> needs more consideration.
>>>
>>> However my issue is how to handle pool exhaustion. For a start, relaying 
>>> info to the LLDD that the event notification failed is probably not the way 
>>> to go. I only now noticed "scsi: sas: scsi_queue_work can fail, so make 
>>> callers aware" made it into the kernel; as I mentioned in response to this 
>>> patch, the LLDD does not know how to handle this (and no LLDDs do actually 
>>> handle this).
>>>
>>> I would say it is better to shut down the PHY from libsas (As Dan mentioned 
>>> in the v1 series) when the pool exhausts, under the assumption that the PHY 
>>> has gone into some erroneous state. The user can later re-enable the PHY 
>>> from sysfs, if required.
>>
>> I considered this suggestion, and what I am worried about are, first if we 
>> disable phy once the sas event pool exhausts, it may hurt the pending sas 
>> event process which has been queued,
> 
> I don't see how it affects currently queued events - they should just be 
> processed normally. As for LLDD reporting events when the pool is exhausted, 
> they are just lost.

So if we disable a phy, it's nothing affect to the already queued sas event 
process, which including access the phy to find target device ?

> 
>> second, if phy was disabled, and no one trigger the reenable by sysfs, the 
>> LLDD has no way to post new sas phy events.
> 
> For the extreme scenario of pool becoming exhausted and PHY being disabled, 
> it should remain disabled until user takes some action to fix originating 
> problem.

So we should print explicit message to tell user what's happen and how to fix 
it.

Thanks!
Yijing.

> 
>>
>> Thanks!
>> Yijing.
>>
>>>
>>> Much appreciated,
>>> John
>>>

 Thanks!
 Yijing.

>
> Thanks,
> John
>
>
> .
>


 .

>>>
>>>
>>>
>>> .
>>>
>>
>>
>> .
>>
> 
> 
> 
> .
> 



Re: [PATCH v3 0/7] Enhance libsas hotplug feature

2017-07-12 Thread wangyijing


在 2017/7/12 17:59, John Garry 写道:
> On 10/07/2017 08:06, Yijing Wang wrote:
>> This patchset is based Johannes's patch
>> "scsi: sas: scsi_queue_work can fail, so make callers aware"
>>
>> Now the libsas hotplug has some issues, Dan Williams report
>> a similar bug here before
>> https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html
>>
>> The issues we have found
>> 1. if LLDD burst reports lots of phy-up/phy-down sas events, some events
>>may lost because a same sas events is pending now, finally libsas topo
>>may different the hardware.
>> 2. receive a phy down sas event, libsas call sas_deform_port to remove
>>devices, it would first delete the sas port, then put a destruction
>>discovery event in a new work, and queue it at the tail of workqueue,
>>once the sas port be deleted, its children device will be deleted too,
>>when the destruction work start, it will found the target device has
>>been removed, and report a sysfs warnning.
>> 3. since a hotplug process will be devided into several works, if a phy up
>>sas event insert into phydown works, like
>>destruction work  ---> PORTE_BYTES_DMAED (sas_form_port) 
>> >PHYE_LOSS_OF_SIGNAL
>>the hot remove flow would broken by PORTE_BYTES_DMAED event, it's not
>>we expected, and issues would occur.
>>
>> The first patch fix the sas events lost, and the second one introudce 
>> wait-complete
>> to fix the hotplug order issues.
>>
> 
> I quickly tested this for basic hotplug.
> 
> Before:
> root@(none)$ echo 0 > ./phy-0:6/sas_phy/phy-0:6/enable
> root@(none)$ echo 0 > ./phy-0:5/sas_phy/phy-0:5/enable
> root@(none)$ echo 0 > ./phy-0:4/sas_phy/phy-0:4/enable
> root@(none)$ echo 0 > ./phy-0:3/sas_phy/phy-0:3/enable
> root@(none)$ echo 0 > ./phy-0:3/sas_phy/phy-0:2/enable
> root@(none)$ echo 0 > ./phy-0:2/sas_phy/phy-0:2/enable
> root@(none)$ echo 0 > ./phy-0:1/sas_phy/phy-0:1/enable
> root@(none)$ echo 0 > ./phy-0:0/sas_phy/phy-0:0/enable
> root@(none)$ echo 0 > ./phy-0:7/sas_phy/phy-0:7/enable
> root@(none)$ [  102.570694] sysfs group 'power' not found for kobject 
> '0:0:7:0'
> [  102.577250] [ cut here ]
> [  102.581861] WARNING: CPU: 3 PID: 1740 at fs/sysfs/group.c:237 
> sysfs_remove_group+0x8c/0x94
> [  102.590110] Modules linked in:
> [  102.593154] CPU: 3 PID: 1740 Comm: kworker/u128:2 Not tainted 
> 4.12.0-rc1-00032-g3ab81fc #1907
> [  102.601664] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon D05 
> UEFI Nemo 1.7 RC3 06/23/2017
> [  102.610784] Workqueue: scsi_wq_0 sas_destruct_devices
> [  102.615822] task: 8017d4793400 task.stack: 8017b7e7
> [  102.621728] PC is at sysfs_remove_group+0x8c/0x94
> [  102.626419] LR is at sysfs_remove_group+0x8c/0x94
> [  102.631109] pc : [] lr : [] pstate: 
> 6045
> [  102.638490] sp : 8017b7e73b80
> [  102.641791] x29: 8017b7e73b80 x28: 8017db010800
> [  102.647091] x27: 08e27000 x26: 8017d43e6600
> [  102.652390] x25: 8017b828 x24: 0003
> [  102.657689] x23: 8017b78864b0 x22: 8017b784c988
> [  102.662988] x21: 8017b7886410 x20: 08ee9dd0
> [  102.668288] x19:  x18: 08a1b678
> [  102.673587] x17: 000e x16: 0007
> [  102.678886] x15:  x14: 00a3
> [  102.684185] x13: 0033 x12: 0028
> [  102.689484] x11: 08f3be58 x10: 
> [  102.694783] x9 : 043c x8 : 6f6b20726f662064
> [  102.700082] x7 : 08e29e08 x6 : 8017fbe34c50
> [  102.705382] x5 :  x4 : 
> [  102.710681] x3 :  x2 : 08e427e0
> [  102.715980] x1 :  x0 : 0033
> [  102.721279] ---[ end trace c216cc1451d5f7ec ]---
> [  102.725882] Call trace:
> [  102.728316] Exception stack(0x8017b7e739b0 to 0x8017b7e73ae0)
> [  102.734742] 39a0:    
> 0001
> [  102.742557] 39c0: 8017b7e73b80 08267c44 08bfa050 
> 
> [  102.750372] 39e0: 8017b78864b0 0003 8017b828 
> 8017d43e6600
> [  102.758188] 3a00: 08e27000 8017db010800 8017d4793400 
> 
> [  102.766003] 3a20: 8017b7e73b80 8017b7e73b80 8017b7e73b40 
> ffc8
> [  102.773818] 3a40: 8017b7e73a70 0810c12c 0033 
> 
> [  102.781633] 3a60: 08e427e0   
> 
> [  102.789449] 3a80: 8017fbe34c50 08e29e08 6f6b20726f662064 
> 043c
> [  102.797264] 3aa0:  08f3be58 0028 
> 0033
> [  102.805079] 3ac0: 00a3  0007 
> 000e
> [  102.812895] [] sysfs_remove_group+0x8c/0x94
> [  102.818628] [] dpm_sysfs_remove+0x58/0x68
> [  102.824188] [] 

Re: [PATCH v3 0/7] Enhance libsas hotplug feature

2017-07-12 Thread wangyijing


在 2017/7/12 17:59, John Garry 写道:
> On 10/07/2017 08:06, Yijing Wang wrote:
>> This patchset is based Johannes's patch
>> "scsi: sas: scsi_queue_work can fail, so make callers aware"
>>
>> Now the libsas hotplug has some issues, Dan Williams report
>> a similar bug here before
>> https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html
>>
>> The issues we have found
>> 1. if LLDD burst reports lots of phy-up/phy-down sas events, some events
>>may lost because a same sas events is pending now, finally libsas topo
>>may different the hardware.
>> 2. receive a phy down sas event, libsas call sas_deform_port to remove
>>devices, it would first delete the sas port, then put a destruction
>>discovery event in a new work, and queue it at the tail of workqueue,
>>once the sas port be deleted, its children device will be deleted too,
>>when the destruction work start, it will found the target device has
>>been removed, and report a sysfs warnning.
>> 3. since a hotplug process will be devided into several works, if a phy up
>>sas event insert into phydown works, like
>>destruction work  ---> PORTE_BYTES_DMAED (sas_form_port) 
>> >PHYE_LOSS_OF_SIGNAL
>>the hot remove flow would broken by PORTE_BYTES_DMAED event, it's not
>>we expected, and issues would occur.
>>
>> The first patch fix the sas events lost, and the second one introudce 
>> wait-complete
>> to fix the hotplug order issues.
>>
> 
> I quickly tested this for basic hotplug.
> 
> Before:
> root@(none)$ echo 0 > ./phy-0:6/sas_phy/phy-0:6/enable
> root@(none)$ echo 0 > ./phy-0:5/sas_phy/phy-0:5/enable
> root@(none)$ echo 0 > ./phy-0:4/sas_phy/phy-0:4/enable
> root@(none)$ echo 0 > ./phy-0:3/sas_phy/phy-0:3/enable
> root@(none)$ echo 0 > ./phy-0:3/sas_phy/phy-0:2/enable
> root@(none)$ echo 0 > ./phy-0:2/sas_phy/phy-0:2/enable
> root@(none)$ echo 0 > ./phy-0:1/sas_phy/phy-0:1/enable
> root@(none)$ echo 0 > ./phy-0:0/sas_phy/phy-0:0/enable
> root@(none)$ echo 0 > ./phy-0:7/sas_phy/phy-0:7/enable
> root@(none)$ [  102.570694] sysfs group 'power' not found for kobject 
> '0:0:7:0'
> [  102.577250] [ cut here ]
> [  102.581861] WARNING: CPU: 3 PID: 1740 at fs/sysfs/group.c:237 
> sysfs_remove_group+0x8c/0x94
> [  102.590110] Modules linked in:
> [  102.593154] CPU: 3 PID: 1740 Comm: kworker/u128:2 Not tainted 
> 4.12.0-rc1-00032-g3ab81fc #1907
> [  102.601664] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon D05 
> UEFI Nemo 1.7 RC3 06/23/2017
> [  102.610784] Workqueue: scsi_wq_0 sas_destruct_devices
> [  102.615822] task: 8017d4793400 task.stack: 8017b7e7
> [  102.621728] PC is at sysfs_remove_group+0x8c/0x94
> [  102.626419] LR is at sysfs_remove_group+0x8c/0x94
> [  102.631109] pc : [] lr : [] pstate: 
> 6045
> [  102.638490] sp : 8017b7e73b80
> [  102.641791] x29: 8017b7e73b80 x28: 8017db010800
> [  102.647091] x27: 08e27000 x26: 8017d43e6600
> [  102.652390] x25: 8017b828 x24: 0003
> [  102.657689] x23: 8017b78864b0 x22: 8017b784c988
> [  102.662988] x21: 8017b7886410 x20: 08ee9dd0
> [  102.668288] x19:  x18: 08a1b678
> [  102.673587] x17: 000e x16: 0007
> [  102.678886] x15:  x14: 00a3
> [  102.684185] x13: 0033 x12: 0028
> [  102.689484] x11: 08f3be58 x10: 
> [  102.694783] x9 : 043c x8 : 6f6b20726f662064
> [  102.700082] x7 : 08e29e08 x6 : 8017fbe34c50
> [  102.705382] x5 :  x4 : 
> [  102.710681] x3 :  x2 : 08e427e0
> [  102.715980] x1 :  x0 : 0033
> [  102.721279] ---[ end trace c216cc1451d5f7ec ]---
> [  102.725882] Call trace:
> [  102.728316] Exception stack(0x8017b7e739b0 to 0x8017b7e73ae0)
> [  102.734742] 39a0:    
> 0001
> [  102.742557] 39c0: 8017b7e73b80 08267c44 08bfa050 
> 
> [  102.750372] 39e0: 8017b78864b0 0003 8017b828 
> 8017d43e6600
> [  102.758188] 3a00: 08e27000 8017db010800 8017d4793400 
> 
> [  102.766003] 3a20: 8017b7e73b80 8017b7e73b80 8017b7e73b40 
> ffc8
> [  102.773818] 3a40: 8017b7e73a70 0810c12c 0033 
> 
> [  102.781633] 3a60: 08e427e0   
> 
> [  102.789449] 3a80: 8017fbe34c50 08e29e08 6f6b20726f662064 
> 043c
> [  102.797264] 3aa0:  08f3be58 0028 
> 0033
> [  102.805079] 3ac0: 00a3  0007 
> 000e
> [  102.812895] [] sysfs_remove_group+0x8c/0x94
> [  102.818628] [] dpm_sysfs_remove+0x58/0x68
> [  102.824188] [] 

[PATCH] libiscsi: Remove iscsi_destroy_session

2017-07-12 Thread Khazhismel Kumykov
iscsi_session_teardown was the only user of this function. Function
currently is just short for iscsi_remove_session + iscsi_free_session.

Signed-off-by: Khazhismel Kumykov 
---
 with "libiscsi: Fix use after free race during iscsi_session_teardown" 
removing the
 last user.

 drivers/scsi/scsi_transport_iscsi.c | 16 
 include/scsi/scsi_transport_iscsi.h |  1 -
 2 files changed, 17 deletions(-)

diff --git a/drivers/scsi/scsi_transport_iscsi.c 
b/drivers/scsi/scsi_transport_iscsi.c
index a424eaeafeb0..924ac408d8a9 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -2210,22 +2210,6 @@ void iscsi_free_session(struct iscsi_cls_session 
*session)
 }
 EXPORT_SYMBOL_GPL(iscsi_free_session);
 
-/**
- * iscsi_destroy_session - destroy iscsi session
- * @session: iscsi_session
- *
- * Can be called by a LLD or iscsi_transport. There must not be
- * any running connections.
- */
-int iscsi_destroy_session(struct iscsi_cls_session *session)
-{
-   iscsi_remove_session(session);
-   ISCSI_DBG_TRANS_SESSION(session, "Completing session destruction\n");
-   iscsi_free_session(session);
-   return 0;
-}
-EXPORT_SYMBOL_GPL(iscsi_destroy_session);
-
 /**
  * iscsi_create_conn - create iscsi class connection
  * @session: iscsi cls session
diff --git a/include/scsi/scsi_transport_iscsi.h 
b/include/scsi/scsi_transport_iscsi.h
index 6183d20a01fb..b266d2a3bcb1 100644
--- a/include/scsi/scsi_transport_iscsi.h
+++ b/include/scsi/scsi_transport_iscsi.h
@@ -434,7 +434,6 @@ extern struct iscsi_cls_session 
*iscsi_create_session(struct Scsi_Host *shost,
unsigned int target_id);
 extern void iscsi_remove_session(struct iscsi_cls_session *session);
 extern void iscsi_free_session(struct iscsi_cls_session *session);
-extern int iscsi_destroy_session(struct iscsi_cls_session *session);
 extern struct iscsi_cls_conn *iscsi_create_conn(struct iscsi_cls_session *sess,
int dd_size, uint32_t cid);
 extern int iscsi_destroy_conn(struct iscsi_cls_conn *conn);
-- 
2.13.2.932.g7449e964c-goog



smime.p7s
Description: S/MIME Cryptographic Signature


[PATCH] libiscsi: Fix use-after-free race during iscsi_session_teardown

2017-07-12 Thread Khazhismel Kumykov
Session attributes exposed through sysfs were freed before the device
was destroyed, resulting in a potential use-after-free. Free these
attributes after removing the device.

Signed-off-by: Khazhismel Kumykov 
---
 drivers/scsi/libiscsi.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 42381adf0769..f9199bebaec7 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -2851,9 +2851,6 @@ EXPORT_SYMBOL_GPL(iscsi_session_setup);
 /**
  * iscsi_session_teardown - destroy session, host, and cls_session
  * @cls_session: iscsi session
- *
- * The driver must have called iscsi_remove_session before
- * calling this.
  */
 void iscsi_session_teardown(struct iscsi_cls_session *cls_session)
 {
@@ -2863,6 +2860,8 @@ void iscsi_session_teardown(struct iscsi_cls_session 
*cls_session)
 
iscsi_pool_free(>cmdpool);
 
+   iscsi_remove_session(session);
+
kfree(session->password);
kfree(session->password_in);
kfree(session->username);
@@ -2877,7 +2876,8 @@ void iscsi_session_teardown(struct iscsi_cls_session 
*cls_session)
kfree(session->portal_type);
kfree(session->discovery_parent_type);
 
-   iscsi_destroy_session(cls_session);
+   iscsi_free_session(cls_session);
+
iscsi_host_dec_session_cnt(shost);
module_put(owner);
 }
-- 
2.13.2.932.g7449e964c-goog



smime.p7s
Description: S/MIME Cryptographic Signature


[RFC v3 1/2] scsi: generate uevent for SCSI sense code

2017-07-12 Thread Song Liu
This patch adds capability for SCSI layer to generate uevent for SCSI
sense code. The feature is gated by CONFIG_SCSI_SENSE_UEVENT.

We can configure which sense keys generate uevent for each device
through sysfs entry sense_event_filter, which is a bitmap of
"sense keys to generate uevent" For example, the following enables
uevent for MEDIUM_ERROR (0x03) and HARDWARE_ERROR (0x04) on scsi
drive sdc:

echo 0x000c > /sys/block/sdc/device/sense_event_filter

Here is an example output captured by udevadm:

KERNEL[587.353177] change   /devices/pci:00/XX
ACTION=change
CDB=\x88\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x08\x00\x00
DEVPATH=/devices/pci:00/:00:01.0/:01:00.0/host6/XX
DEVTYPE=scsi_device
DRIVER=sd
MODALIAS=scsi:t-0x00
SDEV_SENSE=1
SENSE_BUFFER=\x72\x03\x11\x14\x00\x00\x00\x34\x00\x0a\x80 
SENSE_CODE=3/11/14
SEQNUM=4796
SUBSYSTEM=scsi

Signed-off-by: Song Liu 
---
 drivers/scsi/Kconfig   | 14 +++
 drivers/scsi/scsi_error.c  | 43 ++
 drivers/scsi/scsi_lib.c| 58 +-
 drivers/scsi/scsi_sysfs.c  | 51 
 include/scsi/scsi_common.h |  6 +
 include/scsi/scsi_device.h | 27 -
 6 files changed, 197 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index d384f4f..0fb672b 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -226,6 +226,20 @@ config SCSI_LOGGING
  there should be no noticeable performance impact as long as you have
  logging turned off.
 
+config SCSI_SENSE_UEVENT
+   bool "SCSI sense code logging"
+   depends on SCSI
+   default n
+   ---help---
+ This turns on uevent for SCSI sense code.
+
+ You can configure which sense keys generate uevent for each device
+ through sysfs entry sense_event_filter. For example, the following
+ enables uevent for MEDIUM_ERROR (0x03) and HARDWARE_ERROR (0x04)
+ on scsi drive sdc:
+
+ echo 0x000c > /sys/block/sdc/device/sense_event_filter
+
 config SCSI_SCAN_ASYNC
bool "Asynchronous SCSI scanning"
depends on SCSI
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index ac31964..b8ef869 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -426,6 +426,48 @@ static void scsi_report_sense(struct scsi_device *sdev,
}
 }
 
+/*
+ * generate uevent when receiving sense code from device
+ */
+static void scsi_send_sense_uevent(struct scsi_device *sdev,
+  struct scsi_cmnd *scmd,
+  struct scsi_sense_hdr *sshdr)
+{
+#ifdef CONFIG_SCSI_SENSE_UEVENT
+   struct scsi_event *evt;
+   unsigned char sb_len;
+
+   if (!test_bit(sshdr->sense_key & 0xf,
+ >sense_event_filter))
+   return;
+   evt = sdev_evt_alloc(SDEV_EVT_SCSI_SENSE, GFP_ATOMIC);
+   if (!evt)
+   return;
+
+   evt->sense_evt_data.cmnd = kzalloc(scmd->cmd_len, GFP_ATOMIC);
+   if (!evt->sense_evt_data.cmnd)
+   goto alloc_cmd_fail;
+
+   sb_len = scsi_sense_data_length(scmd->sense_buffer);
+
+   evt->sense_evt_data.sense_buffer = kzalloc(sb_len, GFP_ATOMIC);
+   if (!evt->sense_evt_data.sense_buffer)
+   goto alloc_sense_fail;
+
+   evt->sense_evt_data.cmd_len = scmd->cmd_len;
+   evt->sense_evt_data.sb_len = sb_len;
+   memcpy(evt->sense_evt_data.cmnd, scmd->cmnd, scmd->cmd_len);
+   memcpy(evt->sense_evt_data.sense_buffer, scmd->sense_buffer, sb_len);
+
+   sdev_evt_send(sdev, evt);
+   return;
+alloc_sense_fail:
+   kfree(evt->sense_evt_data.cmnd);
+alloc_cmd_fail:
+   kfree(evt);
+#endif
+}
+
 /**
  * scsi_check_sense - Examine scsi cmd sense
  * @scmd:  Cmd to have sense checked.
@@ -446,6 +488,7 @@ int scsi_check_sense(struct scsi_cmnd *scmd)
return FAILED;  /* no valid sense data */
 
scsi_report_sense(sdev, );
+   scsi_send_sense_uevent(sdev, scmd, );
 
if (scsi_sense_is_deferred())
return NEEDS_RETRY;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 41c19c7..67cc0fb 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2677,7 +2677,15 @@ EXPORT_SYMBOL(scsi_device_set_state);
 static void scsi_evt_emit(struct scsi_device *sdev, struct scsi_event *evt)
 {
int idx = 0;
-   char *envp[3];
+   char *envp[5];  /* SDEV_EVT_SCSI_SENSE needs most entries (4) */
+#ifdef CONFIG_SCSI_SENSE_UEVENT
+   char *buf = NULL;
+   int i;
+   int buf_size;
+   int offset;
+   struct scsi_sense_hdr sshdr;
+#endif
+
 
switch (evt->evt_type) {
case SDEV_EVT_MEDIA_CHANGE:
@@ -2702,6 +2710,49 @@ static void scsi_evt_emit(struct scsi_device *sdev, 
struct scsi_event *evt)
case 

[RFC v3 0/2] generate uevent for SCSI sense code

2017-07-12 Thread Song Liu
This change is to follow up our discussion on event log for media
management during LSF/MM 2017.

Changes from RFC v2:
1. change rate limit of events from per device to per SCSI host
2. fix bugs from feedbacks

Please kindly let me know your thoughts on this.

Thanks,
Song

Song Liu (2):
  scsi: generate uevent for SCSI sense code
  scsi: add rate limit to scsi sense code uevent

 drivers/scsi/Kconfig   | 14 +++
 drivers/scsi/hosts.c   |  4 
 drivers/scsi/scsi_error.c  | 59 ++
 drivers/scsi/scsi_lib.c| 58 -
 drivers/scsi/scsi_sysfs.c  | 51 +++
 include/scsi/scsi_common.h |  6 +
 include/scsi/scsi_device.h | 27 -
 include/scsi/scsi_host.h   | 13 ++
 8 files changed, 230 insertions(+), 2 deletions(-)

--
2.9.3


[RFC v3 2/2] scsi: add rate limit to scsi sense code uevent

2017-07-12 Thread Song Liu
This patch adds rate limits to SCSI sense code uevets. Currently,
the rate limit is hard-coded to 64 events per second per Scsi_Host.

The code tracks nano second time of latest 64 events in a circular
buffer. When a new event arrives, the time is compared against the
latest time in the buffer. If the difference is smaller than one
second, the new event is dropped.

Signed-off-by: Song Liu 
---
 drivers/scsi/hosts.c  |  4 
 drivers/scsi/scsi_error.c | 16 
 include/scsi/scsi_host.h  | 13 +
 3 files changed, 33 insertions(+)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 831a1c8..219481b 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -408,6 +408,10 @@ struct Scsi_Host *scsi_host_alloc(struct 
scsi_host_template *sht, int privsize)
init_waitqueue_head(>host_wait);
mutex_init(>scan_mutex);
 
+#ifdef CONFIG_SCSI_SENSE_UEVENT
+   spin_lock_init(>latest_event_lock);
+#endif
+
index = ida_simple_get(_index_ida, 0, 0, GFP_KERNEL);
if (index < 0)
goto fail_kfree;
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index b8ef869..a0f5aab 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -436,10 +436,26 @@ static void scsi_send_sense_uevent(struct scsi_device 
*sdev,
 #ifdef CONFIG_SCSI_SENSE_UEVENT
struct scsi_event *evt;
unsigned char sb_len;
+   struct Scsi_Host *shost = sdev->host;
+   unsigned long flags;
+   u64 time_ns;
 
if (!test_bit(sshdr->sense_key & 0xf,
  >sense_event_filter))
return;
+
+   time_ns = ktime_to_ns(ktime_get());
+   spin_lock_irqsave(>latest_event_lock, flags);
+   if (time_ns - shost->latest_event_times[shost->latest_event_idx] <
+   NSEC_PER_SEC) {
+   spin_unlock_irqrestore(>latest_event_lock, flags);
+   return;
+   }
+   shost->latest_event_times[shost->latest_event_idx] = time_ns;
+   shost->latest_event_idx = (shost->latest_event_idx + 1) %
+   MAX_SENSE_EVENT_PER_SECOND;
+   spin_unlock_irqrestore(>latest_event_lock, flags);
+
evt = sdev_evt_alloc(SDEV_EVT_SCSI_SENSE, GFP_ATOMIC);
if (!evt)
return;
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index afb0481..8aacb15 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -731,6 +731,19 @@ struct Scsi_Host {
 */
struct device *dma_dev;
 
+#ifdef CONFIG_SCSI_SENSE_UEVENT
+#define MAX_SENSE_EVENT_PER_SECOND 64
+   /*
+* To rate limit uevents to MAX_SENSE_EVENT_PER_SECOND, we track
+* nano second time of MAX_SENSE_EVENT_PER_SECOND most recent
+* events. If there are already MAX_SENSE_EVENT_PER_SECOND in the
+* past seconds, new event is dropped.
+*/
+   u64 latest_event_times[MAX_SENSE_EVENT_PER_SECOND];
+   int latest_event_idx;
+   spinlock_t  latest_event_lock;
+#endif
+
/*
 * We should ensure that this is aligned, both for better performance
 * and also because some compilers (m68k) don't automatically force
-- 
2.9.3



Dear friend,

2017-07-12 Thread Smith Awala

Hi friend

I am a banker in IDB BANK. I want to transfer an abandoned sum of USD15.
6Million to your Bank account. 40/percent will be your share. No risk involved 
but keeps it as secret. Contact me for more details. Please reply me through my 
alternative email id only (mrsmithawl...@gmail.com) for confidential reasons.

Best Regards,
Mr.Smith Awali.



Re: [PATCH v7 0/6] g_NCR5380: PDMA fixes and cleanup

2017-07-12 Thread Martin K. Petersen

Finn,

> Ondrej, would you please test this new series?

Applied to 4.14/scsi-queue. Thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH 0/7] aacraid: split off EH functions

2017-07-12 Thread Martin K. Petersen

Hannes,

> this patchset is a split off from the original patchset to rework SCSI
> EH function parameters. As it's pretty much independent and an
> improvement to the existing EH callback function, so I'm posting it
> separately.

Applied to 4.14/scsi-queue. Thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v2 2/2] smartpqi: limit transfer length to 1MB

2017-07-12 Thread Martin K. Petersen

Don,

> The smartpqi firmware will bypass the cache for any request larger
> than 1MB, so we should cap the request size to avoid any
> performance degradation in kernels later than v4.3

Ping on these...

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: libfc: pass an error pointer to fc_disc_error()

2017-07-12 Thread Martin K. Petersen

Dan,

> This patch is basically to silence a static checker warning.
>
> drivers/scsi/libfc/fc_disc.c:326 fc_disc_error()
> warn: passing a valid pointer to 'PTR_ERR'
>
> It doesn't affect runtime because it treats -ENOMEM and a valid pointer
> the same.  But the documentation says we should be passing an error
> pointer.

Applied to 4.13/scsi-fixes. Thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: hisi_sas: make several const arrays static

2017-07-12 Thread Martin K. Petersen

Colin,

> Don't populate various tables on the stack but make them static const.
> Makes the object code smaller by over 280 bytes:

Applied to 4.13/scsi-fixes.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: qla2xxx: Off by one in qlt_ctio_to_cmd()

2017-07-12 Thread Martin K. Petersen

Dan,

> There are "req->num_outstanding_cmds" elements in the
> req->outstanding_cmds[] array so the > here should be >=.

Applied to 4.13/scsi-fixes. Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v2] scsi: sg: fix SG_DXFER_FROM_DEV transfers

2017-07-12 Thread Martin K. Petersen

Johannes,

> SG_DXFER_FROM_DEV transfers do not necessarily have a dxferp as we set
> it to NULL for the old sg_io read/write interface, but must have a
> length bigger than 0. This fixes a regression introduced by commit
> 28676d869bbb ("scsi: sg: check for valid direction before starting the
> request")

Applied to 4.13/scsi-fixes. Thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] virtio_scsi: always read VPD pages for multiqueue too

2017-07-12 Thread Martin K. Petersen

Paolo,

> Multi-queue virtio-scsi uses a different scsi_host_template struct.
> Add the .device_alloc field there, too.

Applied to 4.13/scsi-fixes by hand. Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH][V2] scsi: qedf: fix spelling mistake: "offlading" -> "offloading"

2017-07-12 Thread Martin K. Petersen

Colin,

> Trivial fix to spelling mistake in QEDF_INFO message and 
> remove duplicated "since" (thanks to Tyrel Datwyler for spotting
> the latter issue).

Applied to 4.13/scsi-fixes.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: qedi: fix another spelling mistake: "alloction" -> "allocation"

2017-07-12 Thread Martin K. Petersen

Colin,

> Trivial fix to spelling mistake in QEDF_ERR message.  I should have
> also included this in a previous fix, but I only just spotted this one.

Applied to 4.13/scsi-fixes.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: isci: fix typo in function names

2017-07-12 Thread Martin K. Petersen

Colin,

> There are a couple of typos in function names and spelling of request
> where the letters u and e are swapped:
>
> scu_ssp_reqeust_construct_task_context
> scu_sata_reqeust_construct_task_context
>
> Fix the spelling of request.

Applied to 4.13/scsi-fixes.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] cxlflash: return -EFAULT if copy_from_user() fails

2017-07-12 Thread Martin K. Petersen

Dan,

> The copy_from/to_user() functions return the number of bytes remaining
> to be copied but we had intended to return -EFAULT here.
>
> Fixes: bc88ac47d5cb ("scsi: cxlflash: Support AFU debug")
> Signed-off-by: Dan Carpenter 

Applied to 4.13/scsi-fixes, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v1 1/1] qedi: Add support for Boot from SAN over iSCSI offload

2017-07-12 Thread Martin K. Petersen

Nilesh,

> This patch adds support for Boot from SAN over iSCSI offload.
> The iSCSI boot information in the NVRAM is populated under
> /sys/firmware/iscsi_bootX/ using qed NVM-image reading API and
> further exported to open-iscsi to perform iSCSI login enabling
> boot over offload iSCSI interface in a Boot from SAN environment.

Applied to 4.13/scsi-fixes. Thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] tcmu: Fix possible to/from address overflow when doing the memcpy

2017-07-12 Thread Mike Christie
On 07/12/2017 02:51 AM, lixi...@cmss.chinamobile.com wrote:
> From: Xiubo Li 
> 
> For most case the sg->length equals to PAGE_SIZE, so this bug won't
> be triggered. Otherwise this will crash the kernel, for example when
> all segments' sg->length equal to 1K.
> 
> Signed-off-by: Xiubo Li 
> ---
>  drivers/target/target_core_user.c | 11 +--
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/target/target_core_user.c 
> b/drivers/target/target_core_user.c
> index 8bf0823..9030c2a 100644
> --- a/drivers/target/target_core_user.c
> +++ b/drivers/target/target_core_user.c
> @@ -590,8 +590,6 @@ static int scatter_data_area(struct tcmu_dev *udev,
>   block_remaining);
>   to_offset = get_block_offset_user(udev, dbi,
>   block_remaining);
> - offset = DATA_BLOCK_SIZE - block_remaining;
> - to += offset;
>  
>   if (*iov_cnt != 0 &&
>   to_offset == iov_tail(*iov)) {
> @@ -602,8 +600,10 @@ static int scatter_data_area(struct tcmu_dev *udev,
>   (*iov)->iov_len = copy_bytes;
>   }
>   if (copy_data) {
> - memcpy(to, from + sg->length - sg_remaining,
> - copy_bytes);
> + offset = DATA_BLOCK_SIZE - block_remaining;
> + memcpy(to + offset,
> +from + sg->length - sg_remaining,
> +copy_bytes);
>   tcmu_flush_dcache_range(to, copy_bytes);
>   }
>   sg_remaining -= copy_bytes;
> @@ -664,9 +664,8 @@ static void gather_data_area(struct tcmu_dev *udev, 
> struct tcmu_cmd *cmd,
>   copy_bytes = min_t(size_t, sg_remaining,
>   block_remaining);
>   offset = DATA_BLOCK_SIZE - block_remaining;
> - from += offset;
>   tcmu_flush_dcache_range(from, copy_bytes);
> - memcpy(to + sg->length - sg_remaining, from,
> + memcpy(to + sg->length - sg_remaining, from + offset,
>   copy_bytes);
>  
>   sg_remaining -= copy_bytes;
> 

Nice.

Reviewed-by: Mike Christie 


Re: CPU lock-ups with 4.12.0+ kernels related to usb_storage

2017-07-12 Thread Christoph Hellwig
On Wed, Jul 12, 2017 at 12:10:02PM -0400, Alan Stern wrote:
> This is pretty conclusive.  The problem comes about because
> usb_stor_control_thread() calls scsi_mq_done() while holding
> shost->host_lock, and then scsi_eh_scmd_add() tries to acquire that
> same lock.
> 
> I don't know why this didn't show up in earlier kernels.  I guess some
> element of the call chain listed above must be new in 4.12.
> 
> Christoph, what's the best way to fix this?  Should usb-storage release
> the host lock before issuing the ->scsi_done callback?  If so, does
> that change need to be applied to any kernels before 4.12?

4.12 switched to blk-mq by default, and while the old code used
a softirq for completions, which is always a difference context
the blk-mq code might execute in the same context it's called in.

So yes, for that we'd need to drop host_lock.  But I wonder how
many more of these are lingering somewhere and if we can find
another workaround.


Re: [PATCH v3 5/7] libsas: add a new workqueue to run probe/destruct discovery event

2017-07-12 Thread John Garry

On 10/07/2017 08:06, Yijing Wang wrote:

Sometimes, we want sync libsas probe or destruct in sas discovery work,
like when libsas revalidate domain. We need to split probe and destruct
work from the scsi host workqueue.

Signed-off-by: Yijing Wang 
CC: John Garry 
CC: Johannes Thumshirn 
CC: Ewan Milne 
CC: Christoph Hellwig 
CC: Tomas Henzl 
CC: Dan Williams 
---
 drivers/scsi/libsas/sas_discover.c | 13 -
 drivers/scsi/libsas/sas_init.c |  8 
 include/scsi/libsas.h  |  1 +
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/libsas/sas_discover.c 
b/drivers/scsi/libsas/sas_discover.c
index 5d4a3a8..a25d648 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -559,7 +559,18 @@ static void sas_chain_work(struct sas_ha_struct *ha, 
struct sas_work *sw)
 * not racing against draining
 */
sas_port_get(port);
-   ret = scsi_queue_work(ha->core.shost, >work);
+   /*
+* discovery event probe and destruct would be called in other
+* discovery event like discover domain and revalidate domain
+* events, in some cases, we need to sync execute probe and destruct
+* events, so run probe and destruct discover events except in a new
+* workqueue.


Can we just use ha->disc_q for all discovery events for simplicity? 
Would this solve the disco mutex you try to solve in patch 7/7?



+*/
+   if (ev->type == DISCE_PROBE || ev->type == DISCE_DESTRUCT)
+   ret = queue_work(ha->disc_q, >work);
+   else
+   ret = scsi_queue_work(ha->core.shost, >work);
+
if (ret != 1)


Uhh, we are mixing bool and int here... but we're forced to by 
scsi_queue_work()



sas_port_put(port);
 }
diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
index 2f3b736..df1d78b 100644
--- a/drivers/scsi/libsas/sas_init.c
+++ b/drivers/scsi/libsas/sas_init.c
@@ -152,6 +152,13 @@ int sas_register_ha(struct sas_ha_struct *sas_ha)
if (!sas_ha->event_q)
goto Undo_ports;

+   snprintf(name, 64, "%s_disc_q", dev_name(sas_ha->dev));
+   sas_ha->disc_q = create_singlethread_workqueue(name);
+   if(!sas_ha->disc_q) {
+   destroy_workqueue(sas_ha->event_q);
+   goto Undo_ports;
+   }
+
INIT_LIST_HEAD(_ha->eh_done_q);
INIT_LIST_HEAD(_ha->eh_ata_q);

@@ -187,6 +194,7 @@ int sas_unregister_ha(struct sas_ha_struct *sas_ha)
__sas_drain_work(sas_ha);
mutex_unlock(_ha->drain_mutex);
destroy_workqueue(sas_ha->event_q);
+   destroy_workqueue(sas_ha->disc_q);

return 0;
 }
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index c2ef05e..4bcb9fe 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -406,6 +406,7 @@ struct sas_ha_struct {
struct device *dev;   /* should be set */
struct module *lldd_module; /* should be set */
struct workqueue_struct *event_q;
+   struct workqueue_struct *disc_q;

u8 *sas_addr; /* must be set */
u8 hashed_sas_addr[HASHED_SAS_ADDR_SIZE];






Re: CPU lock-ups with 4.12.0+ kernels related to usb_storage

2017-07-12 Thread Alan Stern
On Thu, 13 Jul 2017, Arthur Marsh wrote:

> Thanks for the advice, I've enabled CONFIG_FRAME_POINTER and managed to 
> catch 1 more log triggered by issuing a
> 
> blkid
> 
> command after inserting a USB stick.
> 
> The problem is that I can't be certain of triggering the problem when 
> testing kernels using git-bisect, so my git-bisect efforts are not 
> guaranteed to get a result.

> Jul 12 18:51:00 localhost kernel: [ 1206.802306] NMI watchdog: Watchdog 
> detected hard LOCKUP on cpu 1
> Jul 12 18:51:00 localhost kernel: [ 1206.802308] Modules linked in: cmac arc4 
> ecb md4 nls_utf8 cifs ccm dns_resolver fscache bnep bluetooth hmac drbg 
> ansi_cprng ecdh_generic nfc rfkill cpufreq_userspace cpufreq_conservative 
> snd_hrtimer cpufreq_powersave binfmt_misc fuse snd_emu10k1_synth 
> snd_emux_synth snd_seq_midi_emul snd_seq_virmidi snd_seq_midi_event snd_seq 
> max6650 ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp 
> libiscsi scsi_transport_iscsi parport_pc ppdev lp parport ir_lirc_codec 
> lirc_dev rtl2832_sdr videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 
> videobuf2_core videodev media fc0012 rtl2832 i2c_mux dvb_usb_rtl28xxu 
> dvb_usb_v2 amdkfd dvb_core rc_core edac_mce_amd radeon kvm_amd kvm irqbypass 
> wmi_bmof snd_hda_codec_hdmi snd_emu10k1 snd_hda_intel ttm sg drm_kms_helper 
> snd_hda_codec snd_util_mem snd_hda_core
> Jul 12 18:51:00 localhost kernel: [ 1206.802336]  snd_ac97_codec snd_rawmidi 
> snd_seq_device snd_hwdep snd_pcm_oss drm snd_mixer_oss ac97_bus snd_pcm 
> pcspkr evdev k10temp i2c_algo_bit serio_raw snd_timer emu10k1_gp snd gameport 
> sp5100_tco soundcore button acpi_cpufreq asus_atk0110 wmi ext4 mbcache crc16 
> jbd2 crc32c_generic btrfs raid6_pq xor uas usb_storage sr_mod cdrom sd_mod 
> hid_generic usbhid hid ata_generic ohci_pci ahci firewire_ohci libahci 
> pata_atiixp firewire_core crc_itu_t i2c_piix4 ehci_pci ohci_hcd libata 
> ehci_hcd usbcore scsi_mod r8169 mii
> Jul 12 18:51:00 localhost kernel: [ 1206.802359] CPU: 1 PID: 142 Comm: 
> usb-storage Not tainted 4.12.0+ #2751
> Jul 12 18:51:00 localhost kernel: [ 1206.802359] Hardware name: System 
> manufacturer System Product Name/M3A78 PRO, BIOS 170101/27/2011
> Jul 12 18:51:00 localhost kernel: [ 1206.802361] task: 927c610eef80 
> task.stack: ba9501b78000
> Jul 12 18:51:00 localhost kernel: [ 1206.802366] RIP: 
> 0010:native_queued_spin_lock_slowpath+0x159/0x1c0
> Jul 12 18:51:00 localhost kernel: [ 1206.802366] RSP: 0018:ba9501b7bdc8 
> EFLAGS: 0002
> Jul 12 18:51:00 localhost kernel: [ 1206.802368] RAX: 0101 RBX: 
> 927c662f3030 RCX: 0001
> Jul 12 18:51:00 localhost kernel: [ 1206.802368] RDX: 0101 RSI: 
> 0001 RDI: 927c662f3030
> Jul 12 18:51:00 localhost kernel: [ 1206.802369] RBP: ba9501b7bde0 R08: 
> 0101 R09: 0100
> Jul 12 18:51:00 localhost kernel: [ 1206.802370] R10:  R11: 
> 7fff R12: 0082
> Jul 12 18:51:00 localhost kernel: [ 1206.802370] R13: 2003 R14: 
> 1d4c R15: 927c610eef80
> Jul 12 18:51:00 localhost kernel: [ 1206.802371] FS:  () 
> GS:927c6fc4() knlGS:
> Jul 12 18:51:00 localhost kernel: [ 1206.802372] CS:  0010 DS:  ES:  
> CR0: 80050033
> Jul 12 18:51:00 localhost kernel: [ 1206.802373] CR2: 0030007995d8 CR3: 
> 00017b144000 CR4: 06e0
> Jul 12 18:51:00 localhost kernel: [ 1206.802374] Call Trace:
> Jul 12 18:51:00 localhost kernel: [ 1206.802378]  ? 
> _raw_spin_lock_irqsave+0x59/0x68
> Jul 12 18:51:00 localhost kernel: [ 1206.802389]  scsi_eh_scmd_add+0x3f/0x150 
> [scsi_mod]
> Jul 12 18:51:00 localhost kernel: [ 1206.802393]  
> scsi_softirq_done+0xb5/0x150 [scsi_mod]
> Jul 12 18:51:00 localhost kernel: [ 1206.802396]  
> __blk_mq_complete_request+0xd4/0x160
> Jul 12 18:51:00 localhost kernel: [ 1206.802397]  
> blk_mq_complete_request+0x18/0x20
> Jul 12 18:51:00 localhost kernel: [ 1206.802402]  scsi_mq_done+0x21/0x80 
> [scsi_mod]
> Jul 12 18:51:00 localhost kernel: [ 1206.802405]  
> usb_stor_control_thread+0xf4/0x250 [usb_storage]
> Jul 12 18:51:00 localhost kernel: [ 1206.802407]  kthread+0x125/0x140
> Jul 12 18:51:00 localhost kernel: [ 1206.802409]  ? 
> fill_inquiry_response+0x20/0x20 [usb_storage]
> Jul 12 18:51:00 localhost kernel: [ 1206.802410]  ? 
> kthread_create_on_node+0x70/0x70
> Jul 12 18:51:00 localhost kernel: [ 1206.802412]  ret_from_fork+0x25/0x30
> Jul 12 18:51:00 localhost kernel: [ 1206.802413] Code: c2 74 a2 89 c2 89 d0 
> 66 31 c0 39 c6 74 ea 4d 85 c9 c6 07 01 74 21 41 c7 41 08 01 00 00 00 eb 85 41 
> ff c9 75 04 f3 c3 f3 90 8b 07 <84> c0 75 f8 66 c7 07 01 00 c3 f3 90 4c 8b 09 
> 4d 85 c9 74 f6 eb 

This is pretty conclusive.  The problem comes about because
usb_stor_control_thread() calls scsi_mq_done() while holding
shost->host_lock, and then scsi_eh_scmd_add() tries to acquire that
same 

Re: [PATCH] scsi: default to scsi-mq

2017-07-12 Thread Jonathan Cameron
On Wed, 12 Jul 2017 14:18:14 +
Bart Van Assche  wrote:

> On Wed, 2017-07-12 at 09:26 +0100, John Garry wrote:
> > > > What block driver controls the block device for which the performance
> > > > regression
> > > > has been observed? How many hardware queues were created by that block
> > > > driver
> > > > (see also /sys/block/*/mq/...)?  
> > 
> > Just confirming that we have only 1 queue:
> > /sys/block/sdc/mq/0 as example  
> 
> Hello John,
> 
> Can you also check the I/O scheduler that has been selected?
> 
> Thanks,
> 
> Bart.
Hi Bart,

Original numbers were with deadline-mq (not deliberately specified - so the
default for this setup)  To flesh them out a bit I've
run the equivalent test with all the options on today's linux next.

iodepth=2048, 4k blocks read only 6 disks 6 processes.

SMMU disabled for now due to ongoing work to reduce it's impact.

none : 716k IOPS
mq-deadline : 305k IOPS
kyber: 321k IOPS

noop, scsi_mq disable using the kernel commandline option: 937k IOPS

Thanks,

Jonathan



Re: [PATCH 2/6] SCSI: use blk_mq_run_hw_queues() in scsi_kick_queue()

2017-07-12 Thread Bart Van Assche
On Wed, 2017-07-12 at 11:15 +0800, Ming Lei wrote:
> On Tue, Jul 11, 2017 at 07:57:53PM +, Bart Van Assche wrote:
> > On Wed, 2017-07-12 at 02:20 +0800, Ming Lei wrote:
> > > Now SCSI won't stop queue, and not necessary to use
> > > blk_mq_start_hw_queues(), so switch to blk_mq_run_hw_queues()
> > > instead.
> > > 
> > > Cc: "James E.J. Bottomley" 
> > > Cc: "Martin K. Petersen" 
> > > Cc: linux-scsi@vger.kernel.org
> > > Signed-off-by: Ming Lei 
> > > ---
> > >  drivers/scsi/scsi_lib.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > > index f6097b89d5d3..91d890356b78 100644
> > > --- a/drivers/scsi/scsi_lib.c
> > > +++ b/drivers/scsi/scsi_lib.c
> > > @@ -333,7 +333,7 @@ void scsi_device_unbusy(struct scsi_device *sdev)
> > >  static void scsi_kick_queue(struct request_queue *q)
> > >  {
> > >   if (q->mq_ops)
> > > - blk_mq_start_hw_queues(q);
> > > + blk_mq_run_hw_queues(q, false);
> > >   else
> > >   blk_run_queue(q);
> > >  }
> > 
> > Hello Ming,
> > 
> > Now that we have separate flags to represent the "stopped" and "quiesced"
> > states, wouldn't it be better not to modify scsi_kick_queue() but instead to
> > stop a SCSI hardware queue again if scsi_queue_rq() returns 
> > BLK_STS_RESOURCE?
> > See also commits 36e3cf273977 ("scsi: Avoid that SCSI queues get stuck") and
> > commit 52d7f1b5c2f3 ("blk-mq: Avoid that requeueing starts stopped queues").
> 
> As you can see in the following patches, all stop/start queue APIs will
> be removed, and the 'stopped' state will become a internal state.
> 
> It doesn't make sense to clear 'stopped' for scsi anymore, since the
> queue won't be stopped actually. So we need this change.

Hello Ming,

As Jens already noticed, this approach won't work properly for concurrent I/O
to multiple LUNs associated with the same SCSI host. This approach won't work
properly for dm-mpath either. Sorry but I'm not convinced that it's possible
to replace the stop/start queue API for all block drivers by an algorithm
that is based on estimating the queue depth.

Bart.

Re: [PATCH] scsi: default to scsi-mq

2017-07-12 Thread Bart Van Assche
On Wed, 2017-07-12 at 09:26 +0100, John Garry wrote:
> > > What block driver controls the block device for which the performance
> > > regression
> > > has been observed? How many hardware queues were created by that block
> > > driver
> > > (see also /sys/block/*/mq/...)?
> 
> Just confirming that we have only 1 queue:
> /sys/block/sdc/mq/0 as example

Hello John,

Can you also check the I/O scheduler that has been selected?

Thanks,

Bart.

Re: [PATCH v3 6/7] libsas: add wait-complete support to sync discovery event

2017-07-12 Thread John Garry

On 10/07/2017 08:06, Yijing Wang wrote:


 static void sas_chain_event(int event, unsigned long *pending,
@@ -592,9 +596,9 @@ int sas_discover_event(struct asd_sas_port *port, enum 
discover_event ev)
 {
struct sas_discovery *disc;

+   disc = >disc;
if (!port)
return 0;
-   disc = >disc;

BUG_ON(ev >= DISC_NUM_EVENTS);

diff --git a/drivers/scsi/libsas/sas_expander.c 
b/drivers/scsi/libsas/sas_expander.c
index 570b2cb..9d26c28 1


I was just looking through the code and I noticed this, above. Is there 
a specific reason to move the NULL check, or was it modified accidentally?


I mean, if port is NULL I don't think we would get as far as checking it 
as we would have already de-referenced it.





Re: [PATCH v3 0/7] Enhance libsas hotplug feature

2017-07-12 Thread Johannes Thumshirn
On Wed, Jul 12, 2017 at 10:59:27AM +0100, John Garry wrote:
> After:
> ...
> root@(none)$ echo 0 > ./phy-0:7/sas_phy/phy-0:7/enable
> root@(none)$ [  446.193336] hisi_sas_v2_hw HISI0162:01: found dev[8:1] is
> gone
> [  446.249205] hisi_sas_v2_hw HISI0162:01: found dev[7:1] is gone
> [  446.325201] hisi_sas_v2_hw HISI0162:01: found dev[6:1] is gone
> [  446.373189] hisi_sas_v2_hw HISI0162:01: found dev[5:1] is gone
> [  446.421187] hisi_sas_v2_hw HISI0162:01: found dev[4:1] is gone
> [  446.457232] hisi_sas_v2_hw HISI0162:01: found dev[3:1] is gone
> [  446.477151] sd 0:0:1:0: [sdb] Synchronizing SCSI cache
> [  446.482373] sd 0:0:1:0: [sdb] Synchronize Cache(10) failed: Result:
> hostbyte=0x04 driverbyte=0x00
> [  446.491238] sd 0:0:1:0: [sdb] Stopping disk
> [  446.495419] sd 0:0:1:0: [sdb] Start/Stop Unit failed: Result:
> hostbyte=0x04 driverbyte=0x00
> [  446.525227] hisi_sas_v2_hw HISI0162:01: found dev[2:5] is gone
> [  446.569249] hisi_sas_v2_hw HISI0162:01: found dev[1:1] is gone
> [  446.576872] hisi_sas_v2_hw HISI0162:01: found dev[0:2] is gone
> 
> root@(none)$
> 
> So much nicer. BTW, /dev/sdb is a SATA disk, the rest are SAS.

This is awesome. I hope I have some time reviewing the patches themselfes
soon.

Johannes

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: [PATCH v3 1/7] libsas: Use static sas event pool to appease sas event lost

2017-07-12 Thread John Garry

On 12/07/2017 09:47, wangyijing wrote:



在 2017/7/12 16:17, John Garry 写道:

On 12/07/2017 03:06, wangyijing wrote:

-unsigned long port_events_pending;
-unsigned long phy_events_pending;
+struct asd_sas_event   port_events[PORT_POOL_SIZE];
+struct asd_sas_event   phy_events[PHY_POOL_SIZE];

 int error;


Hi Yijing,

So now we are creating a static pool of events per PHY/port, instead of having 
1 static work struct per event per PHY/port. So, for sure, this avoids the 
dynamic event issue of system memory exhaustion which we discussed in v1+v2 
series. And it seems to possibly remove issue of losing SAS events.

But how did you determine the pool size for a PHY/port? It would seem to be 5 * 
#phy events or #port events (which is also 5, I figure by coincidence). How does 
this deal with flutter of >25 events?


There is no special meaning for the pool size, if flutter of > 25 events, 
notify sas events will return error, and the further step work is depending on 
LLDD drivers.
I hope libsas could do more work in this case, but now it seems a little 
difficult, this patch may be a interim fix, until we find a perfect solution.


The principal of having a fixed-sized pool is ok, even though the pool size 
needs more consideration.

However my issue is how to handle pool exhaustion. For a start, relaying info to the LLDD 
that the event notification failed is probably not the way to go. I only now noticed 
"scsi: sas: scsi_queue_work can fail, so make callers aware" made it into the 
kernel; as I mentioned in response to this patch, the LLDD does not know how to handle 
this (and no LLDDs do actually handle this).

I would say it is better to shut down the PHY from libsas (As Dan mentioned in 
the v1 series) when the pool exhausts, under the assumption that the PHY has 
gone into some erroneous state. The user can later re-enable the PHY from 
sysfs, if required.


I considered this suggestion, and what I am worried about are, first if we 
disable phy once the sas event pool exhausts, it may hurt the pending sas event 
process which has been queued,


I don't see how it affects currently queued events - they should just be 
processed normally. As for LLDD reporting events when the pool is 
exhausted, they are just lost.



second, if phy was disabled, and no one trigger the reenable by sysfs, the LLDD 
has no way to post new sas phy events.


For the extreme scenario of pool becoming exhausted and PHY being 
disabled, it should remain disabled until user takes some action to fix 
originating problem.




Thanks!
Yijing.



Much appreciated,
John



Thanks!
Yijing.



Thanks,
John


.




.





.




.






Re: [PATCH v3 0/7] Enhance libsas hotplug feature

2017-07-12 Thread John Garry

On 10/07/2017 08:06, Yijing Wang wrote:

This patchset is based Johannes's patch
"scsi: sas: scsi_queue_work can fail, so make callers aware"

Now the libsas hotplug has some issues, Dan Williams report
a similar bug here before
https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html

The issues we have found
1. if LLDD burst reports lots of phy-up/phy-down sas events, some events
   may lost because a same sas events is pending now, finally libsas topo
   may different the hardware.
2. receive a phy down sas event, libsas call sas_deform_port to remove
   devices, it would first delete the sas port, then put a destruction
   discovery event in a new work, and queue it at the tail of workqueue,
   once the sas port be deleted, its children device will be deleted too,
   when the destruction work start, it will found the target device has
   been removed, and report a sysfs warnning.
3. since a hotplug process will be devided into several works, if a phy up
   sas event insert into phydown works, like
   destruction work  ---> PORTE_BYTES_DMAED (sas_form_port) 
>PHYE_LOSS_OF_SIGNAL
   the hot remove flow would broken by PORTE_BYTES_DMAED event, it's not
   we expected, and issues would occur.

The first patch fix the sas events lost, and the second one introudce 
wait-complete
to fix the hotplug order issues.



I quickly tested this for basic hotplug.

Before:
root@(none)$ echo 0 > ./phy-0:6/sas_phy/phy-0:6/enable
root@(none)$ echo 0 > ./phy-0:5/sas_phy/phy-0:5/enable
root@(none)$ echo 0 > ./phy-0:4/sas_phy/phy-0:4/enable
root@(none)$ echo 0 > ./phy-0:3/sas_phy/phy-0:3/enable
root@(none)$ echo 0 > ./phy-0:3/sas_phy/phy-0:2/enable
root@(none)$ echo 0 > ./phy-0:2/sas_phy/phy-0:2/enable
root@(none)$ echo 0 > ./phy-0:1/sas_phy/phy-0:1/enable
root@(none)$ echo 0 > ./phy-0:0/sas_phy/phy-0:0/enable
root@(none)$ echo 0 > ./phy-0:7/sas_phy/phy-0:7/enable
root@(none)$ [  102.570694] sysfs group 'power' not found for kobject 
'0:0:7:0'

[  102.577250] [ cut here ]
[  102.581861] WARNING: CPU: 3 PID: 1740 at fs/sysfs/group.c:237 
sysfs_remove_group+0x8c/0x94

[  102.590110] Modules linked in:
[  102.593154] CPU: 3 PID: 1740 Comm: kworker/u128:2 Not tainted 
4.12.0-rc1-00032-g3ab81fc #1907
[  102.601664] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon 
D05 UEFI Nemo 1.7 RC3 06/23/2017

[  102.610784] Workqueue: scsi_wq_0 sas_destruct_devices
[  102.615822] task: 8017d4793400 task.stack: 8017b7e7
[  102.621728] PC is at sysfs_remove_group+0x8c/0x94
[  102.626419] LR is at sysfs_remove_group+0x8c/0x94
[  102.631109] pc : [] lr : [] 
pstate: 6045

[  102.638490] sp : 8017b7e73b80
[  102.641791] x29: 8017b7e73b80 x28: 8017db010800
[  102.647091] x27: 08e27000 x26: 8017d43e6600
[  102.652390] x25: 8017b828 x24: 0003
[  102.657689] x23: 8017b78864b0 x22: 8017b784c988
[  102.662988] x21: 8017b7886410 x20: 08ee9dd0
[  102.668288] x19:  x18: 08a1b678
[  102.673587] x17: 000e x16: 0007
[  102.678886] x15:  x14: 00a3
[  102.684185] x13: 0033 x12: 0028
[  102.689484] x11: 08f3be58 x10: 
[  102.694783] x9 : 043c x8 : 6f6b20726f662064
[  102.700082] x7 : 08e29e08 x6 : 8017fbe34c50
[  102.705382] x5 :  x4 : 
[  102.710681] x3 :  x2 : 08e427e0
[  102.715980] x1 :  x0 : 0033
[  102.721279] ---[ end trace c216cc1451d5f7ec ]---
[  102.725882] Call trace:
[  102.728316] Exception stack(0x8017b7e739b0 to 0x8017b7e73ae0)
[  102.734742] 39a0:    
0001
[  102.742557] 39c0: 8017b7e73b80 08267c44 08bfa050 

[  102.750372] 39e0: 8017b78864b0 0003 8017b828 
8017d43e6600
[  102.758188] 3a00: 08e27000 8017db010800 8017d4793400 

[  102.766003] 3a20: 8017b7e73b80 8017b7e73b80 8017b7e73b40 
ffc8
[  102.773818] 3a40: 8017b7e73a70 0810c12c 0033 

[  102.781633] 3a60: 08e427e0   

[  102.789449] 3a80: 8017fbe34c50 08e29e08 6f6b20726f662064 
043c
[  102.797264] 3aa0:  08f3be58 0028 
0033
[  102.805079] 3ac0: 00a3  0007 
000e

[  102.812895] [] sysfs_remove_group+0x8c/0x94
[  102.818628] [] dpm_sysfs_remove+0x58/0x68
[  102.824188] [] device_del+0xf8/0x2d0
[  102.829312] [] device_unregister+0x14/0x2c
[  102.834959] [] bsg_unregister_queue+0x60/0x98
[  102.840866] [] __scsi_remove_device+0xa0/0xbc



[  151.331854] 3bc0: 081f21ac 803370c0
[  151.336718] [] 

Re: [PATCH v3 1/7] libsas: Use static sas event pool to appease sas event lost

2017-07-12 Thread wangyijing


在 2017/7/12 16:17, John Garry 写道:
> On 12/07/2017 03:06, wangyijing wrote:
 -unsigned long port_events_pending;
 -unsigned long phy_events_pending;
 +struct asd_sas_event   port_events[PORT_POOL_SIZE];
 +struct asd_sas_event   phy_events[PHY_POOL_SIZE];

  int error;
>>>
>>> Hi Yijing,
>>>
>>> So now we are creating a static pool of events per PHY/port, instead of 
>>> having 1 static work struct per event per PHY/port. So, for sure, this 
>>> avoids the dynamic event issue of system memory exhaustion which we 
>>> discussed in v1+v2 series. And it seems to possibly remove issue of losing 
>>> SAS events.
>>>
>>> But how did you determine the pool size for a PHY/port? It would seem to be 
>>> 5 * #phy events or #port events (which is also 5, I figure by coincidence). 
>>> How does this deal with flutter of >25 events?
>>
>> There is no special meaning for the pool size, if flutter of > 25 events, 
>> notify sas events will return error, and the further step work is depending 
>> on LLDD drivers.
>> I hope libsas could do more work in this case, but now it seems a little 
>> difficult, this patch may be a interim fix, until we find a perfect solution.
> 
> The principal of having a fixed-sized pool is ok, even though the pool size 
> needs more consideration.
> 
> However my issue is how to handle pool exhaustion. For a start, relaying info 
> to the LLDD that the event notification failed is probably not the way to go. 
> I only now noticed "scsi: sas: scsi_queue_work can fail, so make callers 
> aware" made it into the kernel; as I mentioned in response to this patch, the 
> LLDD does not know how to handle this (and no LLDDs do actually handle this).
> 
> I would say it is better to shut down the PHY from libsas (As Dan mentioned 
> in the v1 series) when the pool exhausts, under the assumption that the PHY 
> has gone into some erroneous state. The user can later re-enable the PHY from 
> sysfs, if required.

I considered this suggestion, and what I am worried about are, first if we 
disable phy once the sas event pool exhausts, it may hurt the pending sas event 
process which has been queued,
second, if phy was disabled, and no one trigger the reenable by sysfs, the LLDD 
has no way to post new sas phy events.

Thanks!
Yijing.

> 
> Much appreciated,
> John
> 
>>
>> Thanks!
>> Yijing.
>>
>>>
>>> Thanks,
>>> John
>>>
>>>
>>> .
>>>
>>
>>
>> .
>>
> 
> 
> 
> .
> 



Re: [PATCH] scsi: default to scsi-mq

2017-07-12 Thread John Garry

What block driver controls the block device for which the performance
regression
has been observed? How many hardware queues were created by that block
driver
(see also /sys/block/*/mq/...)?




Just confirming that we have only 1 queue:
/sys/block/sdc/mq/0 as example


Hi Bart,

Here's the shost init for our SCSI LLDD:
http://elixir.free-electrons.com/linux/latest/source/drivers/scsi/hisi_sas/hisi_sas_main.c#L1736


So we don't set hr_hw_queues (which would mean = 0), so this should set
shost->tag_set.nr_hw_queues to 1 in scsi_mq_setup_tags().

FWIW, I can confirm sysfs entry when I get hw access tomorrow.

John

I'm asking this because the number of hardware

queues controls which I/O scheduler is selected as default. From
block/elevator.c:






if (q->mq_ops) {
if (q->nr_hw_queues == 1)
e = elevator_get("mq-deadline", false);
if (!e)
return 0;
} else
e = elevator_get(CONFIG_DEFAULT_IOSCHED, false);

Bart.




___
linuxarm mailing list
linux...@huawei.com
http://rnd-openeuler.huawei.com/mailman/listinfo/linuxarm

.






RE: [PATCH v2 11/15] megaraid_sas: Set device queue_depth same as HBA can_queue value in scsi-mq mode

2017-07-12 Thread Shivasharan Srikanteshwara
> -Original Message-
> From: Kashyap Desai [mailto:kashyap.de...@broadcom.com]
> Sent: Tuesday, July 11, 2017 9:18 PM
> To: Christoph Hellwig; Shivasharan Srikanteshwara
> Cc: linux-scsi@vger.kernel.org; martin.peter...@oracle.com;
> the...@redhat.com; j...@linux.vnet.ibm.com; Sumit Saxena; h...@suse.com
> Subject: RE: [PATCH v2 11/15] megaraid_sas: Set device queue_depth same as
> HBA can_queue value in scsi-mq mode
>
> > -Original Message-
> > From: Christoph Hellwig [mailto:h...@lst.de]
> > Sent: Tuesday, July 11, 2017 7:28 PM
> > To: Shivasharan S
> > Cc: linux-scsi@vger.kernel.org; martin.peter...@oracle.com;
> > the...@redhat.com; j...@linux.vnet.ibm.com;
> > kashyap.de...@broadcom.com; sumit.sax...@broadcom.com;
> h...@suse.com;
> > h...@lst.de
> > Subject: Re: [PATCH v2 11/15] megaraid_sas: Set device queue_depth
> > same
> as
> > HBA can_queue value in scsi-mq mode
> >
> > On Wed, Jul 05, 2017 at 05:00:25AM -0700, Shivasharan S wrote:
> > > Currently driver sets default queue_depth for VDs at 256 and JBODs
> > > based on interface type, ie., for SAS JBOD QD will be 64, for SATA
> JBOD QD
> > will be 32.
> > > During performance runs with scsi-mq enabled, we are seeing better
> > > results by setting QD same as HBA queue_depth.
> >
> > Please no scsi-mq specifics.  just do this unconditionally.
>
> Chris -  Intent for mq specific check is mainly because of sequential work
> load
> for HDD is having penalty due to mq scheduler issue.
> We did this exercise prior to mq-deadline support.
>
> Making generic change for non-mq and mq was good, but we may see some
> user may not like to see regression.
> E.a In case of, QD = 32 for SATA PD file system creation may be faster
> compare
> to large QD. There may be a soft merger at block layer due to queue depth
> throttling. Eventually, FS creation goes fast due to IO merges, but same
> will not
> be true if we change queue depth logic (means, increase device queue depth
> to
> HBA QD.)
>
> We have choice to completely remove this patch and ask users to do sysfs
> settings in case of scsi-mq performance issue for HDD sequential work
> load.
> Having this patch, we want to provide better QD settings as default from
> driver.
>
>
> Thanks, Kashyap

Hi Christoph,
As Kashyap mentioned, the performance issues seen were specific to scsi-mq
enabled case when running sequential workloads with HDDs.
Making this generic might result in regressions in some scenarios for
non-mq.
That was the idea behind making the change specific to scsi-mq only.

Let us know if you are ok with having this as is or we could remove
this patch completely and have users manually tune queue depth settings if
they
are seeing performance issues with scsi-mq enabled.

Thanks,
Shivasharan


Re: [PATCH v3 1/7] libsas: Use static sas event pool to appease sas event lost

2017-07-12 Thread John Garry

On 12/07/2017 03:06, wangyijing wrote:

-unsigned long port_events_pending;
-unsigned long phy_events_pending;
+struct asd_sas_event   port_events[PORT_POOL_SIZE];
+struct asd_sas_event   phy_events[PHY_POOL_SIZE];

 int error;


Hi Yijing,

So now we are creating a static pool of events per PHY/port, instead of having 
1 static work struct per event per PHY/port. So, for sure, this avoids the 
dynamic event issue of system memory exhaustion which we discussed in v1+v2 
series. And it seems to possibly remove issue of losing SAS events.

But how did you determine the pool size for a PHY/port? It would seem to be 5 * 
#phy events or #port events (which is also 5, I figure by coincidence). How does 
this deal with flutter of >25 events?


There is no special meaning for the pool size, if flutter of > 25 events, 
notify sas events will return error, and the further step work is depending on 
LLDD drivers.
I hope libsas could do more work in this case, but now it seems a little 
difficult, this patch may be a interim fix, until we find a perfect solution.


The principal of having a fixed-sized pool is ok, even though the pool 
size needs more consideration.


However my issue is how to handle pool exhaustion. For a start, relaying 
info to the LLDD that the event notification failed is probably not the 
way to go. I only now noticed "scsi: sas: scsi_queue_work can fail, so 
make callers aware" made it into the kernel; as I mentioned in response 
to this patch, the LLDD does not know how to handle this (and no LLDDs 
do actually handle this).


I would say it is better to shut down the PHY from libsas (As Dan 
mentioned in the v1 series) when the pool exhausts, under the assumption 
that the PHY has gone into some erroneous state. The user can later 
re-enable the PHY from sysfs, if required.


Much appreciated,
John



Thanks!
Yijing.



Thanks,
John


.




.






Re: [PATCH] scsi: libfc: pass an error pointer to fc_disc_error()

2017-07-12 Thread Johannes Thumshirn
On Wed, Jul 12, 2017 at 10:30:22AM +0300, Dan Carpenter wrote:
> This patch is basically to silence a static checker warning.
> 
> drivers/scsi/libfc/fc_disc.c:326 fc_disc_error()
> warn: passing a valid pointer to 'PTR_ERR'
> 
> It doesn't affect runtime because it treats -ENOMEM and a valid pointer
> the same.  But the documentation says we should be passing an error
> pointer.

Acked-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


[PATCH] tcmu: Fix possible to/from address overflow when doing the memcpy

2017-07-12 Thread lixiubo
From: Xiubo Li 

For most case the sg->length equals to PAGE_SIZE, so this bug won't
be triggered. Otherwise this will crash the kernel, for example when
all segments' sg->length equal to 1K.

Signed-off-by: Xiubo Li 
---
 drivers/target/target_core_user.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/target/target_core_user.c 
b/drivers/target/target_core_user.c
index 8bf0823..9030c2a 100644
--- a/drivers/target/target_core_user.c
+++ b/drivers/target/target_core_user.c
@@ -590,8 +590,6 @@ static int scatter_data_area(struct tcmu_dev *udev,
block_remaining);
to_offset = get_block_offset_user(udev, dbi,
block_remaining);
-   offset = DATA_BLOCK_SIZE - block_remaining;
-   to += offset;
 
if (*iov_cnt != 0 &&
to_offset == iov_tail(*iov)) {
@@ -602,8 +600,10 @@ static int scatter_data_area(struct tcmu_dev *udev,
(*iov)->iov_len = copy_bytes;
}
if (copy_data) {
-   memcpy(to, from + sg->length - sg_remaining,
-   copy_bytes);
+   offset = DATA_BLOCK_SIZE - block_remaining;
+   memcpy(to + offset,
+  from + sg->length - sg_remaining,
+  copy_bytes);
tcmu_flush_dcache_range(to, copy_bytes);
}
sg_remaining -= copy_bytes;
@@ -664,9 +664,8 @@ static void gather_data_area(struct tcmu_dev *udev, struct 
tcmu_cmd *cmd,
copy_bytes = min_t(size_t, sg_remaining,
block_remaining);
offset = DATA_BLOCK_SIZE - block_remaining;
-   from += offset;
tcmu_flush_dcache_range(from, copy_bytes);
-   memcpy(to + sg->length - sg_remaining, from,
+   memcpy(to + sg->length - sg_remaining, from + offset,
copy_bytes);
 
sg_remaining -= copy_bytes;
-- 
1.8.3.1





[PATCH] scsi: qedi: Fix return code in qedi_ep_connect()

2017-07-12 Thread Dan Carpenter
We shouldn't be writing over the "ret" variable.  It means we return
ERR_PTR(0) which is NULL and it results in a NULL dereference in the
caller.

Fixes: ace7f46ba5fd ("scsi: qedi: Add QLogic FastLinQ offload iSCSI driver 
framework.")
Signed-off-by: Dan Carpenter 

diff --git a/drivers/scsi/qedi/qedi_iscsi.c b/drivers/scsi/qedi/qedi_iscsi.c
index 80edd28b635f..37da9a8b43b1 100644
--- a/drivers/scsi/qedi/qedi_iscsi.c
+++ b/drivers/scsi/qedi/qedi_iscsi.c
@@ -824,7 +824,7 @@ qedi_ep_connect(struct Scsi_Host *shost, struct sockaddr 
*dst_addr,
u32 iscsi_cid = QEDI_CID_RESERVED;
u16 len = 0;
char *buf = NULL;
-   int ret;
+   int ret, tmp;
 
if (!shost) {
ret = -ENXIO;
@@ -940,10 +940,10 @@ qedi_ep_connect(struct Scsi_Host *shost, struct sockaddr 
*dst_addr,
 
 ep_rel_conn:
qedi->ep_tbl[iscsi_cid] = NULL;
-   ret = qedi_ops->release_conn(qedi->cdev, qedi_ep->handle);
-   if (ret)
+   tmp = qedi_ops->release_conn(qedi->cdev, qedi_ep->handle);
+   if (tmp)
QEDI_WARN(>dbg_ctx, "release_conn returned %d\n",
- ret);
+ tmp);
 ep_free_sq:
qedi_free_sq(qedi, qedi_ep);
 ep_conn_exit:


[PATCH] scsi: libfc: pass an error pointer to fc_disc_error()

2017-07-12 Thread Dan Carpenter
This patch is basically to silence a static checker warning.

drivers/scsi/libfc/fc_disc.c:326 fc_disc_error()
warn: passing a valid pointer to 'PTR_ERR'

It doesn't affect runtime because it treats -ENOMEM and a valid pointer
the same.  But the documentation says we should be passing an error
pointer.

Signed-off-by: Dan Carpenter 

diff --git a/drivers/scsi/libfc/fc_disc.c b/drivers/scsi/libfc/fc_disc.c
index fd501f8dbb11..8660f923ace0 100644
--- a/drivers/scsi/libfc/fc_disc.c
+++ b/drivers/scsi/libfc/fc_disc.c
@@ -573,7 +573,7 @@ static void fc_disc_gpn_ft_resp(struct fc_seq *sp, struct 
fc_frame *fp,
event = DISC_EV_FAILED;
}
if (error)
-   fc_disc_error(disc, fp);
+   fc_disc_error(disc, ERR_PTR(error));
else if (event != DISC_EV_NONE)
fc_disc_done(disc, event);
fc_frame_free(fp);


[PATCHv2] tcmu: Add fifo type waiter list support to avoid starvation

2017-07-12 Thread lixiubo
From: Xiubo Li 

The fifo type waiter list will hold the udevs who are waiting for the
blocks from the data global pool. The unmap thread will try to feed the
first udevs in waiter list, if the global free blocks available are
not enough, it will stop traversing the list and abort waking up the
others.

Signed-off-by: Xiubo Li 
---
 drivers/target/target_core_user.c | 104 --
 1 file changed, 88 insertions(+), 16 deletions(-)

diff --git a/drivers/target/target_core_user.c 
b/drivers/target/target_core_user.c
index 80ee130..8bf0823 100644
--- a/drivers/target/target_core_user.c
+++ b/drivers/target/target_core_user.c
@@ -107,6 +107,8 @@ struct tcmu_nl_cmd {
 struct tcmu_dev {
struct list_head node;
struct kref kref;
+   struct list_head waiter;
+
struct se_device se_dev;
 
char *name;
@@ -132,7 +134,7 @@ struct tcmu_dev {
wait_queue_head_t wait_cmdr;
struct mutex cmdr_lock;
 
-   bool waiting_global;
+   uint32_t waiting_blocks;
uint32_t dbi_max;
uint32_t dbi_thresh;
DECLARE_BITMAP(data_bitmap, DATA_BLOCK_BITS);
@@ -176,9 +178,33 @@ struct tcmu_cmd {
 
 static struct task_struct *unmap_thread;
 static wait_queue_head_t unmap_wait;
+/*
+ * To avoid dead lock, the mutex locks order should always be:
+ *
+ * mutex_lock(_udev_mutex);
+ * ...
+ * mutex_lock(_dev->cmdr_lock);
+ * mutex_unlock(_dev->cmdr_lock);
+ * ...
+ * mutex_unlock(_udev_mutex);
+ */
 static DEFINE_MUTEX(root_udev_mutex);
 static LIST_HEAD(root_udev);
 
+/*
+ * To avoid dead lock, the mutex locks order should always be:
+ *
+ * mutex_lock(_udev_waiter_mutex);
+ * ...
+ * mutex_lock(_dev->cmdr_lock);
+ * mutex_unlock(_dev->cmdr_lock);
+ * ...
+ * mutex_unlock(_udev_waiter_mutex);
+ */
+static DEFINE_MUTEX(root_udev_waiter_mutex);
+/* The data blocks global pool waiter list */
+static LIST_HEAD(root_udev_waiter);
+
 static atomic_t global_db_count = ATOMIC_INIT(0);
 
 static struct kmem_cache *tcmu_cmd_cache;
@@ -321,6 +347,11 @@ static int tcmu_genl_set_features(struct sk_buff *skb, 
struct genl_info *info)
 #define tcmu_cmd_set_dbi(cmd, index) ((cmd)->dbi[(cmd)->dbi_cur++] = (index))
 #define tcmu_cmd_get_dbi(cmd) ((cmd)->dbi[(cmd)->dbi_cur++])
 
+static inline bool is_in_waiter_list(struct tcmu_dev *udev)
+{
+   return !!udev->waiting_blocks;
+}
+
 static void tcmu_cmd_free_data(struct tcmu_cmd *tcmu_cmd, uint32_t len)
 {
struct tcmu_dev *udev = tcmu_cmd->tcmu_dev;
@@ -377,8 +408,6 @@ static bool tcmu_get_empty_blocks(struct tcmu_dev *udev,
 {
int i;
 
-   udev->waiting_global = false;
-
for (i = tcmu_cmd->dbi_cur; i < tcmu_cmd->dbi_cnt; i++) {
if (!tcmu_get_empty_block(udev, tcmu_cmd))
goto err;
@@ -386,9 +415,7 @@ static bool tcmu_get_empty_blocks(struct tcmu_dev *udev,
return true;
 
 err:
-   udev->waiting_global = true;
-   /* Try to wake up the unmap thread */
-   wake_up(_wait);
+   udev->waiting_blocks += tcmu_cmd->dbi_cnt - i;
return false;
 }
 
@@ -795,12 +822,26 @@ static inline size_t tcmu_cmd_get_cmd_size(struct 
tcmu_cmd *tcmu_cmd,
 
while (!is_ring_space_avail(udev, tcmu_cmd, command_size, data_length)) 
{
int ret;
+   bool is_waiting_blocks = is_in_waiter_list(udev);
DEFINE_WAIT(__wait);
 
prepare_to_wait(>wait_cmdr, &__wait, TASK_INTERRUPTIBLE);
 
pr_debug("sleeping for ring space\n");
mutex_unlock(>cmdr_lock);
+
+   /*
+* Try to insert the current udev to waiter list and
+* then wake up the unmap thread
+*/
+   if (is_waiting_blocks) {
+   mutex_lock(_udev_waiter_mutex);
+   list_add_tail(>waiter, _udev_waiter);
+   mutex_unlock(_udev_waiter_mutex);
+
+   wake_up(_wait);
+   }
+
if (udev->cmd_time_out)
ret = schedule_timeout(
msecs_to_jiffies(udev->cmd_time_out));
@@ -1023,8 +1064,6 @@ static unsigned int tcmu_handle_completions(struct 
tcmu_dev *udev)
if (mb->cmd_tail == mb->cmd_head)
del_timer(>timeout); /* no more pending cmds */
 
-   wake_up(>wait_cmdr);
-
return handled;
 }
 
@@ -1121,7 +1160,17 @@ static int tcmu_irqcontrol(struct uio_info *info, s32 
irq_on)
struct tcmu_dev *tcmu_dev = container_of(info, struct tcmu_dev, 
uio_info);
 
mutex_lock(_dev->cmdr_lock);
-   tcmu_handle_completions(tcmu_dev);
+   /*
+* If the current udev is also in waiter list, this will
+* make sure that the other waiters in list be fed ahead
+* of it.
+*/
+   if (is_in_waiter_list(tcmu_dev)) {
+   wake_up(_wait);
+  

Re: [PATCH] hpsa: add support for legacy boards

2017-07-12 Thread Hannes Reinecke
On 07/12/2017 09:11 AM, Christoph Hellwig wrote:
> On Tue, Jul 11, 2017 at 06:58:36PM +0300, Meelis Roos wrote:
>>> The 5i controller is probably too old for the hpsa driver to support. 
>>> The hpsa driver is looking for information to determine if the drive is 
>>> online/offline and
>>> this information is not available.
>>>
>>> What was the original issue you were having with the cciss driver?
>>
>> Christoph Hellwig updated block layer with "block: Make most 
>> scsi_req_init() calls implicit" and at first try, cciss was left without 
>> the needed initialization. This caused OOPS in udev probing but the 
>> system worked. The issue was fixed by Christoph quickly.
> 
> The original patch is from Bart, but otherwise correct.
> 
>> But he suggested it might be worth trying hpsa driver instead of cciss, 
>> with a longer term goal to to move users of cciss over to hpsa if 
>> possible. Now that I have tested it, it seems not all older cards are 
>> supported in hpsa - it's more than ID-s and interrupt masks.
> 
> Seems like it.  And the idea behind this game is that we'd like to slowly
> get rid of old request_fn based drivers.  Given that hpsa supports very
> similar hardware cciss is a target for removal once that hardware is
> switched over.  But it seems like we'll need more work in this area..
> 
I'll give it a shot.
I seem to have an oldish cciss board floating about; let's see how far I
get with those.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH] hpsa: add support for legacy boards

2017-07-12 Thread Christoph Hellwig
On Tue, Jul 11, 2017 at 06:58:36PM +0300, Meelis Roos wrote:
> > The 5i controller is probably too old for the hpsa driver to support. 
> > The hpsa driver is looking for information to determine if the drive is 
> > online/offline and
> > this information is not available.
> > 
> > What was the original issue you were having with the cciss driver?
> 
> Christoph Hellwig updated block layer with "block: Make most 
> scsi_req_init() calls implicit" and at first try, cciss was left without 
> the needed initialization. This caused OOPS in udev probing but the 
> system worked. The issue was fixed by Christoph quickly.

The original patch is from Bart, but otherwise correct.

> But he suggested it might be worth trying hpsa driver instead of cciss, 
> with a longer term goal to to move users of cciss over to hpsa if 
> possible. Now that I have tested it, it seems not all older cards are 
> supported in hpsa - it's more than ID-s and interrupt masks.

Seems like it.  And the idea behind this game is that we'd like to slowly
get rid of old request_fn based drivers.  Given that hpsa supports very
similar hardware cciss is a target for removal once that hardware is
switched over.  But it seems like we'll need more work in this area..

> 
> -- 
> Meelis Roos (mr...@linux.ee)
---end quoted text---