Re: [PATCH] scsi: avoid a permanent stop of the scsi device's request queue

2016-12-08 Thread Wei Fang


On 2016/12/8 23:39, James Bottomley wrote:
> On Thu, 2016-12-08 at 11:22 +0800, Wei Fang wrote:
>> Hi, James, Ewan,
>>
>> On 2016/12/8 10:33, James Bottomley wrote:
>>> On Thu, 2016-12-08 at 10:28 +0800, Wei Fang wrote:
 Hi, James, Ewan,

 On 2016/12/8 7:43, James Bottomley wrote:
> On Wed, 2016-12-07 at 15:30 -0500, Ewan D. Milne wrote:
>> On Wed, 2016-12-07 at 12:09 -0800, James Bottomley wrote:
>>> Hm, it looks like the state set in scsi_sysfs_add_sdev() is
>>> bogus. 
>>>  We expect the state to have been properly set before that
>>> (in
>>> scsi_add_lun), so can we not simply remove it?
>>>
>>> James
>>>
>>
>> I was considering that, but...
>>
>> enum scsi_device_state {
>> SDEV_CREATED = 1,   /* device created but not
>> added
>> to
>> sysfs
>> 
>> 
>>  
>> 
>> 
>>   
>>  * Only internal commands
>> allowed
>> (for inq) */
>>
>> So it seems the intent was for the state to not change until
>> then.
>
> I think this is historical.  There was a change somewhere that
> moved
> the sysfs state handling out of the sdev stat to is_visible, so
> the
> sdev state no-longer reflects  it.
>
>> The call to set the SDEV_RUNNING state earlier in
>> scsi_add_lun()
>> was added with:
>>
>> commit 6f4267e3bd1211b3d09130e626b0b3d885077610
>> Author: James Bottomley <
>> james.bottom...@hansenpartnership.com>
>> Date:   Fri Aug 22 16:53:31 2008 -0500
>>
>> [SCSI] Update the SCSI state model to allow blocking in
>> the
>> created state
>>
>> Which allows the device to go into ->BLOCK (which is needed,
>> since it
>> actually happens).
>>
>> Should we remove the call from scsi_sysfs_add_sdev() and
>> change
>> the
>> comment in scsi_device.h to reflect the intent?

 This sounds reasonable.

> Assuming someone with the problem actually tests it, yes.

 This problem can be stably reproduced on Zengxi Chen's machine,
 who
 reported the bug. We can test it on this machine.

 The patch is as below, just for sure:

 diff --git a/drivers/scsi/scsi_sysfs.c
 b/drivers/scsi/scsi_sysfs.c
 index 0734927..82dfe07 100644
 --- a/drivers/scsi/scsi_sysfs.c
 +++ b/drivers/scsi/scsi_sysfs.c
 @@ -1204,10 +1204,6 @@ int scsi_sysfs_add_sdev(struct scsi_device
 *sdev)
 struct request_queue *rq = sdev->request_queue;
 struct scsi_target *starget = sdev->sdev_target;

 -   error = scsi_device_set_state(sdev, SDEV_RUNNING);
 -   if (error)
 -   return error;
 -
>>
>> I looked through those code and found that if we fix this bug
>> by removing setting the state in scsi_sysfs_add_sdev(), it
>> can't be fixed completely:
>>
>> scsi_device_set_state(sdev, SDEV_RUNNING) in scsi_add_lun() and
>> scsi_device_set_state(sdev, SDEV_CREATED_BLOCK) in
>> scsi_internal_device_block()
>> can be called simultaneously. Because there is no synchronization
>> between scsi_device_set_state(), those calls may both return
>> success, and the state may be SDEV_RUNNING after that, and the
>> device queue is stopped.
> 
> As Bart said, we've had this problem for a while, but the theoretical
> issue never really shows.  Unless it's suddenly exposed for you, lets
> go with the simple fix (if you confirm it works) and defer the far more
> complex issue of concurrent state changes.

OK, I'll test it and send patch v2.

Thanks,
Wei

>>> That's it, although not the second hunk: CREATED still means device
>>> not
>>> added to sysfs.  It's just that RUNNING now doesn't mean it is.
>>>
>>> James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: avoid a permanent stop of the scsi device's request queue

2016-12-08 Thread James Bottomley
On Thu, 2016-12-08 at 11:22 +0800, Wei Fang wrote:
> Hi, James, Ewan,
> 
> On 2016/12/8 10:33, James Bottomley wrote:
> > On Thu, 2016-12-08 at 10:28 +0800, Wei Fang wrote:
> > > Hi, James, Ewan,
> > > 
> > > On 2016/12/8 7:43, James Bottomley wrote:
> > > > On Wed, 2016-12-07 at 15:30 -0500, Ewan D. Milne wrote:
> > > > > On Wed, 2016-12-07 at 12:09 -0800, James Bottomley wrote:
> > > > > > Hm, it looks like the state set in scsi_sysfs_add_sdev() is
> > > > > > bogus. 
> > > > > >  We expect the state to have been properly set before that
> > > > > > (in
> > > > > > scsi_add_lun), so can we not simply remove it?
> > > > > > 
> > > > > > James
> > > > > > 
> > > > > 
> > > > > I was considering that, but...
> > > > > 
> > > > > enum scsi_device_state {
> > > > > SDEV_CREATED = 1,   /* device created but not
> > > > > added
> > > > > to
> > > > > sysfs
> > > > > 
> > > > > 
> > > > >  
> > > > > 
> > > > > 
> > > > >   
> > > > >  * Only internal commands
> > > > > allowed
> > > > > (for inq) */
> > > > > 
> > > > > So it seems the intent was for the state to not change until
> > > > > then.
> > > > 
> > > > I think this is historical.  There was a change somewhere that
> > > > moved
> > > > the sysfs state handling out of the sdev stat to is_visible, so
> > > > the
> > > > sdev state no-longer reflects  it.
> > > > 
> > > > > The call to set the SDEV_RUNNING state earlier in
> > > > > scsi_add_lun()
> > > > > was added with:
> > > > > 
> > > > > commit 6f4267e3bd1211b3d09130e626b0b3d885077610
> > > > > Author: James Bottomley <
> > > > > james.bottom...@hansenpartnership.com>
> > > > > Date:   Fri Aug 22 16:53:31 2008 -0500
> > > > > 
> > > > > [SCSI] Update the SCSI state model to allow blocking in
> > > > > the
> > > > > created state
> > > > > 
> > > > > Which allows the device to go into ->BLOCK (which is needed,
> > > > > since it
> > > > > actually happens).
> > > > > 
> > > > > Should we remove the call from scsi_sysfs_add_sdev() and
> > > > > change
> > > > > the
> > > > > comment in scsi_device.h to reflect the intent?
> > > 
> > > This sounds reasonable.
> > > 
> > > > Assuming someone with the problem actually tests it, yes.
> > > 
> > > This problem can be stably reproduced on Zengxi Chen's machine,
> > > who
> > > reported the bug. We can test it on this machine.
> > > 
> > > The patch is as below, just for sure:
> > > 
> > > diff --git a/drivers/scsi/scsi_sysfs.c
> > > b/drivers/scsi/scsi_sysfs.c
> > > index 0734927..82dfe07 100644
> > > --- a/drivers/scsi/scsi_sysfs.c
> > > +++ b/drivers/scsi/scsi_sysfs.c
> > > @@ -1204,10 +1204,6 @@ int scsi_sysfs_add_sdev(struct scsi_device
> > > *sdev)
> > > struct request_queue *rq = sdev->request_queue;
> > > struct scsi_target *starget = sdev->sdev_target;
> > > 
> > > -   error = scsi_device_set_state(sdev, SDEV_RUNNING);
> > > -   if (error)
> > > -   return error;
> > > -
> 
> I looked through those code and found that if we fix this bug
> by removing setting the state in scsi_sysfs_add_sdev(), it
> can't be fixed completely:
> 
> scsi_device_set_state(sdev, SDEV_RUNNING) in scsi_add_lun() and
> scsi_device_set_state(sdev, SDEV_CREATED_BLOCK) in
> scsi_internal_device_block()
> can be called simultaneously. Because there is no synchronization
> between scsi_device_set_state(), those calls may both return
> success, and the state may be SDEV_RUNNING after that, and the
> device queue is stopped.

As Bart said, we've had this problem for a while, but the theoretical
issue never really shows.  Unless it's suddenly exposed for you, lets
go with the simple fix (if you confirm it works) and defer the far more
complex issue of concurrent state changes.

James


> Thanks,
> Wei
> 
> > That's it, although not the second hunk: CREATED still means device
> > not
> > added to sysfs.  It's just that RUNNING now doesn't mean it is.
> > 
> > James
> > 
> > 
> > 
> > .
> > 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi"
> in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: avoid a permanent stop of the scsi device's request queue

2016-12-08 Thread Ewan D. Milne
On Thu, 2016-12-08 at 14:38 +0800, Wei Fang wrote:
> Hi, James, Ewan, Bart,
> 
> On 2016/12/8 11:22, Wei Fang wrote:
> > I looked through those code and found that if we fix this bug
> > by removing setting the state in scsi_sysfs_add_sdev(), it
> > can't be fixed completely:
> > 
> > scsi_device_set_state(sdev, SDEV_RUNNING) in scsi_add_lun() and
> > scsi_device_set_state(sdev, SDEV_CREATED_BLOCK) in 
> > scsi_internal_device_block()
> > can be called simultaneously. Because there is no synchronization
> > between scsi_device_set_state(), those calls may both return
> > success, and the state may be SDEV_RUNNING after that, and the
> > device queue is stopped.
> 
> Can we fix it in this way:
> 
> Add a state lock to make sure the result of simultaneously calling
> of scsi_device_set_state() is not unpredictable.
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 253ee74..80cb493 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -2457,10 +2457,16 @@ EXPORT_SYMBOL(scsi_test_unit_ready);
>  int
>  scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
>  {
> -   enum scsi_device_state oldstate = sdev->sdev_state;
> +   enum scsi_device_state oldstate;
> +   unsigned long flags;
> +
> +   spin_lock_irqsave(>state_lock, flags);
> +   oldstate = sdev->sdev_state;
> 
> -   if (state == oldstate)
> +   if (state == oldstate) {
> +   spin_unlock_irqrestore(>state_lock, flags);
> return 0;
> +   }
> 
> switch (state) {
> case SDEV_CREATED:
> @@ -2558,9 +2564,11 @@ scsi_device_set_state(struct scsi_device *sdev, enum 
> scsi_device_state state)
> 
> }
> sdev->sdev_state = state;
> +   spin_unlock_irqrestore(>state_lock, flags);
> return 0;
> 
>   illegal:
> +   spin_unlock_irqrestore(>state_lock, flags);
> SCSI_LOG_ERROR_RECOVERY(1,
> sdev_printk(KERN_ERR, sdev,
> "Illegal state transition %s->%s",
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index 6f7128f..ba2f38f 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -238,6 +238,7 @@ static struct scsi_device *scsi_alloc_sdev(struct 
> scsi_target *starget,
> INIT_LIST_HEAD(>starved_entry);
> INIT_LIST_HEAD(>event_list);
> spin_lock_init(>list_lock);
> +   spin_lock_init(>state_lock);
> mutex_init(>inquiry_mutex);
> INIT_WORK(>event_work, scsi_evt_thread);
> INIT_WORK(>requeue_work, scsi_requeue_run_queue);
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index 0734927..82dfe07 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -1204,10 +1204,6 @@ int scsi_sysfs_add_sdev(struct scsi_device *sdev)
> struct request_queue *rq = sdev->request_queue;
> struct scsi_target *starget = sdev->sdev_target;
> 
> -   error = scsi_device_set_state(sdev, SDEV_RUNNING);
> -   if (error)
> -   return error;
> -
> error = scsi_target_add(starget);
> if (error)
> return error;
> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
> index 8990e58..e00764e 100644
> --- a/include/scsi/scsi_device.h
> +++ b/include/scsi/scsi_device.h
> @@ -31,7 +31,7 @@ struct scsi_mode_data {
>  enum scsi_device_state {
> SDEV_CREATED = 1,   /* device created but not added to sysfs
>  * Only internal commands allowed (for inq) */
> -   SDEV_RUNNING,   /* device properly configured
> +   SDEV_RUNNING,   /* device properly initialized
>  * All commands allowed */
> SDEV_CANCEL,/* beginning to delete device
>  * Only error handler commands allowed */
> @@ -207,6 +207,7 @@ struct scsi_device {
> void*handler_data;
> 
> unsigned char   access_state;
> +   spinlock_t  state_lock;
> enum scsi_device_state sdev_state;
> unsigned long   sdev_data[0];
>  } __attribute__((aligned(sizeof(unsigned long;
> 
> Haven't tested yet. Sending this for your opinion.
> 
> Thanks,
> Wei
> 

You would presumably need to take your lock in scsi_internal_device_unblock()
as well, since it also checks and updates sdev_state directly.  There are also
places like scsi_device_resume() that examine the state before deciding to
call scsi_device_set_state().

-Ewan




--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: avoid a permanent stop of the scsi device's request queue

2016-12-07 Thread Wei Fang
Hi, James, Ewan, Bart,

On 2016/12/8 11:22, Wei Fang wrote:
> I looked through those code and found that if we fix this bug
> by removing setting the state in scsi_sysfs_add_sdev(), it
> can't be fixed completely:
> 
> scsi_device_set_state(sdev, SDEV_RUNNING) in scsi_add_lun() and
> scsi_device_set_state(sdev, SDEV_CREATED_BLOCK) in 
> scsi_internal_device_block()
> can be called simultaneously. Because there is no synchronization
> between scsi_device_set_state(), those calls may both return
> success, and the state may be SDEV_RUNNING after that, and the
> device queue is stopped.

Can we fix it in this way:

Add a state lock to make sure the result of simultaneously calling
of scsi_device_set_state() is not unpredictable.

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 253ee74..80cb493 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2457,10 +2457,16 @@ EXPORT_SYMBOL(scsi_test_unit_ready);
 int
 scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
 {
-   enum scsi_device_state oldstate = sdev->sdev_state;
+   enum scsi_device_state oldstate;
+   unsigned long flags;
+
+   spin_lock_irqsave(>state_lock, flags);
+   oldstate = sdev->sdev_state;

-   if (state == oldstate)
+   if (state == oldstate) {
+   spin_unlock_irqrestore(>state_lock, flags);
return 0;
+   }

switch (state) {
case SDEV_CREATED:
@@ -2558,9 +2564,11 @@ scsi_device_set_state(struct scsi_device *sdev, enum 
scsi_device_state state)

}
sdev->sdev_state = state;
+   spin_unlock_irqrestore(>state_lock, flags);
return 0;

  illegal:
+   spin_unlock_irqrestore(>state_lock, flags);
SCSI_LOG_ERROR_RECOVERY(1,
sdev_printk(KERN_ERR, sdev,
"Illegal state transition %s->%s",
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 6f7128f..ba2f38f 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -238,6 +238,7 @@ static struct scsi_device *scsi_alloc_sdev(struct 
scsi_target *starget,
INIT_LIST_HEAD(>starved_entry);
INIT_LIST_HEAD(>event_list);
spin_lock_init(>list_lock);
+   spin_lock_init(>state_lock);
mutex_init(>inquiry_mutex);
INIT_WORK(>event_work, scsi_evt_thread);
INIT_WORK(>requeue_work, scsi_requeue_run_queue);
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 0734927..82dfe07 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1204,10 +1204,6 @@ int scsi_sysfs_add_sdev(struct scsi_device *sdev)
struct request_queue *rq = sdev->request_queue;
struct scsi_target *starget = sdev->sdev_target;

-   error = scsi_device_set_state(sdev, SDEV_RUNNING);
-   if (error)
-   return error;
-
error = scsi_target_add(starget);
if (error)
return error;
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 8990e58..e00764e 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -31,7 +31,7 @@ struct scsi_mode_data {
 enum scsi_device_state {
SDEV_CREATED = 1,   /* device created but not added to sysfs
 * Only internal commands allowed (for inq) */
-   SDEV_RUNNING,   /* device properly configured
+   SDEV_RUNNING,   /* device properly initialized
 * All commands allowed */
SDEV_CANCEL,/* beginning to delete device
 * Only error handler commands allowed */
@@ -207,6 +207,7 @@ struct scsi_device {
void*handler_data;

unsigned char   access_state;
+   spinlock_t  state_lock;
enum scsi_device_state sdev_state;
unsigned long   sdev_data[0];
 } __attribute__((aligned(sizeof(unsigned long;

Haven't tested yet. Sending this for your opinion.

Thanks,
Wei

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: avoid a permanent stop of the scsi device's request queue

2016-12-07 Thread Wei Fang
Hi, James, Ewan,

On 2016/12/8 10:33, James Bottomley wrote:
> On Thu, 2016-12-08 at 10:28 +0800, Wei Fang wrote:
>> Hi, James, Ewan,
>>
>> On 2016/12/8 7:43, James Bottomley wrote:
>>> On Wed, 2016-12-07 at 15:30 -0500, Ewan D. Milne wrote:
 On Wed, 2016-12-07 at 12:09 -0800, James Bottomley wrote:
> Hm, it looks like the state set in scsi_sysfs_add_sdev() is
> bogus. 
>  We expect the state to have been properly set before that (in
> scsi_add_lun), so can we not simply remove it?
>
> James
>

 I was considering that, but...

 enum scsi_device_state {
 SDEV_CREATED = 1,   /* device created but not added
 to
 sysfs
 
  
 
   
  * Only internal commands allowed
 (for inq) */

 So it seems the intent was for the state to not change until
 then.
>>>
>>> I think this is historical.  There was a change somewhere that
>>> moved
>>> the sysfs state handling out of the sdev stat to is_visible, so the
>>> sdev state no-longer reflects  it.
>>>
 The call to set the SDEV_RUNNING state earlier in scsi_add_lun()
 was added with:

 commit 6f4267e3bd1211b3d09130e626b0b3d885077610
 Author: James Bottomley 
 Date:   Fri Aug 22 16:53:31 2008 -0500

 [SCSI] Update the SCSI state model to allow blocking in the
 created state

 Which allows the device to go into ->BLOCK (which is needed,
 since it
 actually happens).

 Should we remove the call from scsi_sysfs_add_sdev() and change
 the
 comment in scsi_device.h to reflect the intent?
>>
>> This sounds reasonable.
>>
>>> Assuming someone with the problem actually tests it, yes.
>>
>> This problem can be stably reproduced on Zengxi Chen's machine, who
>> reported the bug. We can test it on this machine.
>>
>> The patch is as below, just for sure:
>>
>> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
>> index 0734927..82dfe07 100644
>> --- a/drivers/scsi/scsi_sysfs.c
>> +++ b/drivers/scsi/scsi_sysfs.c
>> @@ -1204,10 +1204,6 @@ int scsi_sysfs_add_sdev(struct scsi_device
>> *sdev)
>> struct request_queue *rq = sdev->request_queue;
>> struct scsi_target *starget = sdev->sdev_target;
>>
>> -   error = scsi_device_set_state(sdev, SDEV_RUNNING);
>> -   if (error)
>> -   return error;
>> -

I looked through those code and found that if we fix this bug
by removing setting the state in scsi_sysfs_add_sdev(), it
can't be fixed completely:

scsi_device_set_state(sdev, SDEV_RUNNING) in scsi_add_lun() and
scsi_device_set_state(sdev, SDEV_CREATED_BLOCK) in scsi_internal_device_block()
can be called simultaneously. Because there is no synchronization
between scsi_device_set_state(), those calls may both return
success, and the state may be SDEV_RUNNING after that, and the
device queue is stopped.

Thanks,
Wei

> That's it, although not the second hunk: CREATED still means device not
> added to sysfs.  It's just that RUNNING now doesn't mean it is.
> 
> James
> 
> 
> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: avoid a permanent stop of the scsi device's request queue

2016-12-07 Thread James Bottomley
On Thu, 2016-12-08 at 10:28 +0800, Wei Fang wrote:
> Hi, James, Ewan,
> 
> On 2016/12/8 7:43, James Bottomley wrote:
> > On Wed, 2016-12-07 at 15:30 -0500, Ewan D. Milne wrote:
> > > On Wed, 2016-12-07 at 12:09 -0800, James Bottomley wrote:
> > > > Hm, it looks like the state set in scsi_sysfs_add_sdev() is
> > > > bogus. 
> > > >  We expect the state to have been properly set before that (in
> > > > scsi_add_lun), so can we not simply remove it?
> > > > 
> > > > James
> > > > 
> > > 
> > > I was considering that, but...
> > > 
> > > enum scsi_device_state {
> > > SDEV_CREATED = 1,   /* device created but not added
> > > to
> > > sysfs
> > > 
> > >  
> > > 
> > >   
> > >  * Only internal commands allowed
> > > (for inq) */
> > > 
> > > So it seems the intent was for the state to not change until
> > > then.
> > 
> > I think this is historical.  There was a change somewhere that
> > moved
> > the sysfs state handling out of the sdev stat to is_visible, so the
> > sdev state no-longer reflects  it.
> > 
> > > The call to set the SDEV_RUNNING state earlier in scsi_add_lun()
> > > was added with:
> > > 
> > > commit 6f4267e3bd1211b3d09130e626b0b3d885077610
> > > Author: James Bottomley 
> > > Date:   Fri Aug 22 16:53:31 2008 -0500
> > > 
> > > [SCSI] Update the SCSI state model to allow blocking in the
> > > created state
> > > 
> > > Which allows the device to go into ->BLOCK (which is needed,
> > > since it
> > > actually happens).
> > > 
> > > Should we remove the call from scsi_sysfs_add_sdev() and change
> > > the
> > > comment in scsi_device.h to reflect the intent?
> 
> This sounds reasonable.
> 
> > Assuming someone with the problem actually tests it, yes.
> 
> This problem can be stably reproduced on Zengxi Chen's machine, who
> reported the bug. We can test it on this machine.
> 
> The patch is as below, just for sure:
> 
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index 0734927..82dfe07 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -1204,10 +1204,6 @@ int scsi_sysfs_add_sdev(struct scsi_device
> *sdev)
> struct request_queue *rq = sdev->request_queue;
> struct scsi_target *starget = sdev->sdev_target;
> 
> -   error = scsi_device_set_state(sdev, SDEV_RUNNING);
> -   if (error)
> -   return error;
> -

That's it, although not the second hunk: CREATED still means device not
added to sysfs.  It's just that RUNNING now doesn't mean it is.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: avoid a permanent stop of the scsi device's request queue

2016-12-07 Thread Wei Fang
Hi, James, Ewan,

On 2016/12/8 7:43, James Bottomley wrote:
> On Wed, 2016-12-07 at 15:30 -0500, Ewan D. Milne wrote:
>> On Wed, 2016-12-07 at 12:09 -0800, James Bottomley wrote:
>>> Hm, it looks like the state set in scsi_sysfs_add_sdev() is bogus. 
>>>  We expect the state to have been properly set before that (in
>>> scsi_add_lun), so can we not simply remove it?
>>>
>>> James
>>>
>>
>> I was considering that, but...
>>
>> enum scsi_device_state {
>> SDEV_CREATED = 1,   /* device created but not added to
>> sysfs
>>  
>>   
>>  * Only internal commands allowed
>> (for inq) */
>>
>> So it seems the intent was for the state to not change until then.
> 
> I think this is historical.  There was a change somewhere that moved
> the sysfs state handling out of the sdev stat to is_visible, so the
> sdev state no-longer reflects  it.
> 
>> The call to set the SDEV_RUNNING state earlier in scsi_add_lun()
>> was added with:
>>
>> commit 6f4267e3bd1211b3d09130e626b0b3d885077610
>> Author: James Bottomley 
>> Date:   Fri Aug 22 16:53:31 2008 -0500
>>
>> [SCSI] Update the SCSI state model to allow blocking in the
>> created state
>>
>> Which allows the device to go into ->BLOCK (which is needed, since it
>> actually happens).
>>
>> Should we remove the call from scsi_sysfs_add_sdev() and change the
>> comment in scsi_device.h to reflect the intent?

This sounds reasonable.

> Assuming someone with the problem actually tests it, yes.

This problem can be stably reproduced on Zengxi Chen's machine, who
reported the bug. We can test it on this machine.

The patch is as below, just for sure:

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 0734927..82dfe07 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1204,10 +1204,6 @@ int scsi_sysfs_add_sdev(struct scsi_device *sdev)
struct request_queue *rq = sdev->request_queue;
struct scsi_target *starget = sdev->sdev_target;

-   error = scsi_device_set_state(sdev, SDEV_RUNNING);
-   if (error)
-   return error;
-
error = scsi_target_add(starget);
if (error)
return error;
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 8990e58..08636ea 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -29,7 +29,7 @@ struct scsi_mode_data {
  * scsi_lib:scsi_device_set_state().
  */
 enum scsi_device_state {
-   SDEV_CREATED = 1,   /* device created but not added to sysfs
+   SDEV_CREATED = 1,   /* device created
 * Only internal commands allowed (for inq) */
SDEV_RUNNING,   /* device properly configured
 * All commands allowed */

Thanks,
Wei

>> I have not verified the async vs. non-async scan path yet but it 
>> looks like it would be OK.
> 
> I did.  The async device addition occurs after scsi_add_lun(), so it
> rules the state change in both cases.
> 
> James
> 
> 
>> -Ewan
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-scsi"
>> in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: avoid a permanent stop of the scsi device's request queue

2016-12-07 Thread James Bottomley
On Wed, 2016-12-07 at 15:30 -0500, Ewan D. Milne wrote:
> On Wed, 2016-12-07 at 12:09 -0800, James Bottomley wrote:
> > Hm, it looks like the state set in scsi_sysfs_add_sdev() is bogus. 
> >  We expect the state to have been properly set before that (in
> > scsi_add_lun), so can we not simply remove it?
> > 
> > James
> > 
> 
> I was considering that, but...
> 
> enum scsi_device_state {
> SDEV_CREATED = 1,   /* device created but not added to
> sysfs
>  
>   
>  * Only internal commands allowed
> (for inq) */
> 
> So it seems the intent was for the state to not change until then.

I think this is historical.  There was a change somewhere that moved
the sysfs state handling out of the sdev stat to is_visible, so the
sdev state no-longer reflects  it.

> The call to set the SDEV_RUNNING state earlier in scsi_add_lun()
> was added with:
> 
> commit 6f4267e3bd1211b3d09130e626b0b3d885077610
> Author: James Bottomley 
> Date:   Fri Aug 22 16:53:31 2008 -0500
> 
> [SCSI] Update the SCSI state model to allow blocking in the
> created state
> 
> Which allows the device to go into ->BLOCK (which is needed, since it
> actually happens).
> 
> Should we remove the call from scsi_sysfs_add_sdev() and change the
> comment in scsi_device.h to reflect the intent?

Assuming someone with the problem actually tests it, yes.

> I have not verified the async vs. non-async scan path yet but it 
> looks like it would be OK.

I did.  The async device addition occurs after scsi_add_lun(), so it
rules the state change in both cases.

James


> -Ewan
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi"
> in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: avoid a permanent stop of the scsi device's request queue

2016-12-07 Thread Ewan D. Milne
On Wed, 2016-12-07 at 12:09 -0800, James Bottomley wrote:
> Hm, it looks like the state set in scsi_sysfs_add_sdev() is bogus.  We
> expect the state to have been properly set before that (in
> scsi_add_lun), so can we not simply remove it?
> 
> James
> 

I was considering that, but...

enum scsi_device_state {
SDEV_CREATED = 1,   /* device created but not added to sysfs

   
 * Only internal commands allowed (for inq) */

So it seems the intent was for the state to not change until then.

The call to set the SDEV_RUNNING state earlier in scsi_add_lun()
was added with:

commit 6f4267e3bd1211b3d09130e626b0b3d885077610
Author: James Bottomley 
Date:   Fri Aug 22 16:53:31 2008 -0500

[SCSI] Update the SCSI state model to allow blocking in the created state

Which allows the device to go into ->BLOCK (which is needed, since it
actually happens).

Should we remove the call from scsi_sysfs_add_sdev() and change the
comment in scsi_device.h to reflect the intent?

I have not verified the async vs. non-async scan path yet but it looks
like it would be OK.

-Ewan


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: avoid a permanent stop of the scsi device's request queue

2016-12-07 Thread James Bottomley
On Wed, 2016-12-07 at 14:24 -0500, Ewan D. Milne wrote:
> On Wed, 2016-12-07 at 10:16 -0800, James Bottomley wrote:
> > On Wed, 2016-12-07 at 12:40 -0500, Ewan D. Milne wrote:
> > > On Wed, 2016-12-07 at 08:55 -0800, Bart Van Assche wrote:
> > > > On 12/07/2016 08:48 AM, Bart Van Assche wrote:
> > > > > It's a known bug. Some time ago I posted a patch that 
> > > > > serializes all scsi_device_set_state() calls but I have not 
> > > > > yet found it in the list archives. However, that patch has 
> > > > > not yet been merged.
> > > > 
> > > > See also https://www.spinics.net/lists/linux-scsi/msg66966.html
> > > > .
> > > > 
> > > > Bart.
> > > > 
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux
> > > > -scsi" in
> > > > the body of a message to majord...@vger.kernel.org
> > > > More majordomo info at  
> > > > http://vger.kernel.org/majordomo-info.html
> > > 
> > > Yes, however that patch does not fix Wei Fang's issue.  In fact I
> > > just received a crash dump that appears to be the same thing.  It
> > > looks like the rport went away right after the initial INQUIRY, 
> > > so we set the state to SDEV_BLOCK and stop the queue, and then 
> > > the scan code continues and sets the state back to SDEV_RUNNING.
> > 
> > So here's the violation of the state model.  the rport went CREATED
> > ->BLOCK which is wrong: it should go CREATED->CREATED_BLOCK and 
> > then the add code would set it to BLOCK instead of RUNNING.
> > 
> > The question to diagnose is why CREATED->BLOCK worked.
> > 
> > James
> > 
> 
> I believe scsi_add_lun() changed the state from CREATED->RUNNING 
> which allowed the state to change from RUNNING->BLOCK, and then
> scsi_sysfs_add_sdev() called scsi_device_set_state() which changed
> the state from BLOCK->RUNNING.  But did not restart the queue.
> 
> I have a debug kernel out to the site that found this to make sure,
> assuming they can reproduce this, but I don't see any other way it 
> could have happened.

Hm, it looks like the state set in scsi_sysfs_add_sdev() is bogus.  We
expect the state to have been properly set before that (in
scsi_add_lun), so can we not simply remove it?

James

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: avoid a permanent stop of the scsi device's request queue

2016-12-07 Thread Ewan D. Milne
On Wed, 2016-12-07 at 10:16 -0800, James Bottomley wrote:
> On Wed, 2016-12-07 at 12:40 -0500, Ewan D. Milne wrote:
> > On Wed, 2016-12-07 at 08:55 -0800, Bart Van Assche wrote:
> > > On 12/07/2016 08:48 AM, Bart Van Assche wrote:
> > > > It's a known bug. Some time ago I posted a patch that serializes 
> > > > all scsi_device_set_state() calls but I have not yet found it in 
> > > > the list archives. However, that patch has not yet been merged.
> > > 
> > > See also https://www.spinics.net/lists/linux-scsi/msg66966.html.
> > > 
> > > Bart.
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux
> > > -scsi" in
> > > the body of a message to majord...@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > Yes, however that patch does not fix Wei Fang's issue.  In fact I 
> > just received a crash dump that appears to be the same thing.  It 
> > looks like the rport went away right after the initial INQUIRY, so we 
> > set the state to SDEV_BLOCK and stop the queue, and then the scan 
> > code continues and sets the state back to SDEV_RUNNING.
> 
> So here's the violation of the state model.  the rport went CREATED
> ->BLOCK which is wrong: it should go CREATED->CREATED_BLOCK and then
> the add code would set it to BLOCK instead of RUNNING.
> 
> The question to diagnose is why CREATED->BLOCK worked.
> 
> James
> 

I believe scsi_add_lun() changed the state from CREATED->RUNNING which
allowed the state to change from RUNNING->BLOCK, and then
scsi_sysfs_add_sdev() called scsi_device_set_state() which changed
the state from BLOCK->RUNNING.  But did not restart the queue.

I have a debug kernel out to the site that found this to make sure,
assuming they can reproduce this, but I don't see any other way it could
have happened.

-Ewan



--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: avoid a permanent stop of the scsi device's request queue

2016-12-07 Thread James Bottomley
On Wed, 2016-12-07 at 12:40 -0500, Ewan D. Milne wrote:
> On Wed, 2016-12-07 at 08:55 -0800, Bart Van Assche wrote:
> > On 12/07/2016 08:48 AM, Bart Van Assche wrote:
> > > It's a known bug. Some time ago I posted a patch that serializes 
> > > all scsi_device_set_state() calls but I have not yet found it in 
> > > the list archives. However, that patch has not yet been merged.
> > 
> > See also https://www.spinics.net/lists/linux-scsi/msg66966.html.
> > 
> > Bart.
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux
> > -scsi" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> Yes, however that patch does not fix Wei Fang's issue.  In fact I 
> just received a crash dump that appears to be the same thing.  It 
> looks like the rport went away right after the initial INQUIRY, so we 
> set the state to SDEV_BLOCK and stop the queue, and then the scan 
> code continues and sets the state back to SDEV_RUNNING.

So here's the violation of the state model.  the rport went CREATED
->BLOCK which is wrong: it should go CREATED->CREATED_BLOCK and then
the add code would set it to BLOCK instead of RUNNING.

The question to diagnose is why CREATED->BLOCK worked.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: avoid a permanent stop of the scsi device's request queue

2016-12-07 Thread Ewan D. Milne
On Wed, 2016-12-07 at 08:55 -0800, Bart Van Assche wrote:
> On 12/07/2016 08:48 AM, Bart Van Assche wrote:
> > It's a known bug. Some time ago I posted a patch that serializes all
> > scsi_device_set_state() calls but I have not yet found it in the list
> > archives. However, that patch has not yet been merged.
> 
> See also https://www.spinics.net/lists/linux-scsi/msg66966.html.
> 
> Bart.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Yes, however that patch does not fix Wei Fang's issue.  In fact I just
received a crash dump that appears to be the same thing.  It looks like
the rport went away right after the initial INQUIRY, so we set the state
to SDEV_BLOCK and stop the queue, and then the scan code continues and
sets the state back to SDEV_RUNNING.  Then, when the devloss timer
expires, we call scsi_target_unblock w/SDEV_TRANSPORT_OFFLINE, but the
SDEV_RUNNING state prevents the queue from being restarted, so a
subsequent command (i.e. the ALUA page 83 inquiry command) is stuck on
the stopped queue.  (The dump shows 3 devices on the target with queues
running in SDEV_TRANSPORT_OFFLINE, and 1 device currently being scanned
with the queue stopped in SDEV_RUNNING.)

It seems to me the problem is that scsi_device_set_state() is allowing
the caller to transition SDEV_BLOCK -> SDEV_RUNNING without actually
restarting the queue and that should be an illegal transition.

-Ewan


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: avoid a permanent stop of the scsi device's request queue

2016-12-07 Thread Bart Van Assche

On 12/06/2016 11:03 PM, Wei Fang wrote:

On 2016/12/7 12:40, Bart Van Assche wrote:

I am aware that commit 5c10e63c943b caused the behavior change. But that
doesn't mean that a fix has to undo the changes introduced by that
commit. We do not only want to make sure that the SCSI core works as
intended but also that the SCSI core code is as easy to comprehend as
reasonably possible. Adding "&& sdev->sdev_state != SDEV_RUNNING" in
scsi_internal_device_unblock() would require a long comment to explain
why that code has been added. I think modifying scsi_sysfs_add_sdev()
such that it does not unblock devices will result in code that is easier
to understand.


Agree that we should make the code easier to comprehend if possible :)

If we modify scsi_sysfs_add_sdev() as below:
...
if (scsi_device_created(sdev))
error = scsi_device_set_state(sdev, SDEV_RUNNING);
if (error)
error = scsi_device_set_state(sdev, SDEV_BLOCK);
...
there's a chance that the state will be changed to SDEV_RUNNING.


That's not what I meant. What I meant is to test the sdev state in
scsi_sysfs_add_sdev() before scsi_device_set_state() is called.


Actually I don't know quite well about the synchronization of
scsi_device_set_state(). There are so many cases it can be called
simultaneously, will the state become a unpredictable value, or this
is tolerated?


It's a known bug. Some time ago I posted a patch that serializes all 
scsi_device_set_state() calls but I have not yet found it in the list 
archives. However, that patch has not yet been merged.


Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: avoid a permanent stop of the scsi device's request queue

2016-12-07 Thread Bart Van Assche

On 12/07/2016 08:48 AM, Bart Van Assche wrote:

It's a known bug. Some time ago I posted a patch that serializes all
scsi_device_set_state() calls but I have not yet found it in the list
archives. However, that patch has not yet been merged.


See also https://www.spinics.net/lists/linux-scsi/msg66966.html.

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: avoid a permanent stop of the scsi device's request queue

2016-12-06 Thread Wei Fang
Hi, Bart,

On 2016/12/7 12:40, Bart Van Assche wrote:
> I am aware that commit 5c10e63c943b caused the behavior change. But that 
> doesn't mean that a fix has to undo the changes introduced by that 
> commit. We do not only want to make sure that the SCSI core works as 
> intended but also that the SCSI core code is as easy to comprehend as 
> reasonably possible. Adding "&& sdev->sdev_state != SDEV_RUNNING" in 
> scsi_internal_device_unblock() would require a long comment to explain 
> why that code has been added. I think modifying scsi_sysfs_add_sdev() 
> such that it does not unblock devices will result in code that is easier 
> to understand.

Agree that we should make the code easier to comprehend if possible :)

If we modify scsi_sysfs_add_sdev() as below:
...
if (scsi_device_created(sdev))
error = scsi_device_set_state(sdev, SDEV_RUNNING);
if (error)
error = scsi_device_set_state(sdev, SDEV_BLOCK);
...
there's a chance that the state will be changed to SDEV_RUNNING.

If a SCSI device is blocked after the check of the device's creating
and before being changed to SDEV_RUNNING state, the state will still
become SDEV_RUNNING. If we fix this problem in this way, we need
introduce a way to synchronize those code.

Actually I don't know quite well about the synchronization of
scsi_device_set_state(). There are so many cases it can be called
simultaneously, will the state become a unpredictable value, or this
is tolerated?

Thanks,
Wei

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: avoid a permanent stop of the scsi device's request queue

2016-12-06 Thread Bart Van Assche
On 12/06/16 19:41, Wei Fang wrote:
> On 2016/12/7 10:45, Bart Van Assche wrote:
>> The purpose of the scsi_device_set_state(sdev, SDEV_RUNNING) call in
>> scsi_sysfs_add_sdev() is to change the device state from SDEV_CREATED
>> into SDEV_RUNNING. Have you tried to modify scsi_sysfs_add_sdev() such
>> that it only changes the device state into SDEV_RUNNING if the current
>> state is SDEV_CREATED and also such that it changes SDEV_CREATED_BLOCK
>> into SDEV_BLOCK?
>
> Does those code in scsi_add_lun() have done this thing?
> ...
> ret = scsi_device_set_state(sdev, SDEV_RUNNING);
> if (ret) {
> ret = scsi_device_set_state(sdev, SDEV_BLOCK);
> ...
> }
> ...
> You mean we shouldn't change the device state from SDEV_BLOCK
> into SDEV_RUNNING in scsi_sysfs_add_sdev()?
>
> I thought it doesn't matter that the state is changed from SDEV_BLOCK
> to SDEV_RUNNING in scsi_sysfs_add_sdev(), if the queue can be unblocked
> in scsi_internal_device_unblock() in SDEV_RUNNING state. But it
> was broken since commit 5c10e63c943b.

Hello Wei,

I am aware that commit 5c10e63c943b caused the behavior change. But that 
doesn't mean that a fix has to undo the changes introduced by that 
commit. We do not only want to make sure that the SCSI core works as 
intended but also that the SCSI core code is as easy to comprehend as 
reasonably possible. Adding "&& sdev->sdev_state != SDEV_RUNNING" in 
scsi_internal_device_unblock() would require a long comment to explain 
why that code has been added. I think modifying scsi_sysfs_add_sdev() 
such that it does not unblock devices will result in code that is easier 
to understand.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: avoid a permanent stop of the scsi device's request queue

2016-12-06 Thread Wei Fang
Hi, Bart,

On 2016/12/7 10:45, Bart Van Assche wrote:
> On 12/06/16 17:21, Wei Fang wrote:
>> The state of the scsi device first is changed to SDEV_BLOCK in
>> scsi_add_lun() as you mentioned, then it will be changed to SDEV_RUNNING
>> in scsi_sysfs_add_sdev().
> 
> Hello Wei,
> 
> The purpose of the scsi_device_set_state(sdev, SDEV_RUNNING) call in 
> scsi_sysfs_add_sdev() is to change the device state from SDEV_CREATED 
> into SDEV_RUNNING. Have you tried to modify scsi_sysfs_add_sdev() such 
> that it only changes the device state into SDEV_RUNNING if the current 
> state is SDEV_CREATED and also such that it changes SDEV_CREATED_BLOCK 
> into SDEV_BLOCK?

Does those code in scsi_add_lun() have done this thing?
...
ret = scsi_device_set_state(sdev, SDEV_RUNNING);
if (ret) {
ret = scsi_device_set_state(sdev, SDEV_BLOCK);
...
}
...
You mean we shouldn't change the device state from SDEV_BLOCK
into SDEV_RUNNING in scsi_sysfs_add_sdev()?

I thought it doesn't matter that the state is changed from SDEV_BLOCK
to SDEV_RUNNING in scsi_sysfs_add_sdev(), if the queue can be unblocked
in scsi_internal_device_unblock() in SDEV_RUNNING state. But it
was broken since commit 5c10e63c943b.

Thanks,
Wei

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: avoid a permanent stop of the scsi device's request queue

2016-12-06 Thread Bart Van Assche
On 12/06/16 17:21, Wei Fang wrote:
> The state of the scsi device first is changed to SDEV_BLOCK in
> scsi_add_lun() as you mentioned, then it will be changed to SDEV_RUNNING
> in scsi_sysfs_add_sdev().

Hello Wei,

The purpose of the scsi_device_set_state(sdev, SDEV_RUNNING) call in 
scsi_sysfs_add_sdev() is to change the device state from SDEV_CREATED 
into SDEV_RUNNING. Have you tried to modify scsi_sysfs_add_sdev() such 
that it only changes the device state into SDEV_RUNNING if the current 
state is SDEV_CREATED and also such that it changes SDEV_CREATED_BLOCK 
into SDEV_BLOCK?

Thanks,

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: avoid a permanent stop of the scsi device's request queue

2016-12-06 Thread Wei Fang
Hi, Bart,

On 2016/12/6 23:51, Bart Van Assche wrote:
> On 12/06/16 01:12, Wei Fang wrote:
>> The scsi device is being setted to the SDEV_RUNNING state at the end of
>> the scan work. When the remote port reappears, scsi_target_unblock()
>> will be called, but the QUEUE_FLAG_STOPPED flag will not be cleared,
>> since scsi_internal_device_unblock() ignores SCSI devices in SDEV_RUNNING
>> state. It results in a permanent stop of the scsi device's request
>> queue. Every requests sended to it will be blocked.
> 
> Hello Wei,
> 
> scsi_device_set_state() does not allow the transition from 
> SDEV_CREATED_BLOCK to SDEV_RUNNING. If a SCSI device is blocked after it 
> has been added to the __devices list and before scsi_add_lun() finishes 
> then I think the scan code will change its state into SDEV_BLOCK. Are 
> you sure what you described above is what happened?

Yes, we already encountered this case on out machine:

The state of the scsi device first is changed to SDEV_BLOCK in
scsi_add_lun() as you metioned, then it will be changed to SDEV_RUNNING
in scsi_sysfs_add_sdev().

Thanks,
Wei

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: avoid a permanent stop of the scsi device's request queue

2016-12-06 Thread Bart Van Assche
On 12/06/16 01:12, Wei Fang wrote:
> The scsi device is being setted to the SDEV_RUNNING state at the end of
> the scan work. When the remote port reappears, scsi_target_unblock()
> will be called, but the QUEUE_FLAG_STOPPED flag will not be cleared,
> since scsi_internal_device_unblock() ignores SCSI devices in SDEV_RUNNING
> state. It results in a permanent stop of the scsi device's request
> queue. Every requests sended to it will be blocked.

Hello Wei,

scsi_device_set_state() does not allow the transition from 
SDEV_CREATED_BLOCK to SDEV_RUNNING. If a SCSI device is blocked after it 
has been added to the __devices list and before scsi_add_lun() finishes 
then I think the scan code will change its state into SDEV_BLOCK. Are 
you sure what you described above is what happened?

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] scsi: avoid a permanent stop of the scsi device's request queue

2016-12-06 Thread Wei Fang
A scan work can run simultaneously with fc_remote_port_delete().
If a scsi device is added to the ->__devices list in the scan work,
it can be touched and will be blocked in scsi_target_block(), which
will be called in fc_remote_port_delete(), and QUEUE_FLAG_STOPPED
flag will be setted to the scsi device's request queue.

The scsi device is being setted to the SDEV_RUNNING state at the end of
the scan work. When the remote port reappears, scsi_target_unblock()
will be called, but the QUEUE_FLAG_STOPPED flag will not be cleared,
since scsi_internal_device_unblock() ignores SCSI devices in SDEV_RUNNING
state. It results in a permanent stop of the scsi device's request
queue. Every requests sended to it will be blocked.

It looks like it's a regression caused by:
commit 5c10e63c943b
("[SCSI] limit state transitions in scsi_internal_device_unblock")

Fix this by restarting the device queue if the state is SDEV_RUNNING.

Reported-by: Zengxi Chen 
Signed-off-by: Wei Fang 
---
 drivers/scsi/scsi_lib.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 9ca1f17..253ee74 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2942,7 +2942,8 @@ scsi_internal_device_unblock(struct scsi_device *sdev,
else
sdev->sdev_state = SDEV_CREATED;
} else if (sdev->sdev_state != SDEV_CANCEL &&
-sdev->sdev_state != SDEV_OFFLINE)
+sdev->sdev_state != SDEV_OFFLINE &&
+sdev->sdev_state != SDEV_RUNNING)
return -EINVAL;
 
if (q->mq_ops) {
-- 
2.4.11

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html