Re: [PATCH v3 0/7] Enhance libsas hotplug feature

2017-07-14 Thread wangyijing
Hi, I'm sorry to say that I have to stop the libsas hotplug improvement work, I 
will resign from
Huawei, so I have no time and hardware to continue to work at this issue. John 
is very familiar with
this work, and provide a lot of good suggestions. So if John like, I am glad he 
could join to work
at this issues, And my colleague Jason Yan could also provide helps.


Thanks!
Yijing.


在 2017/7/10 15:06, Yijing Wang 写道:
> This patchset is based Johannes's patch
> "scsi: sas: scsi_queue_work can fail, so make callers aware"
> 
> Now the libsas hotplug has some issues, Dan Williams report
> a similar bug here before
> https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html
> 
> The issues we have found
> 1. if LLDD burst reports lots of phy-up/phy-down sas events, some events
>may lost because a same sas events is pending now, finally libsas topo
>may different the hardware.
> 2. receive a phy down sas event, libsas call sas_deform_port to remove
>devices, it would first delete the sas port, then put a destruction
>discovery event in a new work, and queue it at the tail of workqueue,
>once the sas port be deleted, its children device will be deleted too,
>when the destruction work start, it will found the target device has
>been removed, and report a sysfs warnning.
> 3. since a hotplug process will be devided into several works, if a phy up
>sas event insert into phydown works, like
>destruction work  ---> PORTE_BYTES_DMAED (sas_form_port) 
> >PHYE_LOSS_OF_SIGNAL
>the hot remove flow would broken by PORTE_BYTES_DMAED event, it's not
>we expected, and issues would occur.
> 
> The first patch fix the sas events lost, and the second one introudce 
> wait-complete
> to fix the hotplug order issues.
> 
> v2->v3: some code improvements suggested by Johannes and John,
>   split v2 patch 2 into several small pathes.
> v1->v2: some code improvements suggested by John Garry
> 
> Yijing Wang (7):
>   libsas: Use static sas event pool to appease sas event lost
>   libsas: remove unused port_gone_completion
>   libsas: Use new workqueue to run sas event
>   libsas: add sas event wait-complete support
>   libsas: add a new workqueue to run probe/destruct discovery event
>   libsas: add wait-complete support to sync discovery event
>   libsas: release disco mutex during waiting in sas_ex_discover_end_dev
> 
>  drivers/scsi/libsas/sas_discover.c |  58 +++---
>  drivers/scsi/libsas/sas_event.c| 212 
> -
>  drivers/scsi/libsas/sas_expander.c |  22 +++-
>  drivers/scsi/libsas/sas_init.c |  21 ++--
>  drivers/scsi/libsas/sas_internal.h |  64 +++
>  drivers/scsi/libsas/sas_phy.c  |  48 +++--
>  drivers/scsi/libsas/sas_port.c |  22 ++--
>  include/scsi/libsas.h  |  27 +++--
>  8 files changed, 373 insertions(+), 101 deletions(-)
> 



Re: [PATCH v3 4/7] libsas: add sas event wait-complete support

2017-07-14 Thread wangyijing


在 2017/7/14 14:51, Hannes Reinecke 写道:
> On 07/10/2017 09:06 AM, Yijing Wang wrote:
>> Introduce wait-complete for libsas sas event processing,
>> execute sas port create/destruct in sync.
>>
>> Signed-off-by: Yijing Wang 
>> CC: John Garry 
>> CC: Johannes Thumshirn 
>> CC: Ewan Milne 
>> CC: Christoph Hellwig 
>> CC: Tomas Henzl 
>> CC: Dan Williams 
>> ---
>>  drivers/scsi/libsas/sas_discover.c | 41 
>> --
>>  drivers/scsi/libsas/sas_internal.h | 34 +++
>>  drivers/scsi/libsas/sas_port.c |  4 
>>  include/scsi/libsas.h  |  5 -
>>  4 files changed, 72 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/scsi/libsas/sas_discover.c 
>> b/drivers/scsi/libsas/sas_discover.c
>> index 60de662..5d4a3a8 100644
>> --- a/drivers/scsi/libsas/sas_discover.c
>> +++ b/drivers/scsi/libsas/sas_discover.c
>> @@ -525,16 +525,43 @@ static void sas_revalidate_domain(struct work_struct 
>> *work)
>>  mutex_unlock(>disco_mutex);
>>  }
>>  
>> +static const work_func_t sas_event_fns[DISC_NUM_EVENTS] = {
>> +[DISCE_DISCOVER_DOMAIN] = sas_discover_domain,
>> +[DISCE_REVALIDATE_DOMAIN] = sas_revalidate_domain,
>> +[DISCE_PROBE] = sas_probe_devices,
>> +[DISCE_SUSPEND] = sas_suspend_devices,
>> +[DISCE_RESUME] = sas_resume_devices,
>> +[DISCE_DESTRUCT] = sas_destruct_devices,
>> +};
>> +
>> +/* a simple wrapper for sas discover event funtions */
>> +static void sas_discover_common_fn(struct work_struct *work)
>> +{
>> +struct sas_discovery_event *ev = to_sas_discovery_event(work);
>> +struct asd_sas_port *port = ev->port;
>> +
>> +sas_event_fns[ev->type](work);
>> +sas_port_put(port);
>> +}
>> +
>> +
>>  /* -- Events -- */
>>  
>>  static void sas_chain_work(struct sas_ha_struct *ha, struct sas_work *sw)
>>  {
>> +int ret;
>> +struct sas_discovery_event *ev = to_sas_discovery_event(>work);
>> +struct asd_sas_port *port = ev->port;
>> +
>>  /* chained work is not subject to SA_HA_DRAINING or
>>   * SAS_HA_REGISTERED, because it is either submitted in the
>>   * workqueue, or known to be submitted from a context that is
>>   * not racing against draining
>>   */
>> -scsi_queue_work(ha->core.shost, >work);
>> +sas_port_get(port);
>> +ret = scsi_queue_work(ha->core.shost, >work);
>> +if (ret != 1)
>> +sas_port_put(port);
>>  }
>>  
>>  static void sas_chain_event(int event, unsigned long *pending,
>> @@ -575,18 +602,10 @@ void sas_init_disc(struct sas_discovery *disc, struct 
>> asd_sas_port *port)
>>  {
>>  int i;
>>  
>> -static const work_func_t sas_event_fns[DISC_NUM_EVENTS] = {
>> -[DISCE_DISCOVER_DOMAIN] = sas_discover_domain,
>> -[DISCE_REVALIDATE_DOMAIN] = sas_revalidate_domain,
>> -[DISCE_PROBE] = sas_probe_devices,
>> -[DISCE_SUSPEND] = sas_suspend_devices,
>> -[DISCE_RESUME] = sas_resume_devices,
>> -[DISCE_DESTRUCT] = sas_destruct_devices,
>> -};
>> -
>>  disc->pending = 0;
>>  for (i = 0; i < DISC_NUM_EVENTS; i++) {
>> -INIT_SAS_WORK(>disc_work[i].work, sas_event_fns[i]);
>> +INIT_SAS_WORK(>disc_work[i].work, sas_discover_common_fn);
>>  disc->disc_work[i].port = port;
>> +disc->disc_work[i].type = i;
>>  }
>>  }
>> diff --git a/drivers/scsi/libsas/sas_internal.h 
>> b/drivers/scsi/libsas/sas_internal.h
>> index f03ce64..890b5d26 100644
>> --- a/drivers/scsi/libsas/sas_internal.h
>> +++ b/drivers/scsi/libsas/sas_internal.h
>> @@ -100,6 +100,40 @@ void sas_free_device(struct kref *kref);
>>  extern const work_func_t sas_phy_event_fns[PHY_NUM_EVENTS];
>>  extern const work_func_t sas_port_event_fns[PORT_NUM_EVENTS];
>>  
>> +static void sas_complete_event(struct kref *kref)
>> +{
>> +struct asd_sas_port *port = container_of(kref, struct asd_sas_port, 
>> ref);
>> +
>> +complete_all(>completion);
>> +}
>> +
>> +static inline void sas_port_put(struct asd_sas_port *port)
>> +{
>> +if (port->is_sync)
>> +kref_put(>ref, sas_complete_event);
>> +}
>> +
>> +static inline void sas_port_wait_init(struct asd_sas_port *port)
>> +{
>> +init_completion(>completion);
>> +kref_init(>ref);
>> +port->is_sync = true;
>> +}
>> +
>> +static inline void sas_port_wait_completion(
>> +struct asd_sas_port *port)
>> +{
>> +sas_port_put(port);
>> +wait_for_completion(>completion);
>> +port->is_sync = false;
>> +}
>> +
>> +static inline void sas_port_get(struct asd_sas_port *port)
>> +{
>> +if (port && port->is_sync)
>> +kref_get(>ref);
>> +}
>> +
>>  #ifdef CONFIG_SCSI_SAS_HOST_SMP
>>  extern int sas_smp_host_handler(struct Scsi_Host *shost, struct request 
>> *req,
>>  

Re: [PATCH v3 7/7] libsas: release disco mutex during waiting in sas_ex_discover_end_dev

2017-07-13 Thread wangyijing


在 2017/7/14 0:10, John Garry 写道:
> On 10/07/2017 08:06, Yijing Wang wrote:
>> Disco mutex was introudced to prevent domain rediscovery competing
>> with ata error handling(87c8331). If we have already hold the lock
>> in sas_revalidate_domain and sync executing probe, deadlock caused,
>> because, sas_probe_sata() also need hold disco_mutex. Since disco mutex
>> use to prevent revalidata domain happen during ata error handler,
>> it should be safe to release disco mutex when sync probe, because
>> no new revalidate domain event would be process until the sync return,
>> and the current sas revalidate domain finish.
>>
> 
> So with your changes we have a chain of synchronised events, running in 
> separate queues. In theory it sounds ok.
> 
> Then, as you said in the commit message, sas_revalidate_domain() holds the 
> disco mutex while *all* these chained events occur; so we will continue to 
> hold the mutex until we have revalidated the domain, meaning until we have 
> finished destroying or probing new devices.
> 
> But in the domain revalidation when you discover a new ATA device, function 
> sas_probe_sata() wants to grab the disco mutex and you just temporarily 
> release it, even though adding a new ATA device kicks in EH. This defeats the 
> principal of using a mutex at all, which is (according to 87c8331 commit 
> message) to mutually exclude the domain re-discovery (which has not actually 
> finished) and the ATA EH (and device destruction).
> 
> Anyway, since we are synchronising this series of events (broadcast event, 
> domain rediscovery, and device destruction), surely it should be possible to 
> include the ATA EH as well, so we can actually get rid of the disco mutex 
> finally, right?

Yes, disco mutex make this issue complex, I checked the commit history, Dan 
introudce disco mutex and probe/destruct discovery event, so it seems to
need a big rework to the libsas process logic, I am so sorry that I have no 
more time to deal with it, I will leave today, if you like, you could
rework my patchset or add additional changes based this patchset.



> 
> Note: I think that there is a problem which you have not seen. Consider 
> removing a ATA disk with IO active conncted to an expander:
> - LLDD sends brodcast event
> - sas_revalidate_domain(), which grabs disco mutex
> - revalidate finds dev is gone
> - destruct device, which calls sas_rphy_delete
> - this waits on command queue to drain
> - commands time out and EH thread kicks in
> - sas_ata_strategy_handler() called
> - domain revalidation disable attempted
> - try to grab disco mutex = Deadlock.

Yes, it's a issue I haven't found.


Thanks!
Yijing.




Hi John, I also agree to rework disco mutex


> 
> Thanks,
> John
> 
>> Signed-off-by: Yijing Wang 
>> CC: John Garry 
>> CC: Johannes Thumshirn 
>> CC: Ewan Milne 
>> CC: Christoph Hellwig 
>> CC: Tomas Henzl 
>> CC: Dan Williams 
>> ---
>>  drivers/scsi/libsas/sas_expander.c | 10 ++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/drivers/scsi/libsas/sas_expander.c 
>> b/drivers/scsi/libsas/sas_expander.c
>> index 9d26c28..077024e 100644
>> --- a/drivers/scsi/libsas/sas_expander.c
>> +++ b/drivers/scsi/libsas/sas_expander.c
>> @@ -776,6 +776,7 @@ static struct domain_device *sas_ex_discover_end_dev(
>>  struct ex_phy *phy = _ex->ex_phy[phy_id];
>>  struct domain_device *child = NULL;
>>  struct sas_rphy *rphy;
>> +bool prev_lock;
>>  int res;
>>
>>  if (phy->attached_sata_host || phy->attached_sata_ps)
>> @@ -803,6 +804,7 @@ static struct domain_device *sas_ex_discover_end_dev(
>>  sas_ex_get_linkrate(parent, child, phy);
>>  sas_device_set_phy(child, phy->port);
>>
>> +prev_lock = mutex_is_locked(>port->ha->disco_mutex);
>>  #ifdef CONFIG_SCSI_SAS_ATA
>>  if ((phy->attached_tproto & SAS_PROTOCOL_STP) || 
>> phy->attached_sata_dev) {
>>  res = sas_get_ata_info(child, phy);
>> @@ -832,7 +834,11 @@ static struct domain_device *sas_ex_discover_end_dev(
>>  SAS_ADDR(parent->sas_addr), phy_id, res);
>>  goto out_list_del;
>>  }
>> +if (prev_lock)
>> +mutex_unlock(>port->ha->disco_mutex);
>>  sas_disc_wait_completion(child->port, DISCE_PROBE);
>> +if (prev_lock)
>> +mutex_lock(>port->ha->disco_mutex);
>>
>>  } else
>>  #endif
>> @@ -861,7 +867,11 @@ static struct domain_device *sas_ex_discover_end_dev(
>>  SAS_ADDR(parent->sas_addr), phy_id, res);
>>  goto out_list_del;
>>  }
>> +if (prev_lock)
>> +mutex_unlock(>port->ha->disco_mutex);
>>  sas_disc_wait_completion(child->port, DISCE_PROBE);
>> +if (prev_lock)
>> +mutex_lock(>port->ha->disco_mutex);
>>  } else {
>>  

Re: [PATCH v3 0/7] Enhance libsas hotplug feature

2017-07-13 Thread wangyijing


在 2017/7/13 16:08, John Garry 写道:
> On 13/07/2017 02:37, wangyijing wrote:
>>> > So much nicer. BTW, /dev/sdb is a SATA disk, the rest are SAS.
>> Oh, I take a mistake ? The result you tested the hotplug which applied this 
>> patchset is fine ?
>>
>> Thanks!
>> Yijing.
> 
> Well basic hotplug is fine, as below. I did not do any robust testing.
> 

OK, thanks,I tested with and without fio running, the results are both fine.

Thanks!
Yijing.

> root@(none)$ echo 0 > ./phy-0:7/sas_phy/phy-0:7/enable
> root@(none)$ [  180.147676] hisi_sas_v2_hw HISI0162:01: found dev[8:1] is gone
> [  180.216558] hisi_sas_v2_hw HISI0162:01: found dev[7:1] is gone
> [  180.280548] hisi_sas_v2_hw HISI0162:01: found dev[6:1] is gone
> [  180.352556] hisi_sas_v2_hw HISI0162:01: found dev[5:1] is gone
> [  180.432495] hisi_sas_v2_hw HISI0162:01: found dev[4:1] is gone
> [  180.508492] hisi_sas_v2_hw HISI0162:01: found dev[3:1] is gone
> [  180.527577] sd 0:0:1:0: [sdb] Synchronizing SCSI cache
> [  180.532728] sd 0:0:1:0: [sdb] Synchronize Cache(10) failed: Result: 
> hostbyte=0x04 driverbyte=0x00
> [  180.541591] sd 0:0:1:0: [sdb] Stopping disk
> [  180.545767] sd 0:0:1:0: [sdb] Start/Stop Unit failed: Result: 
> hostbyte=0x04 driverbyte=0x00
> [  180.612491] hisi_sas_v2_hw HISI0162:01: found dev[2:5] is gone
> [  180.696452] hisi_sas_v2_hw HISI0162:01: found dev[1:1] is gone
> [  180.703221] hisi_sas_v2_hw HISI0162:01: found dev[0:2] is gone
> 
> root@(none)$ echo 1 > ./phy-0:7/sas_phy/phy-0:7/enable
> root@(none)$ [  185.937831] hisi_sas_v2_hw HISI0162:01: phyup: phy7 
> link_rate=11
> [  185.996575] scsi 0:0:8:0: Direct-Access SanDisk  LT0200MO P404 PQ: 0 
> ANSI: 6
> [  187.059642] ata2.00: ATA-8: HGST HUS724040ALA640, MFAOA8B0, max UDMA/133
> [  187.066341] ata2.00: 7814037168 sectors, multi 0: LBA48 NCQ (depth 31/32)
> [  187.073278] ata2.00: ATA Identify Device Log not supported
> [  187.078755] ata2.00: Security Log not supported
> [  187.085239] ata2.00: ATA Identify Device Log not supported
> [  187.090715] ata2.00: Security Log not supported
> [  187.095236] ata2.00: configured for UDMA/133
> [  187.136917] scsi 0:0:9:0: Direct-Access ATA  HGST HUS724040AL A8B0 
> PQ: 0 ANSI: 5
> [  187.187612] sd 0:0:9:0: [sdb] 7814037168 512-byte logical blocks: (4.00 
> TB/3.64 TiB)
> [  187.195365] sd 0:0:9:0: [sdb] Write Protect is off
> [  187.200161] sd 0:0:9:0: [sdb] Write cache: enabled, read cache: enabled, 
> doesn't support DPO or FUA
> [  187.223844] sd 0:0:9:0: [sdb] Attached SCSI disk
> [  187.225498] scsi 0:0:10:0: Direct-Access SanDisk  LT0200MO  P404 PQ: 0 
> ANSI: 6
> [  187.243864] sd 0:0:8:0: [sda] 390721968 512-byte logical blocks: (200 
> GB/186 GiB)
> [  187.285879] sd 0:0:8:0: [sda] Write Protect is off
> [  187.367898] sd 0:0:8:0: [sda] Write cache: disabled, read cache: disabled, 
> supports DPO and FUA
> [  187.524043] scsi 0:0:11:0: Direct-Access SanDisk  LT0200MO  P404 PQ: 0 
> ANSI: 6
> [  187.701505] sd 0:0:10:0: [sdc] 390721968 512-byte logical blocks: (200 
> GB/186 GiB)
> [  187.743547] sd 0:0:10:0: [sdc] Write Protect is off
> [  187.822546] scsi 0:0:12:0: Direct-Access SanDisk  LT0200MO  P404 PQ: 0 
> ANSI: 6
> [  187.825531] sd 0:0:10:0: [sdc] Write cache: disabled, read cache: 
> disabled, supports DPO and FUA
> [  188.000167] sd 0:0:11:0: [sdd] 390721968 512-byte logical blocks: (200 
> GB/186 GiB)
> [  188.042205] sd 0:0:11:0: [sdd] Write Protect is off
> [  188.121527] scsi 0:0:13:0: Direct-Access SanDisk  LT0200MO  P404 PQ: 0 
> ANSI: 6
> [  188.124274] sd 0:0:11:0: [sdd] Write cache: disabled, read cache: 
> disabled, supports DPO and FUA
> [  188.298942] sd 0:0:12:0: [sde] 390721968 512-byte logical blocks: (200 
> GB/186 GiB)
> [  188.340960] sd 0:0:12:0: [sde] Write Protect is off
> [  188.420023] scsi 0:0:14:0: Direct-Access SanDisk  LT0200MO  P404 PQ: 0 
> ANSI: 6
> [  188.422969] sd 0:0:12:0: [sde] Write cache: disabled, read cache: 
> disabled, supports DPO and FUA
> [  188.597501] sd 0:0:13:0: [sdf] 390721968 512-byte logical blocks: (200 
> GB/186 GiB)
> [  188.605069] sd 0:0:8:0: [sda] Attached SCSI disk
> [  188.639520] sd 0:0:13:0: [sdf] Write Protect is off
> [  188.682445] scsi 0:0:15:0: Enclosure 12G SAS  Expander  RevB PQ: 0 
> ANSI: 6
> [  188.721540] sd 0:0:13:0: [sdf] Write cache: disabled, read cache: 
> disabled, supports DPO and FUA
> [  188.896399] sd 0:0:14:0: [sdg] 390721968 512-byte logical blocks: (200 
> GB/186 GiB)
> [  188.938445] sd 0:0:14:0: [sdg] Write Protect is off
> [  189.020444] sd 0:0:14:0: [sdg] Write cache: disabled, read cache: 
> disabled, supports DPO and FUA
> [  189.060608] sd 0:0:10:0: [sdc] Attached

Re: [PATCH v3 5/7] libsas: add a new workqueue to run probe/destruct discovery event

2017-07-12 Thread wangyijing


在 2017/7/13 0:50, John Garry 写道:
> On 10/07/2017 08:06, Yijing Wang wrote:
>> Sometimes, we want sync libsas probe or destruct in sas discovery work,
>> like when libsas revalidate domain. We need to split probe and destruct
>> work from the scsi host workqueue.
>>
>> Signed-off-by: Yijing Wang 
>> CC: John Garry 
>> CC: Johannes Thumshirn 
>> CC: Ewan Milne 
>> CC: Christoph Hellwig 
>> CC: Tomas Henzl 
>> CC: Dan Williams 
>> ---
>>  drivers/scsi/libsas/sas_discover.c | 13 -
>>  drivers/scsi/libsas/sas_init.c |  8 
>>  include/scsi/libsas.h  |  1 +
>>  3 files changed, 21 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/libsas/sas_discover.c 
>> b/drivers/scsi/libsas/sas_discover.c
>> index 5d4a3a8..a25d648 100644
>> --- a/drivers/scsi/libsas/sas_discover.c
>> +++ b/drivers/scsi/libsas/sas_discover.c
>> @@ -559,7 +559,18 @@ static void sas_chain_work(struct sas_ha_struct *ha, 
>> struct sas_work *sw)
>>   * not racing against draining
>>   */
>>  sas_port_get(port);
>> -ret = scsi_queue_work(ha->core.shost, >work);
>> +/*
>> + * discovery event probe and destruct would be called in other
>> + * discovery event like discover domain and revalidate domain
>> + * events, in some cases, we need to sync execute probe and destruct
>> + * events, so run probe and destruct discover events except in a new
>> + * workqueue.
> 
> Can we just use ha->disc_q for all discovery events for simplicity? Would 
> this solve the disco mutex you try to solve in patch 7/7?

No, since we could queue sas discovery probe/destruct event in sas discovery 
work(like sas_revalidate_domain)

sas_revalidate_domain
sas_ex_revalidate_domain
sas_rediscover
sas_rediscover_dev
sas_unregister_devs_sas_addr
sas_unregister_dev
sas_discover_event(dev->port, 
DISCE_DESTRUCT)
sas_discover_new
sas_ex_discover_devices
sas_ex_discover_dev
sas_ex_discover_end_dev

sas_discover_end_dev

sas_discover_event(dev->port, DISCE_PROBE)

So we could not sync probe or destruct sas discovery event in 
sas_revalidate_domain if they are queued in a same workqueue,
or it would block for ever.


> 
>> + */
>> +if (ev->type == DISCE_PROBE || ev->type == DISCE_DESTRUCT)
>> +ret = queue_work(ha->disc_q, >work);
>> +else
>> +ret = scsi_queue_work(ha->core.shost, >work);
>> +
>>  if (ret != 1)
> 
> Uhh, we are mixing bool and int here... but we're forced to by 
> scsi_queue_work()

Eagle eye :), I could check the return value separately.

Thanks!
Yijing.


> 
>>  sas_port_put(port);
>>  }
>> diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
>> index 2f3b736..df1d78b 100644
>> --- a/drivers/scsi/libsas/sas_init.c
>> +++ b/drivers/scsi/libsas/sas_init.c
>> @@ -152,6 +152,13 @@ int sas_register_ha(struct sas_ha_struct *sas_ha)
>>  if (!sas_ha->event_q)
>>  goto Undo_ports;
>>
>> +snprintf(name, 64, "%s_disc_q", dev_name(sas_ha->dev));
>> +sas_ha->disc_q = create_singlethread_workqueue(name);
>> +if(!sas_ha->disc_q) {
>> +destroy_workqueue(sas_ha->event_q);
>> +goto Undo_ports;
>> +}
>> +
>>  INIT_LIST_HEAD(_ha->eh_done_q);
>>  INIT_LIST_HEAD(_ha->eh_ata_q);
>>
>> @@ -187,6 +194,7 @@ int sas_unregister_ha(struct sas_ha_struct *sas_ha)
>>  __sas_drain_work(sas_ha);
>>  mutex_unlock(_ha->drain_mutex);
>>  destroy_workqueue(sas_ha->event_q);
>> +destroy_workqueue(sas_ha->disc_q);
>>
>>  return 0;
>>  }
>> diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
>> index c2ef05e..4bcb9fe 100644
>> --- a/include/scsi/libsas.h
>> +++ b/include/scsi/libsas.h
>> @@ -406,6 +406,7 @@ struct sas_ha_struct {
>>  struct device *dev;  /* should be set */
>>  struct module *lldd_module; /* should be set */
>>  struct workqueue_struct *event_q;
>> +struct workqueue_struct *disc_q;
>>
>>  u8 *sas_addr;  /* must be set */
>>  u8 hashed_sas_addr[HASHED_SAS_ADDR_SIZE];
>>
> 
> 
> 
> .
> 



Re: [PATCH v3 6/7] libsas: add wait-complete support to sync discovery event

2017-07-12 Thread wangyijing


在 2017/7/12 21:51, John Garry 写道:
> On 10/07/2017 08:06, Yijing Wang wrote:
>>
>>  static void sas_chain_event(int event, unsigned long *pending,
>> @@ -592,9 +596,9 @@ int sas_discover_event(struct asd_sas_port *port, enum 
>> discover_event ev)
>>  {
>>  struct sas_discovery *disc;
>>
>> +disc = >disc;
>>  if (!port)
>>  return 0;
>> -disc = >disc;
>>
>>  BUG_ON(ev >= DISC_NUM_EVENTS);
>>
>> diff --git a/drivers/scsi/libsas/sas_expander.c 
>> b/drivers/scsi/libsas/sas_expander.c
>> index 570b2cb..9d26c28 1
> 
> I was just looking through the code and I noticed this, above. Is there a 
> specific reason to move the NULL check, or was it modified accidentally?
> 
> I mean, if port is NULL I don't think we would get as far as checking it as 
> we would have already de-referenced it.

Oh, sorry, it's a accidental change, good catch, thanks!

> 
> 
> 
> .
> 



Re: [PATCH v3 1/7] libsas: Use static sas event pool to appease sas event lost

2017-07-12 Thread wangyijing
 There is no special meaning for the pool size, if flutter of > 25 events, 
 notify sas events will return error, and the further step work is 
 depending on LLDD drivers.
 I hope libsas could do more work in this case, but now it seems a little 
 difficult, this patch may be a interim fix, until we find a perfect 
 solution.
>>>
>>> The principal of having a fixed-sized pool is ok, even though the pool size 
>>> needs more consideration.
>>>
>>> However my issue is how to handle pool exhaustion. For a start, relaying 
>>> info to the LLDD that the event notification failed is probably not the way 
>>> to go. I only now noticed "scsi: sas: scsi_queue_work can fail, so make 
>>> callers aware" made it into the kernel; as I mentioned in response to this 
>>> patch, the LLDD does not know how to handle this (and no LLDDs do actually 
>>> handle this).
>>>
>>> I would say it is better to shut down the PHY from libsas (As Dan mentioned 
>>> in the v1 series) when the pool exhausts, under the assumption that the PHY 
>>> has gone into some erroneous state. The user can later re-enable the PHY 
>>> from sysfs, if required.
>>
>> I considered this suggestion, and what I am worried about are, first if we 
>> disable phy once the sas event pool exhausts, it may hurt the pending sas 
>> event process which has been queued,
> 
> I don't see how it affects currently queued events - they should just be 
> processed normally. As for LLDD reporting events when the pool is exhausted, 
> they are just lost.

So if we disable a phy, it's nothing affect to the already queued sas event 
process, which including access the phy to find target device ?

> 
>> second, if phy was disabled, and no one trigger the reenable by sysfs, the 
>> LLDD has no way to post new sas phy events.
> 
> For the extreme scenario of pool becoming exhausted and PHY being disabled, 
> it should remain disabled until user takes some action to fix originating 
> problem.

So we should print explicit message to tell user what's happen and how to fix 
it.

Thanks!
Yijing.

> 
>>
>> Thanks!
>> Yijing.
>>
>>>
>>> Much appreciated,
>>> John
>>>

 Thanks!
 Yijing.

>
> Thanks,
> John
>
>
> .
>


 .

>>>
>>>
>>>
>>> .
>>>
>>
>>
>> .
>>
> 
> 
> 
> .
> 



Re: [PATCH v3 0/7] Enhance libsas hotplug feature

2017-07-12 Thread wangyijing


在 2017/7/12 17:59, John Garry 写道:
> On 10/07/2017 08:06, Yijing Wang wrote:
>> This patchset is based Johannes's patch
>> "scsi: sas: scsi_queue_work can fail, so make callers aware"
>>
>> Now the libsas hotplug has some issues, Dan Williams report
>> a similar bug here before
>> https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html
>>
>> The issues we have found
>> 1. if LLDD burst reports lots of phy-up/phy-down sas events, some events
>>may lost because a same sas events is pending now, finally libsas topo
>>may different the hardware.
>> 2. receive a phy down sas event, libsas call sas_deform_port to remove
>>devices, it would first delete the sas port, then put a destruction
>>discovery event in a new work, and queue it at the tail of workqueue,
>>once the sas port be deleted, its children device will be deleted too,
>>when the destruction work start, it will found the target device has
>>been removed, and report a sysfs warnning.
>> 3. since a hotplug process will be devided into several works, if a phy up
>>sas event insert into phydown works, like
>>destruction work  ---> PORTE_BYTES_DMAED (sas_form_port) 
>> >PHYE_LOSS_OF_SIGNAL
>>the hot remove flow would broken by PORTE_BYTES_DMAED event, it's not
>>we expected, and issues would occur.
>>
>> The first patch fix the sas events lost, and the second one introudce 
>> wait-complete
>> to fix the hotplug order issues.
>>
> 
> I quickly tested this for basic hotplug.
> 
> Before:
> root@(none)$ echo 0 > ./phy-0:6/sas_phy/phy-0:6/enable
> root@(none)$ echo 0 > ./phy-0:5/sas_phy/phy-0:5/enable
> root@(none)$ echo 0 > ./phy-0:4/sas_phy/phy-0:4/enable
> root@(none)$ echo 0 > ./phy-0:3/sas_phy/phy-0:3/enable
> root@(none)$ echo 0 > ./phy-0:3/sas_phy/phy-0:2/enable
> root@(none)$ echo 0 > ./phy-0:2/sas_phy/phy-0:2/enable
> root@(none)$ echo 0 > ./phy-0:1/sas_phy/phy-0:1/enable
> root@(none)$ echo 0 > ./phy-0:0/sas_phy/phy-0:0/enable
> root@(none)$ echo 0 > ./phy-0:7/sas_phy/phy-0:7/enable
> root@(none)$ [  102.570694] sysfs group 'power' not found for kobject 
> '0:0:7:0'
> [  102.577250] [ cut here ]
> [  102.581861] WARNING: CPU: 3 PID: 1740 at fs/sysfs/group.c:237 
> sysfs_remove_group+0x8c/0x94
> [  102.590110] Modules linked in:
> [  102.593154] CPU: 3 PID: 1740 Comm: kworker/u128:2 Not tainted 
> 4.12.0-rc1-00032-g3ab81fc #1907
> [  102.601664] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon D05 
> UEFI Nemo 1.7 RC3 06/23/2017
> [  102.610784] Workqueue: scsi_wq_0 sas_destruct_devices
> [  102.615822] task: 8017d4793400 task.stack: 8017b7e7
> [  102.621728] PC is at sysfs_remove_group+0x8c/0x94
> [  102.626419] LR is at sysfs_remove_group+0x8c/0x94
> [  102.631109] pc : [] lr : [] pstate: 
> 6045
> [  102.638490] sp : 8017b7e73b80
> [  102.641791] x29: 8017b7e73b80 x28: 8017db010800
> [  102.647091] x27: 08e27000 x26: 8017d43e6600
> [  102.652390] x25: 8017b828 x24: 0003
> [  102.657689] x23: 8017b78864b0 x22: 8017b784c988
> [  102.662988] x21: 8017b7886410 x20: 08ee9dd0
> [  102.668288] x19:  x18: 08a1b678
> [  102.673587] x17: 000e x16: 0007
> [  102.678886] x15:  x14: 00a3
> [  102.684185] x13: 0033 x12: 0028
> [  102.689484] x11: 08f3be58 x10: 
> [  102.694783] x9 : 043c x8 : 6f6b20726f662064
> [  102.700082] x7 : 08e29e08 x6 : 8017fbe34c50
> [  102.705382] x5 :  x4 : 
> [  102.710681] x3 :  x2 : 08e427e0
> [  102.715980] x1 :  x0 : 0033
> [  102.721279] ---[ end trace c216cc1451d5f7ec ]---
> [  102.725882] Call trace:
> [  102.728316] Exception stack(0x8017b7e739b0 to 0x8017b7e73ae0)
> [  102.734742] 39a0:    
> 0001
> [  102.742557] 39c0: 8017b7e73b80 08267c44 08bfa050 
> 
> [  102.750372] 39e0: 8017b78864b0 0003 8017b828 
> 8017d43e6600
> [  102.758188] 3a00: 08e27000 8017db010800 8017d4793400 
> 
> [  102.766003] 3a20: 8017b7e73b80 8017b7e73b80 8017b7e73b40 
> ffc8
> [  102.773818] 3a40: 8017b7e73a70 0810c12c 0033 
> 
> [  102.781633] 3a60: 08e427e0   
> 
> [  102.789449] 3a80: 8017fbe34c50 08e29e08 6f6b20726f662064 
> 043c
> [  102.797264] 3aa0:  08f3be58 0028 
> 0033
> [  102.805079] 3ac0: 00a3  0007 
> 000e
> [  102.812895] [] sysfs_remove_group+0x8c/0x94
> [  102.818628] [] dpm_sysfs_remove+0x58/0x68
> [  102.824188] [] 

Re: [PATCH v3 0/7] Enhance libsas hotplug feature

2017-07-12 Thread wangyijing


在 2017/7/12 17:59, John Garry 写道:
> On 10/07/2017 08:06, Yijing Wang wrote:
>> This patchset is based Johannes's patch
>> "scsi: sas: scsi_queue_work can fail, so make callers aware"
>>
>> Now the libsas hotplug has some issues, Dan Williams report
>> a similar bug here before
>> https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html
>>
>> The issues we have found
>> 1. if LLDD burst reports lots of phy-up/phy-down sas events, some events
>>may lost because a same sas events is pending now, finally libsas topo
>>may different the hardware.
>> 2. receive a phy down sas event, libsas call sas_deform_port to remove
>>devices, it would first delete the sas port, then put a destruction
>>discovery event in a new work, and queue it at the tail of workqueue,
>>once the sas port be deleted, its children device will be deleted too,
>>when the destruction work start, it will found the target device has
>>been removed, and report a sysfs warnning.
>> 3. since a hotplug process will be devided into several works, if a phy up
>>sas event insert into phydown works, like
>>destruction work  ---> PORTE_BYTES_DMAED (sas_form_port) 
>> >PHYE_LOSS_OF_SIGNAL
>>the hot remove flow would broken by PORTE_BYTES_DMAED event, it's not
>>we expected, and issues would occur.
>>
>> The first patch fix the sas events lost, and the second one introudce 
>> wait-complete
>> to fix the hotplug order issues.
>>
> 
> I quickly tested this for basic hotplug.
> 
> Before:
> root@(none)$ echo 0 > ./phy-0:6/sas_phy/phy-0:6/enable
> root@(none)$ echo 0 > ./phy-0:5/sas_phy/phy-0:5/enable
> root@(none)$ echo 0 > ./phy-0:4/sas_phy/phy-0:4/enable
> root@(none)$ echo 0 > ./phy-0:3/sas_phy/phy-0:3/enable
> root@(none)$ echo 0 > ./phy-0:3/sas_phy/phy-0:2/enable
> root@(none)$ echo 0 > ./phy-0:2/sas_phy/phy-0:2/enable
> root@(none)$ echo 0 > ./phy-0:1/sas_phy/phy-0:1/enable
> root@(none)$ echo 0 > ./phy-0:0/sas_phy/phy-0:0/enable
> root@(none)$ echo 0 > ./phy-0:7/sas_phy/phy-0:7/enable
> root@(none)$ [  102.570694] sysfs group 'power' not found for kobject 
> '0:0:7:0'
> [  102.577250] [ cut here ]
> [  102.581861] WARNING: CPU: 3 PID: 1740 at fs/sysfs/group.c:237 
> sysfs_remove_group+0x8c/0x94
> [  102.590110] Modules linked in:
> [  102.593154] CPU: 3 PID: 1740 Comm: kworker/u128:2 Not tainted 
> 4.12.0-rc1-00032-g3ab81fc #1907
> [  102.601664] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon D05 
> UEFI Nemo 1.7 RC3 06/23/2017
> [  102.610784] Workqueue: scsi_wq_0 sas_destruct_devices
> [  102.615822] task: 8017d4793400 task.stack: 8017b7e7
> [  102.621728] PC is at sysfs_remove_group+0x8c/0x94
> [  102.626419] LR is at sysfs_remove_group+0x8c/0x94
> [  102.631109] pc : [] lr : [] pstate: 
> 6045
> [  102.638490] sp : 8017b7e73b80
> [  102.641791] x29: 8017b7e73b80 x28: 8017db010800
> [  102.647091] x27: 08e27000 x26: 8017d43e6600
> [  102.652390] x25: 8017b828 x24: 0003
> [  102.657689] x23: 8017b78864b0 x22: 8017b784c988
> [  102.662988] x21: 8017b7886410 x20: 08ee9dd0
> [  102.668288] x19:  x18: 08a1b678
> [  102.673587] x17: 000e x16: 0007
> [  102.678886] x15:  x14: 00a3
> [  102.684185] x13: 0033 x12: 0028
> [  102.689484] x11: 08f3be58 x10: 
> [  102.694783] x9 : 043c x8 : 6f6b20726f662064
> [  102.700082] x7 : 08e29e08 x6 : 8017fbe34c50
> [  102.705382] x5 :  x4 : 
> [  102.710681] x3 :  x2 : 08e427e0
> [  102.715980] x1 :  x0 : 0033
> [  102.721279] ---[ end trace c216cc1451d5f7ec ]---
> [  102.725882] Call trace:
> [  102.728316] Exception stack(0x8017b7e739b0 to 0x8017b7e73ae0)
> [  102.734742] 39a0:    
> 0001
> [  102.742557] 39c0: 8017b7e73b80 08267c44 08bfa050 
> 
> [  102.750372] 39e0: 8017b78864b0 0003 8017b828 
> 8017d43e6600
> [  102.758188] 3a00: 08e27000 8017db010800 8017d4793400 
> 
> [  102.766003] 3a20: 8017b7e73b80 8017b7e73b80 8017b7e73b40 
> ffc8
> [  102.773818] 3a40: 8017b7e73a70 0810c12c 0033 
> 
> [  102.781633] 3a60: 08e427e0   
> 
> [  102.789449] 3a80: 8017fbe34c50 08e29e08 6f6b20726f662064 
> 043c
> [  102.797264] 3aa0:  08f3be58 0028 
> 0033
> [  102.805079] 3ac0: 00a3  0007 
> 000e
> [  102.812895] [] sysfs_remove_group+0x8c/0x94
> [  102.818628] [] dpm_sysfs_remove+0x58/0x68
> [  102.824188] [] 

Re: [PATCH v3 1/7] libsas: Use static sas event pool to appease sas event lost

2017-07-12 Thread wangyijing


在 2017/7/12 16:17, John Garry 写道:
> On 12/07/2017 03:06, wangyijing wrote:
>>>> -unsigned long port_events_pending;
>>>> -unsigned long phy_events_pending;
>>>> +struct asd_sas_event   port_events[PORT_POOL_SIZE];
>>>> +struct asd_sas_event   phy_events[PHY_POOL_SIZE];
>>>>
>>>>  int error;
>>>
>>> Hi Yijing,
>>>
>>> So now we are creating a static pool of events per PHY/port, instead of 
>>> having 1 static work struct per event per PHY/port. So, for sure, this 
>>> avoids the dynamic event issue of system memory exhaustion which we 
>>> discussed in v1+v2 series. And it seems to possibly remove issue of losing 
>>> SAS events.
>>>
>>> But how did you determine the pool size for a PHY/port? It would seem to be 
>>> 5 * #phy events or #port events (which is also 5, I figure by coincidence). 
>>> How does this deal with flutter of >25 events?
>>
>> There is no special meaning for the pool size, if flutter of > 25 events, 
>> notify sas events will return error, and the further step work is depending 
>> on LLDD drivers.
>> I hope libsas could do more work in this case, but now it seems a little 
>> difficult, this patch may be a interim fix, until we find a perfect solution.
> 
> The principal of having a fixed-sized pool is ok, even though the pool size 
> needs more consideration.
> 
> However my issue is how to handle pool exhaustion. For a start, relaying info 
> to the LLDD that the event notification failed is probably not the way to go. 
> I only now noticed "scsi: sas: scsi_queue_work can fail, so make callers 
> aware" made it into the kernel; as I mentioned in response to this patch, the 
> LLDD does not know how to handle this (and no LLDDs do actually handle this).
> 
> I would say it is better to shut down the PHY from libsas (As Dan mentioned 
> in the v1 series) when the pool exhausts, under the assumption that the PHY 
> has gone into some erroneous state. The user can later re-enable the PHY from 
> sysfs, if required.

I considered this suggestion, and what I am worried about are, first if we 
disable phy once the sas event pool exhausts, it may hurt the pending sas event 
process which has been queued,
second, if phy was disabled, and no one trigger the reenable by sysfs, the LLDD 
has no way to post new sas phy events.

Thanks!
Yijing.

> 
> Much appreciated,
> John
> 
>>
>> Thanks!
>> Yijing.
>>
>>>
>>> Thanks,
>>> John
>>>
>>>
>>> .
>>>
>>
>>
>> .
>>
> 
> 
> 
> .
> 



Re: [PATCH v3 2/7] libsas: remove unused port_gone_completion

2017-07-11 Thread wangyijing


在 2017/7/11 23:54, John Garry 写道:
> On 10/07/2017 08:06, Yijing Wang wrote:
>> No one uses the port_gone_completion in struct asd_sas_port,
>> clean it out.
> 
> This seems like a reasonable tidy-up patch which could be taken in isolation, 
> having no dependency on the rest of the series.

Yes.

> 
>>
>> Signed-off-by: Yijing Wang 
>> ---
>>  include/scsi/libsas.h | 2 --
>>  1 file changed, 2 deletions(-)
>>
>> diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
>> index c41328d..628f48b 100644
>> --- a/include/scsi/libsas.h
>> +++ b/include/scsi/libsas.h
>> @@ -263,8 +263,6 @@ struct sas_discovery {
>>  /* The port struct is Class:RW, driver:RO */
>>  struct asd_sas_port {
>>  /* private: */
>> -struct completion port_gone_completion;
>> -
>>  struct sas_discovery disc;
>>  struct domain_device *port_dev;
>>  spinlock_t dev_list_lock;
>>
> 
> 
> 
> .
> 



Re: [PATCH v3 1/7] libsas: Use static sas event pool to appease sas event lost

2017-07-11 Thread wangyijing
>> -unsigned long port_events_pending;
>> -unsigned long phy_events_pending;
>> +struct asd_sas_event   port_events[PORT_POOL_SIZE];
>> +struct asd_sas_event   phy_events[PHY_POOL_SIZE];
>>
>>  int error;
> 
> Hi Yijing,
> 
> So now we are creating a static pool of events per PHY/port, instead of 
> having 1 static work struct per event per PHY/port. So, for sure, this avoids 
> the dynamic event issue of system memory exhaustion which we discussed in 
> v1+v2 series. And it seems to possibly remove issue of losing SAS events.
> 
> But how did you determine the pool size for a PHY/port? It would seem to be 5 
> * #phy events or #port events (which is also 5, I figure by coincidence). How 
> does this deal with flutter of >25 events?

There is no special meaning for the pool size, if flutter of > 25 events, 
notify sas events will return error, and the further step work is depending on 
LLDD drivers.
I hope libsas could do more work in this case, but now it seems a little 
difficult, this patch may be a interim fix, until we find a perfect solution.

Thanks!
Yijing.

> 
> Thanks,
> John
> 
> 
> .
> 



Re: [PATCH v2 1/2] libsas: Don't process sas events in static works

2017-06-15 Thread wangyijing


在 2017/6/15 16:00, John Garry 写道:
> On 15/06/2017 08:37, wangyijing wrote:
>>
>>
>> 在 2017/6/14 21:08, John Garry 写道:
>>> On 14/06/2017 10:04, wangyijing wrote:
>>>>>>  static void notify_ha_event(struct sas_ha_struct *sas_ha, enum ha_event 
>>>>>> event)
>>>>>>>>  {
>>>>>>>> +struct sas_ha_event *ev;
>>>>>>>> +
>>>>>>>>  BUG_ON(event >= HA_NUM_EVENTS);
>>>>>>>>
>>>>>>>> -sas_queue_event(event, _ha->pending,
>>>>>>>> -_ha->ha_events[event].work, sas_ha);
>>>>>>>> +ev = kzalloc(sizeof(*ev), GFP_ATOMIC);
>>>>>>>> +if (!ev)
>>>>>>>> +return;
>>>>>> GFP_ATOMIC allocations can fail and then no events will be queued *and* 
>>>>>> we
>>>>>> don't report the error back to the caller.
>>>>>>
>>>> Yes, it's really a problem, but I don't find a better solution, do you 
>>>> have some suggestion ?
>>>>
>>>
>>> Dan raised an issue with this approach, regarding a malfunctioning PHY 
>>> which spews out events. I still don't think we're handling it safely. 
>>> Here's the suggestion:
>>> - each asd_sas_phy owns a finite-sized pool of events
>>> - when the event pool becomes exhausted, libsas stops queuing events 
>>> (obviously) and disables the PHY in the LLDD
>>> - upon attempting to re-enable the PHY from sysfs, libsas first checks that 
>>> the pool is still not exhausted
>>>
>>> If you cannot find a good solution, then let us know and we can help.
>>
>> Hi John and Dan, what's event you found on malfunctioning PHY, if the event 
>> is PORTE_BROADCAST_RCVD, since
>> every PORTE_BROADCAST_RCVD libsas always call sas_revalidate_domain(), what 
>> about keeping a broadcast waiting(not queued in workqueue)
>> and discard others. If the event is other types, things may become knotty.
>>
> 
> As I mentioned in the v1 series discussion, I found a poorly connected 
> expander PHY was spewing out PHY up and loss of signal events continuously. 
> This is the sort of situation we should protect against. Current solution is 
> ok, as it uses a static event per port/PHY/HA.
> 
> The point is that we cannot allow a PHY to continuously send events to 
> libsas, which may lead to memory exhaustion.

The current solution won't introduce memory exhaustion, but it's not ok, since 
the root of this issue is it may lost event which is normal.
If we cannot identify the abnormal PHY, I think your mem pool idea is a 
candidate solution.

> 
> John
> 
>>
>>>
>>> John
>>>
>>>
>>> .
>>>
>>
>>
>> .
>>
> 
> 
> 
> .
> 



Re: [PATCH v2 1/2] libsas: Don't process sas events in static works

2017-06-15 Thread wangyijing


在 2017/6/14 21:08, John Garry 写道:
> On 14/06/2017 10:04, wangyijing wrote:
>>>>  static void notify_ha_event(struct sas_ha_struct *sas_ha, enum ha_event 
>>>> event)
>>>> >>  {
>>>> >> +struct sas_ha_event *ev;
>>>> >> +
>>>> >>  BUG_ON(event >= HA_NUM_EVENTS);
>>>> >>
>>>> >> -sas_queue_event(event, _ha->pending,
>>>> >> -_ha->ha_events[event].work, sas_ha);
>>>> >> +ev = kzalloc(sizeof(*ev), GFP_ATOMIC);
>>>> >> +if (!ev)
>>>> >> +return;
>>> > GFP_ATOMIC allocations can fail and then no events will be queued *and* we
>>> > don't report the error back to the caller.
>>> >
>> Yes, it's really a problem, but I don't find a better solution, do you have 
>> some suggestion ?
>>
> 
> Dan raised an issue with this approach, regarding a malfunctioning PHY which 
> spews out events. I still don't think we're handling it safely. Here's the 
> suggestion:
> - each asd_sas_phy owns a finite-sized pool of events
> - when the event pool becomes exhausted, libsas stops queuing events 
> (obviously) and disables the PHY in the LLDD
> - upon attempting to re-enable the PHY from sysfs, libsas first checks that 
> the pool is still not exhausted
> 
> If you cannot find a good solution, then let us know and we can help.

Hi John and Dan, what's event you found on malfunctioning PHY, if the event is 
PORTE_BROADCAST_RCVD, since
every PORTE_BROADCAST_RCVD libsas always call sas_revalidate_domain(), what 
about keeping a broadcast waiting(not queued in workqueue)
and discard others. If the event is other types, things may become knotty.


> 
> John
> 
> 
> .
> 



Re: [PATCH v2 1/2] libsas: Don't process sas events in static works

2017-06-14 Thread wangyijing


在 2017/6/14 17:18, Johannes Thumshirn 写道:
> On 06/14/2017 11:04 AM, wangyijing wrote:
>>>>  static void notify_ha_event(struct sas_ha_struct *sas_ha, enum ha_event 
>>>> event)
>>>>  {
>>>> +  struct sas_ha_event *ev;
>>>> +
>>>>BUG_ON(event >= HA_NUM_EVENTS);
>>>>  
>>>> -  sas_queue_event(event, _ha->pending,
>>>> -  _ha->ha_events[event].work, sas_ha);
>>>> +  ev = kzalloc(sizeof(*ev), GFP_ATOMIC);
>>>> +  if (!ev)
>>>> +  return;
>>> GFP_ATOMIC allocations can fail and then no events will be queued *and* we
>>> don't report the error back to the caller.
>>>
>>
>> Yes, it's really a problem, but I don't find a better solution, do you have 
>> some suggestion ?
> 
> Something like this?
> 
> diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c
> b/drivers/scsi/hisi_sas/hisi_sas_main.c
> index d622db502ec9..27cd7307d308 100644
> --- a/drivers/scsi/hisi_sas/hisi_sas_main.c
> +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
> @@ -894,7 +894,11 @@ static int hisi_sas_controller_reset(struct
> hisi_hba *hisi_hba)
> hisi_sas_release_tasks(hisi_hba);
> spin_unlock_irqrestore(_hba->lock, flags);
> 
> -   sas_ha->notify_ha_event(sas_ha, HAE_RESET);
> +   rc = sas_ha->notify_ha_event(sas_ha, HAE_RESET);
> +   if (rc) {
> +   dev_warn(dev, "failed to queue HA event (%d)\n",
> rc);
> +   goto out;
> +   }
> dev_dbg(dev, "controller reset successful!\n");
> } else
> return -1;
> diff --git a/drivers/scsi/libsas/sas_event.c
> b/drivers/scsi/libsas/sas_event.c
> index aadbd5314c5c..6c91fb31f891 100644
> --- a/drivers/scsi/libsas/sas_event.c
> +++ b/drivers/scsi/libsas/sas_event.c
> @@ -116,7 +116,7 @@ void sas_enable_revalidation(struct sas_ha_struct *ha)
> mutex_unlock(>disco_mutex);
>  }
> 
> -static void notify_ha_event(struct sas_ha_struct *sas_ha, enum ha_event
> event)
> +static int notify_ha_event(struct sas_ha_struct *sas_ha, enum ha_event
> event)
>  {
> BUG_ON(event >= HA_NUM_EVENTS);
> 
> diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
> index dd0f72c95abe..5d9cd6d20855 100644
> --- a/include/scsi/libsas.h
> +++ b/include/scsi/libsas.h
> @@ -415,7 +415,7 @@ struct sas_ha_struct {
> * their siblings when forming wide ports */
> 
> /* LLDD calls these to notify the class of an event. */
> -   void (*notify_ha_event)(struct sas_ha_struct *, enum ha_event);
> +   int (*notify_ha_event)(struct sas_ha_struct *, enum ha_event);
> void (*notify_port_event)(struct asd_sas_phy *, enum port_event);
> void (*notify_phy_event)(struct asd_sas_phy *, enum phy_event);

This make sense, return a fail status and report it to the callers.


> 
> 



Re: [PATCH v2 2/2] libsas: Enhance libsas hotplug

2017-06-14 Thread wangyijing
>> In this patch, we try to solve these issues in following steps:
>> 1. create a new workqueue used to run sas event work, instead of scsi host 
>> workqueue,
>>because we may block sas event work, we cannot block the normal scsi 
>> works.
>>When libsas receive a phy down event, sas_deform_port would be called, 
>> and now we
>>block sas_deform_port and wait for destruction work finish, in 
>> sas_destruct_devices,
>>we may wait ata error handler, it would take a long time, so if do all 
>> stuff in scsi
>>host workq, libsas may block other scsi works too long.
>> 2. create a new workqueue used to run sas discovery events work, instead of 
>> scsi host
>>workqueue, because in some cases, eg. in revalidate domain event, we may 
>> unregister
>>a sas device and discover new one, we must sync the execution, wait the 
>> remove process
>>finish, then start a new discovery. So we must put the probe and destruct 
>> discovery
>>events in a new workqueue to avoid deadlock.
>> 3. introudce a asd_sas_port level wait-complete and a sas_discovery level 
>> wait-complete
>>we use former wait-complete to achieve a sas event atomic process and use 
>> latter to
>>make a sas discovery sync.
>> 4. remove disco_mutex in sas_revalidate_domain, since now 
>> sas_revalidate_domain sync
>>the destruct discovery event execution, it's no need to lock disco mutex 
>> there.
> 
> The way you've written the changelog suggests this patch should be split
> into 4 patches, each one taking care of one of your change items.

I will split it in next version.

Thanks!
Yijing.


> 



Re: [PATCH v2 1/2] libsas: Don't process sas events in static works

2017-06-14 Thread wangyijing
>>  static void notify_ha_event(struct sas_ha_struct *sas_ha, enum ha_event 
>> event)
>>  {
>> +struct sas_ha_event *ev;
>> +
>>  BUG_ON(event >= HA_NUM_EVENTS);
>>  
>> -sas_queue_event(event, _ha->pending,
>> -_ha->ha_events[event].work, sas_ha);
>> +ev = kzalloc(sizeof(*ev), GFP_ATOMIC);
>> +if (!ev)
>> +return;
> GFP_ATOMIC allocations can fail and then no events will be queued *and* we
> don't report the error back to the caller.
> 

Yes, it's really a problem, but I don't find a better solution, do you have 
some suggestion ?

> 
> 
>> index 64e9cdd..c227a8b 100644
>> --- a/drivers/scsi/libsas/sas_init.c
>> +++ b/drivers/scsi/libsas/sas_init.c
>> @@ -111,10 +111,6 @@ void sas_hash_addr(u8 *hashed, const u8 *sas_addr)
>>  
>>  void sas_hae_reset(struct work_struct *work)
>>  {
>> -struct sas_ha_event *ev = to_sas_ha_event(work);
>> -struct sas_ha_struct *ha = ev->ha;
>> -
>> -clear_bit(HAE_RESET, >pending);
>>  }
> 
> I don't really get why you need a stubbed out sas_hae_reset(). Can't we
> just kill it if it doesn't have anything left to do?

I have no idea about this function history, I agree clean it out.

> 
> 



Re: [PATCH 2/2] libsas: Enhance libsas hotplug

2017-05-25 Thread wangyijing
Hi John, thanks for your review and comments!

在 2017/5/25 17:04, John Garry 写道:
> Hi,
> 
> There are some comments, inline.
> 
> In general, if it works, it looks ok.
> 
> Other reviews would be greatly appreciated - Hannes, Christoph, Johannes, Dan 
> - please.
> 
>> Libsas complete a hotplug event notified by LLDD in several works,
>> for example, if libsas receive a PHYE_LOSS_OF_SIGNAL, we process it
>> in following steps:
>>
>> notify_phy_event[interrupt context]
>> sas_queue_event[queue work on shost->work_q]
>> sas_phye_loss_of_signal[running in shost->work_q]
>> sas_deform_port[remove sas port]
>> sas_unregister_dev
>> sas_discover_event[queue destruct work on 
>> shost->work_q tail]
>>
>> In above case, complete whole hotplug in two works, remove sas port first, 
>> then
>> put the destruction of device in another work and queue it on in the tail of
>> workqueue, since sas port is the parent of the children rphy device, so if 
>> remove
>> sas port first, the children rphy device would also be deleted, when the 
>> destruction
>> work coming, it would find the target has been removed already, and report a
>> sysfs warning calltrace.
>>
>> queue tail queue head
>> DISCE_DESTRUCT> PORTE_BYTES_DMAED event 
>> ->PHYE_LOSS_OF_SIGNAL[running]
>>
>> There are other hotplug issues in current framework, in above case, if there 
>> is
>> hotadd sas event queued between hotremove works, the hotplug order would be 
>> broken
>> and unexpected issues would happen.
>>
>> In this patch, we try to solve these issues in following steps:
>> 1. create a new workqueue used to run sas event work, instead of scsi host 
>> workqueue,
>>because we may block sas event work, we cannot block the normal scsi 
>> works.
> 
> What do we block the event work for?

When libsas receive a phy down event, sas_deform_port would be called, and now 
we block sas_deform_port
and wait for destruction work finish, in sas_destruct_devices, we may wait ata 
error handler, it would
take a long time, so if do all stuff in scsi host workq, libsas may block other 
scsi works too long.

> 
>> 2. create a new workqueue used to run sas discovery events work, instead of 
>> scsi host
>>workqueue, because in some cases, eg. in revalidate domain event, we may 
>> unregister
>>a sas device and discover new one, we must sync the execution, wait the 
>> remove process
>>finish, then start a new discovery. So we must put the probe and destruct 
>> discovery
>>events in a new workqueue to avoid deadlock.
>> 3. introudce a asd_sas_port level wait-complete and a sas_discovery level 
>> wait-complete
>>we use former wait-complete to achieve a sas event atomic process and use 
>> latter to
>>make a sas discovery sync.
>> 4. remove disco_mutex in sas_revalidate_domain, since now 
>> sas_revalidate_domain sync
>>the destruct discovery event execution, it's no need to lock disco mutex 
>> there.
>>
>> Signed-off-by: Yijing Wang 
>> ---
>>  drivers/scsi/libsas/sas_discover.c | 58 
>> --
>>  drivers/scsi/libsas/sas_event.c|  2 +-
>>  drivers/scsi/libsas/sas_expander.c |  9 +-
>>  drivers/scsi/libsas/sas_init.c | 31 +++-
>>  drivers/scsi/libsas/sas_internal.h | 50 
>>  drivers/scsi/libsas/sas_port.c |  4 +++
>>  include/scsi/libsas.h  | 11 +++-
>>  7 files changed, 146 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/scsi/libsas/sas_discover.c 
>> b/drivers/scsi/libsas/sas_discover.c
>> index 60de662..43e8a1e 100644
>> --- a/drivers/scsi/libsas/sas_discover.c
>> +++ b/drivers/scsi/libsas/sas_discover.c
>> @@ -503,11 +503,10 @@ static void sas_revalidate_domain(struct work_struct 
>> *work)
>>  struct domain_device *ddev = port->port_dev;
>>
>>  /* prevent revalidation from finding sata links in recovery */
>> -mutex_lock(>disco_mutex);
>>  if (test_bit(SAS_HA_ATA_EH_ACTIVE, >state)) {
>>  SAS_DPRINTK("REVALIDATION DEFERRED on port %d, pid:%d\n",
>>  port->id, task_pid_nr(current));
>> -goto out;
>> +return;
>>  }
>>
>>  clear_bit(DISCE_REVALIDATE_DOMAIN, >disc.pending);
>> @@ -521,20 +520,57 @@ static void sas_revalidate_domain(struct work_struct 
>> *work)
>>
>>  SAS_DPRINTK("done REVALIDATING DOMAIN on port %d, pid:%d, res 0x%x\n",
>>  port->id, task_pid_nr(current), res);
>> - out:
>> -mutex_unlock(>disco_mutex);
>> +}
>> +
>> +static const work_func_t sas_event_fns[DISC_NUM_EVENTS] = {
>> +[DISCE_DISCOVER_DOMAIN] = sas_discover_domain,
>> +[DISCE_REVALIDATE_DOMAIN] = sas_revalidate_domain,
>> +[DISCE_PROBE] = sas_probe_devices,
>> +[DISCE_SUSPEND] = sas_suspend_devices,
>> +[DISCE_RESUME] = sas_resume_devices,
>> +[DISCE_DESTRUCT] = 

Re: [PATCH 1/2] libsas: Don't process sas events in static works

2017-05-23 Thread wangyijing
>>
> 
> I have seen this scenario on our development board when we have a bad 
> physical cable connection - the PHY continually goes up and down in a loop.
> 
> So, in this regard, it is worth safeguarding against this scenario.

OK, I will reconsider this case.

Thanks!
Yijing.


> 
> John
> 
> 
> .
> 



Re: [PATCH 1/2] libsas: Don't process sas events in static works

2017-05-21 Thread wangyijing
Hi Dan, thanks for your review and comments!

在 2017/5/21 11:44, Dan Williams 写道:
> On Fri, May 19, 2017 at 11:39 PM, Yijing Wang  wrote:
>> Now libsas hotplug work is static, LLDD driver queue
>> the hotplug work into shost->work_q. If LLDD driver
>> burst post lots hotplug events to libsas, the hotplug
>> events may pending in the workqueue like
>>
>> shost->work_q
>> new work[PORTE_BYTES_DMAED] --> |[PHYE_LOSS_OF_SIGNAL][PORTE_BYTES_DMAED] -> 
>> processing
>> |<---wait worker to process>|
>> In this case, a new PORTE_BYTES_DMAED event coming, libsas try to queue it
>> to shost->work_q, but this work is already pending, so it would be lost.
>> Finally, libsas delete the related sas port and sas devices, but LLDD driver
>> expect libsas add the sas port and devices(last sas event).
>>
>> This patch remove the static defined hotplug work, and use dynamic work to
>> avoid missing hotplug events.
> 
> If we go this route we don't even need:
> 
> sas_port_event_fns
> sas_phy_event_fns
> sas_ha_event_fns

Yes, these three fns are not necessary, just for avoid lots kfree in 
phy/port/ha event fns.

> 
> ...just specify the target routine directly to INIT_WORK() and remove
> the indirection.
> 
> I also think for safety this should use a mempool that guarantees that
> events can continue to be processed under system memory pressure.

What I am worried about is it's would still fail if the mempool is used empty 
during memory pressure.

> Also, have you considered the case when a broken phy starts throwing a
> constant stream of events? Is there a point at which libsas should
> stop queuing events and disable the phy?

Not yet, I didn't find this issue in real case, but I agree, it's really a 
problem in some broken
hardware, I think it's not a easy problem, we could improve it step by step.

Thanks!
Yijing.

> 
> .
> 



Re: [PATCH v2 0/5] Re-order scsi_remove_host and sas_remove_host in SAS HBA LLDDs

2017-04-21 Thread wangyijing

>>
>> Please repost the series against the current tree.  Also a cover
>> letter with a bit more explanation would be good.
>>
> 
> Great, I think that my colleagues can send an updated version next week.

I will send a new version with a cover letter these days, before that, we need
to do a detail test on real hardware based the current mainline.

Thannks!
Yijing.

> 
> Much appreciated,
> John
> 
>> .
>>
> 
> 
> .
> 


Re: [RFC PATCH] scsi: libsas: fix WARN on device removal

2016-11-22 Thread wangyijing
>>
>> The events are not lost.
> 
> In sas_queue_event(), if there is a particular event pending for a port/PHY, 
> we cannot queue further same event types for that port/PHY. I think my 
> colleagues found issue where we try to enqueue multiple complementary events.

Yes, we found this issue in our local tests.

> 
>> The new problem this patch introduces is
>> delaying sas port deletion where it was previously immediate.  So now
>> we can get into a situation where the port has gone down and can start
>> processing a port up event before the previous deletion work has run.
>>

> And it's a very noisy warning, as in 6K lines on the console when an
> expander is unplugged.


 Does something like this modulate the failure?
>>
>> I'm curious if we simply need to fix the double deletion of the
>> sas_port bsg queue, could you try the changes below?
>>
> 
> No, I just tested it on a root port and we get the same WARN.
> 

 diff --git a/drivers/scsi/scsi_transport_sas.c
 b/drivers/scsi/scsi_transport_sas.cindex
 60b651bfaa01..11401e5c88ba 100644
  --- a/drivers/scsi/scsi_transport_sas.c
 +++ b/drivers/scsi/scsi_transport_sas.c
 @@ -262,9 +262,10 @@ static void sas_bsg_remove(struct Scsi_Host
 *shost, struct sas_rphy *rphy
  {
 struct request_queue *q;

 -   if (rphy)
 +   if (rphy) {
 q = rphy->q;
 -   else
 +   rphy->q = NULL;
 +   } else
 q = to_sas_host_attrs(shost)->q;

 if (!q)

 .

>>>
>>>
>>
>> .
>>
> 
> 
> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] scsi: libsas: fix WARN on device removal

2016-11-11 Thread wangyijing
>>> I have not seen the flutter issue. I am just trying to solve the horrible 
>>> WARN dump.
>>> However I do understand that there may be a issue related to how we queue 
>>> the events; there was a recent attempt to fix this, but it came to nothing:
>>> https://www.spinics.net/lists/linux-scsi/msg1.html
>>
>> We found libsas hotplug several problems:
>> 1. sysfs warning calltrace(like the case you found);
> 
> Maybe you can then review my patch.

I did it, I think your solution to fix the sysfs calltrace issue is ok, and 
what I worried about is we still need to fix
the rest issues. So it's better if we could fix all issues one time.

> 
>> 2. hot-add and hot-remove work events may process out of order;
>> 3. in some extreme cases, libsas may miss some events, if the same event is 
>> still pending in workqueue.
>>
> 
> Can you tell me how to recreate #2 and #3?

Qilin Chen and Yousong He help me to reproduce it, I told them to reply this 
mail to tell you the test steps.
Some tests we did is make sas phy link flutter, so hardware would post phy down 
and phy up events sequentially.

1. scsi host workqueue receive phy down and phy up events.  
   in process new added
2. sas_deform_port would post a new destruct event to scsi host workqueue, so 
things in workqueue like [phy down-phy up -destruct]

So the phy down logic is separated by phy up, and it's not atomic, not safe, 
something unexpected would happen.

For case 3, we make hardware burst post lots pair of phy up and phy down 
events, so if libsas is processing the phy up event, the next
phy up event can not queue to scsi host workqueue again, it will lost, it's not 
we expect.

> 
>> It's a complex issue, we posted two patches, try to fix these issues, but 
>> now few people are interested in it  :(
>>
> 
> IIRC, you sent as RFC and got a "reviewed-by" from Hannes, so I'm not sure 
> what else you want. BTW, I thought that the changes were quite drastic.

I agree, the changes seems something drastic. But I think current libsas 
hotplug framework has a big flaw.

> 
> John
> 
>>>

 Alternatively we need a mechanism to cancel in-flight port shutdown
 requests when we start re-attaching devices before queued port
 destruction events have run.

 .

>>>
>>>
>>> ___
>>> linuxarm mailing list
>>> linux...@huawei.com
>>> http://rnd-openeuler.huawei.com/mailman/listinfo/linuxarm
>>>
>>> .
>>>
>>
>>
>> .
>>
> 
> 
> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH v2 0/2] Improve libsas hotplug

2016-11-11 Thread wangyijing
Hi James, sorry to bother you, these two patches try to fix several issues
in libsas, Dan Williams and John Garry also found similar issue, and post
some patches before. Dan Williams's solution fix the sysfs warning calltrace,
but may introduce new flutter issue.

In these two patches, we introduce a new workqueue to fix the flutter issue.
Do you have time to look at these patches ? Your comments is important to us,
It help us to know whether we are in the right direction to fix these issues.

Thanks!
Yijing.

在 2016/9/27 11:15, Yijing Wang 写道:
> v1-v2: Fix memory allocation issue in interrupt context.
> 
> Yijing Wang (2):
>   libsas: Alloc dynamic work to avoid missing sas events
>   libsas: Fix hotplug issue in libsas
> 
>  drivers/scsi/libsas/sas_ata.c   |  34 ++---
>  drivers/scsi/libsas/sas_discover.c  | 245 
> ++--
>  drivers/scsi/libsas/sas_event.c |  61 +
>  drivers/scsi/libsas/sas_expander.c  |  54 ++--
>  drivers/scsi/libsas/sas_init.c  |  31 -
>  drivers/scsi/libsas/sas_internal.h  |  45 ++-
>  drivers/scsi/libsas/sas_phy.c   |  50 +++-
>  drivers/scsi/libsas/sas_port.c  |  35 --
>  drivers/scsi/libsas/sas_scsi_host.c |  23 
>  include/scsi/libsas.h   |  13 +-
>  include/scsi/sas_ata.h  |   4 +-
>  11 files changed, 404 insertions(+), 191 deletions(-)
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] scsi: libsas: fix WARN on device removal

2016-11-11 Thread wangyijing
> 
> They're not the same. I don't see how your solution properly deals with 
> remote sas_port deletion.
> 
> When we unplug a device connected to an expander, can't the sas_port be 
> deleted twice, in sas_unregister_devs_sas_addr() from domain revalidation and 
> also now in sas_destruct_devices()? I think that this gives a NULL 
> dereference.
> And we still get the WARN as the sas_port has still been deleted before the 
> device.
> 
> In my solution, we should always delete the sas_port after the attached 
> device.
> 
>>>
>>> i.e. it moves the port destruction to the workqueue and still suffers
>>> from the flutter problem:
>>>
>>> http://marc.info/?l=linux-scsi=143801026028006=2
>>> http://marc.info/?l=linux-scsi=143801971131073=2
>>>
>>> Perhaps we instead need to quiet this warning?
>>>
>>> http://marc.info/?l=linux-scsi=143802229932175=2
> 
> I have not seen the flutter issue. I am just trying to solve the horrible 
> WARN dump.
> However I do understand that there may be a issue related to how we queue the 
> events; there was a recent attempt to fix this, but it came to nothing:
> https://www.spinics.net/lists/linux-scsi/msg1.html

We found libsas hotplug several problems:
1. sysfs warning calltrace(like the case you found);
2. hot-add and hot-remove work events may process out of order;
3. in some extreme cases, libsas may miss some events, if the same event is 
still pending in workqueue.

It's a complex issue, we posted two patches, try to fix these issues, but now 
few people are interested in it  :(

> 
> Cheers,
> John
> 
>>
>> Alternatively we need a mechanism to cancel in-flight port shutdown
>> requests when we start re-attaching devices before queued port
>> destruction events have run.
>>
>> .
>>
> 
> 
> ___
> linuxarm mailing list
> linux...@huawei.com
> http://rnd-openeuler.huawei.com/mailman/listinfo/linuxarm
> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question for Patch"libsas: fix "sysfs group not found" warnings at port teardown time"

2016-02-05 Thread wangyijing
Yes, it seems to be the same issue.

在 2016/2/6 1:22, Dāvis Mosāns 写道:
> 2016-02-05 11:20 GMT+02:00 wangyijing <wangyij...@huawei.com>:
>>
>> Hi Dan and Praveen,
>>I found a patch titled "libsas: fix "sysfs group not found" warnings at
>> port teardown time" by google,
>> https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html
>>
> 
>  Maybe this is related?
> 
>  kernel: WARNING: CPU: 4 PID: 16820 at fs/sysfs/group.c:237
>  sysfs_remove_group+0x8b/0x90()
>  kernel: sysfs group 818a3b60 not found for kobject 'end_device-7:7'
>  [...]
>  kernel: Workqueue: scsi_wq_7 sas_destruct_devices [libsas]
> 
>  I'm getting this warning when removing disk from HighPoint RocketRAID 2760
>  controller (mvsas driver)
>  See https://bugzilla.kernel.org/show_bug.cgi?id=71021#c3
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Warning Calltrace when hotplug sas disk

2016-02-04 Thread wangyijing
Hi list, I tried to hotplug disk in my machine, but when I hot remove the disk, 
I found some warning calltrace.

When we try to unplug a disk,

The lldd report a loss_of_singal event to sas, so

sas_deform_port
sas_unregister_domain_devices
sas_unregister_dev
queue the destruct to scsi work queue
sas_port_delete
device_del(port)  //port device is parent of phy and end 
device, so in this case, we first delete the parent kobj then to delete the 
children device.
..
sas_destruct_devices
sas_rphy_delete
...

It seems caused by delete the parent device before the children devices. This 
is my personal idea, if anyone could comment on this, I will be appreciate very 
much, thanks.


WARNING: CPU: 2 PID: 6 at fs/sysfs/group.c:224 sysfs_remove_group+0xa0/0xa4()
kobj 8013e8389410 sysfs group (power)80a2dbe8 not found for kobject 
'0:0:1:0'
Modules linked in:
CPU: 2 PID: 6 Comm: kworker/u64:0 Not tainted 4.1.6+ #160
Hardware name: Hisilicon PhosphorV660 Development Board (DT)
Workqueue: scsi_wq_0 sas_destruct_devices
Call trace:
[] dump_backtrace+0x0/0x124
[] show_stack+0x10/0x1c
[] dump_stack+0x78/0x98
[] warn_slowpath_common+0x98/0xd0
[] warn_slowpath_fmt+0x4c/0x58
[] sysfs_remove_group+0x9c/0xa4
[] dpm_sysfs_remove+0x54/0x94
[] device_del+0x58/0x24c
[] device_unregister+0x10/0x2c
[] bsg_unregister_queue+0xbc/0xf8
[] __scsi_remove_device+0x9c/0xbc
[] scsi_remove_device+0x44/0x64
[] scsi_remove_target+0x198/0x258
[] sas_rphy_remove+0x8c/0xb4
[] sas_rphy_delete+0x34/0x54
[] sas_destruct_devices+0x60/0x98
[] process_one_work+0x13c/0x344
[] worker_thread+0x13c/0x494
[] kthread+0xd8/0xf0
---[ end trace b69dffc64eb59f96 ]---





--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html