Hi all,

I am seeing an issue while using an LSI-3008-based adapter (mpt3sas driver) on a PowerPC system (although I am not yet convinced it is architecture dependent). When I create a RAID1 volume, the physical disk devices get "hidden" as expected however the various kernel objects are out of sync. The corresponding bits in the "sd_index_ida" bitmap gets cleared, and the symlink in /sys/dev/block for this major:minor pair gets removed, but none of the other major:minor entries in sysfs get removed. The next time a new device is added (for example, during another RAID volume create or delete), the recently-freed major:minor number is picked up from the "sd_index_ida" bitmap but the attempt to create sysfs entries fails EEXIST due to an entry by the same name already (still) existing. This failure goes unhandled and later the kernel panics in sd_probe_async while dereferencing an (apparently) invalid backing_dev_info structure (presumably left invalid due to the EEXIST error).

A reboot clears this (bitmaps and sysfs) up and the second RAID volume (if a create was done) shows up normally. However, even if the panic were avoided by better error handling in sd_probe_async there would still be the problem of being able to create more than one RAID volume without rebooting.

I am wondering if this issue has been seen elsewhere, and also just what might be going wrong. For mpt3sas, it appears that the firmware largely drives the hiding/exposing of devices but I don't see an issue with the ordering of those events. I am wondering if the driver is failing to setup the device attributes correctly in order to get the proper sysfs handling.

I am seeing this on Ubuntu 16.04, but also see it on the upstream kernel. Oddly, it does not happen on RHEL 7.2 (an older kernel).

A possibly-related issue we see is that when a RAID volume is deleted, none of the RAID device nodes (/dev as well as /sys/) get removed - although they are unusable. Deleting before creating does not produce the panic, so I believe the "sd_index_ida" bitmap is not getting updated by the delete.


Any help would be appreciated.

Thanks,

Doug

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to