On 03/16/2017 01:49 PM, James Bottomley wrote:
On Wed, 2017-03-15 at 19:39 -0400, Martin K. Petersen wrote:
Maurizio Lombardi <mlomb...@redhat.com> writes:

With multipath, it may happen that the same device is passed to
enclosure_add_device() multiple times and that the
enclosure_add_links() function fails to create the symlinks because
the device's sysfs directory entry is still NULL.  In this case,
the
links will never be created because all the subsequent calls to
enclosure_add_device() will immediately fail with EEXIST.
James?
Well I don't think the patch is the correct way to do this.  The
problem is that if we encounter an error creating the links, we
shouldn't add the device to the enclosure.  There's no need of a
links_created variable (see below).

However, more interesting is why the link creation failed in the first
place.  The device clearly seems to exist because it was added to sysfs
at time index 19.2 and the enclosure didn't try to use it until 60.0.
  Can you debug this a bit more, please?  I can't see anything specific
to multipath in the trace, so whatever this is looks like it could
happen in the single path case as well.

James

diff --git a/drivers/misc/enclosure.c b/drivers/misc/enclosure.c
index 65fed71..ae89082 100644
--- a/drivers/misc/enclosure.c
+++ b/drivers/misc/enclosure.c
@@ -375,6 +375,7 @@ int enclosure_add_device(struct enclosure_device *edev, int 
component,
                         struct device *dev)
  {
        struct enclosure_component *cdev;
+       int err;
if (!edev || component >= edev->components)
                return -EINVAL;
@@ -384,12 +385,15 @@ int enclosure_add_device(struct enclosure_device *edev, 
int component,
        if (cdev->dev == dev)
                return -EEXIST;
- if (cdev->dev)
+       if (cdev->dev) {
                enclosure_remove_links(cdev);
-
-       put_device(cdev->dev);
-       cdev->dev = get_device(dev);
-       return enclosure_add_links(cdev);
+               put_device(cdev->dev);
+               cdev->dev = NULL;
+       }
+       err = enclosure_add_links(cdev);
+       if (!err)
+               cdev->dev = get_device(dev);
+       return err;
  }
  EXPORT_SYMBOL_GPL(enclosure_add_device);
After stumbling across the NULL pointer panic, I was able to use Maurizio's second patch below:

diff --git a/drivers/misc/enclosure.c b/drivers/misc/enclosure.c
index 65fed71..6ac07ea 100644
--- a/drivers/misc/enclosure.c
+++ b/drivers/misc/enclosure.c
@@ -375,6 +375,7 @@ int enclosure_add_device(struct enclosure_device *edev, int component,
                         struct device *dev)
 {
        struct enclosure_component *cdev;
+       int err;

        if (!edev || component >= edev->components)
                return -EINVAL;
@@ -384,12 +385,17 @@ int enclosure_add_device(struct enclosure_device *edev, int component,
        if (cdev->dev == dev)
                return -EEXIST;

-       if (cdev->dev)
+       if (cdev->dev) {
                enclosure_remove_links(cdev);
-
-       put_device(cdev->dev);
+               put_device(cdev->dev);
+       }
        cdev->dev = get_device(dev);
-       return enclosure_add_links(cdev);
+       err = enclosure_add_links(cdev);
+       if (err) {
+               cdev->dev = NULL;
+               put_device(cdev->dev);
+       }
+       return err;
 }
 EXPORT_SYMBOL_GPL(enclosure_add_device);


I am able to pass my testing with this patch. I don't see an official submit of this patch, but will respond to it when I see one. Again, I am seeing the problem even without multipath.

Reply via email to