Re: [PATCH v5 0/4] New Microsemi PCI Switch Management Driver

2017-03-01 Thread Logan Gunthorpe


On 01/03/17 04:58 PM, Jason Gunthorpe wrote:
> On Wed, Mar 01, 2017 at 03:49:04PM -0700, Logan Gunthorpe wrote:
> 
>> Seems to me like an elegant solution would be to implement a 'cdev_kill'
>> function which could kill all the processes using a cdev. Thus, during
>> an unbind, a driver could call it and be sure that there are no users
>> left and it can safely allow the devres unwind to continue. Then no
>> difficult and racy 'alive' flags would be necessary and it would be much
>> easier on drivers.
> 
> That could help, but this would mean cdev would have to insert a shim
> to grab locks around the various file ops.

Hmm, I was hoping something more along the lines of actually killing the
processes instead of just shimming away fops.

> AFAIK TPM is correct and has been robustly tested now. We have a 'vtpm'
> driver that agressively uses hot-unplug.

Ah, thanks for the explanation of how that works. I didn't notice the
semaphore usage.

Switchtec is a bit more tricky because a) there's no upper level driver
to handle things and b) userspace may be inside a wait_for_completion
(via read or poll) that needs to be completed. If a so called
'cdev_kill' could actually just kill these processes it would be a bit
easier.

Currently, in the Switchtec code, there's a timeout if the interrupt
doesn't fire (which occurs if the pci device has been torn down) and the
code will check an alive bit (under mutex protection) and error out if
it's not alive.

Because of how poll works, I don't see how I can just hold a semaphore
inside every fops call like the tpm code does.

Logan


Re: [PATCH v5 0/4] New Microsemi PCI Switch Management Driver

2017-03-01 Thread Logan Gunthorpe


On 01/03/17 04:58 PM, Jason Gunthorpe wrote:
> On Wed, Mar 01, 2017 at 03:49:04PM -0700, Logan Gunthorpe wrote:
> 
>> Seems to me like an elegant solution would be to implement a 'cdev_kill'
>> function which could kill all the processes using a cdev. Thus, during
>> an unbind, a driver could call it and be sure that there are no users
>> left and it can safely allow the devres unwind to continue. Then no
>> difficult and racy 'alive' flags would be necessary and it would be much
>> easier on drivers.
> 
> That could help, but this would mean cdev would have to insert a shim
> to grab locks around the various file ops.

Hmm, I was hoping something more along the lines of actually killing the
processes instead of just shimming away fops.

> AFAIK TPM is correct and has been robustly tested now. We have a 'vtpm'
> driver that agressively uses hot-unplug.

Ah, thanks for the explanation of how that works. I didn't notice the
semaphore usage.

Switchtec is a bit more tricky because a) there's no upper level driver
to handle things and b) userspace may be inside a wait_for_completion
(via read or poll) that needs to be completed. If a so called
'cdev_kill' could actually just kill these processes it would be a bit
easier.

Currently, in the Switchtec code, there's a timeout if the interrupt
doesn't fire (which occurs if the pci device has been torn down) and the
code will check an alive bit (under mutex protection) and error out if
it's not alive.

Because of how poll works, I don't see how I can just hold a semaphore
inside every fops call like the tpm code does.

Logan


Re: [PATCH v5 0/4] New Microsemi PCI Switch Management Driver

2017-03-01 Thread Logan Gunthorpe


On 01/03/17 05:23 PM, Logan Gunthorpe wrote:
> Because of how poll works, I don't see how I can just hold a semaphore
> inside every fops call like the tpm code does.

On 01/03/17 04:58 PM, Jason Gunthorpe wrote:
> Both infiniband and TPM have the 'detach' flavour of unplug where the
> user is not forced to close their open cdevs. For simpler cases you
> can just wait for all cdevs to close with a simple rwsem, much like
> sysfs does already during device_del.

Though, after reading your email again, a hard fence on close sounds
like a good idea. Tomorrow I'll post a v6 that uses that approach.


Thanks,

Logan


Re: [PATCH v5 0/4] New Microsemi PCI Switch Management Driver

2017-03-01 Thread Logan Gunthorpe


On 01/03/17 05:23 PM, Logan Gunthorpe wrote:
> Because of how poll works, I don't see how I can just hold a semaphore
> inside every fops call like the tpm code does.

On 01/03/17 04:58 PM, Jason Gunthorpe wrote:
> Both infiniband and TPM have the 'detach' flavour of unplug where the
> user is not forced to close their open cdevs. For simpler cases you
> can just wait for all cdevs to close with a simple rwsem, much like
> sysfs does already during device_del.

Though, after reading your email again, a hard fence on close sounds
like a good idea. Tomorrow I'll post a v6 that uses that approach.


Thanks,

Logan


Re: [PATCH v5 0/4] New Microsemi PCI Switch Management Driver

2017-03-01 Thread Bjorn Helgaas
On Tue, Feb 28, 2017 at 10:11:56AM -0700, Logan Gunthorpe wrote:
> 
> > This driver doesn't have anything to do with the PCI core, other than
> > using the pci_register_driver() interface (just like all other drivers
> > for PCI-connected devices), so drivers/pci doesn't really feel like
> > the right place for it.  Putting it in drivers/pci leads to the sort
> > of confusion you mentioned above ("To make this entirely clear ...").
> > 
> > Would drivers/perf or drivers/misc/switchtec/ be possible places for
> > it?
> 
> I made a similar argument when we made the decision of where to put the
> code. In the end, the device _is_ a PCI Switch and someone going through
> menuconfig or the source tree would probably look there first.
> 
> As for drivers/perf, our device does a fair bit more than performance
> counters so it doesn't seem like it really fits in there. drivers/misc
> just seems like a dumping ground which we'd prefer not to contribute to.
> We also considered drivers/char (seeing it exposes a char device), but
> that also seems like a dumping ground with stuff that belongs and other
> stuff that just ended up stuck between the cracks.
> 
> If you still feel strongly about this we can move it into misc, but I
> think from an organizational perspective pci/switch makes a bit more sense.

It's not a perfect fit in drivers/pci because it's not bus
infrastructure and I don't want to be the default maintainer of it,
but I agree there's not really an alternative that's clearly better,
so let's leave it where it is for now.

> In any case, I also wish we could have had this discussion 3 months ago
> when we posted the RFC and not when I have people pushing to get this
> merged.

You and me both!  I hope some of those same people are pushing to help
avoid the tragedy of the commons by helping review other
contributions.

Bjorn


Re: [PATCH v5 0/4] New Microsemi PCI Switch Management Driver

2017-03-01 Thread Bjorn Helgaas
On Tue, Feb 28, 2017 at 10:11:56AM -0700, Logan Gunthorpe wrote:
> 
> > This driver doesn't have anything to do with the PCI core, other than
> > using the pci_register_driver() interface (just like all other drivers
> > for PCI-connected devices), so drivers/pci doesn't really feel like
> > the right place for it.  Putting it in drivers/pci leads to the sort
> > of confusion you mentioned above ("To make this entirely clear ...").
> > 
> > Would drivers/perf or drivers/misc/switchtec/ be possible places for
> > it?
> 
> I made a similar argument when we made the decision of where to put the
> code. In the end, the device _is_ a PCI Switch and someone going through
> menuconfig or the source tree would probably look there first.
> 
> As for drivers/perf, our device does a fair bit more than performance
> counters so it doesn't seem like it really fits in there. drivers/misc
> just seems like a dumping ground which we'd prefer not to contribute to.
> We also considered drivers/char (seeing it exposes a char device), but
> that also seems like a dumping ground with stuff that belongs and other
> stuff that just ended up stuck between the cracks.
> 
> If you still feel strongly about this we can move it into misc, but I
> think from an organizational perspective pci/switch makes a bit more sense.

It's not a perfect fit in drivers/pci because it's not bus
infrastructure and I don't want to be the default maintainer of it,
but I agree there's not really an alternative that's clearly better,
so let's leave it where it is for now.

> In any case, I also wish we could have had this discussion 3 months ago
> when we posted the RFC and not when I have people pushing to get this
> merged.

You and me both!  I hope some of those same people are pushing to help
avoid the tragedy of the commons by helping review other
contributions.

Bjorn


Re: [PATCH v5 0/4] New Microsemi PCI Switch Management Driver

2017-03-01 Thread Jason Gunthorpe
On Wed, Mar 01, 2017 at 05:23:38PM -0700, Logan Gunthorpe wrote:

> > That could help, but this would mean cdev would have to insert a shim
> > to grab locks around the various file ops.
> 
> Hmm, I was hoping something more along the lines of actually killing the
> processes instead of just shimming away fops.

That would probably make most cdev users unhappy, it is not what we
want in tpm or infiniband, for instance.

> > AFAIK TPM is correct and has been robustly tested now. We have a 'vtpm'
> > driver that agressively uses hot-unplug.
> 
> Switchtec is a bit more tricky because a) there's no upper level driver
> to handle things

Introducing a light split between 'the upper part that owns the cdev'
and 'the lower part that owns the hardware' makes things much easier
to understand in a driver and it becomes clearer where, eg, devm
actions should be linked (ie probably not to the cdev part)

> and b) userspace may be inside a wait_for_completion (via read or
> poll) that needs to be completed. If a so called 'cdev_kill' could
> actually just kill these processes it would be a bit easier.

For TPM, poll could be something like:

static unsigned int tpm_poll(struct file *filp,
 struct poll_table_struct *wait)
{
   poll_wait(filp, >poll_wait, wait);
   if (tpm_try_get_ops(chip)) {
  mask = chip->ops->driver_do_poll(...);
  tpm_put_ops(chip);
   } else 
  mask = POLLIN | POLLRDHUP | POLLOUT | POLLERR | POLLHUP;
   return mask;
}

And we would trigger chip->poll_wait in the unregister.

wait_for_completion is similar, drop the rwsem while sleeping, add
'ops = NULL' to the sleeping condition test, trigger the wait on
unregister then reacquire the rwsem and test ops on wake.

Jason


Re: [PATCH v5 0/4] New Microsemi PCI Switch Management Driver

2017-03-01 Thread Jason Gunthorpe
On Wed, Mar 01, 2017 at 05:23:38PM -0700, Logan Gunthorpe wrote:

> > That could help, but this would mean cdev would have to insert a shim
> > to grab locks around the various file ops.
> 
> Hmm, I was hoping something more along the lines of actually killing the
> processes instead of just shimming away fops.

That would probably make most cdev users unhappy, it is not what we
want in tpm or infiniband, for instance.

> > AFAIK TPM is correct and has been robustly tested now. We have a 'vtpm'
> > driver that agressively uses hot-unplug.
> 
> Switchtec is a bit more tricky because a) there's no upper level driver
> to handle things

Introducing a light split between 'the upper part that owns the cdev'
and 'the lower part that owns the hardware' makes things much easier
to understand in a driver and it becomes clearer where, eg, devm
actions should be linked (ie probably not to the cdev part)

> and b) userspace may be inside a wait_for_completion (via read or
> poll) that needs to be completed. If a so called 'cdev_kill' could
> actually just kill these processes it would be a bit easier.

For TPM, poll could be something like:

static unsigned int tpm_poll(struct file *filp,
 struct poll_table_struct *wait)
{
   poll_wait(filp, >poll_wait, wait);
   if (tpm_try_get_ops(chip)) {
  mask = chip->ops->driver_do_poll(...);
  tpm_put_ops(chip);
   } else 
  mask = POLLIN | POLLRDHUP | POLLOUT | POLLERR | POLLHUP;
   return mask;
}

And we would trigger chip->poll_wait in the unregister.

wait_for_completion is similar, drop the rwsem while sleeping, add
'ops = NULL' to the sleeping condition test, trigger the wait on
unregister then reacquire the rwsem and test ops on wake.

Jason


Re: [PATCH v5 0/4] New Microsemi PCI Switch Management Driver

2017-03-01 Thread Logan Gunthorpe


On 01/03/17 05:32 PM, Bjorn Helgaas wrote:
> It's not a perfect fit in drivers/pci because it's not bus
> infrastructure and I don't want to be the default maintainer of it,
> but I agree there's not really an alternative that's clearly better,
> so let's leave it where it is for now.

Sounds good, thanks.

It would be nice if it could stay where it is for organization but go
through other maintainers trees (though I'm not sure who). I understand
why you wouldn't want to take on the maintenance of it. Hopefully,
myself and the other Microsemi developers will be able to do the bulk of
the maintenance work for the driver.

Thanks,

Logan


Re: [PATCH v5 0/4] New Microsemi PCI Switch Management Driver

2017-03-01 Thread Logan Gunthorpe


On 01/03/17 05:32 PM, Bjorn Helgaas wrote:
> It's not a perfect fit in drivers/pci because it's not bus
> infrastructure and I don't want to be the default maintainer of it,
> but I agree there's not really an alternative that's clearly better,
> so let's leave it where it is for now.

Sounds good, thanks.

It would be nice if it could stay where it is for organization but go
through other maintainers trees (though I'm not sure who). I understand
why you wouldn't want to take on the maintenance of it. Hopefully,
myself and the other Microsemi developers will be able to do the bulk of
the maintenance work for the driver.

Thanks,

Logan


Re: [PATCH v5 0/4] New Microsemi PCI Switch Management Driver

2017-03-01 Thread Jason Gunthorpe
On Wed, Mar 01, 2017 at 03:49:04PM -0700, Logan Gunthorpe wrote:

> Seems to me like an elegant solution would be to implement a 'cdev_kill'
> function which could kill all the processes using a cdev. Thus, during
> an unbind, a driver could call it and be sure that there are no users
> left and it can safely allow the devres unwind to continue. Then no
> difficult and racy 'alive' flags would be necessary and it would be much
> easier on drivers.

That could help, but this would mean cdev would have to insert a shim
to grab locks around the various file ops.

> > 3) A couple of drivers drivers/char/tpm doesn't seem to have any
> > protection at all and appears like they would continue to use io
> > operations even after the they may get unmapped because the char device
> > persists.

AFAIK TPM is correct and has been robustly tested now. We have a 'vtpm'
driver that agressively uses hot-unplug.

Firstly, TPM correctly tracks krefs on the 'chip', so the pointer is
always valid.

Next, anytime someone has a 'chip' pointer and wants to execute a TPM
operation they have to call tpm_try_get_ops and hold that lock for the
duration of their usage. Internally it does this:

down_read(>ops_sem);
if (!chip->ops)
goto out_lock; // FAILURE

In the cdev fops case this will cause the fop to return EPIPE if it
fails. cdev holds this read side lock while it is running TPM commands
inside the driver.

This meshes with this code called by tpm_chip_unregister():

down_write(>ops_sem);
chip->ops = NULL;
up_write(>ops_sem);

tpm_chip_unregister uses this to provide a strong fence guarentee to
the driver. After tpm_chip_unregister return:
 - Nothing is running in a TPM ops callback currently
 - Nothing will ever call a TPM ops callback in future

This allows the TPM drivers to be trivial: call tpm_chip_unregister(),
kill the IRQ, and unmap the io resource, unload the module code.

TPM drivers cannot associate HW resources to the 'chip' struct device,
as it is unwound and devm triggered *during* tpm_chip_unregister and
does not enjoy the strong fence guarantee.

infiniband has a similar 'strong fence' unregister function. I think
it is best-practice for subsystem design to provide that because
drivers will never get it right on their own :(

Both infiniband and TPM have the 'detach' flavour of unplug where the
user is not forced to close their open cdevs. For simpler cases you
can just wait for all cdevs to close with a simple rwsem, much like
sysfs does already during device_del.

Jason


Re: [PATCH v5 0/4] New Microsemi PCI Switch Management Driver

2017-03-01 Thread Jason Gunthorpe
On Wed, Mar 01, 2017 at 03:49:04PM -0700, Logan Gunthorpe wrote:

> Seems to me like an elegant solution would be to implement a 'cdev_kill'
> function which could kill all the processes using a cdev. Thus, during
> an unbind, a driver could call it and be sure that there are no users
> left and it can safely allow the devres unwind to continue. Then no
> difficult and racy 'alive' flags would be necessary and it would be much
> easier on drivers.

That could help, but this would mean cdev would have to insert a shim
to grab locks around the various file ops.

> > 3) A couple of drivers drivers/char/tpm doesn't seem to have any
> > protection at all and appears like they would continue to use io
> > operations even after the they may get unmapped because the char device
> > persists.

AFAIK TPM is correct and has been robustly tested now. We have a 'vtpm'
driver that agressively uses hot-unplug.

Firstly, TPM correctly tracks krefs on the 'chip', so the pointer is
always valid.

Next, anytime someone has a 'chip' pointer and wants to execute a TPM
operation they have to call tpm_try_get_ops and hold that lock for the
duration of their usage. Internally it does this:

down_read(>ops_sem);
if (!chip->ops)
goto out_lock; // FAILURE

In the cdev fops case this will cause the fop to return EPIPE if it
fails. cdev holds this read side lock while it is running TPM commands
inside the driver.

This meshes with this code called by tpm_chip_unregister():

down_write(>ops_sem);
chip->ops = NULL;
up_write(>ops_sem);

tpm_chip_unregister uses this to provide a strong fence guarentee to
the driver. After tpm_chip_unregister return:
 - Nothing is running in a TPM ops callback currently
 - Nothing will ever call a TPM ops callback in future

This allows the TPM drivers to be trivial: call tpm_chip_unregister(),
kill the IRQ, and unmap the io resource, unload the module code.

TPM drivers cannot associate HW resources to the 'chip' struct device,
as it is unwound and devm triggered *during* tpm_chip_unregister and
does not enjoy the strong fence guarantee.

infiniband has a similar 'strong fence' unregister function. I think
it is best-practice for subsystem design to provide that because
drivers will never get it right on their own :(

Both infiniband and TPM have the 'detach' flavour of unplug where the
user is not forced to close their open cdevs. For simpler cases you
can just wait for all cdevs to close with a simple rwsem, much like
sysfs does already during device_del.

Jason


Re: [PATCH v5 0/4] New Microsemi PCI Switch Management Driver

2017-03-01 Thread Logan Gunthorpe


On 01/03/17 03:59 PM, Keith Busch wrote:
> Okay, I see. Was mistakenliy looking at v4. The v5 looks right.

Great! Thanks for reviewing that for me.

Logan



Re: [PATCH v5 0/4] New Microsemi PCI Switch Management Driver

2017-03-01 Thread Logan Gunthorpe


On 01/03/17 03:59 PM, Keith Busch wrote:
> Okay, I see. Was mistakenliy looking at v4. The v5 looks right.

Great! Thanks for reviewing that for me.

Logan



Re: [PATCH v5 0/4] New Microsemi PCI Switch Management Driver

2017-03-01 Thread Logan Gunthorpe


On 01/03/17 03:26 PM, Keith Busch wrote:
> I think this is from using the managed device resource API to request the
> irq actions. The scope of the resource used to be tied to the pci_dev's
> dev, but now it's the new switchec class dev, which has a different
> lifetime while open references exist, so it's not releasing the irq's.

The scope of the IRQ was originally tied to the pci_dev. Then in v4 I
tied it to the switchtec device in order to try and keep using the pci
device after unbind. This didn't work, so I switched it back to using
the pci_dev. (This seems to be the way most drivers work anyway.)


> One thing about the BUG_ON that is confusing me is how it's getting
> to free_msi_irq's BUG in v4 or v5. I don't see any part releasing the
> allocated ones. Maybe the devres API is harder to use than having the
> driver manage all the resources...

free_msi_irqs seems to be called via pci_disable_device in pcim_release
which devres will call during release of the PCI device and before all
the references to the pci_dev are freed (I tried adding an extra
get_device which gets put in the child devices release -- this didn't work):

 [ 1079.845616] Call Trace:
 [ 1079.845652]  ? pcim_release+0x35/0x96
 [ 1079.845691]  ? release_nodes+0x15b/0x17c
 [ 1079.845730]  ? device_release_driver_internal+0x12d/0x1cb
 [ 1079.845771]  ? unbind_store+0x59/0x89
 [ 1079.845809]  ? kernfs_fop_write+0xe7/0x129
 [ 1079.845847]  ? __vfs_write+0x1c/0xa2
 [ 1079.845885]  ? kmem_cache_alloc+0xc5/0x131
 [ 1079.845923]  ? fput+0xd/0x7d
 [ 1079.845958]  ? filp_close+0x5a/0x61
 [ 1079.845993]  ? vfs_write+0xa2/0xe4
 [ 1079.846028]  ? SyS_write+0x48/0x73
 [ 1079.846066]  ? entry_SYSCALL_64_fastpath+0x13/0x94

v5 is correct because it registers the irqs against the pci_dev (with
devm_request_irq) and thus they get freed in time as part of the devres
unwind.

Logan




Re: [PATCH v5 0/4] New Microsemi PCI Switch Management Driver

2017-03-01 Thread Logan Gunthorpe


On 01/03/17 03:26 PM, Keith Busch wrote:
> I think this is from using the managed device resource API to request the
> irq actions. The scope of the resource used to be tied to the pci_dev's
> dev, but now it's the new switchec class dev, which has a different
> lifetime while open references exist, so it's not releasing the irq's.

The scope of the IRQ was originally tied to the pci_dev. Then in v4 I
tied it to the switchtec device in order to try and keep using the pci
device after unbind. This didn't work, so I switched it back to using
the pci_dev. (This seems to be the way most drivers work anyway.)


> One thing about the BUG_ON that is confusing me is how it's getting
> to free_msi_irq's BUG in v4 or v5. I don't see any part releasing the
> allocated ones. Maybe the devres API is harder to use than having the
> driver manage all the resources...

free_msi_irqs seems to be called via pci_disable_device in pcim_release
which devres will call during release of the PCI device and before all
the references to the pci_dev are freed (I tried adding an extra
get_device which gets put in the child devices release -- this didn't work):

 [ 1079.845616] Call Trace:
 [ 1079.845652]  ? pcim_release+0x35/0x96
 [ 1079.845691]  ? release_nodes+0x15b/0x17c
 [ 1079.845730]  ? device_release_driver_internal+0x12d/0x1cb
 [ 1079.845771]  ? unbind_store+0x59/0x89
 [ 1079.845809]  ? kernfs_fop_write+0xe7/0x129
 [ 1079.845847]  ? __vfs_write+0x1c/0xa2
 [ 1079.845885]  ? kmem_cache_alloc+0xc5/0x131
 [ 1079.845923]  ? fput+0xd/0x7d
 [ 1079.845958]  ? filp_close+0x5a/0x61
 [ 1079.845993]  ? vfs_write+0xa2/0xe4
 [ 1079.846028]  ? SyS_write+0x48/0x73
 [ 1079.846066]  ? entry_SYSCALL_64_fastpath+0x13/0x94

v5 is correct because it registers the irqs against the pci_dev (with
devm_request_irq) and thus they get freed in time as part of the devres
unwind.

Logan




Re: [PATCH v5 0/4] New Microsemi PCI Switch Management Driver

2017-03-01 Thread Keith Busch
On Wed, Mar 01, 2017 at 03:37:03PM -0700, Logan Gunthorpe wrote:
> On 01/03/17 03:26 PM, Keith Busch wrote:
> > I think this is from using the managed device resource API to request the
> > irq actions. The scope of the resource used to be tied to the pci_dev's
> > dev, but now it's the new switchec class dev, which has a different
> > lifetime while open references exist, so it's not releasing the irq's.
> 
> The scope of the IRQ was originally tied to the pci_dev. Then in v4 I
> tied it to the switchtec device in order to try and keep using the pci
> device after unbind. This didn't work, so I switched it back to using
> the pci_dev. (This seems to be the way most drivers work anyway.)

Okay, I see. Was mistakenliy looking at v4. The v5 looks right.


Re: [PATCH v5 0/4] New Microsemi PCI Switch Management Driver

2017-03-01 Thread Keith Busch
On Wed, Mar 01, 2017 at 03:37:03PM -0700, Logan Gunthorpe wrote:
> On 01/03/17 03:26 PM, Keith Busch wrote:
> > I think this is from using the managed device resource API to request the
> > irq actions. The scope of the resource used to be tied to the pci_dev's
> > dev, but now it's the new switchec class dev, which has a different
> > lifetime while open references exist, so it's not releasing the irq's.
> 
> The scope of the IRQ was originally tied to the pci_dev. Then in v4 I
> tied it to the switchtec device in order to try and keep using the pci
> device after unbind. This didn't work, so I switched it back to using
> the pci_dev. (This seems to be the way most drivers work anyway.)

Okay, I see. Was mistakenliy looking at v4. The v5 looks right.


Re: [PATCH v5 0/4] New Microsemi PCI Switch Management Driver

2017-03-01 Thread Logan Gunthorpe
Hey,

Seems to me like an elegant solution would be to implement a 'cdev_kill'
function which could kill all the processes using a cdev. Thus, during
an unbind, a driver could call it and be sure that there are no users
left and it can safely allow the devres unwind to continue. Then no
difficult and racy 'alive' flags would be necessary and it would be much
easier on drivers.

However, I don't think any such thing exists at the moment and it's not
likely to be done in the near term. I'm reasonably confident in the
correctness of v5 of my driver (especially when compared to other
drivers) and unless someone can describe how it's wrong or a better
solution I'd rather see it merged as is. If and when a better approach
arrives I'd happily patch it to improve the situation.

Logan

On 01/03/17 03:24 PM, Logan Gunthorpe wrote:
> 
> 
> On 01/03/17 02:41 PM, Bjorn Helgaas wrote:
>> I don't think this is indicating a bug in the PCI core (although I do
>> think a BUG_ON() here is an excessive response).  I think it's an
>> indication that the driver didn't disconnect its ISR.  Without more
>> details of the failure it's hard to tell if the BUG_ON is a symptom of
>> a problem in the driver or what.
> 
> Yes, my assumption was that when you force an unbind on the PCI core,
> it's designed to stop using the PCI device right away even if there are
> users using it. Thus it becomes the drivers responsibility to handle
> this situation.
> 
>> An "alive" flag feels racy, and I can't tell if it's really the best
>> way to deal with this, or if it's just avoiding the issue.  There must
>> be other drivers with the same cleanup issue -- do they handle it the
>> same way?
> 
> I haven't done a comprehensive search, but it's very common for people
> to use (and this is what I've adopted again in v5):
> 
> devm_request_irq(>dev, ...)
> 
> In this way, the IRQs are released with the pci_dev (or often platform)
> and thus the BUG_ON never hits. However, it means any user space program
> waiting on an IRQ (like via a cdev call) will hang unless handled with
> other means. Exactly what those means are seems driver specific and not
> always obvious. I wouldn't be surprised if a lot of drivers get this
> aspect wrong.
> 
> A couple examples I've looked at:
> 
> 1) drivers/dax/dax.c uses an alive flag without any mutexes, atomics or
> anything. So I don't know if it's racy or perhaps correct for other reasons.
> 
> 2) drivers/char/hw_random has a drop_current_rng that looks like it
> could easily be racy with the get_current_rng in the userspace flow.
> 
> 3) A couple of drivers drivers/char/tpm doesn't seem to have any
> protection at all and appears like they would continue to use io
> operations even after the they may get unmapped because the char device
> persists.
> 
> So I'm not sure where you'd find a driver that does it correctly and in
> a simpler way..
> 
> Another thing: based on comments in [1], a lot of people don't seem to
> realize that cdev instances can persist long after cdev_del so it's
> probably very common for drivers to get this wrong.
> 
> Logan
> 
> [1] https://lists.01.org/pipermail/linux-nvdimm/2017-February/009001.html




Re: [PATCH v5 0/4] New Microsemi PCI Switch Management Driver

2017-03-01 Thread Logan Gunthorpe
Hey,

Seems to me like an elegant solution would be to implement a 'cdev_kill'
function which could kill all the processes using a cdev. Thus, during
an unbind, a driver could call it and be sure that there are no users
left and it can safely allow the devres unwind to continue. Then no
difficult and racy 'alive' flags would be necessary and it would be much
easier on drivers.

However, I don't think any such thing exists at the moment and it's not
likely to be done in the near term. I'm reasonably confident in the
correctness of v5 of my driver (especially when compared to other
drivers) and unless someone can describe how it's wrong or a better
solution I'd rather see it merged as is. If and when a better approach
arrives I'd happily patch it to improve the situation.

Logan

On 01/03/17 03:24 PM, Logan Gunthorpe wrote:
> 
> 
> On 01/03/17 02:41 PM, Bjorn Helgaas wrote:
>> I don't think this is indicating a bug in the PCI core (although I do
>> think a BUG_ON() here is an excessive response).  I think it's an
>> indication that the driver didn't disconnect its ISR.  Without more
>> details of the failure it's hard to tell if the BUG_ON is a symptom of
>> a problem in the driver or what.
> 
> Yes, my assumption was that when you force an unbind on the PCI core,
> it's designed to stop using the PCI device right away even if there are
> users using it. Thus it becomes the drivers responsibility to handle
> this situation.
> 
>> An "alive" flag feels racy, and I can't tell if it's really the best
>> way to deal with this, or if it's just avoiding the issue.  There must
>> be other drivers with the same cleanup issue -- do they handle it the
>> same way?
> 
> I haven't done a comprehensive search, but it's very common for people
> to use (and this is what I've adopted again in v5):
> 
> devm_request_irq(>dev, ...)
> 
> In this way, the IRQs are released with the pci_dev (or often platform)
> and thus the BUG_ON never hits. However, it means any user space program
> waiting on an IRQ (like via a cdev call) will hang unless handled with
> other means. Exactly what those means are seems driver specific and not
> always obvious. I wouldn't be surprised if a lot of drivers get this
> aspect wrong.
> 
> A couple examples I've looked at:
> 
> 1) drivers/dax/dax.c uses an alive flag without any mutexes, atomics or
> anything. So I don't know if it's racy or perhaps correct for other reasons.
> 
> 2) drivers/char/hw_random has a drop_current_rng that looks like it
> could easily be racy with the get_current_rng in the userspace flow.
> 
> 3) A couple of drivers drivers/char/tpm doesn't seem to have any
> protection at all and appears like they would continue to use io
> operations even after the they may get unmapped because the char device
> persists.
> 
> So I'm not sure where you'd find a driver that does it correctly and in
> a simpler way..
> 
> Another thing: based on comments in [1], a lot of people don't seem to
> realize that cdev instances can persist long after cdev_del so it's
> probably very common for drivers to get this wrong.
> 
> Logan
> 
> [1] https://lists.01.org/pipermail/linux-nvdimm/2017-February/009001.html




Re: [PATCH v5 0/4] New Microsemi PCI Switch Management Driver

2017-03-01 Thread Keith Busch
On Wed, Mar 01, 2017 at 03:41:20PM -0600, Bjorn Helgaas wrote:
> On Sat, Feb 25, 2017 at 11:53:13PM -0700, Logan Gunthorpe wrote:
> > Changes since v4:
> > 
> > * Turns out pushing the pci release code into the device release
> >   function didn't work as I would have liked. If you try to unbind the
> >   device with an instance open, then you hit a kernel bug at
> >   drivers/pci/msi.c:371. (This didn't occur in v3.)
> 
> This is in free_msi_irqs():
> 
>   368for_each_pci_msi_entry(entry, dev)
>   369if (entry->irq)
>   370for (i = 0; i < entry->nvec_used; i++)
>   371BUG_ON(irq_has_action(entry->irq + i));
> 
> I don't think this is indicating a bug in the PCI core (although I do
> think a BUG_ON() here is an excessive response).  I think it's an
> indication that the driver didn't disconnect its ISR.  Without more
> details of the failure it's hard to tell if the BUG_ON is a symptom of
> a problem in the driver or what.
> 
> An "alive" flag feels racy, and I can't tell if it's really the best
> way to deal with this, or if it's just avoiding the issue.  There must
> be other drivers with the same cleanup issue -- do they handle it the
> same way?

I think this is from using the managed device resource API to request the
irq actions. The scope of the resource used to be tied to the pci_dev's
dev, but now it's the new switchec class dev, which has a different
lifetime while open references exist, so it's not releasing the irq's.

One thing about the BUG_ON that is confusing me is how it's getting
to free_msi_irq's BUG in v4 or v5. I don't see any part releasing the
allocated ones. Maybe the devres API is harder to use than having the
driver manage all the resources...


Re: [PATCH v5 0/4] New Microsemi PCI Switch Management Driver

2017-03-01 Thread Keith Busch
On Wed, Mar 01, 2017 at 03:41:20PM -0600, Bjorn Helgaas wrote:
> On Sat, Feb 25, 2017 at 11:53:13PM -0700, Logan Gunthorpe wrote:
> > Changes since v4:
> > 
> > * Turns out pushing the pci release code into the device release
> >   function didn't work as I would have liked. If you try to unbind the
> >   device with an instance open, then you hit a kernel bug at
> >   drivers/pci/msi.c:371. (This didn't occur in v3.)
> 
> This is in free_msi_irqs():
> 
>   368for_each_pci_msi_entry(entry, dev)
>   369if (entry->irq)
>   370for (i = 0; i < entry->nvec_used; i++)
>   371BUG_ON(irq_has_action(entry->irq + i));
> 
> I don't think this is indicating a bug in the PCI core (although I do
> think a BUG_ON() here is an excessive response).  I think it's an
> indication that the driver didn't disconnect its ISR.  Without more
> details of the failure it's hard to tell if the BUG_ON is a symptom of
> a problem in the driver or what.
> 
> An "alive" flag feels racy, and I can't tell if it's really the best
> way to deal with this, or if it's just avoiding the issue.  There must
> be other drivers with the same cleanup issue -- do they handle it the
> same way?

I think this is from using the managed device resource API to request the
irq actions. The scope of the resource used to be tied to the pci_dev's
dev, but now it's the new switchec class dev, which has a different
lifetime while open references exist, so it's not releasing the irq's.

One thing about the BUG_ON that is confusing me is how it's getting
to free_msi_irq's BUG in v4 or v5. I don't see any part releasing the
allocated ones. Maybe the devres API is harder to use than having the
driver manage all the resources...


Re: [PATCH v5 0/4] New Microsemi PCI Switch Management Driver

2017-03-01 Thread Logan Gunthorpe


On 01/03/17 02:41 PM, Bjorn Helgaas wrote:
> I don't think this is indicating a bug in the PCI core (although I do
> think a BUG_ON() here is an excessive response).  I think it's an
> indication that the driver didn't disconnect its ISR.  Without more
> details of the failure it's hard to tell if the BUG_ON is a symptom of
> a problem in the driver or what.

Yes, my assumption was that when you force an unbind on the PCI core,
it's designed to stop using the PCI device right away even if there are
users using it. Thus it becomes the drivers responsibility to handle
this situation.

> An "alive" flag feels racy, and I can't tell if it's really the best
> way to deal with this, or if it's just avoiding the issue.  There must
> be other drivers with the same cleanup issue -- do they handle it the
> same way?

I haven't done a comprehensive search, but it's very common for people
to use (and this is what I've adopted again in v5):

devm_request_irq(>dev, ...)

In this way, the IRQs are released with the pci_dev (or often platform)
and thus the BUG_ON never hits. However, it means any user space program
waiting on an IRQ (like via a cdev call) will hang unless handled with
other means. Exactly what those means are seems driver specific and not
always obvious. I wouldn't be surprised if a lot of drivers get this
aspect wrong.

A couple examples I've looked at:

1) drivers/dax/dax.c uses an alive flag without any mutexes, atomics or
anything. So I don't know if it's racy or perhaps correct for other reasons.

2) drivers/char/hw_random has a drop_current_rng that looks like it
could easily be racy with the get_current_rng in the userspace flow.

3) A couple of drivers drivers/char/tpm doesn't seem to have any
protection at all and appears like they would continue to use io
operations even after the they may get unmapped because the char device
persists.

So I'm not sure where you'd find a driver that does it correctly and in
a simpler way..

Another thing: based on comments in [1], a lot of people don't seem to
realize that cdev instances can persist long after cdev_del so it's
probably very common for drivers to get this wrong.

Logan

[1] https://lists.01.org/pipermail/linux-nvdimm/2017-February/009001.html



>>   To solve this, we've moved the pci release code back into the
>>   unregister function and reintroduced an alive flag. This time,
>>   however, the alive flag is protected by mrpc_mutex and we're very
>>   careful about what happens to devices still in use (they should
>>   all be released through the timeout path and an ENODEV error
>>   returned to userspace; while new commands are blocked with the
>>   same error).


Re: [PATCH v5 0/4] New Microsemi PCI Switch Management Driver

2017-03-01 Thread Logan Gunthorpe


On 01/03/17 02:41 PM, Bjorn Helgaas wrote:
> I don't think this is indicating a bug in the PCI core (although I do
> think a BUG_ON() here is an excessive response).  I think it's an
> indication that the driver didn't disconnect its ISR.  Without more
> details of the failure it's hard to tell if the BUG_ON is a symptom of
> a problem in the driver or what.

Yes, my assumption was that when you force an unbind on the PCI core,
it's designed to stop using the PCI device right away even if there are
users using it. Thus it becomes the drivers responsibility to handle
this situation.

> An "alive" flag feels racy, and I can't tell if it's really the best
> way to deal with this, or if it's just avoiding the issue.  There must
> be other drivers with the same cleanup issue -- do they handle it the
> same way?

I haven't done a comprehensive search, but it's very common for people
to use (and this is what I've adopted again in v5):

devm_request_irq(>dev, ...)

In this way, the IRQs are released with the pci_dev (or often platform)
and thus the BUG_ON never hits. However, it means any user space program
waiting on an IRQ (like via a cdev call) will hang unless handled with
other means. Exactly what those means are seems driver specific and not
always obvious. I wouldn't be surprised if a lot of drivers get this
aspect wrong.

A couple examples I've looked at:

1) drivers/dax/dax.c uses an alive flag without any mutexes, atomics or
anything. So I don't know if it's racy or perhaps correct for other reasons.

2) drivers/char/hw_random has a drop_current_rng that looks like it
could easily be racy with the get_current_rng in the userspace flow.

3) A couple of drivers drivers/char/tpm doesn't seem to have any
protection at all and appears like they would continue to use io
operations even after the they may get unmapped because the char device
persists.

So I'm not sure where you'd find a driver that does it correctly and in
a simpler way..

Another thing: based on comments in [1], a lot of people don't seem to
realize that cdev instances can persist long after cdev_del so it's
probably very common for drivers to get this wrong.

Logan

[1] https://lists.01.org/pipermail/linux-nvdimm/2017-February/009001.html



>>   To solve this, we've moved the pci release code back into the
>>   unregister function and reintroduced an alive flag. This time,
>>   however, the alive flag is protected by mrpc_mutex and we're very
>>   careful about what happens to devices still in use (they should
>>   all be released through the timeout path and an ENODEV error
>>   returned to userspace; while new commands are blocked with the
>>   same error).


Re: [PATCH v5 0/4] New Microsemi PCI Switch Management Driver

2017-03-01 Thread Bjorn Helgaas
On Sat, Feb 25, 2017 at 11:53:13PM -0700, Logan Gunthorpe wrote:
> Changes since v4:
> 
> * Turns out pushing the pci release code into the device release
>   function didn't work as I would have liked. If you try to unbind the
>   device with an instance open, then you hit a kernel bug at
>   drivers/pci/msi.c:371. (This didn't occur in v3.)

This is in free_msi_irqs():

  368for_each_pci_msi_entry(entry, dev)
  369if (entry->irq)
  370for (i = 0; i < entry->nvec_used; i++)
  371BUG_ON(irq_has_action(entry->irq + i));

I don't think this is indicating a bug in the PCI core (although I do
think a BUG_ON() here is an excessive response).  I think it's an
indication that the driver didn't disconnect its ISR.  Without more
details of the failure it's hard to tell if the BUG_ON is a symptom of
a problem in the driver or what.

An "alive" flag feels racy, and I can't tell if it's really the best
way to deal with this, or if it's just avoiding the issue.  There must
be other drivers with the same cleanup issue -- do they handle it the
same way?

>   To solve this, we've moved the pci release code back into the
>   unregister function and reintroduced an alive flag. This time,
>   however, the alive flag is protected by mrpc_mutex and we're very
>   careful about what happens to devices still in use (they should
>   all be released through the timeout path and an ENODEV error
>   returned to userspace; while new commands are blocked with the
>   same error).


Re: [PATCH v5 0/4] New Microsemi PCI Switch Management Driver

2017-03-01 Thread Bjorn Helgaas
On Sat, Feb 25, 2017 at 11:53:13PM -0700, Logan Gunthorpe wrote:
> Changes since v4:
> 
> * Turns out pushing the pci release code into the device release
>   function didn't work as I would have liked. If you try to unbind the
>   device with an instance open, then you hit a kernel bug at
>   drivers/pci/msi.c:371. (This didn't occur in v3.)

This is in free_msi_irqs():

  368for_each_pci_msi_entry(entry, dev)
  369if (entry->irq)
  370for (i = 0; i < entry->nvec_used; i++)
  371BUG_ON(irq_has_action(entry->irq + i));

I don't think this is indicating a bug in the PCI core (although I do
think a BUG_ON() here is an excessive response).  I think it's an
indication that the driver didn't disconnect its ISR.  Without more
details of the failure it's hard to tell if the BUG_ON is a symptom of
a problem in the driver or what.

An "alive" flag feels racy, and I can't tell if it's really the best
way to deal with this, or if it's just avoiding the issue.  There must
be other drivers with the same cleanup issue -- do they handle it the
same way?

>   To solve this, we've moved the pci release code back into the
>   unregister function and reintroduced an alive flag. This time,
>   however, the alive flag is protected by mrpc_mutex and we're very
>   careful about what happens to devices still in use (they should
>   all be released through the timeout path and an ENODEV error
>   returned to userspace; while new commands are blocked with the
>   same error).


Re: [PATCH v5 0/4] New Microsemi PCI Switch Management Driver

2017-02-28 Thread Greg Kroah-Hartman
On Tue, Feb 28, 2017 at 10:11:56AM -0700, Logan Gunthorpe wrote:
> In any case, I also wish we could have had this discussion 3 months ago
> when we posted the RFC and not when I have people pushing to get this
> merged.

Relax, there is no rush here, no deadlines, etc.  Moving the files to a
different directly takes all of 5 minutes, max.

mellow out dude,

greg k-h


Re: [PATCH v5 0/4] New Microsemi PCI Switch Management Driver

2017-02-28 Thread Greg Kroah-Hartman
On Tue, Feb 28, 2017 at 10:11:56AM -0700, Logan Gunthorpe wrote:
> In any case, I also wish we could have had this discussion 3 months ago
> when we posted the RFC and not when I have people pushing to get this
> merged.

Relax, there is no rush here, no deadlines, etc.  Moving the files to a
different directly takes all of 5 minutes, max.

mellow out dude,

greg k-h


Re: [PATCH v5 0/4] New Microsemi PCI Switch Management Driver

2017-02-28 Thread Logan Gunthorpe

> This driver doesn't have anything to do with the PCI core, other than
> using the pci_register_driver() interface (just like all other drivers
> for PCI-connected devices), so drivers/pci doesn't really feel like
> the right place for it.  Putting it in drivers/pci leads to the sort
> of confusion you mentioned above ("To make this entirely clear ...").
> 
> Would drivers/perf or drivers/misc/switchtec/ be possible places for
> it?

I made a similar argument when we made the decision of where to put the
code. In the end, the device _is_ a PCI Switch and someone going through
menuconfig or the source tree would probably look there first.

As for drivers/perf, our device does a fair bit more than performance
counters so it doesn't seem like it really fits in there. drivers/misc
just seems like a dumping ground which we'd prefer not to contribute to.
We also considered drivers/char (seeing it exposes a char device), but
that also seems like a dumping ground with stuff that belongs and other
stuff that just ended up stuck between the cracks.

If you still feel strongly about this we can move it into misc, but I
think from an organizational perspective pci/switch makes a bit more sense.

In any case, I also wish we could have had this discussion 3 months ago
when we posted the RFC and not when I have people pushing to get this
merged.

Thanks,

Logan


Re: [PATCH v5 0/4] New Microsemi PCI Switch Management Driver

2017-02-28 Thread Logan Gunthorpe

> This driver doesn't have anything to do with the PCI core, other than
> using the pci_register_driver() interface (just like all other drivers
> for PCI-connected devices), so drivers/pci doesn't really feel like
> the right place for it.  Putting it in drivers/pci leads to the sort
> of confusion you mentioned above ("To make this entirely clear ...").
> 
> Would drivers/perf or drivers/misc/switchtec/ be possible places for
> it?

I made a similar argument when we made the decision of where to put the
code. In the end, the device _is_ a PCI Switch and someone going through
menuconfig or the source tree would probably look there first.

As for drivers/perf, our device does a fair bit more than performance
counters so it doesn't seem like it really fits in there. drivers/misc
just seems like a dumping ground which we'd prefer not to contribute to.
We also considered drivers/char (seeing it exposes a char device), but
that also seems like a dumping ground with stuff that belongs and other
stuff that just ended up stuck between the cracks.

If you still feel strongly about this we can move it into misc, but I
think from an organizational perspective pci/switch makes a bit more sense.

In any case, I also wish we could have had this discussion 3 months ago
when we posted the RFC and not when I have people pushing to get this
merged.

Thanks,

Logan


Re: [PATCH v5 0/4] New Microsemi PCI Switch Management Driver

2017-02-28 Thread Bjorn Helgaas
Hi Logan,

On Sat, Feb 25, 2017 at 11:53:13PM -0700, Logan Gunthorpe wrote:
> ...
> This is a continuation of the RFC we posted lasted month [1] which
> proposes a management driver for Microsemi's Switchtec line of PCI
> switches. This hardware is still looking to be used in the Open
> Compute Platform
> 
> To make this entirely clear: the Switchtec products are compliant
> with the PCI specifications and are supported today with the standard
> in-kernel driver. However, these devices also expose a management endpoint
> on a separate PCI function address which can be used to perform some
> advanced operations. This is a driver for that function. See the patch
> for more information.
> 
> Since the RFC, we've made the changes requested by Greg Kroah-Hartman
> and Keith Busch, and we've also fleshed out a number of features. We've
> added a couple of IOCTLs and sysfs attributes which are documented in
> the patch. Significant work has also been done on the userspace tool
> which is available under a GPL license at [2]. We've also had testing
> done by some of the interested parties.
> 
> We hope to see this work included in either 4.11 or 4.12 assuming a
> smooth review process.
> 
> The patch is based off of the v4.10-rc6 release.
> 
> Thanks for your review,
> 
> Logan
> 
> [1] https://www.spinics.net/lists/linux-pci/msg56897.html
> [2] https://github.com/sbates130272/switchtec-user
> 
> Logan Gunthorpe (4):
>   MicroSemi Switchtec management interface driver
>   switchtec: Add user interface documentation
>   switchtec: Add sysfs attributes to the Switchtec driver
>   switchtec: Add IOCTLs to the Switchtec driver
> 
>  Documentation/ABI/testing/sysfs-class-switchtec |   96 ++
>  Documentation/ioctl/ioctl-number.txt|1 +
>  Documentation/switchtec.txt |   80 ++
>  MAINTAINERS |   11 +
>  drivers/pci/Kconfig |1 +
>  drivers/pci/Makefile|1 +
>  drivers/pci/switch/Kconfig  |   13 +
>  drivers/pci/switch/Makefile |1 +
>  drivers/pci/switch/switchtec.c  | 1535 
> +++
>  include/uapi/linux/switchtec_ioctl.h|  132 ++
>  10 files changed, 1871 insertions(+)
>  create mode 100644 Documentation/ABI/testing/sysfs-class-switchtec
>  create mode 100644 Documentation/switchtec.txt
>  create mode 100644 drivers/pci/switch/Kconfig
>  create mode 100644 drivers/pci/switch/Makefile
>  create mode 100644 drivers/pci/switch/switchtec.c
>  create mode 100644 include/uapi/linux/switchtec_ioctl.h

This driver doesn't have anything to do with the PCI core, other than
using the pci_register_driver() interface (just like all other drivers
for PCI-connected devices), so drivers/pci doesn't really feel like
the right place for it.  Putting it in drivers/pci leads to the sort
of confusion you mentioned above ("To make this entirely clear ...").

Would drivers/perf or drivers/misc/switchtec/ be possible places for
it?

Bjorn


Re: [PATCH v5 0/4] New Microsemi PCI Switch Management Driver

2017-02-28 Thread Bjorn Helgaas
Hi Logan,

On Sat, Feb 25, 2017 at 11:53:13PM -0700, Logan Gunthorpe wrote:
> ...
> This is a continuation of the RFC we posted lasted month [1] which
> proposes a management driver for Microsemi's Switchtec line of PCI
> switches. This hardware is still looking to be used in the Open
> Compute Platform
> 
> To make this entirely clear: the Switchtec products are compliant
> with the PCI specifications and are supported today with the standard
> in-kernel driver. However, these devices also expose a management endpoint
> on a separate PCI function address which can be used to perform some
> advanced operations. This is a driver for that function. See the patch
> for more information.
> 
> Since the RFC, we've made the changes requested by Greg Kroah-Hartman
> and Keith Busch, and we've also fleshed out a number of features. We've
> added a couple of IOCTLs and sysfs attributes which are documented in
> the patch. Significant work has also been done on the userspace tool
> which is available under a GPL license at [2]. We've also had testing
> done by some of the interested parties.
> 
> We hope to see this work included in either 4.11 or 4.12 assuming a
> smooth review process.
> 
> The patch is based off of the v4.10-rc6 release.
> 
> Thanks for your review,
> 
> Logan
> 
> [1] https://www.spinics.net/lists/linux-pci/msg56897.html
> [2] https://github.com/sbates130272/switchtec-user
> 
> Logan Gunthorpe (4):
>   MicroSemi Switchtec management interface driver
>   switchtec: Add user interface documentation
>   switchtec: Add sysfs attributes to the Switchtec driver
>   switchtec: Add IOCTLs to the Switchtec driver
> 
>  Documentation/ABI/testing/sysfs-class-switchtec |   96 ++
>  Documentation/ioctl/ioctl-number.txt|1 +
>  Documentation/switchtec.txt |   80 ++
>  MAINTAINERS |   11 +
>  drivers/pci/Kconfig |1 +
>  drivers/pci/Makefile|1 +
>  drivers/pci/switch/Kconfig  |   13 +
>  drivers/pci/switch/Makefile |1 +
>  drivers/pci/switch/switchtec.c  | 1535 
> +++
>  include/uapi/linux/switchtec_ioctl.h|  132 ++
>  10 files changed, 1871 insertions(+)
>  create mode 100644 Documentation/ABI/testing/sysfs-class-switchtec
>  create mode 100644 Documentation/switchtec.txt
>  create mode 100644 drivers/pci/switch/Kconfig
>  create mode 100644 drivers/pci/switch/Makefile
>  create mode 100644 drivers/pci/switch/switchtec.c
>  create mode 100644 include/uapi/linux/switchtec_ioctl.h

This driver doesn't have anything to do with the PCI core, other than
using the pci_register_driver() interface (just like all other drivers
for PCI-connected devices), so drivers/pci doesn't really feel like
the right place for it.  Putting it in drivers/pci leads to the sort
of confusion you mentioned above ("To make this entirely clear ...").

Would drivers/perf or drivers/misc/switchtec/ be possible places for
it?

Bjorn