Re: [Nouveau] [PATCH v2] ALSA: hda: Continue to probe when codec probe fails

2021-04-10 Thread Lukas Wunner
On Sat, Apr 10, 2021 at 04:51:27PM +0100, Roy Spliet wrote:
> Can I ask someone with more
> technical knowledge of snd_hda_intel and vgaswitcheroo to brainstorm about
> the possible challenges of nouveau taking matters into its own hand rather
> than keeping this PCI quirk around?

It sounds to me like the HDA is not powered if no cable is plugged in.
What is reponsible then for powering it up or down, firmware code on
the GPU or in the host's BIOS?

Ideally, we should try to find out how to control HDA power from the
operating system rather than trying to cooperate with whatever firmware
is doing.  If we have that capability, the OS should power the HDA up
and down as it sees fit.

Thanks,

Lukas


Re: [PATCH v2 1/1] PCI: pciehp: Skip DLLSC handling if DPC is triggered

2021-03-28 Thread Lukas Wunner
On Wed, Mar 17, 2021 at 01:02:07PM -0700, Kuppuswamy, Sathyanarayanan wrote:
> On 3/17/21 12:01 PM, Lukas Wunner wrote:
> > If the events are ignored, the driver of the device in the hotplug slot
> > is not unbound and rebound.  So the driver must be able to cope with
> > loss of TLPs during DPC recovery and it must be able to cope with
> > whatever state the endpoint device is in after DPC recovery.
> > Is this really safe?  How does the nvme driver deal with it?
> 
> During DPC recovery, in pcie_do_recovery() function, we use
> report_frozen_detected() to notify all devices attached to the port
> about the fatal error. After this notification, we expect all
> affected devices to halt its IO transactions.
> 
> Regarding state restoration, after successful recovery, we use
> report_slot_reset() to notify about the slot/link reset. So device
> drivers are expected to restore the device to working state after this
> notification.

Thanks a lot for the explanation.


> I am not sure how pure firmware DPC recovery works. Is there a platform
> which uses this combination? For firmware DPC model, spec does not clarify
> following points.
> 
> 1. Who will notify the affected device drivers to halt the IO transactions.
> 2. Who is responsible to restore the state of the device after link reset.
> 
> IMO, pure firmware DPC does not support seamless recovery. I think after it
> clears the DPC trigger status, it might expect hotplug handler be responsible
> for device recovery.
> 
> I don't want to add fix to the code path that I don't understand. This is the
> reason for extending this logic to pure firmware DPC case.

I agree, let's just declare synchronization of pciehp with
pure firmware DPC recovery as unsupported for now.


I've just submitted a refined version of my patch to the list:
https://lore.kernel.org/linux-pci/b70e19324bbdded90b728a5687aa78dc17c20306.1616921228.git.lu...@wunner.de/

If you could give this new version a whirl I'd be grateful.

This version contains more code comments and kernel-doc.

There's now a check in dpc_completed() whether the DPC Status
register contains "all ones", which can happen when a DPC-capable
hotplug port is hot-removed, i.e. for cascaded DPC-capable hotplug
ports.

I've also realized that the previous version was prone to races
which are theoretical but should nonetheless be avoided:
E.g., previously the DLLSC event was only removed from "events"
if the link is still up after DPC recovery.  However if DPC
triggers and recovers multiple times in a row, the link may
happen to be up but a new DLLSC event may have been picked up
in "pending_events" which should be ignored.  I've solved this
by inverting the logic such that DLLSC is *always* removed from
"events", and if the link is unexpectedly *down* after successful
recovery, a DLLSC event is synthesized.

Thanks,

Lukas


Re: [PATCH v2 1/1] PCI: pciehp: Skip DLLSC handling if DPC is triggered

2021-03-28 Thread Lukas Wunner
On Sat, Mar 27, 2021 at 10:49:45PM -0700, Kuppuswamy, Sathyanarayanan wrote:
> On 3/16/21 9:13 PM, Lukas Wunner wrote:
> > --- a/drivers/pci/hotplug/pciehp_hpc.c
> > +++ b/drivers/pci/hotplug/pciehp_hpc.c
> > @@ -707,6 +707,17 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
> > }
> > /*
> > +* Ignore Link Down/Up caused by Downstream Port Containment
> > +* if recovery from the error succeeded.
> > +*/
> > +   if ((events & PCI_EXP_SLTSTA_DLLSC) && pci_dpc_recovered(pdev) &&
> > +   ctrl->state == ON_STATE) {
> > +   atomic_and(~PCI_EXP_SLTSTA_DLLSC, >pending_events);
> 
> Why modify pending_events here. It should be already be zero right?

"pending_events" is expected to contain the Link Up event
after successful recovery, whereas "events" contains the
Link Down event (if DPC was triggered).

pciehp is structured around the generic irq core's separation
of hardirq handler (runs in interrupt context) and irq thread
(runs in task context).  The hardirq handler pciehp_isr() picks
up events from the Slot Status register and stores them in
"pending_events" for later consumption by the irq thread
pciehp_ist().  The irq thread performs long running tasks such
as slot bringup and bringdown.  The irq thread is also allowed
to sleep.

While pciehp_ist() awaits completion of DPC recovery, a DLLSC
event will be picked up by pciehp_isr() which is caused by
link retraining.  That event is contained in "pending_events",
so after successful recovery, pciehp_ist() can just delete it.

Thanks,

Lukas


Re: [PATCH v2 1/1] PCI: pciehp: Skip DLLSC handling if DPC is triggered

2021-03-17 Thread Lukas Wunner
On Wed, Mar 17, 2021 at 12:22:41PM -0700, Raj, Ashok wrote:
> On Wed, Mar 17, 2021 at 08:09:52PM +0100, Lukas Wunner wrote:
> > On Wed, Mar 17, 2021 at 10:45:21AM -0700, Dan Williams wrote:
> > > Ah, ok, we're missing a flush of the hotplug event handler after the
> > > link is up to make sure the hotplug handler does not see the Link Up.
> > > I'm not immediately seeing how the new proposal ensures that there is
> > > no Link Up event still in flight after DPC completes its work.
> > > Wouldn't it be required to throw away Link Up to Link Up transitions?
> > 
> > If you look at the new code added to pciehp_ist() by my patch...
> > 
> >   atomic_and(~PCI_EXP_SLTSTA_DLLSC, >pending_events);
> >   if (pciehp_check_link_active(ctrl) > 0)
> >   events &= ~PCI_EXP_SLTSTA_DLLSC;
> 
> When you have a Surprise Link Down and without any DPC, the link trains
> back up. Aren't we expected to take the slot down and add it as if a remove
> and add happens?
> 
> without this change if slot-status == ON_STATE, DLLSC means we would power
> the slot off. Then we check link_active and bring the slot back on isn't
> it?

Yes to both questions.  That behavior should still be the same
with the patch.  Do you think it's not?


Re: [PATCH v2 1/1] PCI: pciehp: Skip DLLSC handling if DPC is triggered

2021-03-17 Thread Lukas Wunner
On Wed, Mar 17, 2021 at 10:45:21AM -0700, Dan Williams wrote:
> Ah, ok, we're missing a flush of the hotplug event handler after the
> link is up to make sure the hotplug handler does not see the Link Up.
> I'm not immediately seeing how the new proposal ensures that there is
> no Link Up event still in flight after DPC completes its work.
> Wouldn't it be required to throw away Link Up to Link Up transitions?

If you look at the new code added to pciehp_ist() by my patch...

  atomic_and(~PCI_EXP_SLTSTA_DLLSC, >pending_events);
  if (pciehp_check_link_active(ctrl) > 0)
  events &= ~PCI_EXP_SLTSTA_DLLSC;

... the atomic_and() ignores the Link Up event which was picked
up by the hardirq handler pciehp_isr() while pciehp_ist() waited for
link recovery.  Afterwards, the Link Down event is only ignored if the
link is still up:  If the link has gone down again before the call to
pciehp_check_link_active(), that event is honored immediately (because
the DLLSC event is then not filtered).  If it goes down after the call,
that event will be picked up by pciehp_isr().  Thus, only the DLLSC
events caused by DPC are ignored, but no others.

A DLLSC event caused by surprise removal during DPC may be incorrectly
ignored, but the expectation is that the ensuing Presence Detect Changed
event will still cause bringdown of the slot after DPC has completed.
Hardware does exist which erroneously hardwires Presence Detect to zero,
but that's rare and DPC-capable systems are likely not affected.

I've realized today that I forgot to add a "synchronize_hardirq(irq);"
before the call to atomic_and().  Sorry, that will be fixed in the
next iteration.

Thanks,

Lukas


Re: [PATCH v2 1/1] PCI: pciehp: Skip DLLSC handling if DPC is triggered

2021-03-17 Thread Lukas Wunner
On Wed, Mar 17, 2021 at 10:54:09AM -0700, Sathyanarayanan Kuppuswamy Natarajan 
wrote:
> Flush of hotplug event after successful recovery, and a simulated
> hotplug link down event after link recovery fails should solve the
> problems raised by Lukas. I assume Lukas' proposal adds this support.
> I will check his patch shortly.

Thank you!

I'd like to get a better understanding of the issues around hotplug/DPC,
specifically I'm wondering:

If DPC recovery was successful, what is the desired behavior by pciehp,
should it ignore the Link Down/Up or bring the slot down and back up
after DPC recovery?

If the events are ignored, the driver of the device in the hotplug slot
is not unbound and rebound.  So the driver must be able to cope with
loss of TLPs during DPC recovery and it must be able to cope with
whatever state the endpoint device is in after DPC recovery.
Is this really safe?  How does the nvme driver deal with it?

Also, if DPC is handled by firmware, your patch does not ignore the
Link Down/Up events, so pciehp brings down the slot when DPC is
triggered, then brings it up after succesful recovery.  In a code
comment, you write that this behavior is okay because there's "no
race between hotplug and DPC recovery".  However, Sinan wrote in
2018 that one of the issues with hotplug versus DPC is that pciehp
may turn off slot power and thereby foil DPC recovery.  (Power off =
cold reset, whereas DPC recovery = warm reset.)  This can occur
as well if DPC is handled by firmware.

So I guess pciehp should make an attempt to await DPC recovery even
if it's handled by firmware?  Or am I missing something?  We may be
able to achieve that by polling the DPC Trigger Status bit and
DLLLA bit, but it won't work as perfectly as with native DPC support.

Finally, you write in your commit message that there are "a lot of
stability issues" if pciehp and DPC are allowed to recover freely
without proper serialization.  What are these issues exactly?
(Beyond the slot power issue mentioned above, and that the endpoint
device's driver should presumably not be unbound if DPC recovery
was successful.)

Thanks!

Lukas


Re: [PATCH v2 1/1] PCI: pciehp: Skip DLLSC handling if DPC is triggered

2021-03-16 Thread Lukas Wunner
On Tue, Mar 16, 2021 at 10:08:31PM -0700, Dan Williams wrote:
> On Tue, Mar 16, 2021 at 9:14 PM Lukas Wunner  wrote:
> >
> > On Fri, Mar 12, 2021 at 07:32:08PM -0800, 
> > sathyanarayanan.kuppusw...@linux.intel.com wrote:
> > > + if ((events == PCI_EXP_SLTSTA_DLLSC) && is_dpc_reset_active(pdev)) {
> > > + ctrl_info(ctrl, "Slot(%s): DLLSC event(DPC), skipped\n",
> > > +   slot_name(ctrl));
> > > + ret = IRQ_HANDLED;
> > > + goto out;
> > > + }
> >
> > Two problems here:
> >
> > (1) If recovery fails, the link will *remain* down, so there'll be
> > no Link Up event.  You've filtered the Link Down event, thus the
> > slot will remain in ON_STATE even though the device in the slot is
> > no longer accessible.  That's not good, the slot should be brought
> > down in this case.
> 
> Can you elaborate on why that is "not good" from the end user
> perspective? From a driver perspective the device driver context is
> lost and the card needs servicing. The service event starts a new
> cycle of slot-attention being triggered and that syncs the slot-down
> state at that time.

All of pciehp's code assumes that if the link is down, the slot must be
off.  A slot which is in ON_STATE for a prolonged period of time even
though the link is down is an oddity the code doesn't account for.

If the link goes down, the slot should be brought into OFF_STATE.
(It's okay though to delay bringdown until DPC recovery has completed
unsuccessfully, which is what the patch I'm proposing does.)

I don't understand what you mean by "service event".  Someone unplugging
and replugging the NVMe drive?


> > (2) If recovery succeeds, there's a race where pciehp may call
> > is_dpc_reset_active() *after* dpc_reset_link() has finished.
> > So both the DPC Trigger Status bit as well as pdev->dpc_reset_active
> > will be cleared.  Thus, the Link Up event is not filtered by pciehp
> > and the slot is brought down and back up even though DPC recovery
> > was succesful, which seems undesirable.
> 
> The hotplug driver never saw the Link Down, so what does it do when
> the slot transitions from Link Up to Link Up? Do you mean the Link
> Down might fire after the dpc recovery has completed if the hotplug
> notification was delayed?

If the Link Down is filtered and the Link Up is not, pciehp will
bring down the slot and then bring it back up.  That's because pciehp
can't really tell whether a DLLSC event is Link Up or Link Down.

It just knows that the link was previously up, is now up again,
but must have been down intermittently, so transactions to the
device in the slot may have been lost and the slot is therefore
brought down for safety.  Because the link is up, it is then
brought back up.

Thanks,

Lukas


Re: [PATCH v2 1/1] PCI: pciehp: Skip DLLSC handling if DPC is triggered

2021-03-16 Thread Lukas Wunner
On Fri, Mar 12, 2021 at 07:32:08PM -0800, 
sathyanarayanan.kuppusw...@linux.intel.com wrote:
> + if ((events == PCI_EXP_SLTSTA_DLLSC) && is_dpc_reset_active(pdev)) {
> + ctrl_info(ctrl, "Slot(%s): DLLSC event(DPC), skipped\n",
> +   slot_name(ctrl));
> + ret = IRQ_HANDLED;
> + goto out;
> + }

Two problems here:

(1) If recovery fails, the link will *remain* down, so there'll be
no Link Up event.  You've filtered the Link Down event, thus the
slot will remain in ON_STATE even though the device in the slot is
no longer accessible.  That's not good, the slot should be brought
down in this case.

(2) If recovery succeeds, there's a race where pciehp may call
is_dpc_reset_active() *after* dpc_reset_link() has finished.
So both the DPC Trigger Status bit as well as pdev->dpc_reset_active
will be cleared.  Thus, the Link Up event is not filtered by pciehp
and the slot is brought down and back up even though DPC recovery
was succesful, which seems undesirable.

The only viable solution I see is to wait until DPC has completed.
Sinan (+cc) proposed something along those lines a couple years back:

https://patchwork.ozlabs.org/project/linux-pci/patch/20180818065126.77912-1-ok...@kernel.org/

Included below please find my suggestion for how to fix this.
I've sort of combined yours and Sinan's approach, but I'm
using a waitqueue (Sinan used polling) and I'm using atomic bitops
on pdev->priv_flags (you're using an atomic_t instead, which needs
additionally space in struct pci_dev).  Note: It's compile-tested
only, I don't have any DPC-capable hardware at my disposal.

Would this work for you?  If so, I can add a commit message to the
patch and submit it properly.  Let me know what you think.  Thanks!

-- >8 --
Subject: [PATCH] PCI: pciehp: Ignore Link Down/Up caused by DPC

Signed-off-by: Lukas Wunner 
---
 drivers/pci/hotplug/pciehp_hpc.c | 11 +
 drivers/pci/pci.h|  4 
 drivers/pci/pcie/dpc.c   | 51 ++--
 3 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index fb3840e..bcc018e 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -707,6 +707,17 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
}
 
/*
+* Ignore Link Down/Up caused by Downstream Port Containment
+* if recovery from the error succeeded.
+*/
+   if ((events & PCI_EXP_SLTSTA_DLLSC) && pci_dpc_recovered(pdev) &&
+   ctrl->state == ON_STATE) {
+   atomic_and(~PCI_EXP_SLTSTA_DLLSC, >pending_events);
+   if (pciehp_check_link_active(ctrl) > 0)
+   events &= ~PCI_EXP_SLTSTA_DLLSC;
+   }
+
+   /*
 * Disable requests have higher priority than Presence Detect Changed
 * or Data Link Layer State Changed events.
 */
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 9684b46..e5ae5e8 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -392,6 +392,8 @@ static inline bool pci_dev_is_disconnected(const struct 
pci_dev *dev)
 
 /* pci_dev priv_flags */
 #define PCI_DEV_ADDED 0
+#define PCI_DPC_RECOVERED 1
+#define PCI_DPC_RECOVERING 2
 
 static inline void pci_dev_assign_added(struct pci_dev *dev, bool added)
 {
@@ -446,10 +448,12 @@ struct rcec_ea {
 void pci_dpc_init(struct pci_dev *pdev);
 void dpc_process_error(struct pci_dev *pdev);
 pci_ers_result_t dpc_reset_link(struct pci_dev *pdev);
+bool pci_dpc_recovered(struct pci_dev *pdev);
 #else
 static inline void pci_save_dpc_state(struct pci_dev *dev) {}
 static inline void pci_restore_dpc_state(struct pci_dev *dev) {}
 static inline void pci_dpc_init(struct pci_dev *pdev) {}
+static inline bool pci_dpc_recovered(struct pci_dev *pdev) { return false; }
 #endif
 
 #ifdef CONFIG_PCIEPORTBUS
diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index e05aba8..7328d9c4 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -71,6 +71,44 @@ void pci_restore_dpc_state(struct pci_dev *dev)
pci_write_config_word(dev, dev->dpc_cap + PCI_EXP_DPC_CTL, *cap);
 }
 
+static bool dpc_completed(struct pci_dev *pdev)
+{
+   u16 status;
+
+   pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_STATUS, );
+   if (status & PCI_EXP_DPC_STATUS_TRIGGER)
+   return false;
+
+   if (test_bit(PCI_DPC_RECOVERING, >priv_flags))
+   return false;
+
+   return true;
+}
+
+static DECLARE_WAIT_QUEUE_HEAD(dpc_completed_waitqueue);
+
+bool pci_dpc_recovered(struct pci_dev *pdev)
+{
+   struct pci_host_bridge *host;
+
+   if (!pdev->dpc_cap)
+   return false;
+
+   /*
+* If DPC is owned by firmware and EDR is not supported, th

Re: [PATCH] spi: stm32: drop devres version of spi_register_master

2021-03-16 Thread Lukas Wunner
On Fri, Mar 12, 2021 at 11:34:46AM +0100, Alain Volmat wrote:
> --- a/drivers/spi/spi-stm32.c
> +++ b/drivers/spi/spi-stm32.c
> @@ -1960,6 +1960,7 @@ static int stm32_spi_remove(struct platform_device 
> *pdev)
>   struct spi_master *master = platform_get_drvdata(pdev);
>   struct stm32_spi *spi = spi_master_get_devdata(master);
>  
> + spi_unregister_master(master);
>   spi->cfg->disable(spi);
>  
>   if (master->dma_tx)

This introduces a use-after-free because spi_unregister_master()
drops the last reference on the spi_master allocation (which includes
the struct stm32_spi), causing it to be freed, yet the stm32_spi
struct is accessed afterwards.

You need to convert the driver to devm_spi_alloc_master() to
fix the use-after-free.  See commit 6cfd39e212de for an example.

Thanks,

Lukas


Re: [PATCH] efi/apple-properties: Handle device properties with software node API

2021-03-04 Thread Lukas Wunner
On Thu, Mar 04, 2021 at 11:28:37AM +0300, Heikki Krogerus wrote:
> The old device property API is going to be removed.
> Replacing the device_add_properties() call with the software
> node API equivalent, device_create_managed_software_node().
> 
> Signed-off-by: Heikki Krogerus 

Acked-by: Lukas Wunner 


Re: [PATCH v2 04/15] PCI: Add pci_find_vsec_capability() to find a specific VSEC

2021-02-03 Thread Lukas Wunner
On Wed, Feb 03, 2021 at 09:11:03AM +, Gustavo Pimentel wrote:
> On Wed, Feb 3, 2021 at 7:51:3, Lukas Wunner  wrote:
> > On Wed, Feb 03, 2021 at 01:54:49AM +, Gustavo Pimentel wrote:
> > > On Tue, Feb 2, 2021 at 18:8:55, Lukas Wunner  wrote:
> > > > As the name implies, the capability is "vendor-specific", so it is
> > > > perfectly possible that two vendors use the same VSEC ID for different
> > > > things.
> > > > 
> > > > To make sure you're looking for the right capability, you need to pass
> > > > a u16 vendor into this function and bail out if dev->vendor is
> > > > different.
> > > 
> > > This function will be called by the driver that will pass the correct 
> > > device which will be already pointing to the config space associated with
> > > the endpoint for instance. Because the driver is already attached to the 
> > > endpoint through the vendor ID and device ID specified, there is no need 
> > > to do that validation, it will be redundant.
> > 
> > Okay.  Please amend the kernel-doc to make it explicit that it's the
> > caller's responsibility to check the vendor ID.
> 
> I don't think that would be necessary, as I said, the 'struct pci_dev *' 
> already points exclusively for the device' config space, which contains 
> all the capabilities for that particular device by his turn will be 
> attached to a specific driver by the Vendor and Device IDs to a specific 
> driver, that will know, firstly search for the specific device vendor ID, 
>  and then secondly how to decode it, and thirdly to do something with it.

The helper you're adding may not only be called from drivers but also
from generic PCI code (such as set_pcie_thunderbolt()).  In that case
the vendor ID is arbitrary.  Also, it doesn't *hurt* documenting this
requirement, does it?

Thanks,

Lukas


Re: [PATCH v2 04/15] PCI: Add pci_find_vsec_capability() to find a specific VSEC

2021-02-02 Thread Lukas Wunner
On Wed, Feb 03, 2021 at 01:54:49AM +, Gustavo Pimentel wrote:
> On Tue, Feb 2, 2021 at 18:8:55, Lukas Wunner  wrote:
> > As the name implies, the capability is "vendor-specific", so it is
> > perfectly possible that two vendors use the same VSEC ID for different
> > things.
> > 
> > To make sure you're looking for the right capability, you need to pass
> > a u16 vendor into this function and bail out if dev->vendor is different.
> 
> This function will be called by the driver that will pass the correct 
> device which will be already pointing to the config space associated with 
> the endpoint for instance. Because the driver is already attached to the 
> endpoint through the vendor ID and device ID specified, there is no need 
> to do that validation, it will be redundant.

Okay.  Please amend the kernel-doc to make it explicit that it's the
caller's responsibility to check the vendor ID.

Thanks,

Lukas


Re: [PATCH v2 04/15] PCI: Add pci_find_vsec_capability() to find a specific VSEC

2021-02-02 Thread Lukas Wunner
On Tue, Feb 02, 2021 at 01:40:18PM +0100, Gustavo Pimentel wrote:
>  /**
> + * pci_find_vsec_capability - Find a vendor-specific extended capability
> + * @dev: PCI device to query
> + * @cap: vendor-specific capability ID code
> + *
> + * Returns the address of the vendor-specific structure that matches the
> + * requested capability ID code within the device's PCI configuration space
> + * or 0 if it does not find a match.
> + */
> +u16 pci_find_vsec_capability(struct pci_dev *dev, int vsec_cap_id)
> +{

As the name implies, the capability is "vendor-specific", so it is perfectly
possible that two vendors use the same VSEC ID for different things.

To make sure you're looking for the right capability, you need to pass
a u16 vendor into this function and bail out if dev->vendor is different.


> + u16 vsec;
> + u32 header;
> +
> + vsec = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_VNDR);
> + while (vsec) {
> + if (pci_read_config_dword(dev, vsec + PCI_VSEC_HDR,
> +   ) == PCIBIOS_SUCCESSFUL &&
> + PCI_VSEC_CAP_ID(header) == vsec_cap_id)
> + return vsec;
> +
> + vsec = pci_find_next_ext_capability(dev, vsec,
> + PCI_EXT_CAP_ID_VNDR);
> + }

FWIW, a more succinct implementation would be:

while ((vsec = pci_find_next_ext_capability(...))) { ... }

See set_pcie_thunderbolt() in drivers/pci/probe.c for an example.
Please consider refactoring that function to use your new helper.

Thanks,

Lukas


[tip: efi/urgent] efi/apple-properties: Reinstate support for boolean properties

2021-01-27 Thread tip-bot2 for Lukas Wunner
The following commit has been merged into the efi/urgent branch of tip:

Commit-ID: 355845b738e76445c8522802552146d96cb4afa7
Gitweb:
https://git.kernel.org/tip/355845b738e76445c8522802552146d96cb4afa7
Author:Lukas Wunner 
AuthorDate:Thu, 31 Dec 2020 06:10:32 +01:00
Committer: Ard Biesheuvel 
CommitterDate: Thu, 31 Dec 2020 10:28:53 +01:00

efi/apple-properties: Reinstate support for boolean properties

Since commit 4466bf82821b ("efi/apple-properties: use
PROPERTY_ENTRY_U8_ARRAY_LEN"), my MacBook Pro issues a -ENODATA error
when trying to assign EFI properties to the discrete GPU:

pci :01:00.0: assigning 56 device properties
pci :01:00.0: error -61 assigning properties

That's because some of the properties have no value.  They're booleans
whose presence can be checked by drivers, e.g. "use-backlight-blanking".

Commit 6e98503dba64 ("efi/apple-properties: Remove redundant attribute
initialization from unmarshal_key_value_pairs()") employed a trick to
store such booleans as u8 arrays (which is the data type used for all
other EFI properties on Macs):  It cleared the property_entry's
"is_array" flag, thereby denoting that the value is stored inline in the
property_entry.

Commit 4466bf82821b erroneously removed that trick.  It was probably a
little fragile to begin with.

Reinstate support for boolean properties by explicitly invoking the
PROPERTY_ENTRY_BOOL() initializer for properties with zero-length value.

Fixes: 4466bf82821b ("efi/apple-properties: use PROPERTY_ENTRY_U8_ARRAY_LEN")
Cc: 
Reviewed-by: Andy Shevchenko 
Signed-off-by: Lukas Wunner 
Link: 
https://lore.kernel.org/r/be958bda75331a011d53c696d1deec8dccd06fd2.1609388549.git.lu...@wunner.de
Signed-off-by: Ard Biesheuvel 
---
 drivers/firmware/efi/apple-properties.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/firmware/efi/apple-properties.c 
b/drivers/firmware/efi/apple-properties.c
index 34f53d8..e192648 100644
--- a/drivers/firmware/efi/apple-properties.c
+++ b/drivers/firmware/efi/apple-properties.c
@@ -3,8 +3,9 @@
  * apple-properties.c - EFI device properties on Macs
  * Copyright (C) 2016 Lukas Wunner 
  *
- * Note, all properties are considered as u8 arrays.
- * To get a value of any of them the caller must use 
device_property_read_u8_array().
+ * Properties are stored either as:
+ * u8 arrays which can be retrieved with device_property_read_u8_array() or
+ * booleans which can be queried with device_property_present().
  */
 
 #define pr_fmt(fmt) "apple-properties: " fmt
@@ -88,8 +89,12 @@ static void __init unmarshal_key_value_pairs(struct 
dev_header *dev_header,
 
entry_data = ptr + key_len + sizeof(val_len);
entry_len = val_len - sizeof(val_len);
-   entry[i] = PROPERTY_ENTRY_U8_ARRAY_LEN(key, entry_data,
-  entry_len);
+   if (entry_len)
+   entry[i] = PROPERTY_ENTRY_U8_ARRAY_LEN(key, entry_data,
+  entry_len);
+   else
+   entry[i] = PROPERTY_ENTRY_BOOL(key);
+
if (dump_properties) {
dev_info(dev, "property: %s\n", key);
print_hex_dump(KERN_INFO, pr_fmt(), DUMP_PREFIX_OFFSET,


Re: [PATCH nf-next v4 1/5] net: sched: Micro-optimize egress handling

2021-01-24 Thread Lukas Wunner
On Sat, Jan 23, 2021 at 07:26:24PM -0800, Jakub Kicinski wrote:
> On Fri, 22 Jan 2021 09:47:01 +0100 Lukas Wunner wrote:
> > sch_handle_egress() returns either the skb or NULL to signal to its
> > caller __dev_queue_xmit() whether a packet should continue to be
> > processed.
> > 
> > The skb is always non-NULL, otherwise __dev_queue_xmit() would hit a
> > NULL pointer deref right at its top.
> > 
> > But the compiler doesn't know that.  So if sch_handle_egress() signals
> > success by returning the skb, the "if (!skb) goto out;" statement
> > results in a gratuitous NULL pointer check in the Assembler output.
> 
> Which exact compiler are we talking about it? Did you report this?
> As Eric pointed the compiler should be able to figure this out quite
> easily.

I tested with gcc 8, 9, 10.

No need to report as it's the expected behavior with
-fno-delete-null-pointer-checks, whose motivation appears
questionable though (per my preceding e-mail).

Thanks,

Lukas


Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-14 Thread Lukas Wunner
On Tue, Jan 12, 2021 at 09:28:32AM -0800, Linus Torvalds wrote:
> On Tue, Jan 12, 2021 at 5:20 AM Lukas Wunner  wrote:
> > > Variable declarations in for-loops is the only one I can think of. I
> > > think that would clean up some code (and some macros), but might not
> > > be compelling on its own.
> >
> > Anonymous structs/unions.  I used to have a use case for that in
> > struct efi_dev_path in include/linux/efi.h, but Ard Biesheuvel
> > refactored it in a gnu89-compatible way for v5.7 with db8952e7094f.
> 
> We use anonymous structs/unions extensively and all over the place already.

Yes, my apologies, I mixed things up.

Back in 2016 when I authored 46cd4b75cd0e, what I wanted to do was
include an unnamed "struct efi_generic_dev_path;" in struct efi_dev_path:

struct efi_dev_path {
struct efi_generic_dev_path;
union {
struct {
u32 hid;
u32 uid;
} acpi;
struct {
u8 fn;
u8 dev;
} pci;
};
} __attribute ((packed));

The alternative is to copy-paste the elements of struct efi_dev_path
or to give it a name such as "header" (which is what db8952e7094f
subsequently did).  Both options seemed inelegant to me.

However it turns out this feature requires -fms-extensions.
It's not part of -std=gnu11.

So coming back to topic, yes there doesn't seem to be too much to
be gained from moving to -std=gnu11 aside from variable declarations
in for-loops.

(And it really has to be -std=gnu11 because -std=c11 fails to compile.)

Thanks,

Lukas


Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-12 Thread Lukas Wunner
On Fri, Jan 08, 2021 at 12:02:53PM -0800, Linus Torvalds wrote:
> I appreciate Arnd pointing out "--std=gnu11", though. What are the
> actual relevant language improvements?
> 
> Variable declarations in for-loops is the only one I can think of. I
> think that would clean up some code (and some macros), but might not
> be compelling on its own.

Anonymous structs/unions.  I used to have a use case for that in
struct efi_dev_path in include/linux/efi.h, but Ard Biesheuvel
refactored it in a gnu89-compatible way for v5.7 with db8952e7094f.

[The above was copy-pasted from last time this discussion came up
in July 2020.  Back then, Kirill Shutemov likewise mentioned the
local variables in loops feature:
https://lore.kernel.org/lkml/20200710111724.m4jaci73pykal...@wunner.de/
]

Thanks,

Lukas


Re: [PATCH 4/7] spi: spi-geni-qcom: Add support for GPI dma

2021-01-11 Thread Lukas Wunner
On Mon, Jan 11, 2021 at 08:46:48PM +0530, Vinod Koul wrote:
> @@ -328,8 +609,34 @@ static int spi_geni_init(struct spi_geni_master *mas)
>   spi_tx_cfg &= ~CS_TOGGLE;
>   writel(spi_tx_cfg, se->base + SE_SPI_TRANS_CFG);
>  
> + mas->tx = dma_request_slave_channel(mas->dev, "tx");
> + if (IS_ERR_OR_NULL(mas->tx)) {
> + dev_err(mas->dev, "Failed to get tx DMA ch %ld", 
> PTR_ERR(mas->tx));
> + ret = PTR_ERR(mas->tx);
> + goto out_pm;
> + } else {
> + mas->rx = dma_request_slave_channel(mas->dev, "rx");
> + if (IS_ERR_OR_NULL(mas->rx)) {
> + dev_err(mas->dev, "Failed to get rx DMA ch %ld", 
> PTR_ERR(mas->rx));
> + dma_release_channel(mas->tx);
> + ret = PTR_ERR(mas->rx);
> + goto out_pm;
> + }

These channels need to be released in spi_geni_remove().

Also, you may want to fall back to PIO mode if channel allocation fails.

Thanks,

Lukas


Re: Time to re-enable Runtime PM per default for PCI devcies?

2021-01-04 Thread Lukas Wunner
On Thu, Dec 31, 2020 at 10:38:12AM +0100, Heiner Kallweit wrote:
> On 31.12.2020 05:07, Lukas Wunner wrote:
> > FWIW, if platform_pci_power_manageable() returns true, it can probably
> > be assumed that allowing runtime PM by default is okay.  So as a first
> > step, you may want to call that instead of adding a new callback.
> 
> I don't think that's sufficient. Most likely all the broken old systems
> return true for platform_pci_power_manageable().

platform_pci_power_manageable() is not a global flag, but rather
a per-device flag whether the platform is capable of power-managing
that device.  E.g. for the ACPI platform, it indicates that objects
such as _PS0 or _PS3 are present in the device's namespace.

My point is that if the platform can power-manage a device,
then it ought to be safe to enable runtime PM by default for it.

If you insist on a "big hammer" approach by turning on runtime PM
by default for everything, you risk regressions.  You can avoid
that by going for a smart approach which enables runtime PM in
cases when it's safe.

Thanks,

Lukas


Re: [PATCH RESEND v2 2/2] Add support for Realtek RTL838x/RTL839x SoC SPI controllers

2020-12-31 Thread Lukas Wunner
On Wed, Dec 30, 2020 at 12:19:04AM +0100, Bert Vermeulen wrote:
> +static inline void wait_ready(struct rtspi *rtspi)
> +{
> + while (!(readl(REG(RTL8380_SPI_SFCSR)) & RTL8380_SPI_SFCSR_RDY))
> + ;
> +}

I'd suggest calling cpu_relax() in the loop's body.


> + err = devm_spi_register_controller(>dev, ctrl);

Since you're invoking devm_spi_register_controller() on probe,
the controller must not be unregistered explicitly on remove.
So the ->remove hook can be dropped altogether:

> +static int realtek_spi_remove(struct platform_device *pdev)
> +{
> + struct spi_controller *ctrl = platform_get_drvdata(pdev);
> +
> + spi_unregister_controller(ctrl);
> +
> + return 0;
> +}
[...]
> + .remove = realtek_spi_remove,

The ->probe hook otherwise LGTM.

Thanks,

Lukas


Re: Time to re-enable Runtime PM per default for PCI devcies?

2020-12-30 Thread Lukas Wunner
On Wed, Dec 30, 2020 at 11:56:04PM +0100, Heiner Kallweit wrote:
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3024,7 +3024,9 @@ void pci_pm_init(struct pci_dev *dev)
>   u16 status;
>   u16 pmc;
>  
> - pm_runtime_forbid(>dev);
> + if (pci_acpi_forbid_runtime_pm())
> + pm_runtime_forbid(>dev);
> +

Generic PCI code usually does not call ACPI-specific functions directly,
but rather through a pci_platform_pm_ops callback.

FWIW, if platform_pci_power_manageable() returns true, it can probably
be assumed that allowing runtime PM by default is okay.  So as a first
step, you may want to call that instead of adding a new callback.

Thanks,

Lukas


Re: Time to re-enable Runtime PM per default for PCI devcies?

2020-12-29 Thread Lukas Wunner
> On Tue, Nov 17, 2020 at 04:56:09PM +0100, Heiner Kallweit wrote:
> > With Runtime PM disabled e.g. the PHY on network devices may remain
> > powered up even with no cable plugged in, affecting battery lifetime
> > on mobile devices. Currently we have to rely on the respective distro
> > or user to enable Runtime PM via sysfs (echo auto > power/control).
> > Some devices work around this restriction by calling pm_runtime_allow
> > in their probe routine, even though that's not recommended by
> > https://www.kernel.org/doc/Documentation/power/pci.txt
> > 
> > Disabling Runtime PM per default seems to be a big hammer, a quirk
> > for affected devices / systems may had been better. And we still
> > have the option to disable Runtime PM for selected devices via sysfs.

Removing the recommendation in pci.txt and allowing runtime PM in more
drivers by default seems sensible to me.  Such an incremental approach
is less risky with regards to regressions than a big hammer.  Heiner,
care to submit patches to that effect?

Thanks,

Lukas


Re: [PATCH 1/2] Add support for Cadence XSPI controller

2020-12-12 Thread Lukas Wunner
On Wed, Dec 09, 2020 at 08:57:57AM +0100, Jayshri Pawar wrote:
> + master = spi_alloc_master(dev, sizeof(*cdns_xspi));
> + if (!master) {
> + ret = -ENOMEM;
> + dev_err(dev, "Failed to allocate memory for spi_master\n");
> + goto err_no_mem;
> + }

Please use devm_spi_alloc_master() to simplify the probe error path.
It was introduced in v5.10-rc5 with commit 5e844cc37a5c and is also
available in 5.9-stable and 5.4-stable.

The memory allocater already emits an error message if it can't satisfy
a request.  Emitting an additional message here isn't really necessary.


> + res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> + cdns_xspi->iobase = devm_ioremap_resource(cdns_xspi->dev, res);
> + if (IS_ERR(cdns_xspi->iobase)) {
> + ret = PTR_ERR(cdns_xspi->iobase);
> + dev_err(dev, "Failed to remap controller base address\n");
> + goto err_spi_master;
> + }

Please use devm_platform_ioremap_resource().

Thanks,

Lukas


Re: [PATCH v3 4/9] spi: tegra210-quad: Add support for Tegra210 QSPI controller

2020-12-12 Thread Lukas Wunner
On Fri, Dec 11, 2020 at 01:15:58PM -0800, Sowjanya Komatineni wrote:
> Tegra SoC has a Quad SPI controller starting from Tegra210.

The probe/remove hooks LGTM now.  Just two tiny nits that caught my eye:


> + master = devm_spi_alloc_master(>dev, sizeof(*tqspi));
> + if (!master) {
> + dev_err(>dev, "failed to allocate spi_master\n");
> + return -ENOMEM;
> + }

The memory allocater already emits an error message if it can't satisfy
a request.  Emitting an additional message here isn't really necessary.


> +exit_free_irq:
> + free_irq(qspi_irq, tqspi);
> +exit_pm_disable:
> + pm_runtime_disable(>dev);
> + tegra_qspi_deinit_dma(tqspi);
> + return ret;
> +}
> +
> +static int tegra_qspi_remove(struct platform_device *pdev)
> +{
> + struct spi_master *master = platform_get_drvdata(pdev);
> + struct tegra_qspi *tqspi = spi_master_get_devdata(master);
> +
> + spi_unregister_master(master);
> + free_irq(tqspi->irq, tqspi);
> + tegra_qspi_deinit_dma(tqspi);
> + pm_runtime_disable(>dev);

The order of tegra_qspi_deinit_dma() and pm_runtime_disable() is
reversed in the remove hook vis-a-vis the probe error path.
It's nicer to use the same order as in the probe error path.

Thanks,

Lukas


Re: [Patch v2 1/1] PCI: pciehp: Add support for handling MRL events

2020-12-10 Thread Lukas Wunner
On Thu, Dec 03, 2020 at 02:51:24PM -0800, Raj, Ashok wrote:
> - Press ATTN, 
> - Slot is empty
> - After 5 seconds synthetic PDC arrives.
>   but since no presence and no link active, we leave slot in 
>   BLINKINGON_STATE, and led keeps blinking
> 
> if someone were to add a card after the 5 seconds, no hot-add is processed
> since we don't get notifications for PDC events when ATTN exists.
> 
> Can we automatically cancel the blinking and return slot back to OFF_STATE?

Yes.


> If the operation initiated by the attention button fails for any reason, it
> is recommended that system software present an error message explaining
> failure via a software user interface, or add the error message to system
> log.

Ah so we're supposed to log a message if the slot is empty.
That needs to be added then to the code snippet I sent you
yesterday in response to your off-list e-mail.


> Alternately we can also choose to subscribe to PDC, but ignore if slot is
> in OFF_STATE. So we let ATTN drive the add. But if PDC happens and we are
> in BLINKINGON_STATE, then we can process the hot-add? Spec says a software
> recommendation, but i think the cancel after 5 seconds seems better?

That approach seems more complicated.  It's better to stop blinking
and return to OFF_STATE if after the 5 second interval the slot is
found to be empty.

Thanks,

Lukas


Re: [PATCH v1 3/7] spi: qspi-tegra: Add support for Tegra210 QSPI controller

2020-12-07 Thread Lukas Wunner
On Mon, Dec 07, 2020 at 04:14:53PM -0800, Sowjanya Komatineni wrote:
> On 12/6/20 10:16 AM, Lukas Wunner wrote:
> > However, be sure to use the devm variant to *allocate* the SPI controller,
> > i.e. use devm_spi_alloc_master() instead of spi_alloc_master().
> 
> Thanks Lukas. I see devm_spi_alloc_master() in 5.4 but not from 5.5

devm_spi_alloc_master() was introduced in v5.10-rc5 with commit
5e844cc37a5c and then backported to 5.9-stable and 5.4-stable.

Patches are pending to also backport it to 4.19-stable, 4.14-stable,
4.9-stable and 4.4-stable.

If your development branch is based on v5.5, just cherry-pick
5e844cc37a5c and you should be good to go.

Thanks,

Lukas


Re: [PATCH v1 3/7] spi: qspi-tegra: Add support for Tegra210 QSPI controller

2020-12-06 Thread Lukas Wunner
On Tue, Dec 01, 2020 at 01:12:44PM -0800, Sowjanya Komatineni wrote:
> + ret = devm_spi_register_master(>dev, master);
[...]
> +static int tegra_qspi_remove(struct platform_device *pdev)
> +{
> + struct spi_master *master = platform_get_drvdata(pdev);
> + struct tegra_qspi_data  *tqspi = spi_master_get_devdata(master);
> +
> + free_irq(tqspi->irq, tqspi);
> +
> + tegra_qspi_deinit_dma_param(tqspi, false);
> + tegra_qspi_deinit_dma_param(tqspi, true);
> +
> + pm_runtime_disable(>dev);
> + if (!pm_runtime_status_suspended(>dev))
> + tegra_qspi_runtime_suspend(>dev);
> +
> + return 0;
> +}

With devm_spi_register_master(), the SPI controller is unregistered
*after* tegra_qspi_remove().  SPI transactions may still be ongoing
until the SPI controller is unregistered, yet you perform teardown
steps (such as freeing the IRQ) while it is still registered.

Bottom line is, you can't use devm_spi_register_master() in this case.
You need to use spi_register_master() and explicitly call
spi_unregister_master() in tegra_qspi_remove() *before* performing
teardown steps.

However, be sure to use the devm variant to *allocate* the SPI controller,
i.e. use devm_spi_alloc_master() instead of spi_alloc_master().

Thanks,

Lukas


Re: Patch "spi: Fix controller unregister order" has been added to the 4.4-stable tree

2020-12-05 Thread Lukas Wunner
On Sat, Oct 10, 2020 at 04:41:09PM +0800, yangerkun wrote:
> ?? 2020/6/16 9:56, Sasha Levin :
> > This is a note to let you know that I've just added the patch titled
> > 
> >  spi: Fix controller unregister order
> > 
> > to the 4.4-stable tree which can be found at:
[...]
> > --- a/drivers/spi/spi.c
> > +++ b/drivers/spi/spi.c
> > @@ -1922,11 +1922,12 @@ void spi_unregister_master(struct spi_master 
> > *master)
> > dev_err(>dev, "queue remove failed\n");
> > }
> > +   device_for_each_child(>dev, NULL, __unregister);
> > +
> 
> This is a wrong patch. We should move this line before
> spi_destroy_queue, but we didn't. 4.9 stable exists this
> problem too.

Hi Sasha, Hi Greg,

below please find a patch for the 4.9-stable tree to fix the backporting
issue reported above.

Thanks!

-- >8 --
Subject: [PATCH] spi: Fix controller unregister order harder

Commit c7e41e1caa71 sought to backport upstream commit 84855678add8 to
the 4.9-stable tree but erroneously inserted a line at the wrong place.
Fix it.

Fixes: c7e41e1caa71 ("spi: Fix controller unregister order")
Reported-by: yangerkun 
Signed-off-by: Lukas Wunner 
---
 drivers/spi/spi.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index c7c9ca3178ad..e0632ee1723b 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -2070,13 +2070,13 @@ static int __unregister(struct device *dev, void *null)
  */
 void spi_unregister_master(struct spi_master *master)
 {
+   device_for_each_child(>dev, NULL, __unregister);
+
if (master->queued) {
if (spi_destroy_queue(master))
dev_err(>dev, "queue remove failed\n");
}
 
-   device_for_each_child(>dev, NULL, __unregister);
-
mutex_lock(_lock);
list_del(>list);
mutex_unlock(_lock);
-- 
2.29.2



Re: [v4,2/3] PCI: mediatek: Add new generation controller support

2020-12-03 Thread Lukas Wunner
On Mon, Nov 30, 2020 at 11:30:05AM -0600, Bjorn Helgaas wrote:
> On Mon, Nov 23, 2020 at 02:45:13PM +0800, Jianjun Wang wrote:
> > On Thu, 2020-11-19 at 14:28 -0600, Bjorn Helgaas wrote:
> > > > +static int mtk_pcie_setup(struct mtk_pcie_port *port)
> > > > +{
[...]
> > > > +   /* Try link up */
> > > > +   err = mtk_pcie_startup_port(port);
> > > > +   if (err) {
> > > > +   dev_notice(dev, "PCIe link down\n");
> > > > +   goto err_setup;
> > > 
> > > Generally it should not be a fatal error if the link is not up at
> > > probe-time.  You may be able to hot-add a device, or the device may
> > > have some external power control that will power it up later.
> > 
> > This is for the power saving requirement. If there is no device
> > connected with the PCIe slot, the PCIe MAC and PHY should be powered
> > off.
> > 
> > Is there any standard flow to support power down the hardware at
> > probe-time if no device is connected and power it up when hot-add a
> > device?
> 
> That's a good question.  I assume this looks like a standard PCIe
> hot-add event?
> 
> When you hot-add a device, does the Root Port generate a Presence
> Detect Changed interrupt?  The pciehp driver should field that
> interrupt and turn on power to the slot via the Power Controller
> Control bit in the Slot Control register.
> 
> Does your hardware require something more than that to control the MAC
> and PHY power?

Power saving of unused PCIe ports is generally achieved through the
runtime PM framework.  When a PCIe port runtime suspends, the PCIe
core will transition it to D3hot.  On top of that, the platform
may be able to transition the port to D3cold.  Currently only the
ACPI platform supports that.  Conceivably, devicetree-based systems
may want to disable certain clocks or regulators when a PCIe port
runtime suspends.  I think we do not support that yet but it could
be added to drivers/pci/pcie/portdrv*.

A hotplug port is expected to signal PDC and DLLSC interrupts even
when in D3hot.  At least that's our experience with Thunderbolt.
To support hotplug interrupts in D3cold, some external mechanism
(such as a PME) is necessary to wake up the port on hotplug.
This is also supported with recent Thunderbolt systems.

Because we've seen various incompatibilities when runtime suspending
PCIe ports, certain conditions must be satisfied for runtime PM
to be enabled.  They're encoded in pci_bridge_d3_possible().
Generally, hotplug ports only runtime suspend if they belong to
a Thunderbolt controller or if the ACPI platform explicitly allows
runtime PM (through presence of a _PR3 method or a device property).
Non-hotplug ports runtime suspend if the BIOS is newer than 2015
(as specified by DMI).

Obviously, this policy is very x86-focussed because both Thunderbolt
and DMI are only really a thing on x86.  That's about to change though
because Apple's new arm64-based Macs have Thunderbolt integrated into
the SoC and arm64 SoCs are making inroads in the datacenter, which is
an important use case for PCIe hotplug (hot-swappable NVMe drives).
So we may have to amend pci_bridge_d3_possible() to whitelist
PCIe ports for runtime PM on specific arches or systems.

Thanks,

Lukas


Re: [PATCH 1/3] spi: Loongson: Add Loongson 3A+7A SPI controller driver support

2020-11-26 Thread Lukas Wunner
On Mon, Nov 23, 2020 at 05:19:06PM +0800, Qing Zhang wrote:
> +static struct platform_device loongson_spi_device = {
> + .name   = "loongson-spi",
> + .id = 0,
> + .num_resources  = ARRAY_SIZE(loongson_spi_resources),
> + .resource   = loongson_spi_resources,
> +};

Why isn't this struct allocated at runtime?


> + if (ret == PCIBIOS_SUCCESSFUL) {
> + loongson_spi_resources[1].start = v8;
> + loongson_spi_resources[1].end = v8;
> + platform_device_register(_spi_device);
> + }
> +
> +err_out:
> + return ret;
> +}
> +
> +static void loongson_spi_pci_unregister(struct pci_dev *pdev)
> +{
> + pci_release_region(pdev, 0);
> +}

Seems like platform_device_unregister() is missing here.

Thanks,

Lukas


Re: [Patch v2 1/1] PCI: pciehp: Add support for handling MRL events

2020-11-22 Thread Lukas Wunner
On Sat, Nov 21, 2020 at 05:42:03PM -0800, Ashok Raj wrote:
> --- a/drivers/pci/hotplug/pciehp_ctrl.c
> +++ b/drivers/pci/hotplug/pciehp_ctrl.c
>  void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 
> events)
>  {
>   int present, link_active;
> + u8 getstatus = 0;
>  
>   /*
>* If the slot is on and presence or link has changed, turn it off.
> @@ -246,6 +259,20 @@ void pciehp_handle_presence_or_link_change(struct 
> controller *ctrl, u32 events)
>   if (events & PCI_EXP_SLTSTA_PDC)
>   ctrl_info(ctrl, "Slot(%s): Card not present\n",
> slot_name(ctrl));
> + if (events & PCI_EXP_SLTSTA_MRLSC)
> + ctrl_info(ctrl, "Slot(%s): Latch %s\n",
> +   slot_name(ctrl), getstatus ? "Open" : 
> "Closed");

This message will currently always be "Latch closed".  It should be
"Latch open" instead because if the slot was up, the latch must have
been closed.  So an MRLSC event can only mean that the latch is now open.
The "getstatus" variable can be removed.


> + /*
> +  * PCIe Base Spec 5.0 Chapter 6.7.1.3 states.
> +  *
> +  * If an MRL Sensor is implemented without a corresponding MRL 
> Sensor input
> +  * on the Hot-Plug Controller, it is recommended that the MRL 
> Sensor be
> +  * routed to power fault input of the Hot-Plug Controller.
> +  * This allows an active adapter to be powered off when the MRL 
> is opened."
> +  *
> +  * This seems to suggest that the slot should be brought down 
> as soon as MRL
> +  * is opened.
> +  */
>   pciehp_disable_slot(ctrl, SURPRISE_REMOVAL);
>   break;

The code comment is not wrapped at 80 chars and a bit long.
I'd move it to the commit message and keep only a shortened version here.

The "SURPRISE_REMOVAL" may now be problematic because the card may still
be in the slot (both presence and link still up) with only the MRL open.
My suggestion would be to add a local variable "bool safe_removal"
which is initialized to "SAFE_REMOVAL".  In the two if-clauses for
DLLSC and PDC, it is set to SURPRISE_REMOVAL.


> @@ -275,6 +302,13 @@ void pciehp_handle_presence_or_link_change(struct 
> controller *ctrl, u32 events)
>   if (link_active)
>   ctrl_info(ctrl, "Slot(%s): Link Up\n",
> slot_name(ctrl));
> + /*
> +  * If slot is closed && ATTN button exists
> +  * don't continue, let the ATTN button
> +  * drive the hot-plug
> +  */
> + if (((events & PCI_EXP_SLTSTA_MRLSC) && ATTN_BUTTN(ctrl)))
> + return;
>   ctrl->request_result = pciehp_enable_slot(ctrl);
>   break;

Hm, if the Attention Button is pressed with MRL still open, the slot is
not brought up.  If the MRL is subsequently closed, it is still not
brought up.  I guess the slot keeps blinking and one has to push the
button to abort the operation, then press it once more to attempt
another slot bringup.  The spec doesn't seem to say how such a situation
should be handled. Oh well.

I'm wondering if this is the right place to bail out:  Immediately
before the above hunk, the button_work is canceled, so it can't later
trigger bringup of the slot.  Shouldn't the above check be in the
code block with the "Turn the slot on if it's occupied or link is up"
comment?

You're also not unlocking the state_lock here before bailing out of
the function.


> @@ -710,8 +710,10 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
>   down_read(>reset_lock);
>   if (events & DISABLE_SLOT)
>   pciehp_handle_disable_request(ctrl);
> - else if (events & (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC))
> + else if (events & (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC |
> +PCI_EXP_SLTSTA_MRLSC))
>   pciehp_handle_presence_or_link_change(ctrl, events);
> +
>   up_read(>reset_lock);

Unnecessary newline added.


> @@ -768,6 +770,14 @@ static void pcie_enable_notification(struct controller 
> *ctrl)
>   cmd |= PCI_EXP_SLTCTL_ABPE;
>   else
>   cmd |= PCI_EXP_SLTCTL_PDCE;
> +
> + /*
> +  * If MRL sensor is present, then subscribe for MRL
> +  * Changes to be notified as well.
> +  */
> + if (MRL_SENS(ctrl))
> + cmd |= PCI_EXP_SLTCTL_MRLSCE;
> +

The code comment doesn't add much information, so can probably be
dropped.

You need to add PCI_EXP_SLTCTL_MRLSCE to the "mask" variable in this
function (before PFDE, as in pcie_disable_notification()).
I don't think the interrupt is enabled at all if it's not added to
"mask", has this patch been tested at all?

Something else:  When pciehp probes, it should check whether the slot
is up even though 

Re: [PATCH 1/1] pci: pciehp: Handle MRL interrupts to enable slot for hotplug.

2020-11-21 Thread Lukas Wunner
On Thu, Nov 19, 2020 at 02:08:07PM -0800, Raj, Ashok wrote:
> On Thu, Nov 19, 2020 at 08:51:20AM +0100, Lukas Wunner wrote:
> > If an Attention Button is present, the current behavior is to bring up
> > the hotplug slot as soon as presence or link is detected.  We don't wait
> > for a button press.  This is intended as a convience feature to bring up
> > slots as quickly as possible, but the behavior can be changed if the
> > button press and 5 second delay is preferred.
> 
> Looks like we still don't subscribe to PDC if ATTN is present? So you don't
> get an event until someone presses the button to process hotplug right?
[...]
> Looks like we still wait for button press to process. When you don't have a
> power control yes the DLLSC would kick in and we would process them. but if
> you have power control, you won't turn on until the button press?

Right.

For hotplug ports without a power controller, we could in principle achieve
the same behavior (i.e. not bring up the slot until the button is pressed)
by not subscribing to DLLSC events as well (if an Attention Button is
present).

However there are hotplug ports out there with incorrect data in the
Slot Capabilities register:  They claim to have an Attention Button
but in reality don't have one.  In those cases we're still able to
bring up and down the slot right now.  Whereas if we changed the
behavior, those hotplug ports would no longer work.  (We'd not
bring up the slot until the button is pressed but there is no button.)

E.g. this bugzilla contains dmesg output for a 5.4 kernel which is able
to bring up and down the slot correctly even though the Slot Capabilities
register incorrectly claims to have an Attention Button as well as
Command Complete support:
https://bugzilla.kernel.org/show_bug.cgi?id=209283


> > In any case the behavior in response to an Attention Button press should
> > be the same regardless whether an MRL is present or not.  (The spec
> > doesn't seem to say otherwise.)
> 
> True, Looks like that is a Linux behavior, certainly not a spec
> recommendation. Our validation teams tell Windows also behaves such. i.e
> when ATTN exists button press drivers the hot-add. Linux doesn't need to
> follow suite though. 
> 
> In that case do we need to subscribe to PDC even when ATTN is present?

Hm, I'd just leave it the way it is for now if it works.


> > > + /*
> > > +  * If ATTN is present and MRL is triggered
> > > +  * ignore the Presence Change Event.
> > > +  */
> > > + if (ATTN_BUTTN(ctrl) && (events & PCI_EXP_SLTSTA_MRLSC))
> > > + events &= ~PCI_EXP_SLTSTA_PDC;
> > 
> > An Attention Button press results in a synthesized PDC event after
> > 5 seconds, which may get lost due to the above if-statement.
> 
> When its synthesized you don't get the MRLSC? So we won't nuke the PDC then
> correct?

I just meant to say, pciehp_queue_pushbutton_work() will synthesize
a PDC event after 5 seconds and with the above code snippet, if an
MRL event happens simultaneously, that synthesized PDC event would
be lost.  So I'd just drop the above code snippet.  I think you
just need to subscribe to MRL events and propagate them to
pciehp_handle_presence_or_link_change().  There, you'd bring down
the slot if an MRL event has occurred (same as DLLSC or PDC).
Then, check whether MRL is closed.  If so, and if presence or link
is up, try to bring up the slot.

If the MRL is open when slot or presence has gone up, the slot is not
brought up.  But once MRL is closed, there'll be another MRL event and
*then* the slot is brought up.

Thanks,

Lukas


Re: [PATCH 1/1] pci: pciehp: Handle MRL interrupts to enable slot for hotplug.

2020-11-18 Thread Lukas Wunner
Hi Ashok,

my sincere apologies for the delay.

On Fri, Sep 25, 2020 at 04:01:38PM -0700, Ashok Raj wrote:
> When Mechanical Retention Lock (MRL) is present, Linux doesn't process
> those change events.
> 
> The following changes need to be enabled when MRL is present.
> 
> 1. Subscribe to MRL change events in SlotControl.
> 2. When MRL is closed,
>- If there is no ATTN button, then POWER on the slot.
>- If there is ATTN button, and an MRL event pending, ignore
>  Presence Detect. Since we want ATTN button to drive the
>  hotplug event.

So I understand you have a hardware platform which has both MRL and
Attention Button on its hotplug slots?  It may be useful to name the
hardware platform in the commit message.

If an Attention Button is present, the current behavior is to bring up
the hotplug slot as soon as presence or link is detected.  We don't wait
for a button press.  This is intended as a convience feature to bring up
slots as quickly as possible, but the behavior can be changed if the
button press and 5 second delay is preferred.

In any case the behavior in response to an Attention Button press should
be the same regardless whether an MRL is present or not.  (The spec
doesn't seem to say otherwise.)


> + if (MRL_SENS(ctrl)) {
> + pciehp_get_latch_status(ctrl, );
> + /*
> +  * If slot is closed && ATTN button exists
> +  * don't continue, let the ATTN button
> +  * drive the hot-plug
> +  */
> + if (!getstatus && ATTN_BUTTN(ctrl))
> + return;
> + }

For the sake of readability I'd suggest adding a pciehp_latch_closed()
helper similar to pciehp_card_present_or_link_active() which returns
true if no MRL is present (i.e. !MRL_SENS(ctrl)), otherwise retrieves
and returns the status with pciehp_get_latch_status().


> +void pciehp_handle_mrl_change(struct controller *ctrl)
> +{
> + u8 getstatus = 0;
> + int present, link_active;
> +
> + pciehp_get_latch_status(ctrl, );
> +
> + present = pciehp_card_present(ctrl);
> + link_active = pciehp_check_link_active(ctrl);
> +
> + ctrl_info(ctrl, "Slot(%s): Card %spresent\n",
> +   slot_name(ctrl), present ? "" : "not ");
> +
> + ctrl_info(ctrl, "Slot(%s): Link %s\n",
> +   slot_name(ctrl), link_active ? "Up" : "Down");

This duplicates a lot of code from pciehp_handle_presence_or_link_change(),
which I think suggests handling MRL events in that function.  After all,
an MRL event may lead to bringup or bringdown of a slot similar to
a link or presence event.

Basically pciehp_handle_presence_or_link_change() just needs to be
amended to bring down the slot if an MRL event occurred, and
afterwards only bring up the slot if MRL is closed.  Like this:

-   if (present <= 0 && link_active <= 0) {
+   if ((present <= 0 && link_active <= 0) || !latch_closed) {
mutex_unlock(>state_lock);
return;
}


> + /*
> +  * Need to handle only MRL Open. When MRL is closed with
> +  * a Card Present, either the ATTN button, or the PDC indication
> +  * should power the slot and add the card in the slot
> +  */

I agree with the second sentence.  You may want to refer to PCIe Base Spec
r4.0, section 6.7.1.3 either in the commit message or a code comment,
which says:

"If an MRL Sensor is implemented without a corresponding MRL Sensor input
on the Hot-Plug Controller, it is recommended that the MRL Sensor be
routed to power fault input of the Hot-Plug Controller.
This allows an active adapter to be powered off when the MRL is opened."

This seems to suggest that the slot should be brought down as soon as MRL
is opened.


> + /*
> +  * If MRL is triggered, if ATTN button exists, ignore the event.
> +  */
> + if (ATTN_BUTTN(ctrl) && (events & PCI_EXP_SLTSTA_MRLSC))
> + events &= ~PCI_EXP_SLTSTA_PDC;

Hm, the spec doesn't seem to imply a connection between presence of
an MRL and presence of an Attention Button, so I find this confusing.


> + /*
> +  * If ATTN is present and MRL is triggered
> +  * ignore the Presence Change Event.
> +  */
> + if (ATTN_BUTTN(ctrl) && (events & PCI_EXP_SLTSTA_MRLSC))
> + events &= ~PCI_EXP_SLTSTA_PDC;

An Attention Button press results in a synthesized PDC event after
5 seconds, which may get lost due to the above if-statement.

Thanks,

Lukas


Re: [PATCH 1/3] x86/quirks: Scan all busses for early PCI quirks

2020-11-15 Thread Lukas Wunner
On Sun, Nov 15, 2020 at 04:11:43PM +0100, Thomas Gleixner wrote:
> Unfortunately there is no way to tell the APIC "Mask vector X" and the
> dump kernel does neither know which device it comes from nor does it
> have enumerated PCI completely which would reset the device and shutup
> the spew. Due to the interrupt storm it does not get that far.

Can't we just set DisINTx, clear MSI Enable and clear MSI-X Enable
on all active PCI devices in the crashing kernel before starting the
crash kernel?  So that the crash kernel starts with a clean slate?

Guilherme's original patches from 2018 iterate over all 256 PCI buses.
That might impact boot time negatively.  The reason he has to do that
is because the crashing kernel doesn't know which devices exist and
which have interrupts enabled.  However the crashing kernel has that
information.  It should either disable interrupts itself or pass the
necessary information to the crashing kernel as setup_data or whatever.

Guilherme's patches add a "clearmsi" command line parameter.  I guess
the idea is that the parameter is always passed to the crash kernel
but the patches don't do that, which seems odd.

Thanks,

Lukas


Re: Use after free in bcm2835_spi_remove()

2020-10-28 Thread Lukas Wunner
On Thu, Oct 15, 2020 at 01:53:35PM +0100, Mark Brown wrote:
> On Thu, Oct 15, 2020 at 07:38:29AM +0200, Lukas Wunner wrote:
> > On Wed, Oct 14, 2020 at 09:25:05PM +0100, Mark Brown wrote:
> > How about holding a ref on the controller as long as the SPI driver
> > is bound to the controller's parent device?  See below for a patch,
> > compile-tested only for lack of an SPI-equipped machine.
[...]
> > +   spi_controller_get(ctlr);
> > +   ret = devm_add_action(dev, __spi_controller_put, ctlr);
> > +   if (ret) {
> > +   kfree(ctlr);
> 
> This feels a bit icky - we're masking a standard use after free bug that
> affects devm in general, not just this instance, and so while it will
> work it doesn't feel great.  If we did do this it'd need more comments
> and should probably be conditional on using the feature.  TBH I'm just
> thinking it's better to just remove the feature, it's clearly error
> prone and pretty redundant with devm.  I'm not sure any memory savings
> it's delivering are worth the sharp edges.

A combined memory allocation for struct spi_controller and the private
data has more benefits than just memory savings:  Having them adjacent
in memory reduces the chance of cache misses.  Also, one can get from
one struct to the other with a cheap subtraction (using container_of())
instead of having to chase pointers.  So it helps performance.  And a
lack of pointers arguably helps security.

Most subsystems embed the controller struct in the private data, but
there *is* precedence for doing it the other way round.  E.g. the IIO
subsystem likewise appends the private data to the controller struct.
So I think that's fine, it need not and should not be changed.

The problem is that the ->probe() and ->remove() code is currently
asymmetric, which is unintuitive:  On ->probe(), there's an allocation
step and a registration step:

spi_alloc_master()
spi_register_controller()

Whereas on ->remove(), there's no step to free the master which would
balance the prior alloc step:

spi_unregister_controller()

That's because the spi_controller struct is ref-counted and the last
ref is usually dropped by spi_unregister_controller().  If the private
data is accessed after the spi_unregister_controller() step, a ref
needs to be held.

I maintain that it would be more intuitive to automatically hold a ref.
We could offer a devm_spi_alloc_master() function which holds this ref
and automatically releases it on unbind.

There are three drivers which call spi_alloc_master() with a size of zero
for the private data.  In these three cases it is fine to free the
spi_controller struct upon spi_unregister_controller().  So these drivers
can continue to use spi_alloc_master().  All other drivers could be
changed to use the new devm_spi_alloc_master(), or I could scrutinize
each of them and convert to the new function only if necessary.

Does this sound more convincing to you?

Thanks,

Lukas


Re: Use after free in bcm2835_spi_remove()

2020-10-22 Thread Lukas Wunner
On Wed, Oct 14, 2020 at 02:20:16PM -0700, Florian Fainelli wrote:
> In bcm2835_spi_remove(), spi_controller_unregister() will free the ctlr
> reference which will lead to an use after free in bcm2835_release_dma().
> 
> To avoid this use after free, allocate the bcm2835_spi structure with a
> different lifecycle than the spi_controller structure such that we
> unregister the SPI controller, free up all the resources and finally let
> device managed allocations free the bcm2835_spi structure.
[...]
> - if (ctlr->dma_tx) {
> - dmaengine_terminate_sync(ctlr->dma_tx);
> + if (dma_tx) {
> + dmaengine_terminate_sync(dma_tx);
>  
>   if (bs->fill_tx_desc)
>   dmaengine_desc_free(bs->fill_tx_desc);
>  
>   if (bs->fill_tx_addr)
> - dma_unmap_page_attrs(ctlr->dma_tx->device->dev,
> + dma_unmap_page_attrs(dma_tx->device->dev,
>bs->fill_tx_addr, sizeof(u32),
>DMA_TO_DEVICE,
>DMA_ATTR_SKIP_CPU_SYNC);
>  
> - dma_release_channel(ctlr->dma_tx);
> - ctlr->dma_tx = NULL;
> + dma_release_channel(dma_tx);
>   }

You must set ctlr->dma_tx and ctlr->dma_rx to NULL because the driver
checks their value in a couple of places.

E.g. bcm2835_spi_setup() checks ctlr->dma_rx.

Likewise, the error paths of bcm2835_dma_init() and bcm2835_spi_probe()
call bcm2835_dma_release() and the latter checks ctlr->dma_tx and
ctlr->dma_rx to determine whether DMA was set up, hence needs to be
torn down.


> + bs = devm_kzalloc(>dev, sizeof(*bs), GFP_KERNEL);
> + if (!bs)
> + return -ENOMEM;
> +
>   ctlr = spi_alloc_master(>dev, ALIGN(sizeof(*bs),
> dma_get_cache_alignment()));

You can set the second argument to spi_alloc_master() to 0
to conserve memory.

Thanks,

Lukas


Re: Use after free in bcm2835_spi_remove()

2020-10-14 Thread Lukas Wunner
[cc += Sascha]

On Wed, Oct 14, 2020 at 09:25:05PM +0100, Mark Brown wrote:
> > On Wed, Oct 14, 2020 at 04:09:12PM +0200, Lukas Wunner wrote:
> > > Apparently the problem is that spi_unregister_controller() drops the
> > > last ref on the controller, causing it to be freed, and afterwards we
> > > access the controller's private data, which is part of the same
> > > allocation as struct spi_controller:
> 
> Right, the proposed patch is yet another way to fix the issue - it all
> comes back to the fact that you shouldn't be using the driver data after
> unregistering if it was allocated as part of allocating the controller.
> This framework feature is unfortunately quite error prone.

How about holding a ref on the controller as long as the SPI driver
is bound to the controller's parent device?  See below for a patch,
compile-tested only for lack of an SPI-equipped machine.

Makes sense or dumb idea?

If this approach is deemed to be a case of "midlayer fallacy",
we could alternatively do this in a library function which drivers
opt-in to.  Or, given that the majority of drivers seems to be affected,
make it the default and allow drivers to opt-out.

-- >8 --

diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index 0cab239..5afa275 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -2399,6 +2399,11 @@ static ssize_t slave_store(struct device *dev, struct 
device_attribute *attr,
 extern struct class spi_slave_class;   /* dummy */
 #endif
 
+static void __spi_controller_put(void *ctlr)
+{
+   spi_controller_put(ctlr);
+}
+
 /**
  * __spi_alloc_controller - allocate an SPI master or slave controller
  * @dev: the controller, possibly using the platform_bus
@@ -2414,6 +2419,7 @@ static ssize_t slave_store(struct device *dev, struct 
device_attribute *attr,
  * This call is used only by SPI controller drivers, which are the
  * only ones directly touching chip registers.  It's how they allocate
  * an spi_controller structure, prior to calling spi_register_controller().
+ * The structure is accessible as long as the SPI driver is bound to @dev.
  *
  * This must be called from context that can sleep.
  *
@@ -2429,6 +2435,7 @@ struct spi_controller *__spi_alloc_controller(struct 
device *dev,
 {
struct spi_controller   *ctlr;
size_t ctlr_size = ALIGN(sizeof(*ctlr), dma_get_cache_alignment());
+   int ret;
 
if (!dev)
return NULL;
@@ -2449,6 +2456,13 @@ struct spi_controller *__spi_alloc_controller(struct 
device *dev,
pm_suspend_ignore_children(>dev, true);
spi_controller_set_devdata(ctlr, (void *)ctlr + ctlr_size);
 
+   spi_controller_get(ctlr);
+   ret = devm_add_action(dev, __spi_controller_put, ctlr);
+   if (ret) {
+   kfree(ctlr);
+   return NULL;
+   }
+
return ctlr;
 }
 EXPORT_SYMBOL_GPL(__spi_alloc_controller);


Re: Use after free in bcm2835_spi_remove()

2020-10-14 Thread Lukas Wunner
On Tue, Oct 13, 2020 at 04:48:42PM -0700, Florian Fainelli wrote:
> With KASAN now working on ARM 32-bit, I was able to get the following
> trace upon reboot which invokes bcm2835_spi_shutdown() calling
> bcm2835_spi_remove(), the same can be triggered by doing a driver unbind:

Thank you for the report.  Apparently the problem is that
spi_unregister_controller() drops the last ref on the controller,
causing it to be freed, and afterwards we access the controller's
private data, which is part of the same allocation as
struct spi_controller:

bcm2835_spi_remove()
  spi_unregister_controller()
device_unregister()
  put_device()
spi_controller_release()  #  spi_master_class.dev_release()
  kfree(ctlr)
  bcm2835_dma_release(ctlr, bs)
  ...

However, when I submitted commit 9dd277ff92d0, I double-checked that
the kfree() happens after bcm2835_spi_remove() has finished and I
even wrote in the commit message:

   "Note that
the struct spi_controller as well as the driver-private data are not
freed until after bcm2835_spi_remove() has finished, so accessing them
is safe."

I'm puzzled now that it doesn't work as intended.  I do not see any
recent commits which changed the behavior, so I must have made a
mistake and missed something.

The below patch should fix the issue.  Could you verify that?
Unfortunately I do not have access to a RasPi currently.

An alternative to this patch would be a devm function which acquires
a ref on the spi controller on ->probe() and automatically releases it
after ->remove() has finished.  This could be used by other SPI drivers
as well.

Thanks,

Lukas

-- >8 --

diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c
index 41986ac..5254fda 100644
--- a/drivers/spi/spi-bcm2835.c
+++ b/drivers/spi/spi-bcm2835.c
@@ -1377,6 +1377,7 @@ static int bcm2835_spi_remove(struct platform_device 
*pdev)
 
bcm2835_debugfs_remove(bs);
 
+   spi_controller_get(ctlr);
spi_unregister_controller(ctlr);
 
bcm2835_dma_release(ctlr, bs);
@@ -1386,6 +1387,7 @@ static int bcm2835_spi_remove(struct platform_device 
*pdev)
   BCM2835_SPI_CS_CLEAR_RX | BCM2835_SPI_CS_CLEAR_TX);
 
clk_disable_unprepare(bs->clk);
+   spi_controller_put(ctlr);
 
return 0;
 }



Re: [PATCH] PCI: pciehp: Add check for DL_ACTIVE bit in pciehp_check_link_status()

2020-10-09 Thread Lukas Wunner
On Thu, Oct 08, 2020 at 12:43:17PM +0530, Sanjay R Mehta wrote:
> On 10/7/2020 1:08 AM, Lukas Wunner wrote:
> > On Tue, Oct 06, 2020 at 01:24:28PM -0500, Sanjay R Mehta wrote:
> I am supposed to use PCI_EXP_LNKSTA_DLLLA bit in my patch but have
> used PCI_EXP_DPC_CAP_DL_ACTIVE.
> 
> The correct code should be as below,
> 
> - if ((lnk_status & PCI_EXP_LNKSTA_LT) ||
> + if (((lnk_status & PCI_EXP_LNKSTA_LT) &
> +  !(lnk_status & PCI_EXP_LNKSTA_DLLLA )) ||

So you want to ignore a set Link Training bit if the DLLLA bit is also
set (i.e. the link is up).  Why do you need this?  Is there broken AMD
hardware which fails to clear the Link Training bit when the LTSSM
exits the Configuration/Recovery state?

Again, please note that you need && instead of &.

Thanks,

Lukas

> >> Signed-off-by: Sanjay R Mehta 
> >> ---
> >>  drivers/pci/hotplug/pciehp_hpc.c | 3 ++-
> >>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/pci/hotplug/pciehp_hpc.c 
> >> b/drivers/pci/hotplug/pciehp_hpc.c
> >> index 53433b3..81d1348 100644
> >> --- a/drivers/pci/hotplug/pciehp_hpc.c
> >> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> >> @@ -309,7 +309,8 @@ int pciehp_check_link_status(struct controller *ctrl)
> >>
> >>   pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, _status);
> >>   ctrl_dbg(ctrl, "%s: lnk_status = %x\n", __func__, lnk_status);
> >> - if ((lnk_status & PCI_EXP_LNKSTA_LT) ||
> >> + if (((lnk_status & PCI_EXP_LNKSTA_LT) &
> >> +  !(lnk_status & PCI_EXP_DPC_CAP_DL_ACTIVE)) ||
> >>   !(lnk_status & PCI_EXP_LNKSTA_NLW)) {
> >>   ctrl_err(ctrl, "link training error: status %#06x\n",
> >>lnk_status);
> >> --
> >> 2.7.4


Re: [PATCH] PCI: pciehp: Add check for DL_ACTIVE bit in pciehp_check_link_status()

2020-10-06 Thread Lukas Wunner
On Tue, Oct 06, 2020 at 01:24:28PM -0500, Sanjay R Mehta wrote:
> if DL_ACTIVE bit is set it means that there is no need to check
> PCI_EXP_LNKSTA_LT bit, as DL_ACTIVE would have set only if the link
> is already trained. Hence adding a check which takes care of this
> scenario.

Sorry for being dense but I don't understand this at all:

The PCI_EXP_DPC_CAP_DL_ACTIVE bit which you check here indicates
that the port is capable of sending an ERR_COR interrupt whenever
the link transitions from inactive to active.

What is the connection to the PCI_EXP_LNKSTA_LT bit (which indicates
that the link is still being trained)?

Also, the negation of a bitwise AND is always either 0 or 1
(!(lnk_status & PCI_EXP_DPC_CAP_DL_ACTIVE)), so bit 0 is set or not set.
However PCI_EXP_LNKSTA_LT is bit 11.  A bitwise AND of bit 11 and 0 is
always 0, so the expression can never be 1.

Am I missing something?

Thanks,

Lukas

> Signed-off-by: Sanjay R Mehta 
> ---
>  drivers/pci/hotplug/pciehp_hpc.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c 
> b/drivers/pci/hotplug/pciehp_hpc.c
> index 53433b3..81d1348 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -309,7 +309,8 @@ int pciehp_check_link_status(struct controller *ctrl)
>  
>   pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, _status);
>   ctrl_dbg(ctrl, "%s: lnk_status = %x\n", __func__, lnk_status);
> - if ((lnk_status & PCI_EXP_LNKSTA_LT) ||
> + if (((lnk_status & PCI_EXP_LNKSTA_LT) &
> +  !(lnk_status & PCI_EXP_DPC_CAP_DL_ACTIVE)) ||
>   !(lnk_status & PCI_EXP_LNKSTA_NLW)) {
>   ctrl_err(ctrl, "link training error: status %#06x\n",
>lnk_status);
> -- 
> 2.7.4


Re: [PATCH v7 3/5] PCI: pciehp: check and wait port status out of DPC before handling DLLSC and PDC

2020-10-04 Thread Lukas Wunner
On Sat, Oct 03, 2020 at 03:55:12AM -0400, Ethan Zhao wrote:
> When root port has DPC capability and it is enabled, then triggered by
> errors, DPC DLLSC and PDC etc interrupts will be sent to DPC driver, pciehp
> drivers almost at the same time.

Do the DLLSC and PDC events occur as a result of handling the error
or do they occur independently?

If the latter, I don't see how we can tell whether the card in the
slot is still the same.

If the former, holding the hotplug slot's reset_lock and doing something
along the lines of pciehp_reset_slot() (or calling it directly) might
solve the race.

Thanks,

Lukas


Re: [PATCH 2/5 V2] PCI: pciehp: check and wait port status out of DPC before handling DLLSC and PDC

2020-09-29 Thread Lukas Wunner
On Tue, Sep 29, 2020 at 05:46:41PM +0800, Ethan Zhao wrote:
> On Tue, Sep 29, 2020 at 4:29 PM Lukas Wunner  wrote:
> > On Sun, Sep 27, 2020 at 11:27:46AM -0400, Sinan Kaya wrote:
> > > On 9/26/2020 11:28 PM, Ethan Zhao wrote:
> > > > --- a/drivers/pci/hotplug/pciehp_hpc.c
> > > > +++ b/drivers/pci/hotplug/pciehp_hpc.c
> > > > @@ -710,8 +710,10 @@ static irqreturn_t pciehp_ist(int irq, void 
> > > > *dev_id)
> > > > down_read(>reset_lock);
> > > > if (events & DISABLE_SLOT)
> > > > pciehp_handle_disable_request(ctrl);
> > > > -   else if (events & (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC))
> > > > +   else if (events & (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC)) {
> > > > +   pci_wait_port_outdpc(pdev);
> > > > pciehp_handle_presence_or_link_change(ctrl, events);
> > > > +   }
> > > > up_read(>reset_lock);
> > >
> > > This looks like a hack TBH.
[...]
> > > Why is device lock not protecting this situation?
> > > Is there a lock missing in hotplug driver?
> >
> > According to Ethan's commit message, there are two issues here:
> > One, that pciehp may remove a device even though DPC recovered the error,
> > and two, that a null pointer deref occurs.
> >
> > The latter is most certainly not a locking issue but failure of DPC
> > to hold a reference on the pci_dev.
> 
> This is what patch 3/5 proposed to fix.

Please reorder the series to fix the null pointer deref first,
i.e. move patch 3 before patch 2.  If the null pointer deref is
fixed by patch 3, do not mention it in patch 2.

Thanks,

Lukas


Re: [PATCH 2/5 V2] PCI: pciehp: check and wait port status out of DPC before handling DLLSC and PDC

2020-09-29 Thread Lukas Wunner
On Sun, Sep 27, 2020 at 11:27:46AM -0400, Sinan Kaya wrote:
> On 9/26/2020 11:28 PM, Ethan Zhao wrote:
> > --- a/drivers/pci/hotplug/pciehp_hpc.c
> > +++ b/drivers/pci/hotplug/pciehp_hpc.c
> > @@ -710,8 +710,10 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
> > down_read(>reset_lock);
> > if (events & DISABLE_SLOT)
> > pciehp_handle_disable_request(ctrl);
> > -   else if (events & (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC))
> > +   else if (events & (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC)) {
> > +   pci_wait_port_outdpc(pdev);
> > pciehp_handle_presence_or_link_change(ctrl, events);
> > +   }
> > up_read(>reset_lock);
> 
> This looks like a hack TBH.
> 
> Lukas, Keith;
> 
> What is your take on this?
> Why is device lock not protecting this situation?
> 
> Is there a lock missing in hotplug driver?

According to Ethan's commit message, there are two issues here:
One, that pciehp may remove a device even though DPC recovered the error,
and two, that a null pointer deref occurs.

The latter is most certainly not a locking issue but failure of DPC
to hold a reference on the pci_dev.

Thanks,

Lukas


Re: [RFC] pcie hotplug doesn't work with kernel 4.19

2020-09-15 Thread Lukas Wunner
On Tue, Sep 15, 2020 at 04:15:15PM +0200, Jinpu Wang wrote:
> We are testing PCIe nvme SSD hotplug, it works out of box with kernel 5.4.62,
> dmesg during the hotplug:
[...]
> But with kernel 4.19.133, pcieport core doesn't print anything, is
> there known problem with kernel 4.19 support for pcie hotplug, do we
> need to backport some fixes from newer kernel to make it work?

No known problem.  Please open a bug on bugzilla.kernel.org and attach
full dmesg output for 5.4.62 and 4.19.133, as well as lspci -vv output.
You may want to add the following to the kernel command line:

ignore_loglevel log_bug_len=10M "dyndbg=file drivers/pci/* +p"
pciehp.pciehp_debug=1


> [  683.218554] pcieport :16:00.0: pciehp: Slot(0-3): Card present
> [  683.218555] pcieport :16:00.0: pciehp: Slot(0-3): Link Up
> [  683.271702] pcieport :16:00.0: pciehp: Timeout on hotplug
> command 0x17e1 (issued 73280 msec ago)
> [  686.301874] pcieport :16:00.0: pciehp: Timeout on hotplug
> command 0x13e1 (issued 3030 msec ago)
> [  686.361894] pcieport :16:00.0: pciehp: Timeout on hotplug
> command 0x13e1 (issued 3090 msec ago)

Those timeouts look suspicious.  Perhaps the hotplug controller
claims to support Command Completed Support, but in reality does not?


> CONFIG_HOTPLUG_PCI=y
> CONFIG_HOTPLUG_PCI_ACPI=y
> CONFIG_HOTPLUG_PCI_ACPI_IBM=m
> CONFIG_HOTPLUG_PCI_CPCI=y
> CONFIG_HOTPLUG_PCI_CPCI_ZT5550=m
> CONFIG_HOTPLUG_PCI_CPCI_GENERIC=m
> CONFIG_HOTPLUG_PCI_SHPC=y
> CONFIG_HOTPLUG_PCI_PCIE=y

You may not need ACPI_IBM, CPCI, SHPC.

Thanks,

Lukas


Re: [RFC PATCH v2] PCI/portdrv: Only disable Bus Master on kexec reboot and connected PCI devices

2020-09-13 Thread Lukas Wunner
On Mon, Sep 14, 2020 at 04:29:10AM +0800, Tiezhu Yang wrote:
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -143,6 +144,28 @@ static void pcie_portdrv_remove(struct pci_dev *dev)
>   }
>  
>   pcie_port_device_remove(dev);
> + pci_disable_device(dev);
> +}
> +
> +static void pcie_portdrv_shutdown(struct pci_dev *dev)
> +{
> + if (pci_bridge_d3_possible(dev)) {
> + pm_runtime_forbid(>dev);
> + pm_runtime_get_noresume(>dev);
> + pm_runtime_dont_use_autosuspend(>dev);
> + }
> +
> + pcie_port_device_remove(dev);
> +
> + /*
> +  * If this is a kexec reboot, turn off Bus Master bit on the
> +  * device to tell it to not continue to do DMA. Don't touch
> +  * devices in D3cold or unknown states.
> +  * If it is not a kexec reboot, firmware will hit the PCI
> +  * devices with big hammer and stop their DMA any way.
> +  */
> + if (kexec_in_progress && (dev->current_state <= PCI_D3hot))
> + pci_disable_device(dev);

The last portion of this function is already executed afterwards by
pci_device_shutdown().  You don't need to duplicate it here:

device_shutdown()
  dev->bus->shutdown() == pci_device_shutdown()
drv->shutdown() == pcie_portdrv_shutdown()
  pci_disable_device()
pci_disable_device()

Thanks,

Lukas


Re: [PATCH 1/2] serial: 8250: Simplify with dev_err_probe()

2020-09-02 Thread Lukas Wunner
On Tue, Sep 01, 2020 at 05:30:59PM +0200, Krzysztof Kozlowski wrote:
> Common pattern of handling deferred probe can be simplified with
> dev_err_probe().  Less code and the error value gets printed.
> 
> Signed-off-by: Krzysztof Kozlowski 

Reviewed-by: Lukas Wunner 


Re: [PATCH 2/2] serial: core: Simplify with dev_err_probe()

2020-09-02 Thread Lukas Wunner
On Tue, Sep 01, 2020 at 05:31:00PM +0200, Krzysztof Kozlowski wrote:
> Common pattern of handling deferred probe can be simplified with
> dev_err_probe().  Less code and the error value gets printed.
> 
> Signed-off-by: Krzysztof Kozlowski 

Reviewed-by: Lukas Wunner 


Re: [PATCH 2/3] driver core: Use rwsem for kill_device() serialization

2020-07-31 Thread Lukas Wunner
On Fri, Jul 31, 2020 at 08:45:05AM +0200, Greg Kroah-Hartman wrote:
> On Thu, Jul 30, 2020 at 11:56:10AM +0200, Lukas Wunner wrote:
> > On Thu, Jul 30, 2020 at 08:53:26AM +0200, Greg Kroah-Hartman wrote:
> > > On Wed, Jul 08, 2020 at 03:27:02PM +0200, Lukas Wunner wrote:
> > > > kill_device() is currently serialized with driver probing by way of the
> > > > device_lock().  We're about to serialize it with device_add() as well
> > > > to prevent addition of children below a device which is going away.
> > > 
> > > Why?  Who does this?  Shouldn't the bus that is trying to do this know
> > > this is happening?
> > 
> > AFAICS, at least spi and i2c are affected.  Any bus which
> > creates a device hierarchy with dynamic addition & removal needs
> > to make sure no new children are added after removal of the parent
> > has begun.
> 
> I thought the bus code itself had this type of serialization already...

An SPI device is inaccessible once the controller has been torn down,
yet drivers may need to reach SPI devices to unbind cleanly (e.g. to
quiesce interrupts on SPI devices).

Therefore SPI devices need to be unregistered first and the controller
last.  However with CONFIG_OF_DYNAMIC=y, an SPI device may be added
at runtime via of_spi_notify(), which does take a ref on the controller,
but otherwise runs lockless against spi_unregister_controller().

What can happen here is an SPI device gets instantiated below a
controller as it is being removed and the SPI device can't be unbound
or removed cleanly because it's inaccessible.

The bus code can't do anything about this.  It doesn't learn about the
controller going away until device_unregister() is called at the *end* of
spi_unregister_controller().

Anyway, the preliminary patch below should do the trick and I've also
cooked up something similar for i2c.  Needs to be tested still.

Thanks,

Lukas

-- >8 --

diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index 6626587..b6876dd 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -475,6 +475,12 @@ struct boardinfo {
  */
 static DEFINE_MUTEX(board_lock);
 
+/*
+ * Prevents addition of devices with same chip select and
+ * addition of devices below an unregistering controller.
+ */
+static DEFINE_MUTEX(spi_add_lock);
+
 /**
  * spi_alloc_device - Allocate a new SPI device
  * @ctlr: Controller to which device is connected
@@ -554,7 +560,6 @@ static int spi_dev_check(struct device *dev, void *data)
  */
 int spi_add_device(struct spi_device *spi)
 {
-   static DEFINE_MUTEX(spi_add_lock);
struct spi_controller *ctlr = spi->controller;
struct device *dev = ctlr->dev.parent;
int status;
@@ -575,6 +580,12 @@ int spi_add_device(struct spi_device *spi)
 */
mutex_lock(_add_lock);
 
+   if ((IS_ENABLED(CONFIG_OF_DYNAMIC) || IS_ENABLED(CONFIG_ACPI) ||
+IS_ENABLED(CONFIG_SPI_SLAVE)) && ctlr->unregistering) {
+   status = -ENODEV;
+   goto done;
+   }
+
status = bus_for_each_dev(_bus_type, NULL, spi, spi_dev_check);
if (status) {
dev_err(dev, "chipselect %d already in use\n",
@@ -2795,6 +2806,13 @@ void spi_unregister_controller(struct spi_controller 
*ctlr)
struct spi_controller *found;
int id = ctlr->bus_num;
 
+   /* Prevent addition of new devices, then remove existing ones */
+   if (IS_ENABLED(CONFIG_OF_DYNAMIC) || IS_ENABLED(CONFIG_ACPI) ||
+   IS_ENABLED(CONFIG_SPI_SLAVE)) {
+   mutex_lock(_add_lock);
+   ctlr->unregistering = true;
+   mutex_unlock(_add_lock);
+   }
device_for_each_child(>dev, NULL, __unregister);
 
/* First make sure that this controller was ever added */
diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h
index 99380c0..6d95515 100644
--- a/include/linux/spi/spi.h
+++ b/include/linux/spi/spi.h
@@ -451,6 +451,7 @@ static inline void spi_unregister_driver(struct spi_driver 
*sdrv)
  * @irq_flags: Interrupt enable state during PTP system timestamping
  * @fallback: fallback to pio if dma transfer return failure with
  * SPI_TRANS_FAIL_NO_START.
+ * @unregistering: Set on controller removal, prevents addition of new devices.
  *
  * Each SPI controller can communicate with one or more @spi_device
  * children.  These make a small bus, sharing MOSI, MISO and SCK signals
@@ -667,6 +668,8 @@ struct spi_controller {
 
/* Interrupt enable state during PTP system timestamping */
unsigned long   irq_flags;
+
+   unsigned intunregistering:1;
 };
 
 static inline void *spi_controller_get_devdata(struct spi_controller *ctlr)


Re: [PATCH 2/3] driver core: Use rwsem for kill_device() serialization

2020-07-30 Thread Lukas Wunner
On Thu, Jul 30, 2020 at 08:53:26AM +0200, Greg Kroah-Hartman wrote:
> On Wed, Jul 08, 2020 at 03:27:02PM +0200, Lukas Wunner wrote:
> > kill_device() is currently serialized with driver probing by way of the
> > device_lock().  We're about to serialize it with device_add() as well
> > to prevent addition of children below a device which is going away.
> 
> Why?  Who does this?  Shouldn't the bus that is trying to do this know
> this is happening?

AFAICS, at least spi and i2c are affected.

I first thought that pci is affected as well but it seems the global
pci_lock_rescan_remove() performs the required serialization.

I've yet to take a closer look at acpi and usb.  Any bus which
creates a device hierarchy with dynamic addition & removal needs
to make sure no new children are added after removal of the parent
has begun.


> So, why are you pushing this down into the driver core, can't this be
> done in whatever crazy bus wants to do this, like is done here?

I guess it can.  Let me try to perform the locking at the bus level then.

Thanks,

Lukas


Re: [driver core] e3b1cb5c89: WARNING:possible_recursive_locking_detected

2020-07-25 Thread Lukas Wunner
On Fri, Jul 24, 2020 at 10:29:50PM +0800, kernel test robot wrote:
> commit: e3b1cb5c896ba748d8f848238c8ea1f89520bde3 ("[PATCH 3/3] driver core: 
> Avoid adding children below a dead parent")
[...]
> [1.392584] WARNING: possible recursive locking detected
> [1.393350] 5.8.0-rc3-00011-ge3b1cb5c896ba7 #1 Not tainted
> [1.393350] 
> [1.393350] swapper/0/1 is trying to acquire lock:
> [1.393350] 88841fc6ff70 (>p->dead_sem){.+.+}-{3:3}, at: 
> __device_attach+0x51/0x1a0
> [1.393350] 
> [1.393350] but task is already holding lock:
> [1.393350] 888107f42770 (>p->dead_sem){.+.+}-{3:3}, at: 
> device_add+0xf8/0x890

False positive:

__device_attach() takes a device's dead_sem whereas device_add() takes
the *parent's* dead_sem.  But lockdep thinks they're the same because
they're in the same lock class.

We would normally see the same lockdep splat for device_lock() but
commit 1704f47b50b5 silenced it by assigning device_lock() to the
novalidate class.

I could silence this lockdep splat by assigning dead_sem to the
novalidate class as well.  But I also have an idea how we could
fix it properly by introducing a per-device class for bus_types
that need it and by putting the device_lock, dead_sem etc in
separate subclasses within that per-device class.

Any preference as to which solution I should pursue?
Any thoughts on this series in general?
Does the newly introduced dead_sem evoke approval or rejection?
Anyone?

Thanks,

Lukas


Re: [PCI] 3233e41d3e: WARNING:at_drivers/pci/pci.c:#pci_reset_hotplug_slot

2020-07-23 Thread Lukas Wunner
On Thu, Jul 23, 2020 at 05:13:06PM +0800, kernel test robot wrote:
> FYI, we noticed the following commit (built with gcc-9):
[...]
> commit: 3233e41d3e8ebcd44e92da47ffed97fd49b84278 ("[PATCH] PCI: pciehp: Fix 
> AB-BA deadlock between reset_lock and device_lock")
[...]
> caused below changes (please refer to attached dmesg/kmsg for entire 
> log/backtrace):
> [0.971752] WARNING: CPU: 0 PID: 1 at drivers/pci/pci.c:4905 
> pci_reset_hotplug_slot+0x70/0x80

Thank you, trusty robot.

I botched the call to lockdep_assert_held_write(), it should have been
conditional on "if (probe)".

Happy to respin the patch, but I'd like to hear opinions on the locking
issues surrounding xen and octeon (and the patch in general).

In particular, would a solution be entertained wherein the pci_dev is
reset by the PCI core after driver unbinding, contingent on a flag which
is set by a PCI driver to indicate that the pci_dev is returned to the
core in an unclean state?

Also, why does xen require a device reset on bind?

Thanks!

Lukas


Re: nouveau regression with 5.7 caused by "PCI/PM: Assume ports without DLL Link Active train links in 100 ms"

2020-07-17 Thread Lukas Wunner
On Fri, Jul 17, 2020 at 03:04:10PM -0400, Lyude Paul wrote:
> Isn't it possible to tell whether a PCI device is connected through
> thunderbolt or not? We could probably get away with just defaulting
> to 100ms for thunderbolt devices without DLL Link Active specified,
> and then default to the old delay value for non-thunderbolt devices.

pci_is_thunderbolt_attached()


Re: [GIT PULL] EFI fixes

2020-07-10 Thread Lukas Wunner
On Fri, Jul 10, 2020 at 02:00:34PM +0300, Kirill A. Shutemov wrote:
> On Fri, Jul 10, 2020 at 12:09:36PM +0200, Arnd Bergmann wrote:
> > I forgot why we care though -- is there any behavior of gnu11
> > that we prefer over the gnu99 behavior, or is it just going with
> > the times because it's the right thing to do? All the interesting
> > features of c11 seem to also be available as extensions in
> > gcc-4.9's gnu89, though I could not find a definite list of the
> > differences.
> 
> Last time (llist_entry_safe() thread) it came up due to local variables in
> loops feature that is not available for gnu89. Both gnu99 and gnu11 is
> fine.

Same for anonymous structs/unions.  I used to have a use case for that
in struct efi_dev_path in include/linux/efi.h, but Ard refactored it
in a gnu89-compatible way for v5.7 with db8952e7094f.

(BTW, revisiting that commit I think it should have been broken into
smaller pieces, in particular the efi_get_device_by_path() argument
and #ifdef change should have gone into a separate commit.)

Thanks,

Lukas


[PATCH 0/3] Fix races on device removal

2020-07-08 Thread Lukas Wunner
Prevent dynamic SPI device addition below a controller which is
being removed.  To do so, set the controller's "dead" flag using
kill_device() (patch [3/3]).

Serialize access to a device's "dead" flag with a newly introduced
rw_semaphore in lieu of the device_lock to avoid deadlocks occurring
with the new use case (patch [2/3]).

Add a missing check for the "dead" flag upon driver binding
(patch [1/3]).


Lukas Wunner (3):
  driver core: Avoid binding drivers to dead devices
  driver core: Use rwsem for kill_device() serialization
  driver core: Avoid adding children below a dead parent

 drivers/base/base.h  |  2 ++
 drivers/base/core.c  | 49 ++--
 drivers/base/dd.c| 12 ++-
 drivers/nvdimm/bus.c |  8 +---
 drivers/spi/spi.c|  3 +++
 5 files changed, 51 insertions(+), 23 deletions(-)

-- 
2.27.0



[PATCH 3/3] driver core: Avoid adding children below a dead parent

2020-07-08 Thread Lukas Wunner
If CONFIG_OF_DYNAMIC or CONFIG_ACPI is enabled, SPI devices may be added
below a controller at runtime by a DeviceTree overlay or DSDT patch.
But there are no precautions to prevent adding a device below a
controller that's being removed.

This seems like something that should be guarded against in the driver
core because it's not specific to SPI:  Adding a child below a parent
that's going away seems like a bad idea regardless of the bus type.

Take advantage of kill_device() which was added by commit 00289cd87676
("drivers/base: Introduce kill_device()"), call it upon removal of an
SPI controller and teach the driver core to refuse device addition below
a killed parent.  To make this race-free, device_add() needs to take the
parent's dead_sem before checking its "dead" flag and until the child
device has been added to the parent's klist_children.

Signed-off-by: Lukas Wunner 
Cc: Dan Williams 
Cc: Geert Uytterhoeven 
Cc: Pantelis Antoniou 
Cc: Alexander Duyck 
---
 drivers/base/core.c | 18 --
 drivers/spi/spi.c   |  3 +++
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 057da42b1a660..1d4e39696f996 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -2597,6 +2597,14 @@ int device_add(struct device *dev)
pr_debug("device: '%s': %s\n", dev_name(dev), __func__);
 
parent = get_device(dev->parent);
+   if (parent) {
+   down_read(>p->dead_sem);
+   if (parent->p->dead) {
+   error = -ENODEV;
+   goto parent_error;
+   }
+   }
+
kobj = get_device_parent(dev, parent);
if (IS_ERR(kobj)) {
error = PTR_ERR(kobj);
@@ -2679,9 +2687,11 @@ int device_add(struct device *dev)
}
 
bus_probe_device(dev);
-   if (parent)
+   if (parent) {
klist_add_tail(>p->knode_parent,
   >p->klist_children);
+   up_read(>p->dead_sem);
+   }
 
if (dev->class) {
mutex_lock(>class->p->mutex);
@@ -2722,6 +2732,8 @@ int device_add(struct device *dev)
  Error:
cleanup_glue_dir(dev, glue_dir);
 parent_error:
+   if (parent)
+   up_read(>p->dead_sem);
put_device(parent);
 name_error:
kfree(dev->p);
@@ -2785,7 +2797,9 @@ EXPORT_SYMBOL_GPL(put_device);
  * kill_device - declare device dead
  * @dev: device in question
  *
- * Declare @dev dead to prevent it from binding to a driver.
+ * Declare @dev dead to prevent it from binding to a driver and
+ * to prevent addition of children.
+ *
  * Return true if it was killed or false if it was already dead.
  */
 bool kill_device(struct device *dev)
diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index 8158e281f3540..005eca4bae089 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -2764,6 +2764,9 @@ void spi_unregister_controller(struct spi_controller 
*ctlr)
struct spi_controller *found;
int id = ctlr->bus_num;
 
+   /* Prevent addition of new children, then remove existing ones */
+   if (IS_ENABLED(CONFIG_OF_DYNAMIC) || IS_ENABLED(CONFIG_ACPI))
+   kill_device(>dev);
device_for_each_child(>dev, NULL, __unregister);
 
/* First make sure that this controller was ever added */
-- 
2.27.0



[PATCH 2/3] driver core: Use rwsem for kill_device() serialization

2020-07-08 Thread Lukas Wunner
kill_device() is currently serialized with driver probing by way of the
device_lock().  We're about to serialize it with device_add() as well
to prevent addition of children below a device which is going away.
However the parent's device_lock() cannot be taken by device_add()
lest deadlocks occur:  Addition of the parent may result in addition
of children (as is the case with SPI controllers) and device_add()
already takes the device_lock through the call to bus_probe_device() ->
device_initial_probe() -> __device_attach().

Avoid such deadlocks by introducing an rw_semaphore whose specific
purpose is to serialize kill_device() with other parts of the kernel.

Use an rw_semaphore instead of a mutex because the latter would preclude
concurrent driver probing of multiple children below the same parent.
The semaphore is acquired for writing when declaring a device dead and
otherwise only acquired for reading.  It is private to the driver core,
obviating the need to acquire a lock when calling kill_device(), as is
currently done in nvdimm's nd_device_unregister() and device_del().

An alternative approach would be to convert the device_lock() itself to
an rw_semaphore (instead of a mutex).

Signed-off-by: Lukas Wunner 
Cc: Dan Williams 
Cc: Geert Uytterhoeven 
Cc: Pantelis Antoniou 
Cc: Alexander Duyck 
---
 drivers/base/base.h  |  2 ++
 drivers/base/core.c  | 33 +++--
 drivers/base/dd.c|  8 
 drivers/nvdimm/bus.c |  8 +---
 4 files changed, 30 insertions(+), 21 deletions(-)

diff --git a/drivers/base/base.h b/drivers/base/base.h
index 95c22c0f90360..7e71a1d262ef6 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -79,6 +79,7 @@ struct driver_private {
  * @async_driver - pointer to device driver awaiting probe via async_probe
  * @device - pointer back to the struct device that this structure is
  * associated with.
+ * @dead_sem - semaphore taken when declaring the device @dead.
  * @dead - This device is currently either in the process of or has been
  * removed from the system. Any asynchronous events scheduled for this
  * device should exit without taking any action.
@@ -94,6 +95,7 @@ struct device_private {
struct list_head deferred_probe;
struct device_driver *async_driver;
struct device *device;
+   struct rw_semaphore dead_sem;
u8 dead:1;
 };
 #define to_device_private_parent(obj)  \
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 67d39a90b45c7..057da42b1a660 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -2526,6 +2526,7 @@ static int device_private_init(struct device *dev)
klist_init(>p->klist_children, klist_children_get,
   klist_children_put);
INIT_LIST_HEAD(>p->deferred_probe);
+   init_rwsem(>p->dead_sem);
return 0;
 }
 
@@ -2780,21 +2781,27 @@ void put_device(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(put_device);
 
+/**
+ * kill_device - declare device dead
+ * @dev: device in question
+ *
+ * Declare @dev dead to prevent it from binding to a driver.
+ * Return true if it was killed or false if it was already dead.
+ */
 bool kill_device(struct device *dev)
 {
-   /*
-* Require the device lock and set the "dead" flag to guarantee that
-* the update behavior is consistent with the other bitfields near
-* it and that we cannot have an asynchronous probe routine trying
-* to run while we are tearing out the bus/class/sysfs from
-* underneath the device.
-*/
-   lockdep_assert_held(>mutex);
+   bool killed;
 
-   if (dev->p->dead)
-   return false;
-   dev->p->dead = true;
-   return true;
+   down_write(>p->dead_sem);
+   if (dev->p->dead) {
+   killed = false;
+   } else {
+   dev->p->dead = true;
+   killed = true;
+   }
+   up_write(>p->dead_sem);
+
+   return killed;
 }
 EXPORT_SYMBOL_GPL(kill_device);
 
@@ -2817,9 +2824,7 @@ void device_del(struct device *dev)
struct kobject *glue_dir = NULL;
struct class_interface *class_intf;
 
-   device_lock(dev);
kill_device(dev);
-   device_unlock(dev);
 
if (dev->fwnode && dev->fwnode->dev == dev)
dev->fwnode->dev = NULL;
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 31c668651e824..9353d811cce83 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -817,6 +817,7 @@ static void __device_attach_async_helper(void *_dev, 
async_cookie_t cookie)
};
 
device_lock(dev);
+   down_read(>p->dead_sem);
 
/*
 * Check if device has already been removed or claimed. This may
@@ -838,6 +839,7 @@ static void __device_attach_async_helper(void *_dev, 
async_cookie_t cookie)
if (dev->parent)
pm_runtime_put(dev->parent);

[PATCH 0/3] Fix races on device removal

2020-07-08 Thread Lukas Wunner
Prevent dynamic SPI device addition below a controller which is
being removed.  To do so, set the controller's "dead" flag using
kill_device() (patch [3/3]).

Serialize access to a device's "dead" flag with a newly introduced
rw_semaphore in lieu of the device_lock to avoid deadlocks occurring
with the new use case (patch [2/3]).

Add a missing check for the "dead" flag upon driver binding
(patch [1/3]).


Lukas Wunner (3):
  driver core: Avoid binding drivers to dead devices
  driver core: Use rwsem for kill_device() serialization
  driver core: Avoid adding children below a dead parent

 drivers/base/base.h  |  2 ++
 drivers/base/core.c  | 47 
 drivers/base/dd.c| 12 ++-
 drivers/nvdimm/bus.c |  8 +---
 drivers/spi/spi.c|  3 +++
 5 files changed, 51 insertions(+), 21 deletions(-)

-- 
2.27.0



[PATCH 1/3] driver core: Avoid binding drivers to dead devices

2020-07-08 Thread Lukas Wunner
Commit 3451a495ef24 ("driver core: Establish order of operations for
device_add and device_del via bitflag") sought to prevent asynchronous
driver binding to a device which is being removed.  It added a
per-device "dead" flag which is checked in the following code paths:

* asynchronous binding in __driver_attach_async_helper()
*  synchronous binding in device_driver_attach()
* asynchronous binding in __device_attach_async_helper()

It did *not* check the flag upon:

*  synchronous binding in __device_attach()

However __device_attach() may also be called asynchronously from:

deferred_probe_work_func()
  bus_probe_device()
device_initial_probe()
  __device_attach()

So if the commit's intention was to check the "dead" flag in all
asynchronous code paths, then a check is also necessary in
__device_attach().  Add the missing check.

Fixes: 3451a495ef24 ("driver core: Establish order of operations for device_add 
and device_del via bitflag")
Signed-off-by: Lukas Wunner 
Cc: sta...@vger.kernel.org # v5.1+
Cc: Alexander Duyck 
---
 drivers/base/dd.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 9a1d940342ac4..31c668651e824 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -848,7 +848,9 @@ static int __device_attach(struct device *dev, bool 
allow_async)
int ret = 0;
 
device_lock(dev);
-   if (dev->driver) {
+   if (dev->p->dead) {
+   goto out_unlock;
+   } else if (dev->driver) {
if (device_is_bound(dev)) {
ret = 1;
goto out_unlock;
-- 
2.27.0



[PATCH] driver core: Drop mention of obsolete bus rwsem from kernel-doc

2020-07-08 Thread Lukas Wunner
15 years ago, commit 6eded061b126 ("Fix up bus code and remove use of
rwsem") removed the bus rwsem, but left over a reference to it in a
kernel-doc comment.  Drop it.

Signed-off-by: Lukas Wunner 
---
 drivers/base/dd.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 9353d811cce83..cc42c91c5bf56 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -430,10 +430,9 @@ static void driver_sysfs_remove(struct device *dev)
  * Allow manual attachment of a driver to a device.
  * Caller must have already set @dev->driver.
  *
- * Note that this does not modify the bus reference count
- * nor take the bus's rwsem. Please verify those are accounted
- * for before calling this. (It is ok to call with no other effort
- * from a driver's probe() method.)
+ * Note that this does not modify the bus reference count.
+ * Please verify that is accounted for before calling this.
+ * (It is ok to call with no other effort from a driver's probe() method.)
  *
  * This function must be called with the device lock held.
  */
-- 
2.27.0



Re: [PATCH v2 0/1] Revert "serial: 8250: Fix max baud limit in generic 8250 port"

2020-07-01 Thread Lukas Wunner
On Thu, Jul 02, 2020 at 01:37:13AM +0300, Serge Semin wrote:
> 1) Add a new capability like UART_CAP_NO16DIV and take it into account
>in the serial8250_get_baud_rate() method.
>  
> I don't have a documentation for the Mediatek UART port, but it seems to me
> that that controller calculates the baud rate differently from the standard
> 8250 port. A standard 8250 port does that by the next formulae:
>   baud = uartclk / (16 * divisor).
> While it seems to me that the Mediatek port uses the formulae like:
>   baud = uartclk / divisor. (Please, correct me if I'm wrong)

8250_bcm2835aux.c seems to suffer from a similar issue and
solves it like this in the ->probe hook:

/* the HW-clock divider for bcm2835aux is 8,
 * but 8250 expects a divider of 16,
 * so we have to multiply the actual clock by 2
 * to get identical baudrates.
 */
up.port.uartclk = clk_get_rate(data->clk) * 2;


> 2) Manually call serial8250_do_set_divisor() in the custom set_termios()
>callback.
> 
> Just add the uart_update_timeout() and serial8250_do_set_divisor() methods
> invocation into the mtk8250_set_termios() function, which the original commit
> 81bb549fdf14 ("serial: 8250_mtk: support big baud rate") author should have
> done in the first place.

That sound preferable as adding new quirks into core code feels
like a case of midlayer fallacy:

https://blog.ffwll.ch/2016/12/midlayers-once-more-with-feeling.html

Thanks,

Lukas


Re: [PATCH] Revert "serial: 8250: Fix max baud limit in generic 8250 port"

2020-06-30 Thread Lukas Wunner
On Tue, Jun 30, 2020 at 04:42:11PM -0700, Daniel Winkler wrote:
> This reverts commit 0eeaf62981ecc79e8395ca8caa1570eaf3a12257.

That is not an upstream commit.  You probably mean:

commit 7b668c064ec33f3d687c3a413d05e355172e6c92
Author: Serge Semin 
Date:   Thu May 7 02:31:32 2020 +0300

serial: 8250: Fix max baud limit in generic 8250 port

And you didn't cc the commit author (hereby fixed).

Thanks,

Lukas

> 
> The change regresses the QCA6174A-3 bluetooth chip, preventing
> firmware from being properly loaded. We have verified that without
> this patch, the chip works as intended.
> 
> Signed-off-by: Daniel Winkler 
> ---
> 
>  drivers/tty/serial/8250/8250_port.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/tty/serial/8250/8250_port.c 
> b/drivers/tty/serial/8250/8250_port.c
> index 1632f7d25acca..e057c65ac1580 100644
> --- a/drivers/tty/serial/8250/8250_port.c
> +++ b/drivers/tty/serial/8250/8250_port.c
> @@ -2618,8 +2618,6 @@ static unsigned int serial8250_get_baud_rate(struct 
> uart_port *port,
>struct ktermios *termios,
>struct ktermios *old)
>  {
> - unsigned int tolerance = port->uartclk / 100;
> -
>   /*
>* Ask the core to calculate the divisor for us.
>* Allow 1% tolerance at the upper limit so uart clks marginally
> @@ -2628,7 +2626,7 @@ static unsigned int serial8250_get_baud_rate(struct 
> uart_port *port,
>*/
>   return uart_get_baud_rate(port, termios, old,
> port->uartclk / 16 / UART_DIV_MAX,
> -   (port->uartclk + tolerance) / 16);
> +   port->uartclk);
>  }
>  
>  void
> -- 
> 2.27.0.212.ge8ba1cc988-goog


Re: [PATCH 2/2] PCI: pciehp: Fix wrong failure check on pcie_capability_read_*()

2020-06-20 Thread Lukas Wunner
On Fri, Jun 19, 2020 at 10:12:19PM +0200, refactormys...@gmail.com wrote:
> On failure, pcie_capabiility_read_*() will set the status value,
> its last parameter to 0 and not ~0.
> This bug fix checks for the proper value.

If a config space read times out, the PCIe controller fabricates
an "all ones" response.  The code is checking for such a timeout,
not for an error.  Hence the code is fine.

Thanks,

Lukas

> 
> Signed-off-by: Bolarinwa Olayemi Saheed 
> ---
>  drivers/pci/hotplug/pciehp_hpc.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c 
> b/drivers/pci/hotplug/pciehp_hpc.c
> index 53433b37e181..c1a67054948a 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -89,7 +89,7 @@ static int pcie_poll_cmd(struct controller *ctrl, int 
> timeout)
>  
>   do {
>   pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, _status);
> - if (slot_status == (u16) ~0) {
> + if (slot_status == (u16)0) {
>   ctrl_info(ctrl, "%s: no response from device\n",
> __func__);
>   return 0;
> @@ -165,7 +165,7 @@ static void pcie_do_write_cmd(struct controller *ctrl, 
> u16 cmd,
>   pcie_wait_cmd(ctrl);
>  
>   pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, _ctrl);
> - if (slot_ctrl == (u16) ~0) {
> + if (slot_ctrl == (u16)0) {
>   ctrl_info(ctrl, "%s: no response from device\n", __func__);
>   goto out;
>   }
> @@ -236,7 +236,7 @@ int pciehp_check_link_active(struct controller *ctrl)
>   int ret;
>  
>   ret = pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, _status);
> - if (ret == PCIBIOS_DEVICE_NOT_FOUND || lnk_status == (u16)~0)
> + if (ret == PCIBIOS_DEVICE_NOT_FOUND || lnk_status == (u16)0)
>   return -ENODEV;
>  
>   ret = !!(lnk_status & PCI_EXP_LNKSTA_DLLLA);
> @@ -440,7 +440,7 @@ int pciehp_card_present(struct controller *ctrl)
>   int ret;
>  
>   ret = pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, _status);
> - if (ret == PCIBIOS_DEVICE_NOT_FOUND || slot_status == (u16)~0)
> + if (ret == PCIBIOS_DEVICE_NOT_FOUND || slot_status == (u16)0)
>   return -ENODEV;
>  
>   return !!(slot_status & PCI_EXP_SLTSTA_PDS);
> @@ -592,7 +592,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
>  
>  read_status:
>   pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, );
> - if (status == (u16) ~0) {
> + if (status == (u16)0) {
>   ctrl_info(ctrl, "%s: no response from device\n", __func__);
>   if (parent)
>   pm_runtime_put(parent);
> -- 
> 2.18.2


Re: [PATCH] serial: core: drop unnecessary gpio include

2020-06-11 Thread Lukas Wunner
[cc += Heiko]

On Wed, Jun 10, 2020 at 05:51:21PM +0200, Johan Hovold wrote:
> Drop the recently added gpio include from the serial-core header in
> favour of a forward declaration and instead include the gpio header only
> where needed.

Hm, but why?  Are there adverse effects if this is included by
?

Thanks,

Lukas

> 
> Signed-off-by: Johan Hovold 
> ---
>  drivers/tty/serial/8250/8250_port.c | 1 +
>  drivers/tty/serial/serial_core.c| 1 +
>  include/linux/serial_core.h | 2 +-
>  3 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/tty/serial/8250/8250_port.c 
> b/drivers/tty/serial/8250/8250_port.c
> index 1632f7d25acc..d64ca77d9cfa 100644
> --- a/drivers/tty/serial/8250/8250_port.c
> +++ b/drivers/tty/serial/8250/8250_port.c
> @@ -16,6 +16,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> diff --git a/drivers/tty/serial/serial_core.c 
> b/drivers/tty/serial/serial_core.c
> index 3706f31b0c37..cba19f7d9ea3 100644
> --- a/drivers/tty/serial/serial_core.c
> +++ b/drivers/tty/serial/serial_core.c
> @@ -14,6 +14,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
> index 791f4844efeb..01fc4d9c9c54 100644
> --- a/include/linux/serial_core.h
> +++ b/include/linux/serial_core.h
> @@ -10,7 +10,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -30,6 +29,7 @@
>  struct uart_port;
>  struct serial_struct;
>  struct device;
> +struct gpio_desc;
>  
>  /*
>   * This structure describes all the operations that can be done on the
> -- 
> 2.26.2
> 


Re: [PATCH v2] spi: bcm2835: Enable shared interrupt support

2020-06-08 Thread Lukas Wunner
On Mon, Jun 08, 2020 at 12:11:11PM +0100, Robin Murphy wrote:
> And all in code that has at least one obvious inefficiency left on
> the table either way.

Care to submit a patch to overcome that inefficiency?


> This thread truly epitomises Knuth's "premature optimisation" quote... ;)

The thread came about because it can be determined at compile time
whether the interrupt is going to be shared:

On the BCM2835 (Raspberry Pi 1), CONFIG_ARCH_MULTI_V6 is set and this
SoC doesn't have multiple bcm2835-spi instances, so no shared interrupt.

The question is how to discern BCM2836/BCM2837 (Raspberry Pi 2/3), which
do not have multiple instances, and BCM2711 (Raspberry Pi 4) which does.

The Raspberry Pi Foundation compiles BCM2711 kernels with CONFIG_ARM_LPAE=y,
but Florian considered that kludgy as a discriminator and opted for
runtime-detection via the compatible string instead.  If you've got
a better idea please come forward.

Is "optimize shared IRQ support away if IS_ENABLED(CONFIG_ARCH_MULTI_V6),
else leave it in" the best we can do?

Thanks,

Lukas


Re: [PATCH 3/3] spi: bcm2835: Enable shared interrupt support

2020-06-05 Thread Lukas Wunner
On Thu, Jun 04, 2020 at 01:24:54PM -0700, Florian Fainelli wrote:
> So we do need to know for the first time we install the interrupt
> handler whether we will be in a shared situation or not, I cannot think
> of any solution other than counting the number of available DT nodes
> with the "brcm,bcm2835-spi" compatible string.

In principle it would be possible to iterate over the entire DT using
for_each_of_allnodes() and call of_irq_parse_one() on each device_node
if it's enabled and not the one we're probing.  Then check if any of that
device_node's IRQs is identical to that of the device_node we're probing.
That would give you a generic method to test for sharedness of an
interrupt.

However the solution you've found is simpler and cheaper than such a
brute-force search, hence seems perfectly valid to me.


> I appreciate that
> Lukas has spent some tremendous amount of time working on this
> controller driver and he has a sensitivity for performance.

Thanks!  Indeed I think spi-bcm2835.c is by now among the best performing
and most featureful SPI drivers in the kernel.  I've recently had a
discussion on netdev with someone testing an SPI-attached Ethernet
chip on iMX6Q and on STM32MP1 and couldn't get it to work.
With a BCM2837 no problem at all:

https://lore.kernel.org/netdev/ac0f7227-a4ae-b6cd-36ec-3bcb02b1a...@denx.de/

Lukas


Re: [PATCH v2] spi: bcm2835: Enable shared interrupt support

2020-06-05 Thread Lukas Wunner
On Thu, Jun 04, 2020 at 02:28:19PM -0700, Florian Fainelli wrote:
> The 4 SPI controller instances added in BCM2711 and BCM7211 SoCs (SPI3,
> SPI4, SPI5 and SPI6) share the same interrupt line with SPI0.
> 
> For the BCM2835 case which is deemed performance critical, we would like
> to continue using an interrupt handler which does not have the extra
> comparison on BCM2835_SPI_CS_INTR.
> 
> To support that requirement the common interrupt handling code between
> the shared and non-shared interrupt paths is split into a
> bcm2835_spi_interrupt_common() and both bcm2835_spi_interrupt() as well
> as bcm2835_spi_shared_interrupt() make use of it.
> 
> During probe, we determine if there is at least another instance of this
> SPI controller, and if there is, then we install a shared interrupt
> handler.
> 
> Signed-off-by: Florian Fainelli 

Reviewed-by: Lukas Wunner 


Re: [PATCH 2/3] ARM: dts: bcm2711: Update SPI nodes compatible strings

2020-06-04 Thread Lukas Wunner
On Thu, Jun 04, 2020 at 12:13:25PM +0100, Mark Brown wrote:
> On Thu, Jun 04, 2020 at 06:20:38AM +0200, Lukas Wunner wrote:
> > On Wed, Jun 03, 2020 at 08:46:54PM -0700, Florian Fainelli wrote:
> > > The BCM2711 SoC features 5 SPI controllers which all share the same
> > > interrupt line, the SPI driver needs to support interrupt sharing,
> > > therefore use the chip specific compatible string to help with that.
> 
> > You're saying above that the 5 controllers all share the interrupt
> > but below you're only changing the compatible string of 4 controllers.
> 
> > So I assume spi0 still has its own interrupt and only the additional
> > 4 controllers present on the BCM2711/BCM7211 share their interrupt?
> 
> Regardless of what's going on with the interrupts the compatible string
> should reflect the IP version so unless for some reason someone taped
> out two different versions of the IP it seems odd that the compatible
> strings would vary within a given SoC.

Hm.  I guess it may be possible to search the DT for other devices
sharing the same interrupt line and thereby determine whether
IRQF_SHARED is necessary.  The helper to perform this search could
live in drivers/of/irq.c as I imagine it might be useful in general.


Re: [PATCH 1/3] dt-bindings: spi: Document bcm2711 and bcm7211 SPI compatible

2020-06-03 Thread Lukas Wunner
On Wed, Jun 03, 2020 at 08:46:53PM -0700, Florian Fainelli wrote:
> The BCM2711 and BCM7211 chips use the BCM2835 SPI controller, but there
> are severl instances of those in the system and they all share the same
  ^^
Nit: "several"

And apparently they do not *all* share the interrupt, but only
4 controllers (out of 5, not counting the two bcm2835aux ones)
do so.

Thanks,

Lukas


Re: [PATCH 2/3] ARM: dts: bcm2711: Update SPI nodes compatible strings

2020-06-03 Thread Lukas Wunner
On Wed, Jun 03, 2020 at 08:46:54PM -0700, Florian Fainelli wrote:
> The BCM2711 SoC features 5 SPI controllers which all share the same
> interrupt line, the SPI driver needs to support interrupt sharing,
> therefore use the chip specific compatible string to help with that.

You're saying above that the 5 controllers all share the interrupt
but below you're only changing the compatible string of 4 controllers.

So I assume spi0 still has its own interrupt and only the additional
4 controllers present on the BCM2711/BCM7211 share their interrupt?

Thanks,

Lukas

> 
> Signed-off-by: Florian Fainelli 
> ---
>  arch/arm/boot/dts/bcm2711.dtsi | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm/boot/dts/bcm2711.dtsi b/arch/arm/boot/dts/bcm2711.dtsi
> index a91cf68e3c4c..9a9ea67fbc2d 100644
> --- a/arch/arm/boot/dts/bcm2711.dtsi
> +++ b/arch/arm/boot/dts/bcm2711.dtsi
> @@ -152,7 +152,7 @@
>   };
>  
>   spi3: spi@7e204600 {
> - compatible = "brcm,bcm2835-spi";
> + compatible = "brcm,bcm2711-spi", "brcm,bcm2835-spi";
>   reg = <0x7e204600 0x0200>;
>   interrupts = ;
>   clocks = < BCM2835_CLOCK_VPU>;
> @@ -162,7 +162,7 @@
>   };
>  
>   spi4: spi@7e204800 {
> - compatible = "brcm,bcm2835-spi";
> + compatible = "brcm,bcm2711-spi", "brcm,bcm2835-spi";
>   reg = <0x7e204800 0x0200>;
>   interrupts = ;
>   clocks = < BCM2835_CLOCK_VPU>;
> @@ -172,7 +172,7 @@
>   };
>  
>   spi5: spi@7e204a00 {
> - compatible = "brcm,bcm2835-spi";
> + compatible = "brcm,bcm2711-spi", "brcm,bcm2835-spi";
>   reg = <0x7e204a00 0x0200>;
>   interrupts = ;
>   clocks = < BCM2835_CLOCK_VPU>;
> @@ -182,7 +182,7 @@
>   };
>  
>   spi6: spi@7e204c00 {
> - compatible = "brcm,bcm2835-spi";
> + compatible = "brcm,bcm2711-spi", "brcm,bcm2835-spi";
>   reg = <0x7e204c00 0x0200>;
>   interrupts = ;
>   clocks = < BCM2835_CLOCK_VPU>;
> -- 
> 2.17.1
> 


Re: [PATCH 3/3] spi: bcm2835: Enable shared interrupt support

2020-06-03 Thread Lukas Wunner
On Wed, Jun 03, 2020 at 08:46:55PM -0700, Florian Fainelli wrote:
> +static const struct of_device_id bcm2835_spi_match[] = {
> + { .compatible = "brcm,bcm2835-spi", .data = _spi_interrupt },
> + { .compatible = "brcm,bcm2711-spi", .data = _spi_sh_interrupt },
> + { .compatible = "brcm,bcm7211-spi", .data = _spi_sh_interrupt },
> + {}
> +};
> +MODULE_DEVICE_TABLE(of, bcm2835_spi_match);

Maybe I'm missing something but I think you either have to reverse the
order of the entries in this array or change patch [2/3] to drop
"brcm,bcm2835-spi" from the compatible string:

__of_match_node() iterates over the entries in the array above and
calls __of_device_is_compatible() for each of them, which returns
success if the entry matches any of the device's compatible string.

Because "brcm,bcm2835-spi" is checked first and that string is
present on the controllers with shared interrupt, they're all
deemed not to use shared interrupts.

If you opt so fix this by dropping "brcm,bcm2835-spi" from the
device's compatible strings, then you have to move patch [2/3]
behind patch [3/3].


>  static int bcm2835_spi_probe(struct platform_device *pdev)
>  {
> + irqreturn_t (*bcm2835_spi_isr_func)(int, void *);

A more succinct alternative is:

irq_handler_t bcm2835_spi_isr_func;

Otherwise this patch LGTM.

Thanks,

Lukas


Re: [PATCH] spi: bcm2835: Enable shared interrupt support

2020-05-29 Thread Lukas Wunner
On Fri, May 29, 2020 at 11:03:48AM -0700, Florian Fainelli wrote:
> On 5/29/20 10:53 AM, Lukas Wunner wrote:
> > On Fri, May 29, 2020 at 10:46:01AM -0700, Florian Fainelli wrote:
> >> On 5/29/20 10:43 AM, Lukas Wunner wrote:
> >>> Finally, it would be nice if the check would be optimized away when
> >>> compiling for pre-RasPi4 products, maybe something like:
> >>>
> >>> + if (IS_ENABLED(CONFIG_ARM_LPAE) && !(cs & BCM2835_SPI_CS_INTR))
> >>> + return IRQ_NONE;
> >>
> >> Rather than keying this off ARM_LPAE or any other option, this should be
> >> keyed off a compatible string, that way we can even conditionally pass
> >> IRQF_SHARED to the interrupt handler if we care so much about performance.
> > 
> > But a compatible string can't be checked at compile time, can it?
> 
> No, but you can have a different interrupt handler that it set at
> runtime if you want to completely eliminate this comparison.

Good idea.  In fact the IRQ handler for platforms with shared interrupts
could just be a wrapper which performs the BCM2835_SPI_CS_INTR check
then tail-calls the existing IRQ handler.  The compiler would just
inline it and everything would be fine.


> My point is that CONFIG_ARM_LPAE is just too brittle, there is nothing
> that prevents you from using a non-LPAE kernel on the Pi 4, even PCIe
> could be made to work if using super section mappings to map the PCIe
> outbound space. Even on models with over 4GB of DRAM, if you are willing
> to lose some of it, it can work.

Agreed.

Thanks,

Lukas


Re: [PATCH] spi: bcm2835: Implement shutdown callback

2020-05-29 Thread Lukas Wunner
On Fri, May 29, 2020 at 10:48:11AM -0700, Florian Fainelli wrote:
> On 5/29/20 10:47 AM, Lukas Wunner wrote:
> > On Thu, May 28, 2020 at 12:06:05PM -0700, Florian Fainelli wrote:
> >> Make sure we clear the FIFOs, stop the block, disable the clock and
> >> release the DMA channel.
> > 
> > To what end?  Why is this change necessary?  Sorry but this seems like
> > an awfully terse commit message.
> 
> To ensure clocks are disabled and to save power in low power modes used
> on 7211 for instance.

Thanks for the explanation, that's an important tidbit.  I wasn't even
aware that this SPI controller is used on SoCs beyond the Raspberry Pi
ones.  Does the BCM7211 use shared interrupts for this controller?
Does it have DMA DREQ attached?


Re: [PATCH] spi: bcm2835: Enable shared interrupt support

2020-05-29 Thread Lukas Wunner
On Fri, May 29, 2020 at 10:46:01AM -0700, Florian Fainelli wrote:
> On 5/29/20 10:43 AM, Lukas Wunner wrote:
> > On Thu, May 28, 2020 at 08:58:04PM +0200, Nicolas Saenz Julienne wrote:
> >> --- a/drivers/spi/spi-bcm2835.c
> >> +++ b/drivers/spi/spi-bcm2835.c
> >> @@ -379,6 +379,10 @@ static irqreturn_t bcm2835_spi_interrupt(int irq, 
> >> void *dev_id)
> >>if (bs->tx_len && cs & BCM2835_SPI_CS_DONE)
> >>bcm2835_wr_fifo_blind(bs, BCM2835_SPI_FIFO_SIZE);
> >>  
> >> +  /* check if we got interrupt enabled */
> >> +  if (!(bcm2835_rd(bs, BCM2835_SPI_CS) & BCM2835_SPI_CS_INTR))
> >> +  return IRQ_NONE;
> >> +
> >>/* Read as many bytes as possible from FIFO */
> >>bcm2835_rd_fifo(bs);
> >>/* Write as many bytes as possible to FIFO */
[...]
> > Finally, it would be nice if the check would be optimized away when
> > compiling for pre-RasPi4 products, maybe something like:
> > 
> > +   if (IS_ENABLED(CONFIG_ARM_LPAE) && !(cs & BCM2835_SPI_CS_INTR))
> > +   return IRQ_NONE;
> 
> Rather than keying this off ARM_LPAE or any other option, this should be
> keyed off a compatible string, that way we can even conditionally pass
> IRQF_SHARED to the interrupt handler if we care so much about performance.

But a compatible string can't be checked at compile time, can it?

The point is that at the least the Foundation compiles and ships a separate
kernel for each of the three platforms BCM2835, BCM2837, BCM2711.  It's
unnecessary to check whether an interrupt was actually raised if we *know*
in advance that it's not shared (as is the case with kernels compiled for
BCM2835 and BCM2837).

Thanks,

Lukas


Re: [PATCH] spi: bcm2835: Implement shutdown callback

2020-05-29 Thread Lukas Wunner
On Thu, May 28, 2020 at 12:06:05PM -0700, Florian Fainelli wrote:
> Make sure we clear the FIFOs, stop the block, disable the clock and
> release the DMA channel.

To what end?  Why is this change necessary?  Sorry but this seems like
an awfully terse commit message.

Thanks,

Lukas

> 
> Signed-off-by: Florian Fainelli 
> ---
>  drivers/spi/spi-bcm2835.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c
> index 20d8581fdf88..237bd306c268 100644
> --- a/drivers/spi/spi-bcm2835.c
> +++ b/drivers/spi/spi-bcm2835.c
> @@ -1391,6 +1391,15 @@ static int bcm2835_spi_remove(struct platform_device 
> *pdev)
>   return 0;
>  }
>  
> +static void bcm2835_spi_shutdown(struct platform_device *pdev)
> +{
> + int ret;
> +
> + ret = bcm2835_spi_remove(pdev);
> + if (ret)
> + dev_err(>dev, "failed to shutdown\n");
> +}
> +
>  static const struct of_device_id bcm2835_spi_match[] = {
>   { .compatible = "brcm,bcm2835-spi", },
>   {}
> @@ -1404,6 +1413,7 @@ static struct platform_driver bcm2835_spi_driver = {
>   },
>   .probe  = bcm2835_spi_probe,
>   .remove = bcm2835_spi_remove,
> + .shutdown   = bcm2835_spi_shutdown,
>  };
>  module_platform_driver(bcm2835_spi_driver);
>  
> -- 
> 2.17.1
> 


Re: [PATCH] spi: bcm2835: Enable shared interrupt support

2020-05-29 Thread Lukas Wunner
On Thu, May 28, 2020 at 08:58:04PM +0200, Nicolas Saenz Julienne wrote:
> --- a/drivers/spi/spi-bcm2835.c
> +++ b/drivers/spi/spi-bcm2835.c
> @@ -379,6 +379,10 @@ static irqreturn_t bcm2835_spi_interrupt(int irq, void 
> *dev_id)
>   if (bs->tx_len && cs & BCM2835_SPI_CS_DONE)
>   bcm2835_wr_fifo_blind(bs, BCM2835_SPI_FIFO_SIZE);
>  
> + /* check if we got interrupt enabled */
> + if (!(bcm2835_rd(bs, BCM2835_SPI_CS) & BCM2835_SPI_CS_INTR))
> + return IRQ_NONE;
> +
>   /* Read as many bytes as possible from FIFO */
>   bcm2835_rd_fifo(bs);
>   /* Write as many bytes as possible to FIFO */

This definitely looks wrong.  The check whether the interrupt is enabled
should be moved *before* the conditional calls to bcm2835_rd_fifo_blind()
and bcm2835_wr_fifo_blind(), i.e. to the top of the function.

Otherwise if an interrupt is raised by another SPI controller,
this function may perform read/write accesses to the FIFO and
interfere with an ongoing transfer in DMA or poll mode.

Also, instead of performing an MMIO read, just use the "cs" variable
which was assigned at the top of the function.

The code comment should explain that the check is necessary because the
interrupt may be shared with other controllers on the BCM2711.

Finally, it would be nice if the check would be optimized away when
compiling for pre-RasPi4 products, maybe something like:

+   if (IS_ENABLED(CONFIG_ARM_LPAE) && !(cs & BCM2835_SPI_CS_INTR))
+   return IRQ_NONE;

Thanks,

Lukas


Re: [PATCH v3 3/5] serial: 8250: Support separate rs485 rx-enable GPIO

2020-05-18 Thread Lukas Wunner
On Mon, May 18, 2020 at 05:22:47PM +0200, Lukas Wunner wrote:
> And for longer cables, users may have to disable it using the
> TIOCSRS485 ioctl.

Sorry, I meant *enable* here. %-/


Re: [PATCH v3 3/5] serial: 8250: Support separate rs485 rx-enable GPIO

2020-05-18 Thread Lukas Wunner
On Mon, May 18, 2020 at 06:12:41PM +0300, Andy Shevchenko wrote:
> On Sun, May 17, 2020 at 11:56:08PM +0200, Heiko Stuebner wrote:
> > From: Heiko Stuebner 
> > 
> > The RE signal is used to control the duplex mode of transmissions,
> > aka receiving data while sending in full duplex mode, while stopping
> > receiving data in half-duplex mode.
> > 
> > On a number of boards the !RE signal is tied to ground so reception
> > is always enabled except if the UART allows disabling the receiver.
> > This can be taken advantage of to implement half-duplex mode - like
> > done on 8250_bcm2835aux.
> > 
> > Another solution is to tie !RE to RTS always forcing half-duplex mode.
> > 
> > And finally there is the option to control the RE signal separately,
> > like done here by introducing a new rs485-specific gpio that can be
> > set depending on the RX_DURING_TX setting in the common em485 callbacks.
> 
> ...
> 
> > +   port->rs485_re_gpio = devm_gpiod_get_optional(dev, "rs485-rx-enable",
> > + GPIOD_OUT_HIGH);
> 
> While reviewing some other patch I realized that people are missing the
> point of these GPIO flags when pin is declared to be output.
> 
> HIGH here means "asserted" (consider active-high vs. active-low in
> general). Is that the intention here?
> 
> Lukas, same question to your patch.

Yes.  "High", i.e. asserted, means "termination enabled" in the case of
my patch and "receiver enabled" in the case of Heiko's patch.

For termination, the default is "low", i.e. termination disabled.
I talked to our hardware engineers and they said that disabling
termination by default is the safe choice:  If multiple devices
on the RS485 bus enable termination, then no communication may
be possible at all.  Whereas if termination is disabled,
communication should always work at least for short cables.
And for longer cables, users may have to disable it using the
TIOCSRS485 ioctl.

In the case of Heiko's patches, the default is "high", i.e.
the receiver is enabled by default so you're able to receive
data over the bus after opening the tty.  Once you start
transmitting, the GPIO may be switched to low for the duration
of the transmission if half-duplex mode is enabled.

Thanks,

Lukas


Re: [PATCH v3 3/5] serial: 8250: Support separate rs485 rx-enable GPIO

2020-05-18 Thread Lukas Wunner
On Mon, May 18, 2020 at 10:04:05AM +0200, Heiko Stübner wrote:
> Am Montag, 18. Mai 2020, 06:50:06 CEST schrieb Lukas Wunner:
> > On Sun, May 17, 2020 at 11:56:08PM +0200, Heiko Stuebner wrote:
> > > @@ -1457,6 +1458,7 @@ void serial8250_em485_stop_tx(struct uart_8250_port 
> > > *p)
> > >* Enable previously disabled RX interrupts.
> > >*/
> > >   if (!(p->port.rs485.flags & SER_RS485_RX_DURING_TX)) {
> > > + gpiod_set_value(port->rs485_re_gpio, 1);
> > >   serial8250_clear_and_reinit_fifos(p);
> > >  
> > >   p->ier |= UART_IER_RLSI | UART_IER_RDI;
> > 
> > The added line needs to be conditional on if (port->rs485_re_gpio)
> > because the gpiod could be NULL and gpiod_set_value() doesn't check
> > for that.
> 
> Need to look deeper at the other comment below, but gpiod_set_value does
> check for NULL ;-)

I'll be damned.  That means I should respin my patch [4/4] and
drop the NULL check. :-/

Thanks!

Lukas


Re: [PATCH v3 3/5] serial: 8250: Support separate rs485 rx-enable GPIO

2020-05-17 Thread Lukas Wunner
On Sun, May 17, 2020 at 11:56:08PM +0200, Heiko Stuebner wrote:
> @@ -1457,6 +1458,7 @@ void serial8250_em485_stop_tx(struct uart_8250_port *p)
>* Enable previously disabled RX interrupts.
>*/
>   if (!(p->port.rs485.flags & SER_RS485_RX_DURING_TX)) {
> + gpiod_set_value(port->rs485_re_gpio, 1);
>   serial8250_clear_and_reinit_fifos(p);
>  
>   p->ier |= UART_IER_RLSI | UART_IER_RDI;

The added line needs to be conditional on if (port->rs485_re_gpio)
because the gpiod could be NULL and gpiod_set_value() doesn't check
for that.


> @@ -1597,9 +1599,12 @@ static inline void __start_tx(struct uart_port *port)
>  void serial8250_em485_start_tx(struct uart_8250_port *up)
>  {
>   unsigned char mcr = serial8250_in_MCR(up);
> + struct uart_port *port = >port;
>  
> - if (!(up->port.rs485.flags & SER_RS485_RX_DURING_TX))
> + if (!(up->port.rs485.flags & SER_RS485_RX_DURING_TX)) {
> + gpiod_set_value(port->rs485_re_gpio, 0);
>   serial8250_stop_rx(>port);
> + }

Same here.


> --- a/include/linux/serial_core.h
> +++ b/include/linux/serial_core.h
> @@ -253,6 +253,7 @@ struct uart_port {
>   const struct attribute_group **tty_groups;  /* all attributes 
> (serial core use only) */
>   struct serial_rs485 rs485;
>   struct gpio_desc*rs485_term_gpio;   /* enable RS485 bus 
> termination */
> + struct gpio_desc*rs485_re_gpio; /* gpio RS485 receive 
> enable */

Nit: I'd probably document this as "enable RS485 receiver" because it's
already apparent from the variable type and name that it's a gpio,
making it unnecessary to repeat that in the code comment.  But I guess
that's a matter of personal preference.


There's something else:  You need to amend serial8250_em485_config()
to toggle the GPIO depending on whether SER_RS485_RX_DURING_TX is
set.  Right now you enable the receiver by default and then disable
it when starting to transmit if half-duplex mode is selected and
likewise re-enable it when stopping to transmit.  But user space
may write some stuff to the tty while in half-duplex mode, then
immediately issue a TIOCSRS485 ioctl to switch to full-duplex mode.
If the ->rs485_config callback is executed while transmitting is
still ongoing, then you'll not re-enable the receiver when transmitting
finally stops.  The ->rs485_config callback is invoked under the
uart port spinlock but the lock may be briefly released and later
re-acquired by the IRQ handler if the TX FIFO is full.  (Unless
I'm missing something.)

Thanks,

Lukas


Re: [PATCH v2 6/7] serial: 8250: Support separate rs485 rx-enable GPIO

2020-05-16 Thread Lukas Wunner
On Thu, Mar 26, 2020 at 12:14:21AM +0100, Heiko Stuebner wrote:
> --- a/drivers/tty/serial/serial_core.c
> +++ b/drivers/tty/serial/serial_core.c
> @@ -3356,6 +3356,18 @@ int uart_get_rs485_mode(struct uart_port *port)
>   rs485conf->flags |= SER_RS485_TERMINATE_BUS;
>   }
>  
> + port->rs485_re_gpio = devm_gpiod_get(dev, "rs485-rx-enable",
> +GPIOD_FLAGS_BIT_DIR_OUT);

I think you want to use GPIOD_OUT_HIGH here so that rs485 reception
is enabled by default and only disabled at runtime when sending
(if half-duplex mode is used).

Thanks,

Lukas


Re: [PATCH v2 5/7] dt-bindings: serial: Add binding for rs485 receiver enable GPIO

2020-05-02 Thread Lukas Wunner
On Thu, Mar 26, 2020 at 12:14:20AM +0100, Heiko Stuebner wrote:
> --- a/Documentation/devicetree/bindings/serial/rs485.yaml
> +++ b/Documentation/devicetree/bindings/serial/rs485.yaml
> @@ -44,6 +44,10 @@ properties:
> description: enables the receiving of data even while sending data.
> $ref: /schemas/types.yaml#/definitions/flag
>  
> +  rs485-rx-enable-gpios:
> +   description: GPIO to handle a separate RS485 receive enable signal
> +   $ref: /schemas/types.yaml#/definitions/flag
> +

This isn't a flag, so you need the "maxItems: 1" line instead,
as you correctly did below:

>rs485-term-gpios:
>  description: GPIO pin to enable RS485 bus termination.
>  maxItems: 1


Re: [PATCH v2 4/7] serial: 8250: Handle implementations not having TEMT interrupt using em485

2020-05-02 Thread Lukas Wunner
On Thu, Mar 26, 2020 at 12:14:19AM +0100, Heiko Stuebner wrote:
> Some 8250 ports have a TEMT interrupt but it's not a part of the 8250
> standard, instead only available on some implementations.
> 
> The current em485 implementation does not work on ports without it.
> The only chance to make it work is to loop-read on LSR register.
> 
> So add UART_CAP_TEMT to mark 8250 uarts having this interrupt,
> update all current em485 users with that capability and make
> the stop_tx function loop-read on uarts not having it.

Just to get a better understanding:  According to the Dw_apb_uart_db.pdf
databook I've found, the UART does have a "THR empty" interrupt.  So you
get an interrupt once the Transmit Holding Register (and by consequence
the FIFO) has been drained.  Then what do you need a TEMT interrupt for?
Why is the THR interrupt not sufficient?


> @@ -1529,11 +1535,22 @@ static inline void __stop_tx(struct uart_8250_port *p)
>   /*
>* To provide required timeing and allow FIFO transfer,
>* __stop_tx_rs485() must be called only when both FIFO and
> -  * shift register are empty. It is for device driver to enable
> -  * interrupt on TEMT.
> +  * shift register are empty. If 8250 port supports it,
> +  * it is for device driver to enable interrupt on TEMT.
> +  * Otherwise must loop-read until TEMT and THRE flags are set.
>*/
> - if ((lsr & BOTH_EMPTY) != BOTH_EMPTY)
> - return;
> + if (p->capabilities & UART_CAP_TEMT) {
> + if ((lsr & BOTH_EMPTY) != BOTH_EMPTY)
> + return;
> + } else {
> + int lsr;
> +
> + if (readx_poll_timeout(__get_lsr, p, lsr,
> + (lsr & BOTH_EMPTY) == BOTH_EMPTY,
> + 0, 1) < 0)
> + pr_warn("%s: timeout waiting for fifos to 
> empty\n",
> + p->port.name);
> + }

Do you actually need to check for the timeout?  How could this happen?
Only if some other part of the driver would disable the transmitter
I guess, which would be a bug.

Also, note that __stop_tx() may be called from hardirq context via
serial8250_tx_chars().  If the baudrate is low, you may spin for a
fairly long time in IRQ context.  E.g. with 9600 8N1, it takes about
1 msec for one char to transmit.

Thanks,

Lukas


Re: [PATCH 0/1] Fiji GPU audio register timeout when in BACO state

2020-05-02 Thread Lukas Wunner
On Sat, May 02, 2020 at 09:11:58AM +0200, Takashi Iwai wrote:
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -673,6 +673,12 @@ static int amdgpu_dm_audio_component_bind(struct device 
> *kdev,
>   struct amdgpu_device *adev = dev->dev_private;
>   struct drm_audio_component *acomp = data;
>  
> + if (!device_link_add(hda_kdev, kdev, DL_FLAG_STATELESS |
> +  DL_FLAG_PM_RUNTIME)) {
> + DRM_ERROR("DM: cannot add device link to audio device\n");
> + return -ENOMEM;
> + }
> +

Doesn't this duplicate drivers/pci/quirks.c:quirk_gpu_hda() ?


Re: [RFC PATCH 05/22] thunderbolt: Add helper macros to iterate over switch ports

2019-10-06 Thread Lukas Wunner
On Wed, Oct 02, 2019 at 05:28:59PM +0300, Mika Westerberg wrote:
> On Wed, Oct 02, 2019 at 04:17:54PM +0200, Oliver Neukum wrote:
> > Am Dienstag, den 01.10.2019, 14:38 +0300 schrieb Mika Westerberg:
> > > @@ -1975,10 +1972,8 @@ void tb_switch_suspend(struct tb_switch *sw)
> > > if (err)
> > > return;
> > >  
> > > -   for (i = 1; i <= sw->config.max_port_number; i++) {
> > > -   if (tb_port_has_remote(>ports[i]))
> > > -   tb_switch_suspend(sw->ports[i].remote->sw);
> > > -   }
> > > +   tb_switch_for_each_remote_port(sw, i)
> > > +   tb_switch_suspend(sw->ports[i].remote->sw);
> > 
> > This macro looks a bit prone to misunderstanding.
> > I guess the function would be better if the test could be seen.
> 
> The macro does not really save too many lines so I think I can just drop
> this patch for now and keep these open-coded.

Introducing a macro is fine.

However instead of using an index "i" as iterator, I'd suggest using a
pointer to struct tb_port.  This reduces the horizontal width of the
code because you can replace the "sw->ports[i]" with just "port".

In most of the loops this also saves 1 line for an assignment:
"struct tb_port *port = >ports[i];"

In fact, I've already proposed such a macro more than a year ago
but it never got merged:

https://lore.kernel.org/patchwork/patch/983863/ 

/**
 * tb_sw_for_each_port() - iterate over ports on a switch
 * @switch: pointer to struct tb_switch over whose ports shall be iterated
 * @port: pointer to struct tb_port which shall be used as the iterator
 *
 * Excludes port 0, which is the switch itself and therefore irrelevant for
 * most iterations.
 */
#define tb_sw_for_each_port(switch, port)\
for (port = &(switch)->ports[1]; \
 port <= &(switch)->ports[(switch)->config.max_port_number]; \
 port++)

Thanks,

Lukas


Re: [PATCH 0/3] PCI: pciehp: Do not turn off slot if presence comes up after link

2019-10-02 Thread Lukas Wunner
On Wed, Oct 02, 2019 at 05:13:58PM -0500, Alex G. wrote:
> On 10/1/19 11:13 PM, Lukas Wunner wrote:
> > On Tue, Oct 01, 2019 at 05:14:16PM -0400, Stuart Hayes wrote:
> > > This patch set is based on a patch set [1] submitted many months ago by
> > > Alexandru Gagniuc, who is no longer working on it.
> > 
> > I'm not sure if it's appropriate to change the author and
> > omit Alex' Signed-off-by.
> 
> Legally Dell owns the patches. I have no objection on my end.

>From a kernel community POV, I don't think it matters (in this case)
who legally owns the copyright to the contributed code.  It's just that
we go to great lengths to provide proper attribution even for small
contributions (e.g. Tested-by).

The benefit to the community is that we know who to cc if that portion
of the code is changed again and someone knowledgable should take a look.

The benefit to contributors is that they can change jobs and their past
contributions are still visible in the git history and attributed to
their names.  By contrast, if you've worked on closed source code and
changed jobs, that work isn't visible to future employers or even yourself,
and it may happen that someone else takes credit for your past work
without you even knowing about it or being able to stop that.
(I've seen it before.)

In this case, there should be a S-o-b line for Alex preceding that
for Stuart, and the author of the commit should be Alex unless a
significant portion of the patch was changed.

Thanks,

Lukas


Re: [PATCH 0/3] PCI: pciehp: Do not turn off slot if presence comes up after link

2019-10-01 Thread Lukas Wunner
On Tue, Oct 01, 2019 at 05:14:16PM -0400, Stuart Hayes wrote:
> This patch set is based on a patch set [1] submitted many months ago by
> Alexandru Gagniuc, who is no longer working on it.
> 
> [1] https://patchwork.kernel.org/cover/10909167/
> [v3,0/4] PCI: pciehp: Do not turn off slot if presence comes up after link

If I'm not mistaken, these two are identical to Alex' patches, right?

  PCI: pciehp: Add support for disabling in-band presence
  PCI: pciehp: Wait for PDS when in-band presence is disabled

I'm not sure if it's appropriate to change the author and
omit Alex' Signed-off-by.

Otherwise I have no objections against this series.

Thanks,

Lukas


Re: [PATCH v2 2/2] PCI: pciehp: Prevent deadlock on disconnect

2019-09-23 Thread Lukas Wunner
On Mon, Sep 23, 2019 at 11:12:37AM +0300, Mika Westerberg wrote:
> Regarding suggestion of unbinding PCI drivers without
> pci_lock_rescan_remove() hold, I haven't looked it too closely but I
> think we need to take that lock anyway because when we are unbinding a
> hotplug driver it is supposed to remove the hierarchy below touching the
> shared structures, possibly concurrently. Unfortunately there is no
> documentation what data pci_lock_rescan_remove() actually protects so
> first one needs to understand that. I think one way to clean up this is
> to use finer grained locking (with documented lock ordering) for PCI bus
> structures that can be accessed simultaneusly by different threads. But
> that is not a simple task.

Right.  I'm (still) working on that, albeit slowly as I'm caught up with
other stuff.

Thanks,

Lukas


Re: [PATCH v2 2/2] PCI: pciehp: Prevent deadlock on disconnect

2019-09-22 Thread Lukas Wunner
On Mon, Aug 12, 2019 at 05:31:33PM +0300, Mika Westerberg wrote:
> If there are more than one PCIe switch with hotplug downstream ports
> hot-removing them leads to a following deadlock:

For the record, I think my comments on v1 of this patch still apply:

https://patchwork.ozlabs.org/patch/1117870/#2230798


Re: [PATCH v4 0/4] Simplify PCIe hotplug indicator control

2019-09-05 Thread Lukas Wunner
On Thu, Sep 05, 2019 at 04:01:02PM -0500, Bjorn Helgaas wrote:
>  void pciehp_set_indicators(struct controller *ctrl, int pwr, int attn)
>  {
>   u16 cmd = 0, mask = 0;
>  
> - if (PWR_LED(ctrl) && pwr > 0) {
> - cmd |= pwr;
> + if (PWR_LED(ctrl) && pwr != INDICATOR_NOOP) {
> + cmd |= (pwr & PCI_EXP_SLTCTL_PIC);
>   mask |= PCI_EXP_SLTCTL_PIC;
>   }
>  
> - if (ATTN_LED(ctrl) && attn > 0) {
> - cmd |= attn;
> + if (ATTN_LED(ctrl) && attn != INDICATOR_NOOP) {
> + cmd |= (attn & PCI_EXP_SLTCTL_AIC);
>   mask |= PCI_EXP_SLTCTL_AIC;
>   }

There's a subtle issue here:  A value of "0" is "reserved" per PCIe r4.0,
sec 7.8.10.  Denis filtered that out, with your change it's an accepted
value.  I don't think the function ever gets called with a value of "0",
so it's not a big deal.  And maybe we don't even want to filter that
value out.  Just noting anyway.

Thanks,

Lukas


Re: [PATCH 2/2] PCI: Unify pci_dev_is_disconnected() and pci_dev_is_inaccessible()

2019-09-03 Thread Lukas Wunner
On Tue, Sep 03, 2019 at 10:36:35PM -0600, Kelsey Skunberg wrote:
> Change pci_dev_is_disconnected() call inside pci_dev_is_inaccessible() to:
> 
>   pdev->error_state == pci_channel_io_perm_failure
> 
> Change remaining pci_dev_is_disconnected() calls to
> pci_dev_is_inaccessible() calls.

I don't think that's a good idea because it introduces a config space read
(for the vendor ID) in places where we don't want that.  E.g., after the
check of pdev->error_state, a regular config space read may take place and
if that returns all ones, we may already be able to determine that the
device is inaccessible, obviating the need for a vendor ID check.
Config space reads aren't for free.

Thanks,

Lukas


Re: [PATCH v3 0/4] Simplify PCIe hotplug indicator control

2019-08-27 Thread Lukas Wunner
On Tue, Aug 27, 2019 at 05:53:19PM -0500, Bjorn Helgaas wrote:
> Unrelated, but if anybody is looking at pciehp, is there value in
> having pciehp split across five files?
> 
>   drivers/pci/hotplug/pciehp.h
>   drivers/pci/hotplug/pciehp_core.c
>   drivers/pci/hotplug/pciehp_ctrl.c
>   drivers/pci/hotplug/pciehp_hpc.c
>   drivers/pci/hotplug/pciehp_pci.c
> 
> To me, it just makes things harder because when I'm browsing for
> something in pciehp and I don't know the exact name of it, I have to
> guess which file it's in, and I'm invariably wrong.
> 
> It seems like it would be much simpler if everything were in a single
> pciehp.c file.  Then we could also get rid of the header and make
> several more things static.

I wouldn't go so far as to say there's *value* in the split.

It's a historic artefact, it's been like that since pciehp was
introduced back in 2004:
https://git.kernel.org/tglx/history/c/c16b4b14d980

I was hesitant to change it so far, in particular because it makes
it harder to backport stable patches.

That said, I did think about rearranging things to reduce the number
of files and non-static declarations or to give the files better names.
I wasn't thinking of squashing everything into one file, it would
probably be quite large and not make things simpler.

The general logic is that everything touching the registers is in
pciehp_hpc.c.  You won't (or shouldn't) find register access in any
of the other files.

pciehp_ctrl.c is the control logic, reaction to events, etc.
Though portions of the control logic are also in pciehp_hpc.c,
in particular the interrupt servicing.

pciehp_core.c contains the interface to the PCI hotplug core and
the probe/remove/PM routines.

pciehp_pci.c contains bringup/teardown of the slot.

One thing that I think deserves changing is this comment at the
top of each file:

"Send feedback to , "

We now use MAINTAINERS to do this.  Kristen took over maintainership
in 2005 with commit 8cf4c19523b7 but by 2009 she had stepped down per
commit be3652b8a253.  So providing her address is no longer accurate.
And I imagine Greg is no longer as familiar with the driver as it has
changed considerably since 2004.  He'd probably have to defer to others
who have done work on it recently (such as me).

Thanks,

Lukas


Re: [PATCH v3 0/8] thunderbolt: Intel Ice Lake support

2019-08-20 Thread Lukas Wunner
On Mon, Aug 19, 2019 at 04:29:35PM +, mario.limoncie...@dell.com wrote:
> I've run into a problem when using
> a WD19TB that after unplugging it will cause the following to spew in dmesg:
> 
> [ 2198.017003] 
> [ 2198.017005] WARNING: possible recursive locking detected
> [ 2198.017008] 5.3.0-rc5+ #75 Not tainted
> [ 2198.017009] 
> [ 2198.017012] irq/122-pciehp/121 is trying to acquire lock:
> [ 2198.017015] 801d4de8 (>reset_lock){.+.+}, at: 
> pciehp_check_presence+0x1b/0x80
> [ 2198.017026] 
>but task is already holding lock:
> [ 2198.017028] 0899e2eb (>reset_lock){.+.+}, at: 
> pciehp_ist+0xaf/0x1c0

This was first reported by Theodore in April and appears to be a
false positive:

https://lore.kernel.org/linux-pci/20190402083257.kyqmirq4ovzsc...@wunner.de/

Thanks,

Lukas


Re: [PATCH v2 7/8] thunderbolt: Add support for Intel Ice Lake

2019-08-13 Thread Lukas Wunner
On Mon, Aug 12, 2019 at 03:38:46PM +0300, Mika Westerberg wrote:
> +static void icm_veto_begin(struct tb *tb)
> +{
> + struct icm *icm = tb_priv(tb);
> +
> + if (!icm->veto) {
> + icm->veto = true;
> + /* Keep the domain powered while veto is in effect */
> + pm_runtime_get(>dev);
> + }
> +}

Hm, don't you need memory barriers when accessing icm->veto?

If so, I'd suggest:

/* Keep the domain powered while veto is in effect */
if (cmpxchg(>veto, false, true))
pm_runtime_get(>dev);

You'll have to declare icm->veto unsigned int instead of bool
because thunderbolt.ko is compiled if CONFIG_COMPILE_TEST is
enabled and there are arches which do not support cmpxchg for
a size of 1 byte.

The other bools in struct icm should likewise be unsigned int
per Linus' dictum:
https://lkml.org/lkml/2017/11/21/384


> --- a/drivers/thunderbolt/nhi.c
> +++ b/drivers/thunderbolt/nhi.c
> +/* Ice Lake specific NHI operations */
> +

Again, can't this be moved to a separate file for maintainability?

Thanks,

Lukas


Re: [PATCH v2 1/4] PCI: pciehp: Add pciehp_set_indicators() to jointly set LED indicators

2019-08-12 Thread Lukas Wunner
On Mon, Aug 12, 2019 at 11:49:23AM -0700, sathyanarayanan kuppuswamy wrote:
> > On 8/11/19 12:59 PM, Denis Efremov wrote:
> > > +if ((!PWR_LED(ctrl)  || pwr  == PWR_NONE) &&
> > > +(!ATTN_LED(ctrl) || attn == ATTN_NONE))
> > > +return;
> 
> Also I think this condition needs to expand to handle the case whether pwr
> != PWR_NONE and !PWR_LED(ctrl) is true.
> 
> you need to return for case, pwr = PWR_ON, !PWR_LED(ctrl)=true ,
> !ATTN_LED(ctrl)=false, attn=on

Why should we return in that case?  We need to update the Attention
Indicator Control to On.

Thanks,

Lukas


Re: [PATCH v2 3/4] PCI: pciehp: Replace pciehp_set_attention_status()

2019-08-12 Thread Lukas Wunner
On Sun, Aug 11, 2019 at 10:59:43PM +0300, Denis Efremov wrote:
> +#define pciehp_set_attention_status(ctrl, status) \
> + pciehp_set_indicators(ctrl, PWR_NONE, (status == 0 ? ATTN_OFF : status))

Reviewed-by: Lukas Wunner 

Good catch regarding the translation of the "off" value that's needed here.


Re: [PATCH v2 1/4] PCI: pciehp: Add pciehp_set_indicators() to jointly set LED indicators

2019-08-12 Thread Lukas Wunner
On Sun, Aug 11, 2019 at 10:59:41PM +0300, Denis Efremov wrote:
> Add pciehp_set_indicators() to set power and attention indicators with a
> single register write. Thus, avoiding waiting twice for Command Complete.
> enum pciehp_indicator introduced to switch between the indicators statuses.
> 
> Signed-off-by: Denis Efremov 

Reviewed-by: Lukas Wunner 


Re: [PATCH 1/4] PCI: pciehp: Add pciehp_set_indicators() to jointly set LED indicators

2019-08-11 Thread Lukas Wunner
On Sun, Aug 11, 2019 at 06:07:55PM +0200, Lukas Wunner wrote:
>   if (!PWR_LED(ctrl)  || pwr  == PWR_NONE) &&
>   !ATTN_LED(ctrl) || attn == ATTN_NONE))
>   return;

Forgot an opening brace in two spots here, sorry.


Re: [PATCH 4/4] PCI: pciehp: Replace pciehp_green_led_{on,off,blink}()

2019-08-11 Thread Lukas Wunner
On Sun, Aug 11, 2019 at 04:29:45PM +0300, Denis Efremov wrote:
> This patch replaces pciehp_green_led_{on,off,blink}() with
> pciehp_set_indicators().
> 
> Signed-off-by: Denis Efremov 

Reviewed-by: Lukas Wunner 


Re: [PATCH 3/4] PCI: pciehp: Replace pciehp_set_attention_status()

2019-08-11 Thread Lukas Wunner
On Sun, Aug 11, 2019 at 04:29:44PM +0300, Denis Efremov wrote:
> +#define pciehp_set_attention_status(crtl, status) \
> + pciehp_set_indicators(ctrl, PWR_NONE, status)
> +

There's a typo here, s/crtl/ctrl/.

With that addressed,

Reviewed-by: Lukas Wunner 


Re: [PATCH 2/4] PCI: pciehp: Switch LED indicators with a single write

2019-08-11 Thread Lukas Wunner
On Sun, Aug 11, 2019 at 04:29:43PM +0300, Denis Efremov wrote:
> This patch replaces all consecutive switches of power and attention
> indicators with pciehp_set_indicators() call. Thus, performing only
> single write to a register.
> 
> Signed-off-by: Denis Efremov 

Reviewed-by: Lukas Wunner 


  1   2   3   4   5   6   7   8   9   10   >