from:"Pali Rohár"

Re: [PATCH v10 5/7] PCI: mediatek-gen3: Add MSI support

2021-04-20 Thread Pali Rohár

On Tuesday 20 April 2021 12:01:10 Marc Zyngier wrote:
> On Tue, 20 Apr 2021 10:44:02 +0100,
> Pali Rohár  wrote:
> > 
> > Hello!
> > 
> > On Tuesday 20 April 2021 14:17:21 Jianjun Wang wrote:
> > > +static void mtk_pcie_enable_msi(struct mtk_pcie_port *port)
> > > +{
> > > + int i;
> > > + u32 val;
> > > +
> > > + for (i = 0; i < PCIE_MSI_SET_NUM; i++) {
> > > + struct mtk_msi_set *msi_set = >msi_sets[i];
> > > +
> > > + msi_set->base = port->base + PCIE_MSI_SET_BASE_REG +
> > > + i * PCIE_MSI_SET_OFFSET;
> > > + msi_set->msg_addr = port->reg_base + PCIE_MSI_SET_BASE_REG +
> > > + i * PCIE_MSI_SET_OFFSET;
> > > +
> > > + /* Configure the MSI capture address */
> > > + writel_relaxed(lower_32_bits(msi_set->msg_addr), msi_set->base);
> > > + writel_relaxed(upper_32_bits(msi_set->msg_addr),
> > > +port->base + PCIE_MSI_SET_ADDR_HI_BASE +
> > > +i * PCIE_MSI_SET_ADDR_HI_OFFSET);
> > 
> > This looks like as setting MSI doorbell address to MSI doorbell address.
> > 
> > > +static void mtk_compose_msi_msg(struct irq_data *data, struct msi_msg 
> > > *msg)
> > > +{
> > > + struct mtk_msi_set *msi_set = irq_data_get_irq_chip_data(data);
> > > + struct mtk_pcie_port *port = data->domain->host_data;
> > > + unsigned long hwirq;
> > > +
> > > + hwirq = data->hwirq % PCIE_MSI_IRQS_PER_SET;
> > > +
> > > + msg->address_hi = upper_32_bits(msi_set->msg_addr);
> > > + msg->address_lo = lower_32_bits(msi_set->msg_addr);
> > > + msg->data = hwirq;
> > > + dev_dbg(port->dev, "msi#%#lx address_hi %#x address_lo %#x data %d\n",
> > > + hwirq, msg->address_hi, msg->address_lo, msg->data);
> > 
> > ... which is later used in compose_msi_msg().
> > 
> > Marc in some other patches for other pci controller drivers changed this
> > address to just main "port" structure. It simplified implementations and
> > also avoided need to declare additional member "msg_addr".
> > 
> > Marc, would it be possible to simplify it also for this driver and just
> > set msg_addr to virt_to_phys(port)?
> 
> Maybe. It really depends on what range the HW accepts, and the sole
> requirement is to use an address that the endpoint cannot DMA
> to. Here, the driver seems to be using something based on the port
> base address, which is good enough as far as I am concerned (the thing
> I usually object to is the allocation of memory just for the sake of
> getting a capture address).
> 
> If you want to further simplify it, you could simply use port.reg_base
> as the MSI address for all sets, as I don't think they have to be
> distinct. But someone with access to the TRM for this should go and
> check it.
> 
> I don't believe this should gate the merging od this driver though.

Of course! I'm just asking details to understand best practises and how
it works. So thanks for information!

>   M.
> 
> -- 
> Without deviation from the norm, progress is not possible.

Re: [PATCH v10 5/7] PCI: mediatek-gen3: Add MSI support

2021-04-20 Thread Pali Rohár

Hello!

On Tuesday 20 April 2021 14:17:21 Jianjun Wang wrote:
> +static void mtk_pcie_enable_msi(struct mtk_pcie_port *port)
> +{
> + int i;
> + u32 val;
> +
> + for (i = 0; i < PCIE_MSI_SET_NUM; i++) {
> + struct mtk_msi_set *msi_set = >msi_sets[i];
> +
> + msi_set->base = port->base + PCIE_MSI_SET_BASE_REG +
> + i * PCIE_MSI_SET_OFFSET;
> + msi_set->msg_addr = port->reg_base + PCIE_MSI_SET_BASE_REG +
> + i * PCIE_MSI_SET_OFFSET;
> +
> + /* Configure the MSI capture address */
> + writel_relaxed(lower_32_bits(msi_set->msg_addr), msi_set->base);
> + writel_relaxed(upper_32_bits(msi_set->msg_addr),
> +port->base + PCIE_MSI_SET_ADDR_HI_BASE +
> +i * PCIE_MSI_SET_ADDR_HI_OFFSET);

This looks like as setting MSI doorbell address to MSI doorbell address.

> +static void mtk_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
> +{
> + struct mtk_msi_set *msi_set = irq_data_get_irq_chip_data(data);
> + struct mtk_pcie_port *port = data->domain->host_data;
> + unsigned long hwirq;
> +
> + hwirq = data->hwirq % PCIE_MSI_IRQS_PER_SET;
> +
> + msg->address_hi = upper_32_bits(msi_set->msg_addr);
> + msg->address_lo = lower_32_bits(msi_set->msg_addr);
> + msg->data = hwirq;
> + dev_dbg(port->dev, "msi#%#lx address_hi %#x address_lo %#x data %d\n",
> + hwirq, msg->address_hi, msg->address_lo, msg->data);

... which is later used in compose_msi_msg().

Marc in some other patches for other pci controller drivers changed this
address to just main "port" structure. It simplified implementations and
also avoided need to declare additional member "msg_addr".

Marc, would it be possible to simplify it also for this driver and just
set msg_addr to virt_to_phys(port)?

Re: [PATCH] arm64: dts: marvell: armada-37xx: Set linux,pci-domain to zero

2021-04-17 Thread Pali Rohár

On Saturday 17 April 2021 17:19:38 Andrew Lunn wrote:
> > Currently this code is implemented in pci_bus_find_domain_nr() function.
> > IIRC domain number is 16bit integer, so plain bitmap would consume 8 kB
> > of memory. I'm not sure if it is fine or some other tree-based structure
> > for allocated domain numbers is needed.
> 
> Hi Pali
> 
> Have a look at lib/idr.c
> 
>  Andrew

Great! So number allocation is already implemented in kernel (via radix trees).

Re: [PATCH] arm64: dts: marvell: armada-37xx: Set linux,pci-domain to zero

2021-04-17 Thread Pali Rohár

On Thursday 15 April 2021 10:13:17 Rob Herring wrote:
> On Thu, Apr 15, 2021 at 3:45 AM Marek Behun  wrote:
> >
> > On Thu, 15 Apr 2021 10:36:40 +0200
> > Pali Rohár  wrote:
> >
> > > On Tuesday 13 April 2021 13:17:29 Rob Herring wrote:
> > > > On Mon, Apr 12, 2021 at 7:41 AM Pali Rohár  wrote:
> > > > >
> > > > > Since commit 526a76991b7b ("PCI: aardvark: Implement driver 'remove'
> > > > > function and allow to build it as module") PCIe controller driver for
> > > > > Armada 37xx can be dynamically loaded and unloaded at runtime. Also 
> > > > > driver
> > > > > allows dynamic binding and unbinding of PCIe controller device.
> > > > >
> > > > > Kernel PCI subsystem assigns by default dynamically allocated PCI 
> > > > > domain
> > > > > number (starting from zero) for this PCIe controller every time when 
> > > > > device
> > > > > is bound. So PCI domain changes after every unbind / bind operation.
> > > >
> > > > PCI host bridges as a module are relatively new, so seems likely a bug 
> > > > to me.
> > >
> > > Why a bug? It is there since 5.10 and it is working.
> 
> I mean historically, the PCI subsystem didn't even support host
> bridges as a module. They weren't even proper drivers and it was all
> arch specific code. Most of the host bridge drivers are still built-in
> only. This seems like a small detail that was easily overlooked.
> unbind is not a well tested path.

Ok! Just to note that during my testing I have not spotted any issue.

> > > > > Alternative way for assigning PCI domain number is to use static 
> > > > > allocated
> > > > > numbers defined in Device Tree. This option has requirement that 
> > > > > every PCI
> > > > > controller in system must have defined PCI bus number in Device Tree.
> > > >
> > > > That seems entirely pointless from a DT point of view with a single PCI 
> > > > bridge.
> > >
> > > If domain id is not specified in DT then kernel uses counter and assigns
> > > counter++. So it is not pointless if we want to have stable domain id.
> >
> > What Rob is trying to say is that
> > - the bug is that kernel assigns counter++
> > - device-tree should not be used to fix problems with how kernel does
> >   things
> > - if a device has only one PCIe controller, it is pointless to define
> >   it's pci-domain. If there were multiple controllers, then it would
> >   make sense, but there is only one
> 
> Yes. I think what we want here is a domain bitmap rather than a
> counter and we assign the lowest free bit. That could also allow for
> handling a mixture of fixed domain numbers and dynamically assigned
> ones.

Currently this code is implemented in pci_bus_find_domain_nr() function.
IIRC domain number is 16bit integer, so plain bitmap would consume 8 kB
of memory. I'm not sure if it is fine or some other tree-based structure
for allocated domain numbers is needed.

> You could create scenarios where the numbers change on you, but it
> wouldn't be any different than say plugging in USB serial adapters.
> You get the same ttyUSBx device when you re-attach unless there's been
> other ttyUSBx devices attached/detached.

This should be fine for most scenarios. Dynamically attaching /
detaching PCI domain is not such common action...

Will you implement this new feature?

Re: [”PATCH” v2 2/5] PCI: armada8k: Add link-down handle

2021-04-17 Thread Pali Rohár

On Wednesday 14 April 2021 16:20:51 bpe...@marvell.com wrote:
> From: Ben Peled 
> 
> In PCIE ISR routine caused by RST_LINK_DOWN
> we schedule work to handle the link-down procedure.
> Link-down procedure will:
> 1. Remove PCIe bus
> 2. Reset the MAC
> 3. Reconfigure link back up
> 4. Rescan PCIe bus

Hello! This looks like a part of PCIe Hot Plug procedure, which is
already handled by kernel pci hotplug code, it can detect link down
interrupt and remove PCI device from the bus. I'm not sure if it would
not be better to use existing "infrastructure" for hotplug rather then
implementing new in controller driver. But I do not know what is
"correct" way, so I hope that other people (maybe Bjorn?) would say
something about this topic.

> Signed-off-by: Ben Peled 
> ---
>  drivers/pci/controller/dwc/pcie-armada8k.c | 69 
>  1 file changed, 69 insertions(+)
> 
> diff --git a/drivers/pci/controller/dwc/pcie-armada8k.c 
> b/drivers/pci/controller/dwc/pcie-armada8k.c
> index b2278b1..34b253c 100644
> --- a/drivers/pci/controller/dwc/pcie-armada8k.c
> +++ b/drivers/pci/controller/dwc/pcie-armada8k.c
> @@ -22,6 +22,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  
>  #include "pcie-designware.h"
>  
> @@ -33,6 +35,9 @@ struct armada8k_pcie {
>   struct clk *clk_reg;
>   struct phy *phy[ARMADA8K_PCIE_MAX_LANES];
>   unsigned int phy_count;
> + struct regmap *sysctrl_base;
> + u32 mac_rest_bitmask;
> + struct work_struct recover_link_work;
>  };
>  
>  #define PCIE_VENDOR_REGS_OFFSET  0x8000
> @@ -73,6 +78,8 @@ struct armada8k_pcie {
>  #define AX_USER_DOMAIN_MASK  0x3
>  #define AX_USER_DOMAIN_SHIFT 4
>  
> +#define UNIT_SOFT_RESET_CONFIG_REG   0x268
> +
>  #define to_armada8k_pcie(x)  dev_get_drvdata((x)->dev)
>  
>  static void armada8k_pcie_disable_phys(struct armada8k_pcie *pcie)
> @@ -225,6 +232,50 @@ static int armada8k_pcie_host_init(struct pcie_port *pp)
>   return 0;
>  }
>  
> +static void armada8k_pcie_recover_link(struct work_struct *ws)
> +{
> + struct armada8k_pcie *pcie = container_of(ws, struct armada8k_pcie, 
> recover_link_work);
> + struct pcie_port *pp = >pci->pp;
> + struct pci_bus *bus = pp->bridge->bus;
> + struct pci_dev *root_port;
> + int ret;
> +
> + root_port = pci_get_slot(bus, 0);
> + if (!root_port) {
> + dev_err(pcie->pci->dev, "failed to get root port\n");
> + return;
> + }
> + pci_lock_rescan_remove();
> + pci_stop_and_remove_bus_device(root_port);
> + /*
> +  * Sleep needed to make sure all pcie transactions and access
> +  * are flushed before resetting the mac
> +  */
> + msleep(100);
> +
> + /* Reset mac */
> + regmap_update_bits_base(pcie->sysctrl_base, UNIT_SOFT_RESET_CONFIG_REG,
> + pcie->mac_rest_bitmask, 0, NULL, false, true);
> + udelay(1);
> + regmap_update_bits_base(pcie->sysctrl_base, UNIT_SOFT_RESET_CONFIG_REG,
> + pcie->mac_rest_bitmask, pcie->mac_rest_bitmask,
> + NULL, false, true);
> + udelay(1);
> + ret = armada8k_pcie_host_init(pp);
> + if (ret) {
> + dev_err(pcie->pci->dev, "failed to initialize host: %d\n", ret);
> + pci_unlock_rescan_remove();
> + pci_dev_put(root_port);
> + return;
> + }
> +
> + bus = NULL;
> + while ((bus = pci_find_next_bus(bus)) != NULL)
> + pci_rescan_bus(bus);
> + pci_unlock_rescan_remove();
> + pci_dev_put(root_port);
> +}
> +
>  static irqreturn_t armada8k_pcie_irq_handler(int irq, void *arg)
>  {
>   struct armada8k_pcie *pcie = arg;
> @@ -262,6 +313,9 @@ static irqreturn_t armada8k_pcie_irq_handler(int irq, 
> void *arg)
>* initiate a link retrain. If link retrains were
>* possible, that is.
>*/
> + if (pcie->sysctrl_base && pcie->mac_rest_bitmask)
> + schedule_work(>recover_link_work);
> +
>   dev_dbg(pci->dev, "%s: link went down\n", __func__);
>   }
>  
> @@ -330,6 +384,8 @@ static int armada8k_pcie_probe(struct platform_device 
> *pdev)
>  
>   pcie->pci = pci;
>  
> + INIT_WORK(>recover_link_work, armada8k_pcie_recover_link);
> +
>   pcie->clk = devm_clk_get(dev, NULL);
>   if (IS_ERR(pcie->clk))
>   return PTR_ERR(pcie->clk);
> @@ -357,6 +413,19 @@ static int armada8k_pcie_probe(struct platform_device 
> *pdev)
>   goto fail_clkreg;
>   }
>  
> + pcie->sysctrl_base = syscon_regmap_lookup_by_phandle(pdev->dev.of_node,
> +
> "marvell,system-controller");
> + if (IS_ERR(pcie->sysctrl_base)) {
> + dev_warn(dev, "failed to find marvell,system-controller\n");
> + pcie->sysctrl_base = 0x0;
> + }
> +
> + ret =

Re: QCA6174 pcie wifi: Add pci quirks

2021-04-15 Thread Pali Rohár

Hello!

On Thursday 15 April 2021 13:01:19 Alex Williamson wrote:
> [cc +Pali]
> 
> On Thu, 15 Apr 2021 20:02:23 +0200
> Ingmar Klein  wrote:
> 
> > First thanks to you both, Alex and Bjorn!
> > I am in no way an expert on this topic, so I have to fully rely on your
> > feedback, concerning this issue.
> > 
> > If you should have any other solution approach, in form of patch-set, I
> > would be glad to test it out. Just let me know, what you think might
> > make sense.
> > I will wait for your further feedback on the issue. In the meantime I
> > have my current workaround via quirk entry.
> > 
> > By the way, my layman's question:
> > Do you think, that the following topic might also apply for the QCA6174?
> > https://www.spinics.net/lists/linux-pci/msg106395.html

I have been testing more ath cards and I'm going to send a new version
of this patch with including more PCI ids.

> > Or in other words, should a similar approach be tried for the QCA6174
> > and if yes, would it bring any benefit at all?
> > I hope you can excuse me, in case the questions should not make too much
> > sense.
> 
> If you run lspci -vvv on your device, what do LnkCap and LnkSta report
> under the express capability?  I wonder if your device even supports
> >Gen1 speeds, mine does not.
> 
> I would not expect that patch to be relevant to you based on your
> report.  I understand it to resolve an issue during link retraining to a
> higher speed on boot, not during a bus reset.  Pali can correct if I'm
> wrong.  Thanks,

These two issues are are related. Both operations (PCIe Hot Reset and
PCIe Link Retraining) cause reset of ath chips. Seems that they cause
double reset. After reset these chips reads configuration from internal
EEPROM/OTP and if another reset is triggered prior chip finishes
internal configuration read then it stops working. My testing showed
that ath10k chips completely disappear from the PCIe bus, some ath9k
chips works fine but starts reporting incorrect PCI ID (0xABCD) and some
other ath9k chips reports correct PCI ID but does not work. I had
discussion with Adrian Chadd who knows probably everything about ath9k
and confirmed me that this issue is there with ath9k and ath10k chips.

He wrote me that workaround to turn card back from this "broken" state
is to do PCIe Cold Reset of the card, which means turning power supply
off for particular PCIe slot. Such thing is not supported on many
low-end boards, so workaround cannot be applied.

I was able to recover my testing cards from this "broken" state by PCIe
Warm Reset (= reset via PERST# pin).

I have tried many other reset methods (PCIe PM reset, Link Down, PCIe
Hot Reset with bigger internal, ...) but nothing worked. So seems that
the only workaround is to do PCIe Cold Reset or PCIe Warm Reset.

I will send V2 of my patch with details and explanation.

As kernel does not have API for doing PCIe Warm Reset, I think is
another argument why kernel really needs it.

I do not have any QCA6174 card for testing, but based on the fact I
reproduced this issue with more ath9k and ath10 cards and Adrian
confirmed that above reset issue is there, I think that it affects all
AR9xxx and QCA cards handled by ath9k and ath10 drivers.

I was told that AMI BIOS was patching their BIOSes found in notebooks to
avoid triggering this issue on notebooks ath9k cards.

> Alex
> 
> > Am 15.04.2021 um 04:36 schrieb Alex Williamson:
> > > On Wed, 14 Apr 2021 16:03:50 -0500
> > > Bjorn Helgaas  wrote:
> > >  
> > >> [+cc Alex]
> > >>
> > >> On Fri, Apr 09, 2021 at 11:26:33AM +0200, Ingmar Klein wrote:  
> > >>> Edit: Retry, as I did not consider, that my mail-client would make this
> > >>> party html.
> > >>>
> > >>> Dear maintainers,
> > >>> I recently encountered an issue on my Proxmox server system, that
> > >>> includes a Qualcomm QCA6174 m.2 PCIe wifi module.
> > >>> https://deviwiki.com/wiki/AIRETOS_AFX-QCA6174-NX
> > >>>
> > >>> On system boot and subsequent virtual machine start (with passed-through
> > >>> QCA6174), the VM would just freeze/hang, at the point where the ath10k
> > >>> driver loads.
> > >>> Quick search in the proxmox related topics, brought me to the following
> > >>> discussion, which suggested a PCI quirk entry for the QCA6174 in the 
> > >>> kernel:
> > >>> https://forum.proxmox.com/threads/pcie-passthrough-freezes-proxmox.27513/
> > >>>
> > >>> I then went ahead, got the Proxmox kernel source (v5.4.106) and applied
> > >>> the attached patch.
> > >>> Effect was as hoped, that the VM hangs are now gone. System boots and
> > >>> runs as intended.
> > >>>
> > >>> Judging by the existing quirk entries for Atheros, I would think, that
> > >>> my proposed "fix" could be included in the vanilla kernel.
> > >>> As far as I saw, there is no entry yet, even in the latest kernel 
> > >>> sources.  
> > >> This would need a signed-off-by; see
> > >>

Re: [PATCH] arm64: dts: marvell: armada-37xx: Set linux,pci-domain to zero

2021-04-15 Thread Pali Rohár

On Tuesday 13 April 2021 13:17:29 Rob Herring wrote:
> On Mon, Apr 12, 2021 at 7:41 AM Pali Rohár  wrote:
> >
> > Since commit 526a76991b7b ("PCI: aardvark: Implement driver 'remove'
> > function and allow to build it as module") PCIe controller driver for
> > Armada 37xx can be dynamically loaded and unloaded at runtime. Also driver
> > allows dynamic binding and unbinding of PCIe controller device.
> >
> > Kernel PCI subsystem assigns by default dynamically allocated PCI domain
> > number (starting from zero) for this PCIe controller every time when device
> > is bound. So PCI domain changes after every unbind / bind operation.
> 
> PCI host bridges as a module are relatively new, so seems likely a bug to me.

Why a bug? It is there since 5.10 and it is working.

> > Alternative way for assigning PCI domain number is to use static allocated
> > numbers defined in Device Tree. This option has requirement that every PCI
> > controller in system must have defined PCI bus number in Device Tree.
> 
> That seems entirely pointless from a DT point of view with a single PCI 
> bridge.

If domain id is not specified in DT then kernel uses counter and assigns
counter++. So it is not pointless if we want to have stable domain id.

> > Armada 37xx has only one PCIe controller, so assign for it PCI domain 0 in
> > Device Tree.
> >
> > After this change PCI domain on Armada 37xx is always zero, even after
> > repeated unbind and bind operations.
> >
> > Signed-off-by: Pali Rohár 
> > Fixes: 526a76991b7b ("PCI: aardvark: Implement driver 'remove' function and 
> > allow to build it as module")
> > ---
> >  arch/arm64/boot/dts/marvell/armada-37xx.dtsi | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/arm64/boot/dts/marvell/armada-37xx.dtsi 
> > b/arch/arm64/boot/dts/marvell/armada-37xx.dtsi
> > index 7a2df148c6a3..f02058ef5364 100644
> > --- a/arch/arm64/boot/dts/marvell/armada-37xx.dtsi
> > +++ b/arch/arm64/boot/dts/marvell/armada-37xx.dtsi
> > @@ -495,6 +495,7 @@
> > <0 0 0 2 _intc 1>,
> > <0 0 0 3 _intc 2>,
> > <0 0 0 4 _intc 3>;
> > +   linux,pci-domain = <0>;
> > max-link-speed = <2>;
> > phys = < 0>;
> > pcie_intc: interrupt-controller {
> > --
> > 2.20.1
> >

Re: [PATCH] PCI: Add PCI_EXP_DEVCTL_PAYLOAD_* macros

2021-04-14 Thread Pali Rohár

On Wednesday 14 April 2021 15:23:09 Bjorn Helgaas wrote:
> On Tue, Apr 13, 2021 at 10:39:16AM +0200, Pali Rohár wrote:
> > On Monday 12 April 2021 14:27:40 Bjorn Helgaas wrote:
> > > On Mon, Apr 12, 2021 at 02:46:02PM +0200, Pali Rohár wrote:
> > > > Define new PCI_EXP_DEVCTL_PAYLOAD_* macros in linux/pci_regs.h header 
> > > > file
> > > > for Max Payload Size. Macros are defined in the same style as existing
> > > > macros PCI_EXP_DEVCTL_READRQ_* macros.
> > > > 
> > > > Signed-off-by: Pali Rohár 
> > > > ---
> > > >  include/uapi/linux/pci_regs.h | 6 ++
> > > >  1 file changed, 6 insertions(+)
> > > > 
> > > > diff --git a/include/uapi/linux/pci_regs.h 
> > > > b/include/uapi/linux/pci_regs.h
> > > > index e709ae8235e7..8f1b15eea53e 100644
> > > > --- a/include/uapi/linux/pci_regs.h
> > > > +++ b/include/uapi/linux/pci_regs.h
> > > > @@ -504,6 +504,12 @@
> > > >  #define  PCI_EXP_DEVCTL_URRE   0x0008  /* Unsupported Request 
> > > > Reporting En. */
> > > >  #define  PCI_EXP_DEVCTL_RELAX_EN 0x0010 /* Enable relaxed ordering */
> > > >  #define  PCI_EXP_DEVCTL_PAYLOAD0x00e0  /* Max_Payload_Size */
> > > > +#define  PCI_EXP_DEVCTL_PAYLOAD_128B 0x /* 128 Bytes */
> > > > +#define  PCI_EXP_DEVCTL_PAYLOAD_256B 0x0020 /* 256 Bytes */
> > > > +#define  PCI_EXP_DEVCTL_PAYLOAD_512B 0x0040 /* 512 Bytes */
> > > > +#define  PCI_EXP_DEVCTL_PAYLOAD_1024B 0x0060 /* 1024 Bytes */
> > > > +#define  PCI_EXP_DEVCTL_PAYLOAD_2048B 0x0080 /* 2048 Bytes */
> > > > +#define  PCI_EXP_DEVCTL_PAYLOAD_4096B 0x00A0 /* 4096 Bytes */
> > > 
> > > This is fine if we're going to use them, but we generally don't add
> > > definitions purely for documentation.
> > > 
> > > 5929b8a38ce0 ("PCI: Add defines for PCIe Max_Read_Request_Size") added
> > > the PCI_EXP_DEVCTL_READRQ_* definitions and we do have a few (very
> > > few) uses in drivers.
> > 
> > I'm planning to use this constant to fix pci-aardvark.c driver. Aardvark
> > changes are not ready yet, but I'm preparing them in my git tree
> > https://git.kernel.org/pub/scm/linux/kernel/git/pali/linux.git/log/?h=pci-aardvark
> > (commit PCI: aardvark: Fix PCIe Max Payload Size setting)
> > 
> > But as this is not change in aardvark driver, I sent it separately and
> > earlier. As it would be dependency for aardvark changes.
> 
> OK, just include this in that series.

Ok, fine.

Re: [PATCH] PCI: Add PCI_EXP_DEVCTL_PAYLOAD_* macros

2021-04-13 Thread Pali Rohár

On Monday 12 April 2021 14:27:40 Bjorn Helgaas wrote:
> On Mon, Apr 12, 2021 at 02:46:02PM +0200, Pali Rohár wrote:
> > Define new PCI_EXP_DEVCTL_PAYLOAD_* macros in linux/pci_regs.h header file
> > for Max Payload Size. Macros are defined in the same style as existing
> > macros PCI_EXP_DEVCTL_READRQ_* macros.
> > 
> > Signed-off-by: Pali Rohár 
> > ---
> >  include/uapi/linux/pci_regs.h | 6 ++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> > index e709ae8235e7..8f1b15eea53e 100644
> > --- a/include/uapi/linux/pci_regs.h
> > +++ b/include/uapi/linux/pci_regs.h
> > @@ -504,6 +504,12 @@
> >  #define  PCI_EXP_DEVCTL_URRE   0x0008  /* Unsupported Request 
> > Reporting En. */
> >  #define  PCI_EXP_DEVCTL_RELAX_EN 0x0010 /* Enable relaxed ordering */
> >  #define  PCI_EXP_DEVCTL_PAYLOAD0x00e0  /* Max_Payload_Size */
> > +#define  PCI_EXP_DEVCTL_PAYLOAD_128B 0x /* 128 Bytes */
> > +#define  PCI_EXP_DEVCTL_PAYLOAD_256B 0x0020 /* 256 Bytes */
> > +#define  PCI_EXP_DEVCTL_PAYLOAD_512B 0x0040 /* 512 Bytes */
> > +#define  PCI_EXP_DEVCTL_PAYLOAD_1024B 0x0060 /* 1024 Bytes */
> > +#define  PCI_EXP_DEVCTL_PAYLOAD_2048B 0x0080 /* 2048 Bytes */
> > +#define  PCI_EXP_DEVCTL_PAYLOAD_4096B 0x00A0 /* 4096 Bytes */
> 
> This is fine if we're going to use them, but we generally don't add
> definitions purely for documentation.
> 
> 5929b8a38ce0 ("PCI: Add defines for PCIe Max_Read_Request_Size") added
> the PCI_EXP_DEVCTL_READRQ_* definitions and we do have a few (very
> few) uses in drivers.

I'm planning to use this constant to fix pci-aardvark.c driver. Aardvark
changes are not ready yet, but I'm preparing them in my git tree
https://git.kernel.org/pub/scm/linux/kernel/git/pali/linux.git/log/?h=pci-aardvark
(commit PCI: aardvark: Fix PCIe Max Payload Size setting)

But as this is not change in aardvark driver, I sent it separately and
earlier. As it would be dependency for aardvark changes.

> If we do need to add these, please follow the local use of lower-case
> in the hex bitmasks.  The file is a mixture, but the closest examples
> are lower-case.

Ok, therefore I will change 0x00A0 to 0x00a0.

> >  #define  PCI_EXP_DEVCTL_EXT_TAG0x0100  /* Extended Tag Field Enable */
> >  #define  PCI_EXP_DEVCTL_PHANTOM0x0200  /* Phantom Functions Enable */
> >  #define  PCI_EXP_DEVCTL_AUX_PME0x0400  /* Auxiliary Power PM Enable */
> > -- 
> > 2.20.1
> >

[PATCH v2] net: phy: marvell: fix detection of PHY on Topaz switches

2021-04-12 Thread Pali Rohár

Since commit fee2d546414d ("net: phy: marvell: mv88e6390 temperature
sensor reading"), Linux reports the temperature of Topaz hwmon as
constant -75°C.

This is because switches from the Topaz family (88E6141 / 88E6341) have
the address of the temperature sensor register different from Peridot.

This address is instead compatible with 88E1510 PHYs, as was used for
Topaz before the above mentioned commit.

Create a new mapping table between switch family and PHY ID for families
which don't have a model number. And define PHY IDs for Topaz and Peridot
families.

Create a new PHY ID and a new PHY driver for Topaz's internal PHY.
The only difference from Peridot's PHY driver is the HWMON probing
method.

Prior this change Topaz's internal PHY is detected by kernel as:

  PHY [...] driver [Marvell 88E6390] (irq=63)

And afterwards as:

  PHY [...] driver [Marvell 88E6341 Family] (irq=63)

Signed-off-by: Pali Rohár 
BugLink: https://github.com/globalscaletechnologies/linux/issues/1
Fixes: fee2d546414d ("net: phy: marvell: mv88e6390 temperature sensor reading")
Reviewed-by: Marek Behún 
---
Changes since v1:
* create direct mapping table between family and prod_id which avoid need
  to use of forward declaration and simplify implementation
* fill mapping table only for Topaz and Peridot families
---
 drivers/net/dsa/mv88e6xxx/chip.c | 30 +-
 drivers/net/phy/marvell.c| 32 +---
 include/linux/marvell_phy.h  |  5 +++--
 3 files changed, 45 insertions(+), 22 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 903d619e08ed..e08bf9377140 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -3026,10 +3026,17 @@ static int mv88e6xxx_setup(struct dsa_switch *ds)
return err;
 }
 
+/* prod_id for switch families which do not have a PHY model number */
+static const u16 family_prod_id_table[] = {
+   [MV88E6XXX_FAMILY_6341] = MV88E6XXX_PORT_SWITCH_ID_PROD_6341,
+   [MV88E6XXX_FAMILY_6390] = MV88E6XXX_PORT_SWITCH_ID_PROD_6390,
+};
+
 static int mv88e6xxx_mdio_read(struct mii_bus *bus, int phy, int reg)
 {
struct mv88e6xxx_mdio_bus *mdio_bus = bus->priv;
struct mv88e6xxx_chip *chip = mdio_bus->chip;
+   u16 prod_id;
u16 val;
int err;
 
@@ -3040,23 +3047,12 @@ static int mv88e6xxx_mdio_read(struct mii_bus *bus, int 
phy, int reg)
err = chip->info->ops->phy_read(chip, bus, phy, reg, );
mv88e6xxx_reg_unlock(chip);
 
-   if (reg == MII_PHYSID2) {
-   /* Some internal PHYs don't have a model number. */
-   if (chip->info->family != MV88E6XXX_FAMILY_6165)
-   /* Then there is the 6165 family. It gets is
-* PHYs correct. But it can also have two
-* SERDES interfaces in the PHY address
-* space. And these don't have a model
-* number. But they are not PHYs, so we don't
-* want to give them something a PHY driver
-* will recognise.
-*
-* Use the mv88e6390 family model number
-* instead, for anything which really could be
-* a PHY,
-*/
-   if (!(val & 0x3f0))
-   val |= MV88E6XXX_PORT_SWITCH_ID_PROD_6390 >> 4;
+   /* Some internal PHYs don't have a model number. */
+   if (reg == MII_PHYSID2 && !(val & 0x3f0) &&
+   chip->info->family < ARRAY_SIZE(family_prod_id_table)) {
+   prod_id = family_prod_id_table[chip->info->family];
+   if (prod_id)
+   val |= prod_id >> 4;
}
 
return err ? err : val;
diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index e26a5d663f8a..8018ddf7f316 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -3021,9 +3021,34 @@ static struct phy_driver marvell_drivers[] = {
.get_stats = marvell_get_stats,
},
{
-   .phy_id = MARVELL_PHY_ID_88E6390,
+   .phy_id = MARVELL_PHY_ID_88E6341_FAMILY,
.phy_id_mask = MARVELL_PHY_ID_MASK,
-   .name = "Marvell 88E6390",
+   .name = "Marvell 88E6341 Family",
+   /* PHY_GBIT_FEATURES */
+   .flags = PHY_POLL_CABLE_TEST,
+   .probe = m88e1510_probe,
+   .config_init = marvell_config_init,
+   .config_aneg = m88e6390_config_aneg,
+   .read_status = marvell_read_status,
+   .config_intr = marvell_config_intr,
+   .handle_interrupt = marvell_handle_interrupt,
+   .resume = genphy_resume,
+

Re: [PATCH] net: phy: marvell: fix detection of PHY on Topaz switches

2021-04-12 Thread Pali Rohár

On Monday 12 April 2021 18:12:35 Andrew Lunn wrote:
> On Mon, Apr 12, 2021 at 05:52:39PM +0200, Pali Rohár wrote:
> > On Monday 12 April 2021 17:32:33 Andrew Lunn wrote:
> > > > Anyway, now I'm looking at phy/marvell.c driver again and it supports
> > > > only 88E6341 and 88E6390 families from whole 88E63xxx range.
> > > > 
> > > > So do we need to define for now table for more than
> > > > MV88E6XXX_FAMILY_6341 and MV88E6XXX_FAMILY_6390 entries?
> > > 
> > > Probably not. I've no idea if the 6393 has an ID, so to be safe you
> > > should add that. Assuming it has a family of its own.
> > 
> > So what about just?
> > 
> > if (reg == MII_PHYSID2 && !(val & 0x3f0)) {
> > if (chip->info->family == MV88E6XXX_FAMILY_6341)
> > val |= MV88E6XXX_PORT_SWITCH_ID_PROD_6341 >> 4;
> > else if (chip->info->family == MV88E6XXX_FAMILY_6390)
> > val |= MV88E6XXX_PORT_SWITCH_ID_PROD_6390 >> 4;
> > }
> 
> As i said, i expect the 6393 also has no ID. And i recently found out
> Marvell have some automotive switches, 88Q5xxx which are actually
> based around the same IP and could be added to this driver. They also
> might not have an ID. I suspect this list is going to get longer, so
> having it table driven will make that simpler, less error prone.
> 
>  Andrew

Ok, I will use table but I fill it only with Topaz (6341) and Peridot
(6390) which was there before as I do not have 6393 switch for testing.

If you or anybody else has 6393 unit for testing, please extend then
table.

Re: [PATCH] net: phy: marvell: fix detection of PHY on Topaz switches

2021-04-12 Thread Pali Rohár

On Monday 12 April 2021 17:32:33 Andrew Lunn wrote:
> > Anyway, now I'm looking at phy/marvell.c driver again and it supports
> > only 88E6341 and 88E6390 families from whole 88E63xxx range.
> > 
> > So do we need to define for now table for more than
> > MV88E6XXX_FAMILY_6341 and MV88E6XXX_FAMILY_6390 entries?
> 
> Probably not. I've no idea if the 6393 has an ID, so to be safe you
> should add that. Assuming it has a family of its own.

So what about just?

if (reg == MII_PHYSID2 && !(val & 0x3f0)) {
if (chip->info->family == MV88E6XXX_FAMILY_6341)
val |= MV88E6XXX_PORT_SWITCH_ID_PROD_6341 >> 4;
else if (chip->info->family == MV88E6XXX_FAMILY_6390)
val |= MV88E6XXX_PORT_SWITCH_ID_PROD_6390 >> 4;
}

Re: [PATCH] net: phy: marvell: fix detection of PHY on Topaz switches

2021-04-12 Thread Pali Rohár

On Monday 12 April 2021 16:44:11 Andrew Lunn wrote:
> On Mon, Apr 12, 2021 at 03:34:47PM +0200, Pali Rohár wrote:
> > On Monday 12 April 2021 15:15:07 Andrew Lunn wrote:
> > > > +static u16 mv88e6xxx_physid_for_family(enum mv88e6xxx_family family);
> > > > +
> > > 
> > > No forward declaration please. Move the code around. It is often best
> > > to do that in a patch which just moves code, no other changes. It
> > > makes it easier to review.
> > 
> > Avoiding forward declaration would mean to move about half of source
> > code. mv88e6xxx_physid_for_family depends on mv88e6xxx_table which
> > depends on all _ops structures which depends on all lot of other
> > functions.
> 
> So this is basically what you are trying to do:
> 
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c 
> b/drivers/net/dsa/mv88e6xxx/chip.c
> index 903d619e08ed..ef4dbcb052b7 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.c
> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> @@ -3026,6 +3026,18 @@ static int mv88e6xxx_setup(struct dsa_switch *ds)
> return err;
>  }
>  
> +static const enum mv88e6xxx_model family_model_table[] = {
> +   [MV88E6XXX_FAMILY_6095] = MV88E6XXX_PORT_SWITCH_ID_PROD_6095,
> +   [MV88E6XXX_FAMILY_6097] = MV88E6XXX_PORT_SWITCH_ID_PROD_6097,
> +   [MV88E6XXX_FAMILY_6185] = MV88E6XXX_PORT_SWITCH_ID_PROD_6185,
> +   [MV88E6XXX_FAMILY_6250] = MV88E6XXX_PORT_SWITCH_ID_PROD_6250,
> +   [MV88E6XXX_FAMILY_6320] = MV88E6XXX_PORT_SWITCH_ID_PROD_6320,
> +   [MV88E6XXX_FAMILY_6341] = MV88E6XXX_PORT_SWITCH_ID_PROD_6341,
> +   [MV88E6XXX_FAMILY_6351] = MV88E6XXX_PORT_SWITCH_ID_PROD_6351,
> +   [MV88E6XXX_FAMILY_6352] = MV88E6XXX_PORT_SWITCH_ID_PROD_6352,
> +   [MV88E6XXX_FAMILY_6390] = MV88E6XXX_PORT_SWITCH_ID_PROD_6390,
> +};

Ok, no problem. I can change it in this way. I just thought that if
prod_id is already defined for every model in mv88e6xxx_table[] table I
could reuse it, instead of duplicating it...

Anyway, now I'm looking at phy/marvell.c driver again and it supports
only 88E6341 and 88E6390 families from whole 88E63xxx range.

So do we need to define for now table for more than
MV88E6XXX_FAMILY_6341 and MV88E6XXX_FAMILY_6390 entries?

> +
>  static int mv88e6xxx_mdio_read(struct mii_bus *bus, int phy, int reg)
>  {
> struct mv88e6xxx_mdio_bus *mdio_bus = bus->priv;
> @@ -3056,7 +3068,7 @@ static int mv88e6xxx_mdio_read(struct mii_bus *bus, int 
> phy, int reg)
>  * a PHY,
>  */
> if (!(val & 0x3f0))
> -   val |= MV88E6XXX_PORT_SWITCH_ID_PROD_6390 >> 
> 4;
> +   val |= family_model_table[chip->info->family] 
> >> 4;
> }
> 
> and it compiles. No forward declarations needed. It is missing all the
> error checking etc, but i don't see why that should change the
> dependencies.
> 
>   Andrew

Re: [PATCH] net: phy: marvell: fix detection of PHY on Topaz switches

2021-04-12 Thread Pali Rohár

On Monday 12 April 2021 16:30:03 Andrew Lunn wrote:
> > > > +/* This table contains representative model for every family */
> > > > +static const enum mv88e6xxx_model family_model_table[] = {
> > > > +   [MV88E6XXX_FAMILY_6095] = MV88E6095,
> > > > +   [MV88E6XXX_FAMILY_6097] = MV88E6097,
> > > > +   [MV88E6XXX_FAMILY_6185] = MV88E6185,
> > > > +   [MV88E6XXX_FAMILY_6250] = MV88E6250,
> > > > +   [MV88E6XXX_FAMILY_6320] = MV88E6320,
> > > > +   [MV88E6XXX_FAMILY_6341] = MV88E6341,
> > > > +   [MV88E6XXX_FAMILY_6351] = MV88E6351,
> > > > +   [MV88E6XXX_FAMILY_6352] = MV88E6352,
> > > > +   [MV88E6XXX_FAMILY_6390] = MV88E6390,
> > > > +};
> > > 
> > > This table is wrong. MV88E6390 does not equal
> > > MV88E6XXX_PORT_SWITCH_ID_PROD_6390. MV88E6XXX_PORT_SWITCH_ID_PROD_6390
> > > was chosen because it is already an MDIO device ID, in register 2 and
> > > 3. It probably will never clash with a real Marvell PHY ID. MV88E6390
> > > is just a small integer, and there is a danger it will clash with a
> > > real PHY.
> > 
> > So... how to solve this issue? What should be in the mapping table?
> 
> You need to use MV88E6XXX_PORT_SWITCH_ID_PROD_6095,
> MV88E6XXX_PORT_SWITCH_ID_PROD_6097,
> ...
> MV88E6XXX_PORT_SWITCH_ID_PROD_6390,

But I'm using it.

First chip->info->family (enum mv88e6xxx_family; MV88E6XXX_FAMILY_6341)
is mapped to enum mv88e6xxx_model (MV88E6341) via family_model_table[]
and then enum mv88e6xxx_model (MV88E6341) is mapped to prod_num
(MV88E6XXX_PORT_SWITCH_ID_PROD_6341) via mv88e6xxx_table[].

All this is done in mv88e6xxx_physid_for_family() function.

So at the end, this function converts MV88E6XXX_FAMILY_6341 to
MV88E6XXX_PORT_SWITCH_ID_PROD_6341.

And therefore I do not see anything wrong in family_model_table[] table.

I defined family_model_table[] table to just maps enum mv88e6xxx_family
to enum mv88e6xxx_model as mv88e6xxx_table[] table already contains
mapping from enum mv88e6xxx_model to phys_id, to simplify
implementation.

Re: [PATCH] net: phy: marvell: fix detection of PHY on Topaz switches

2021-04-12 Thread Pali Rohár

On Monday 12 April 2021 15:15:07 Andrew Lunn wrote:
> > +static u16 mv88e6xxx_physid_for_family(enum mv88e6xxx_family family);
> > +
> 
> No forward declaration please. Move the code around. It is often best
> to do that in a patch which just moves code, no other changes. It
> makes it easier to review.

Avoiding forward declaration would mean to move about half of source
code. mv88e6xxx_physid_for_family depends on mv88e6xxx_table which
depends on all _ops structures which depends on all lot of other
functions.

I wanted to create a small fixup patch which can be easily backported to
stable releases which are affected by this issue.

If you do not like forward declarations, I can create a followup patch
which moves this half of code in this file to avoid forward declaration.
But having it in one patch would mean to complicate reviewing code...

> >  static int mv88e6xxx_mdio_read(struct mii_bus *bus, int phy, int reg)
> >  {
> > struct mv88e6xxx_mdio_bus *mdio_bus = bus->priv;
> > @@ -3040,24 +3042,9 @@ static int mv88e6xxx_mdio_read(struct mii_bus *bus, 
> > int phy, int reg)
> > err = chip->info->ops->phy_read(chip, bus, phy, reg, );
> > mv88e6xxx_reg_unlock(chip);
> >  
> > -   if (reg == MII_PHYSID2) {
> > -   /* Some internal PHYs don't have a model number. */
> > -   if (chip->info->family != MV88E6XXX_FAMILY_6165)
> > -   /* Then there is the 6165 family. It gets is
> > -* PHYs correct. But it can also have two
> > -* SERDES interfaces in the PHY address
> > -* space. And these don't have a model
> > -* number. But they are not PHYs, so we don't
> > -* want to give them something a PHY driver
> > -* will recognise.
> > -*
> > -* Use the mv88e6390 family model number
> > -* instead, for anything which really could be
> > -* a PHY,
> > -*/
> > -   if (!(val & 0x3f0))
> > -   val |= MV88E6XXX_PORT_SWITCH_ID_PROD_6390 >> 4;
> > -   }
> > +   /* Some internal PHYs don't have a model number. */
> > +   if (reg == MII_PHYSID2 && !(val & 0x3f0))
> > +   val |= mv88e6xxx_physid_for_family(chip->info->family);
> >  
> > return err ? err : val;
> >  }
> > @@ -5244,6 +5231,39 @@ static const struct mv88e6xxx_info 
> > *mv88e6xxx_lookup_info(unsigned int prod_num)
> > return NULL;
> >  }
> >  
> > +/* This table contains representative model for every family */
> > +static const enum mv88e6xxx_model family_model_table[] = {
> > +   [MV88E6XXX_FAMILY_6095] = MV88E6095,
> > +   [MV88E6XXX_FAMILY_6097] = MV88E6097,
> > +   [MV88E6XXX_FAMILY_6185] = MV88E6185,
> > +   [MV88E6XXX_FAMILY_6250] = MV88E6250,
> > +   [MV88E6XXX_FAMILY_6320] = MV88E6320,
> > +   [MV88E6XXX_FAMILY_6341] = MV88E6341,
> > +   [MV88E6XXX_FAMILY_6351] = MV88E6351,
> > +   [MV88E6XXX_FAMILY_6352] = MV88E6352,
> > +   [MV88E6XXX_FAMILY_6390] = MV88E6390,
> > +};
> 
> This table is wrong. MV88E6390 does not equal
> MV88E6XXX_PORT_SWITCH_ID_PROD_6390. MV88E6XXX_PORT_SWITCH_ID_PROD_6390
> was chosen because it is already an MDIO device ID, in register 2 and
> 3. It probably will never clash with a real Marvell PHY ID. MV88E6390
> is just a small integer, and there is a danger it will clash with a
> real PHY.

So... how to solve this issue? What should be in the mapping table?

> > --- a/drivers/net/phy/marvell.c
> > +++ b/drivers/net/phy/marvell.c
> > @@ -3021,9 +3021,34 @@ static struct phy_driver marvell_drivers[] = {
> > .get_stats = marvell_get_stats,
> > },
> > {
> > -   .phy_id = MARVELL_PHY_ID_88E6390,
> > +   .phy_id = MARVELL_PHY_ID_88E6341_FAMILY,
> > .phy_id_mask = MARVELL_PHY_ID_MASK,
> > -   .name = "Marvell 88E6390",
> > +   .name = "Marvell 88E6341 Family",
> 
> You cannot just replace the MARVELL_PHY_ID_88E6390. That will break
> the 6390! You need to add the new PHY for the 88E6341.

I have not replaced anything. I just put there a new phy_id section for
88E6341. You are probably confused by 'git diff' output as quickly
looking at it, somebody can think that there is phy replacement. But
there is no replacement, I only added a new PHY. Entry for 88E6390 is
still there!

Also this is reason why I wanted to avoid big code movement. It will be
harder to read the 'git diff' output in this patch.

[PATCH] PCI: Add PCI_EXP_DEVCTL_PAYLOAD_* macros

2021-04-12 Thread Pali Rohár

Define new PCI_EXP_DEVCTL_PAYLOAD_* macros in linux/pci_regs.h header file
for Max Payload Size. Macros are defined in the same style as existing
macros PCI_EXP_DEVCTL_READRQ_* macros.

Signed-off-by: Pali Rohár 
---
 include/uapi/linux/pci_regs.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index e709ae8235e7..8f1b15eea53e 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -504,6 +504,12 @@
 #define  PCI_EXP_DEVCTL_URRE   0x0008  /* Unsupported Request Reporting En. */
 #define  PCI_EXP_DEVCTL_RELAX_EN 0x0010 /* Enable relaxed ordering */
 #define  PCI_EXP_DEVCTL_PAYLOAD0x00e0  /* Max_Payload_Size */
+#define  PCI_EXP_DEVCTL_PAYLOAD_128B 0x /* 128 Bytes */
+#define  PCI_EXP_DEVCTL_PAYLOAD_256B 0x0020 /* 256 Bytes */
+#define  PCI_EXP_DEVCTL_PAYLOAD_512B 0x0040 /* 512 Bytes */
+#define  PCI_EXP_DEVCTL_PAYLOAD_1024B 0x0060 /* 1024 Bytes */
+#define  PCI_EXP_DEVCTL_PAYLOAD_2048B 0x0080 /* 2048 Bytes */
+#define  PCI_EXP_DEVCTL_PAYLOAD_4096B 0x00A0 /* 4096 Bytes */
 #define  PCI_EXP_DEVCTL_EXT_TAG0x0100  /* Extended Tag Field Enable */
 #define  PCI_EXP_DEVCTL_PHANTOM0x0200  /* Phantom Functions Enable */
 #define  PCI_EXP_DEVCTL_AUX_PME0x0400  /* Auxiliary Power PM Enable */
-- 
2.20.1

[PATCH] arm64: dts: marvell: armada-37xx: Set linux,pci-domain to zero

2021-04-12 Thread Pali Rohár

Since commit 526a76991b7b ("PCI: aardvark: Implement driver 'remove'
function and allow to build it as module") PCIe controller driver for
Armada 37xx can be dynamically loaded and unloaded at runtime. Also driver
allows dynamic binding and unbinding of PCIe controller device.

Kernel PCI subsystem assigns by default dynamically allocated PCI domain
number (starting from zero) for this PCIe controller every time when device
is bound. So PCI domain changes after every unbind / bind operation.

Alternative way for assigning PCI domain number is to use static allocated
numbers defined in Device Tree. This option has requirement that every PCI
controller in system must have defined PCI bus number in Device Tree.

Armada 37xx has only one PCIe controller, so assign for it PCI domain 0 in
Device Tree.

After this change PCI domain on Armada 37xx is always zero, even after
repeated unbind and bind operations.

Signed-off-by: Pali Rohár 
Fixes: 526a76991b7b ("PCI: aardvark: Implement driver 'remove' function and 
allow to build it as module")
---
 arch/arm64/boot/dts/marvell/armada-37xx.dtsi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/boot/dts/marvell/armada-37xx.dtsi 
b/arch/arm64/boot/dts/marvell/armada-37xx.dtsi
index 7a2df148c6a3..f02058ef5364 100644
--- a/arch/arm64/boot/dts/marvell/armada-37xx.dtsi
+++ b/arch/arm64/boot/dts/marvell/armada-37xx.dtsi
@@ -495,6 +495,7 @@
<0 0 0 2 _intc 1>,
<0 0 0 3 _intc 2>,
<0 0 0 4 _intc 3>;
+   linux,pci-domain = <0>;
max-link-speed = <2>;
phys = < 0>;
pcie_intc: interrupt-controller {
-- 
2.20.1

[PATCH] net: phy: marvell: fix detection of PHY on Topaz switches

2021-04-12 Thread Pali Rohár

Since commit fee2d546414d ("net: phy: marvell: mv88e6390 temperature
sensor reading"), Linux reports the temperature of Topaz hwmon as
constant -75°C.

This is because switches from the Topaz family (88E6141 / 88E6341) have
the address of the temperature sensor register different from Peridot.

This address is instead compatible with 88E1510 PHYs, as was used for
Topaz before the above mentioned commit.

Define a representative model for each switch family and add a mapping
table for constructing PHY IDs for the internal PHYs (since they don't
have a model number).

Create a new PHY ID and a new PHY driver for Topaz' internal PHY.
The only difference from Peridot's PHY driver is the HWMON probing
method.

Prior this change Topaz's internal PHY is detected by kernel as:

  PHY [...] driver [Marvell 88E6390] (irq=63)

And afterwards as:

  PHY [...] driver [Marvell 88E6341 Family] (irq=63)

Signed-off-by: Pali Rohár 
BugLink: https://github.com/globalscaletechnologies/linux/issues/1
Fixes: fee2d546414d ("net: phy: marvell: mv88e6390 temperature sensor reading")
Reviewed-by: Marek Behún 
---
 drivers/net/dsa/mv88e6xxx/chip.c | 56 ++--
 drivers/net/dsa/mv88e6xxx/chip.h |  2 ++
 drivers/net/phy/marvell.c| 32 --
 include/linux/marvell_phy.h  |  5 +--
 4 files changed, 72 insertions(+), 23 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 903d619e08ed..e602c9816aee 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -3026,6 +3026,8 @@ static int mv88e6xxx_setup(struct dsa_switch *ds)
return err;
 }
 
+static u16 mv88e6xxx_physid_for_family(enum mv88e6xxx_family family);
+
 static int mv88e6xxx_mdio_read(struct mii_bus *bus, int phy, int reg)
 {
struct mv88e6xxx_mdio_bus *mdio_bus = bus->priv;
@@ -3040,24 +3042,9 @@ static int mv88e6xxx_mdio_read(struct mii_bus *bus, int 
phy, int reg)
err = chip->info->ops->phy_read(chip, bus, phy, reg, );
mv88e6xxx_reg_unlock(chip);
 
-   if (reg == MII_PHYSID2) {
-   /* Some internal PHYs don't have a model number. */
-   if (chip->info->family != MV88E6XXX_FAMILY_6165)
-   /* Then there is the 6165 family. It gets is
-* PHYs correct. But it can also have two
-* SERDES interfaces in the PHY address
-* space. And these don't have a model
-* number. But they are not PHYs, so we don't
-* want to give them something a PHY driver
-* will recognise.
-*
-* Use the mv88e6390 family model number
-* instead, for anything which really could be
-* a PHY,
-*/
-   if (!(val & 0x3f0))
-   val |= MV88E6XXX_PORT_SWITCH_ID_PROD_6390 >> 4;
-   }
+   /* Some internal PHYs don't have a model number. */
+   if (reg == MII_PHYSID2 && !(val & 0x3f0))
+   val |= mv88e6xxx_physid_for_family(chip->info->family);
 
return err ? err : val;
 }
@@ -5244,6 +5231,39 @@ static const struct mv88e6xxx_info 
*mv88e6xxx_lookup_info(unsigned int prod_num)
return NULL;
 }
 
+/* This table contains representative model for every family */
+static const enum mv88e6xxx_model family_model_table[] = {
+   [MV88E6XXX_FAMILY_6095] = MV88E6095,
+   [MV88E6XXX_FAMILY_6097] = MV88E6097,
+   [MV88E6XXX_FAMILY_6185] = MV88E6185,
+   [MV88E6XXX_FAMILY_6250] = MV88E6250,
+   [MV88E6XXX_FAMILY_6320] = MV88E6320,
+   [MV88E6XXX_FAMILY_6341] = MV88E6341,
+   [MV88E6XXX_FAMILY_6351] = MV88E6351,
+   [MV88E6XXX_FAMILY_6352] = MV88E6352,
+   [MV88E6XXX_FAMILY_6390] = MV88E6390,
+};
+static_assert(ARRAY_SIZE(family_model_table) == MV88E6XXX_FAMILY_LAST);
+
+static u16 mv88e6xxx_physid_for_family(enum mv88e6xxx_family family)
+{
+   enum mv88e6xxx_model model;
+
+   /* 6165 family gets it's PHYs correct. But it can also have two SERDES
+* interfaces in the PHY address space. And these don't have a model
+* number. But they are not PHYs, so we don't want to give them
+* something a PHY driver will recognise.
+*/
+   if (family == MV88E6XXX_FAMILY_6165)
+   return 0;
+
+   model = family_model_table[family];
+   if (model == MV88E_NA)
+   return 0;
+
+   return mv88e6xxx_table[model].prod_num >> 4;
+}
+
 static int mv88e6xxx_detect(struct mv88e6xxx_chip *chip)
 {
const struct mv88e6xxx_info *info;
diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
index a57c8886f3ac..70c736788a68 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.h
+++ b/drivers/net/dsa/m

Re: [PATCH v2] hwmon: (dell-smm) Add Dell Latitude E7440 to fan control whitelist

2021-04-11 Thread Pali Rohár

On Sunday 11 April 2021 11:57:41 Sebastian Oechsle wrote:
> Allow manual PWM control on Dell Latitude E7440.
> 
> Signed-off-by: Sebastian Oechsle 

Reviewed-by: Pali Rohár 

> 
> Changes in v2:
> - Fix spelling
> - Fix format
> ---
>  drivers/hwmon/dell-smm-hwmon.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/hwmon/dell-smm-hwmon.c b/drivers/hwmon/dell-smm-hwmon.c
> index 73b9db9e3aab..2970892bed82 100644
> --- a/drivers/hwmon/dell-smm-hwmon.c
> +++ b/drivers/hwmon/dell-smm-hwmon.c
> @@ -1210,6 +1210,14 @@ static struct dmi_system_id 
> i8k_whitelist_fan_control[] __initdata = {
>   },
>   .driver_data = (void *)_fan_control_data[I8K_FAN_34A3_35A3],
>   },
> + {
> + .ident = "Dell Latitude E7440",
> + .matches = {
> + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
> + DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Latitude E7440"),
> + },
> + .driver_data = (void *)_fan_control_data[I8K_FAN_34A3_35A3],
> + },
>   { }
>  };
>  
> -- 
> 2.31.1
>

Re: PCI: Race condition in pci_create_sysfs_dev_files

2021-04-07 Thread Pali Rohár

On Wednesday 07 April 2021 16:25:46 Petr Štetiar wrote:
> Pali Rohár  [2020-10-07 10:12:27]:
> 
> Hi,
> 
> [adding Koen to Cc:]
> 
> > I'm hitting these race conditions randomly on pci aardvark controller
> > driver- I prepared patch which speed up initialization of this driver,
> > but also increase probability that it hits above race conditions :-(
> 
> it seems, that I'm able to reproduce this race condition on every boot with
> 5.10 on my Freescale i.MX6Q board, see the log excerpt bellow. I don't know if
> this helps, but it's not happening on 5.4 kernel.

Hello! This is same race condition which I described in my original
report. Good to know that other people can reproduce it too!

> [0.00] Booting Linux on physical CPU 0x0
> [0.00] Linux version 5.10.27 (ynezz@ntbk) 
> (arm-openwrt-linux-muslgnueabi-gcc (OpenWrt GCC 8.4.0 r12719+30-84f4a783c698) 
> 8.4.0, GNU ld (GNU Binutils) 2.34) #0 SMP Wed Apr 7 12:52:23 2021
> [0.00] CPU: ARMv7 Processor [412fc09a] revision 10 (ARMv7), 
> cr=10c5387d
> [0.00] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing 
> instruction cache
> [0.00] OF: fdt: Machine model: Toradex Apalis iMX6Q/D Module on Ixora 
> Carrier Board
> 
> ...
> 
> [2.239498] pci :00:00.0: BAR 0: assigned [mem 0x0100-0x010f]
> [2.266430] pci :00:00.0: BAR 6: assigned [mem 0x0110-0x0110 
> pref]
> #���+$HH��.274570] pci :00:00.0: Max Payload Size set to  128/ 128 
> (was  128), Max Read Rq  128
> 
>  (this serial console hiccup during PCI initialization seems quite strange as 
> well, happens always)

I'm seeing similar garbage on UART output when such thing happen. But I
suspect that this is the issue when doing serialization of more parallel
messages print by kernel. Nothing relevant to PCI.

But running 'dmesg' after full bootup can show also these lost messages.

> [2.283074] sysfs: cannot create duplicate filename 
> '/devices/platform/soc/1ffc000.pcie/pci:00/:00:00.0/config'
> [2.293884] CPU: 1 PID: 47 Comm: kworker/u8:3 Not tainted 5.10.27 #0
> [2.300165] random: fast init done
> [2.300249] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> [2.310186] Workqueue: events_unbound async_run_entry_fn

And this is important information! Device is registered by some async
workqueue. When I tried to use my (private) patch which speed by
aardvark initialization by putting init code into separate worker then
probability of hitting this race condition increased to about 90%. So
same situation as yours, it was called from async workqueue.

But now with 5.12-rc kernel I'm not able to reproduce it because
something changed in kernel and kernel schedule this "racy" worker long
after race condition may happen.

So it needs perfect timing and seems that one important thing is to call
pci_bus_add_device() function from separate worker.

> [2.315507] Backtrace:
> [2.317976] [<8010cc88>] (dump_backtrace) from [<8010d134>] 
> (show_stack+0x20/0x24)
> [2.325556]  r7:813283d4 r6:6013 r5: r4:80ec845c
> [2.331236] [<8010d114>] (show_stack) from [<805971fc>] 
> (dump_stack+0xa4/0xb8)
> [2.338480] [<80597158>] (dump_stack) from [<803a7128>] 
> (sysfs_warn_dup+0x68/0x74)
> [2.346058]  r7:813283d4 r6:80af0dfc r5:812f48f0 r4:81afa000
> [2.351726] [<803a70c0>] (sysfs_warn_dup) from [<803a6c90>] 
> (sysfs_add_file_mode_ns+0x100/0x1cc)
> [2.360518]  r7:813283d4 r6:812f48f0 r5:80b4299c r4:ffef
> [2.366187] [<803a6b90>] (sysfs_add_file_mode_ns) from [<803a6fe8>] 
> (sysfs_create_bin_file+0x94/0x9c)
> [2.375411]  r6:81eb8078 r5:80b4299c r4:
> [2.380043] [<803a6f54>] (sysfs_create_bin_file) from [<805da848>] 
> (pci_create_sysfs_dev_files+0x58/0x2cc)
> [2.389701]  r6:81eb8000 r5:81eb8078 r4:81eb8000
> [2.394341] [<805da7f0>] (pci_create_sysfs_dev_files) from [<805cba98>] 
> (pci_bus_add_device+0x34/0x90)
> [2.403659]  r10:80b45d88 r9:81818810 r8:81328200 r7:813283d4 r6:8190c000 
> r5:81eb8078
> [2.411490]  r4:81eb8000
> [2.414034] [<805cba64>] (pci_bus_add_device) from [<805cbb30>] 
> (pci_bus_add_devices+0x3c/0x80)
> [2.421744] random: crng init done
> [2.422737]  r5:8190c014 r4:81eb8000
> [2.429720] [<805cbaf4>] (pci_bus_add_devices) from [<805cf898>] 
> (pci_host_probe+0x50/0xa0)
> [2.438078]  r7:813283d4 r6:8190c000 r5:8190c00c r4:
> [2.443753] [<805cf848>] (pci_host_probe) from [<805eeb20>] 
> (dw_pcie_host_init+0x1d0/0x414)
> [2.452111]  r7:813283d4 r6:81328058 r5:0200 r4:
> [2.457780] [<805ee950>] (dw_pc

Re: PCI: Race condition in pci_create_sysfs_dev_files

2021-04-07 Thread Pali Rohár

On Thursday 16 July 2020 13:04:23 Pali Rohár wrote:
> Hello Bjorn!
> 
> I see following error message in dmesg which looks like a race condition:
> 
> sysfs: cannot create duplicate filename 
> '/devices/platform/soc/d007.pcie/pci:00/:00:00.0/config'
> 
> I looked at it deeper and found out that in PCI subsystem code is race
> condition between pci_bus_add_device() and pci_sysfs_init() calls. Both
> of these functions calls pci_create_sysfs_dev_files() and calling this
> function more times for same pci device throws above error message.
> 
> There can be two different race conditions:
> 
> 1. pci_bus_add_device() called pcibios_bus_add_device() or
> pci_fixup_device() but have not called pci_create_sysfs_dev_files() yet.
> Meanwhile pci_sysfs_init() is running and pci_create_sysfs_dev_files()
> was called for newly registered device. In this case function
> pci_create_sysfs_dev_files() is called two times, ones from
> pci_bus_add_device() and once from pci_sysfs_init().
> 
> 2. pci_sysfs_init() is called. It first sets sysfs_initialized to 1
> which unblock calling pci_create_sysfs_dev_files(). Then another bus
> registers new PCI device and calls pci_bus_add_device() which calls
> pci_create_sysfs_dev_files() and registers sysfs files. Function
> pci_sysfs_init() continues execution and calls function
> pci_create_sysfs_dev_files() also for this newly registered device. So
> pci_create_sysfs_dev_files() is again called two times.
> 
> 
> I workaround both race conditions I created following hack patch. After
> applying it I'm not getting that 'sysfs: cannot create duplicate filename'
> error message anymore.

Scratch this hack patch, it contains another new race condition.

The only way how to get rid of this race condition is either to protect
whole "sysfs_initialized" variable by mutex or by completely removing
"sysfs_initialized" variable and therefore also removing function
pci_create_sysfs_dev_files(). I'm for second variant.

[PATCH] ath9k: Fix kernel NULL pointer dereference during ath_reset_internal()

2021-04-02 Thread Pali Rohár

Function ath9k_hw_reset() is dereferencing chan structure pointer, so it
needs to be non-NULL pointer.

Function ath9k_stop() already contains code which sets ah->curchan to valid
non-NULL pointer prior calling ath9k_hw_reset() function.

Add same code pattern also into ath_reset_internal() function to prevent
kernel NULL pointer dereference in ath9k_hw_reset() function.

This change fixes kernel NULL pointer dereference in ath9k_hw_reset() which
is caused by calling ath9k_hw_reset() from ath_reset_internal() with NULL
chan structure.

[   45.334305] Unable to handle kernel NULL pointer dereference at virtual 
address 0008
[   45.344417] Mem abort info:
[   45.347301]   ESR = 0x9605
[   45.350448]   EC = 0x25: DABT (current EL), IL = 32 bits
[   45.356166]   SET = 0, FnV = 0
[   45.359350]   EA = 0, S1PTW = 0
[   45.362596] Data abort info:
[   45.365756]   ISV = 0, ISS = 0x0005
[   45.369735]   CM = 0, WnR = 0
[   45.372814] user pgtable: 4k pages, 39-bit VAs, pgdp=0685d000
[   45.379663] [0008] pgd=, 
p4d=, pud=
[   45.388856] Internal error: Oops: 9605 [#1] SMP
[   45.393897] Modules linked in: ath9k ath9k_common ath9k_hw
[   45.399574] CPU: 1 PID: 309 Comm: kworker/u4:2 Not tainted 
5.12.0-rc2-dirty #785
[   45.414746] Workqueue: phy0 ath_reset_work [ath9k]
[   45.419713] pstate: 4005 (nZcv daif -PAN -UAO -TCO BTYPE=--)
[   45.425910] pc : ath9k_hw_reset+0xc4/0x1c48 [ath9k_hw]
[   45.431234] lr : ath9k_hw_reset+0xc0/0x1c48 [ath9k_hw]
[   45.436548] sp : ffc0118dbca0
[   45.439961] x29: ffc0118dbca0 x28: 
[   45.445442] x27: ff800dee4080 x26: 
[   45.450923] x25: ff800df9b9d8 x24: 
[   45.456404] x23: ffc0115f6000 x22: ffc008d0d408
[   45.461885] x21: ff800dee5080 x20: ff800df9b9d8
[   45.467366] x19:  x18: 
[   45.472846] x17:  x16: 
[   45.478326] x15: 0010 x14: 
[   45.483807] x13: ffc0918db94f x12: ffc011498720
[   45.489289] x11: 0003 x10: ffc0114806e0
[   45.494770] x9 : ffc01014b2ec x8 : 00017fe8
[   45.500251] x7 : c000efff x6 : 0001
[   45.505733] x5 :  x4 : 
[   45.511213] x3 :  x2 : ff801fece870
[   45.516693] x1 : ffc00eded000 x0 : 003f
[   45.522174] Call trace:
[   45.524695]  ath9k_hw_reset+0xc4/0x1c48 [ath9k_hw]
[   45.529653]  ath_reset_internal+0x1a8/0x2b8 [ath9k]
[   45.534696]  ath_reset_work+0x2c/0x40 [ath9k]
[   45.539198]  process_one_work+0x210/0x480
[   45.543339]  worker_thread+0x5c/0x510
[   45.547115]  kthread+0x12c/0x130
[   45.550445]  ret_from_fork+0x10/0x1c
[   45.554138] Code: 910922c2 9117e021 95ff0398 b4000294 (b9400a61)
[   45.560430] ---[ end trace 566410ba90b50e8b ]---
[   45.565193] Kernel panic - not syncing: Oops: Fatal exception in 
interrupt
[   45.572282] SMP: stopping secondary CPUs
[   45.576331] Kernel Offset: disabled
[   45.579924] CPU features: 0x00040002,200c
[   45.584416] Memory Limit: none
[   45.587564] Rebooting in 3 seconds..

Signed-off-by: Pali Rohár 
Cc: sta...@vger.kernel.org
---
 drivers/net/wireless/ath/ath9k/main.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/wireless/ath/ath9k/main.c 
b/drivers/net/wireless/ath/ath9k/main.c
index 45f6402478b5..97c3a53f9cef 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -307,6 +307,11 @@ static int ath_reset_internal(struct ath_softc *sc, struct 
ath9k_channel *hchan)
hchan = ah->curchan;
}
 
+   if (!hchan) {
+   fastcc = false;
+   hchan = ath9k_cmn_get_channel(sc->hw, ah, 
>cur_chan->chandef);
+   }
+
if (!ath_prepare_reset(sc))
fastcc = false;
 
-- 
2.20.1

Re: [PATCH] power: supply: bq27xxx: Return the value instead of -ENODATA

2021-04-01 Thread Pali Rohár

On Thursday 01 April 2021 01:51:28 Hermes Zhang wrote:
> On 3/31/21 10:02 PM, Pali Rohár wrote:
> > @@ -1655,9 +1655,6 @@ static int bq27xxx_battery_read_time(struct 
> > bq27xxx_device_info *di, u8 reg)
> > return tval;
> > }
> >  
> > -   if (tval == 65535)
> > -   return -ENODATA;
> > -
> > return tval * 60;
> > I'm not sure if this is correct change. If value 65535 is special which
> > indicates that data are not available then driver should not return
> > (converted) value 65535*60. If -ENODATA is there to indicate that data
> > are not available then -ENODATA should not be used.
> >
> > And if there is application which does not handle -ENODATA for state
> > when data are not available then it is a bug in application.
> 
> Yeah, I just have a feeling return -ENODATA for time_to_full/empty is
> not good here. Because:
> 
> 1. From chip datasheet, it mentioned return 65535 when it's not
> available (e.g. read time_to_full when discharging), but the driver
> changes behavior here.

So if chip reports that value is not available (by special value 65535)
then driver should not report that data are available with value 3932100s.

> 2. There is other case will return -ENODATA (e.g. the gauge not
> calibrated), so it will confuse application which is real failure.

In both cases, driver does not have (meaningful) value, so it reports
-ENODATA. If you think that some other error code is better for 2. then
we can discuss about it.

But -ENODATA seems a correct error code for reporting when data are not
available.

> Could we change the value in minute instead of seconds in
> bq27xxx_battery_read_time(), so that means driver do nothing but only
> pass the value from the chip?

No, according to API, driver must report value in seconds, not in
minutes, see documentation: Documentation/power/power_supply_class.rst

  All voltages, currents, charges, energies, time and temperatures in µV,
  µA, µAh, µWh, seconds and tenths of degree Celsius unless otherwise
  stated. It's driver's job to convert its raw values to units in which
  this class operates.

There is no API for reporting value in minutes.

Re: [PATCH] power: supply: bq27xxx: Return the value instead of -ENODATA

2021-03-31 Thread Pali Rohár

On Wednesday 31 March 2021 21:51:41 Hermes Zhang wrote:
> From: Hermes Zhang 
> 
> It might be better to return value (e.g. 65535) instead of an error
> (e.g. No data available) for the time property.
> 
> Normally a common function will handle the read string and parse to
> integer for all the properties, but will have problem when read the
> time property because need to handle the NODATA error as non-error.
> So it will make simple for application which indicate success when
> read a number, otherwise as an error to handle.
> 
> Signed-off-by: Hermes Zhang 
> ---
>  drivers/power/supply/bq27xxx_battery.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/power/supply/bq27xxx_battery.c 
> b/drivers/power/supply/bq27xxx_battery.c
> index 4c4a7b1c64c5..b75e54aa8ada 100644
> --- a/drivers/power/supply/bq27xxx_battery.c
> +++ b/drivers/power/supply/bq27xxx_battery.c
> @@ -1655,9 +1655,6 @@ static int bq27xxx_battery_read_time(struct 
> bq27xxx_device_info *di, u8 reg)
>   return tval;
>   }
>  
> - if (tval == 65535)
> - return -ENODATA;
> -
>   return tval * 60;

I'm not sure if this is correct change. If value 65535 is special which
indicates that data are not available then driver should not return
(converted) value 65535*60. If -ENODATA is there to indicate that data
are not available then -ENODATA should not be used.

And if there is application which does not handle -ENODATA for state
when data are not available then it is a bug in application.

>  }
>  
> -- 
> 2.20.1
>

Re: [PATCH] hwmon: (dell-smm) Add Dell Latitude 7440 to fan control whitelist

2021-03-31 Thread Pali Rohár

On Tuesday 30 March 2021 19:02:55 Sebastian Oechsle wrote:
> Allow manual PWM control on Dell Latitude 7440.
> 
> Signed-off-by: Sebastian Oechsle  <mailto:setbool...@icloud.com>>

Reviewed-by: Pali Rohár 

(Btw, in commit message is small typo, model name is E7440, not 7440)

Guenter, would you take this patch?

> ---
> drivers/hwmon/dell-smm-hwmon.c | 8 
> 1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/hwmon/dell-smm-hwmon.c b/drivers/hwmon/dell-smm-hwmon.c
> index 73b9db9e3aab..2970892bed82 100644
> --- a/drivers/hwmon/dell-smm-hwmon.c
> +++ b/drivers/hwmon/dell-smm-hwmon.c
> @@ -1210,6 +1210,14 @@ static struct dmi_system_id 
> i8k_whitelist_fan_control[] __initdata = {
>   },
>   .driver_data = (void *)_fan_control_data[I8K_FAN_34A3_35A3],
>   },
> + {
> + .ident = "Dell Latitude E7440",
> + .matches = {
> + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
> + DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Latitude E7440"),
> + },
> + .driver_data = (void *)_fan_control_data[I8K_FAN_34A3_35A3],
> + },
>   { }
> };
> 
> --
> 2.31.1

Re: How long should be PCIe card in Warm Reset state?

2021-03-30 Thread Pali Rohár

On Tuesday 30 March 2021 16:34:47 Maciej W. Rozycki wrote:
> On Tue, 30 Mar 2021, Pali Rohár wrote:
> 
> > >  If I were to implement this stuff, for good measure I'd give it a safety 
> > > margin beyond what the spec requires and use a timeout of say 2-4s while 
> > > actively querying the status of the device.  The values given in the spec 
> > > are only the minimum requirements.
> > 
> > Are you able to also figure out what is the minimal timeout value for 
> > PCIe Warm Reset?
> > 
> > Because we are having troubles to "decode" correct minimal timeout value
> > for this PCIe Warm Reset (not Function-level reset).
> 
>  The spec does not give any exceptions AFAICT as to the timeouts required 
> between the three kinds of a Conventional Reset (Hot, Warm, or Cold) and 
> refers to them collectively as a Conventional Reset across the relevant 
> parts of the document, so clearly the same rules apply.
> 
>   Maciej

There are specified more timeouts related to Warm reset and PERST#
signal. Just they are not in Base spec, but in CEM spec. See previous
Amey's email where are described some timeouts and also links in my
first email where I put other timeouts defined in specs relevant for
PERST# signal and therefore also for Warm Reset.

Re: How long should be PCIe card in Warm Reset state?

2021-03-30 Thread Pali Rohár

On Tuesday 30 March 2021 15:04:02 Maciej W. Rozycki wrote:
> On Thu, 25 Mar 2021, David Laight wrote:
> 
> > I can't see the value in the (nice bound) copy of the PCI 2.0 spec I have.
> > But IIRC it is 100ms (it might just me 500ms).
> > While this might seem like ages it can be problematic if targets have
> > to load large FPGA images from serial EEPROMs.
> 
>  AFAICT it is 100ms for the Conventional Reset before Configuration 
> Requests are allowed to be issued in the first place, and then they are 
> allowed to fail with the Configuration Request Retry Status (CRS) status 
> until the device is ready to respond.  Then it is 1.0s before the Root 
> Complex and/or system software is allowed to consider a device broken that 
> does not return a Successful Completion status for a valid Configuration 
> Request.
> 
>  This 1.0s period is analogous to the Trhfa parameter for PCI/PCI-X buses 
> (2^25/2^27 bus clocks respectively; I don't know why the PCIe spec quotes 
> the latter value as 2^26, contrary to what the original PCI-X spec says, 
> but obviously the latter document is what sets the norm), which also has 
> to be respected for the respective bus segments in the presence of PCIe to 
> PCI/PCI-X bridges.

Hello Maciej! Thank you for information.

>  For Function-level reset the timeout is 100ms.
> 
>  This is specified in sections 6.6.1. "Conventional Reset" and 6.6.2. 
> "Function-Level Reset (FLR)" respectively in the copy of PCIe 2.0 base 
> spec I have access to; I imagine other versions may have different section 
> numbers, but will have them named similarly.
> 
>  If I were to implement this stuff, for good measure I'd give it a safety 
> margin beyond what the spec requires and use a timeout of say 2-4s while 
> actively querying the status of the device.  The values given in the spec 
> are only the minimum requirements.

Are you able to also figure out what is the minimal timeout value for PCIe Warm 
Reset?

Because we are having troubles to "decode" correct minimal timeout value
for this PCIe Warm Reset (not Function-level reset).

Re: [v8,3/7] PCI: mediatek-gen3: Add MediaTek Gen3 driver for MT8192

2021-03-29 Thread Pali Rohár

On Thursday 18 March 2021 13:48:07 Jianjun Wang wrote:
> On Thu, 2021-03-18 at 01:02 +0100, Pali Rohár wrote:
> > On Saturday 13 March 2021 15:43:14 Jianjun Wang wrote:
> > > On Thu, 2021-03-11 at 13:38 +0100, Pali Rohár wrote:
> > > > On Wednesday 24 February 2021 14:11:28 Jianjun Wang wrote:
> > > > > +
> > > > > + /* Check if the link is up or not */
> > > > > + err = readl_poll_timeout(port->base + PCIE_LINK_STATUS_REG, val,
> > > > > +  !!(val & PCIE_PORT_LINKUP), 20,
> > > > > +  50 * USEC_PER_MSEC);
> > > > 
> > > > IIRC, you need to wait at least 100ms after de-asserting PERST# signal
> > > > as it is required by PCIe specs and also because experiments proved that
> > > > some Compex wifi cards (e.g. WLE900VX) are not detected if you do not
> > > > wait this minimal time.
> > > 
> > > Yes, this should be 100ms, I will fix it at next version, thanks for
> > > your review.
> > 
> > In past Bjorn suggested to use msleep(PCI_PM_D3COLD_WAIT); macro for
> > this step during reviewing aardvark driver.
> > 
> > https://lore.kernel.org/linux-pci/20190426161050.ga189...@google.com/
> > 
> > And next iteration used this PCI_PM_D3COLD_WAIT macro instead of 100:
> > 
> > https://lore.kernel.org/linux-pci/20190522213351.21366-2-r...@triplefau.lt/
> 
> Sure, I will use PCI_PM_D3COLD_WAIT macro instead in the next version.
> 
> Thanks.

Anyway, now I found out that kernel has functions for this waiting:
pcie_wait_for_link_delay() and pcie_wait_for_link()

Function is called from pci_bridge_wait_for_secondary_bus().

But in current form it is not usable for native controller drivers.

This looks like another candidate for code de-duplication or providing
"framework".


Lorenzo, as maintainer of native controller drivers, do you have some
ideas about providing "framework", common functions or something for
avoiding to implement same code patterns in every native controller
driver, which is de-facto standard PCIe codepath? Including a way how to
export PERST# reset gpio?

Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS

2021-03-29 Thread Pali Rohár

On Friday 26 March 2021 11:43:10 Don Bollinger wrote:
> > Hello Don!
> > 
> > I have read whole discussion and your EEPROM patch proposal. But for me it
> > looks like some kernel glue code for some old legacy / proprietary access
> > method which does not have any usage outside of that old code.
> 
> I don't know if 'kernel glue code' is good or bad.  It is a driver.  It
> locks access to a device so it can perform multiple accesses without
> interference.  It organizes the data on a weird device into a simple linear
> address space that can be accessed with open(), seek(), read() and write()
> calls.
> 
> As for 'old code', this code and variations of it are under active
> development by multiple Network OS vendors and multiple switch vendors, and
> in production on hundreds of thousands of switches with millions of
> SFP/QSFP/CMIS devices.  This stuff is running the biggest clouds in the
> world.
> 
> > 
> > Your code does not contain any quirks which are needed to read different
> > EEPROMs in different SFP modules. As Andrew wrote there are lot of broken
> > SFPs which needs special handling and this logic is already implemented in
> > sfp.c and sfp-bus.c kernel drivers. These drivers then export EEPROM
> > content to userspace via ethtool -m API in unified way and userspace does
> > not implement any quirks (nor does not have to deal with quirks).
> 
> As a technical matter, you handle those quirks in the code that interprets
> EEPROM data.  You have figured out what devices have what quirks, then coded
> up solutions to make them work.  Then, after all the quirk handling is done,
> you call the actual access routines (sfp_i2c_read() and sfp_i2c_write()) to
> access the module EEPROMs.  My code works the same way, except in my
> community all the interpretation of EEPROM data is done in user space.  You
> may not like that, but it is not clear why you should care how my community
> chooses to handle the data.  In this architecture, the only thing needed
> from the kernel is the equivalent of sfp_i2c_read() and sfp_i2c_write, which
> optoe provides.  The key point here is that my community wants the kernel to
> just access the data.  No interpretation, no identification, no special
> cases.
> 
> > 
> > If you try to read EEPROM "incorrectly" then SFP module with its EEPROM
> > chip (or emulation of chip) locks and is fully unusable after you unplug
> it and
> > plug it again. Kernel really should not export API to userspace which can
> > cause "damage" to SFP modules. And currently it does *not* do it.
> 
> In my community, such devices are not tolerated.  Modules which can be
> "damaged" should be thrown away.

Well, those tested / problematic modules are not damaged by real. They
just stop working until you do power off / on cycle and unplug / plug
them again.

But I agree with you. Russel wrote about those SFP modules that they
should have biohazard label :-)

The best would be if those modules disappear, but that would not happen
in near future.

These modules are already used by lot of users and users wanted support
for them. Hence we wrote what was possible.

The main issue is that we do know know any well-behaved GPON SFP module.

GPON is now very common technology for internet access, so it is not
obvious that more people are asking about support for GPON HW compatible
with Linux kernel.

> Please be clear, I am not disagreeing with your implementation.  For your
> GPON devices, you handle this in kernel code.  Cool.  Keep it that way.
> Just don't make my community implement that where it is not needed and not
> wanted.  Optoe just accesses the device.  Other layers handle interpretation
> and quirks.  Your handling is in sfp.c, mine is in user space.  Not right or
> wrong, just different.  Both work.
> 
> > 
> > I have contributed code for some GPON SFP modules, so their EEPROM can
> > be correctly read and exported to userspace via ethtool -m. So I know that
> > this is very fragile area and needs to be properly handled.
> 
> My code is in use in cloud datacenters and campus switches around the world.
> I know it needs to be properly handled.
> 
> > 
> > So I do not see any reason why can be a new optoe method in _current_
> > form useful. It does not implemented required things for handling
> different
> > EEPROM modules.
> 
> optoe would be useful in your current environment as a replacement for
> sfp_i2c_read() and sfp_i2c_write().  Those routines just access the EEPROM.
> They don't identify or interpret or implement quirk handling.  Neither does
> optoe.

Now that we know that there are SFP modules which needs special handling
without it they lock themselves on i2c bus, we really cannot export to
userspace interface which allows this locking.

I understand that you do not care for these broken GPON SFP modules, but
"generic" API which would be part of mainline kernel really cannot have
known issue that can cause some modules to lock by just simple "read"
operation.

> AND, optoe is

Re: [PATCH mvebu v3 00/10] Armada 37xx: Fix cpufreq changing base CPU speed to 800 MHz from 1000 MHz

2021-03-28 Thread Pali Rohár

On Friday 12 March 2021 10:12:06 Gregory CLEMENT wrote:
> Hello Pali,
> 
> > Hello Gregory!
> >
> > Patches are the for almost two months and more people have tested them.
> > They are marked with Fixed/CC-stable tags, they should go also into
> > stable trees as they are fixing CPU scaling and instability issues.
> >
> > Are there any issues with these patches? If not, could you please merge
> > them for upcoming Linux version?
> 
> Actually I am not the maintainer of the clk and cpufreq subsystems, so
> the only thing I can apply is the device tree relative patch.
> 
> Gregory

Hello Gregory! Could you please at least review this patches, so other
maintainers could merge them?

Re: [v9,5/7] PCI: mediatek-gen3: Add MSI support

2021-03-27 Thread Pali Rohár

On Saturday 27 March 2021 19:44:30 Marc Zyngier wrote:
> On Sat, 27 Mar 2021 19:28:37 +,
> Pali Rohár  wrote:
> > 
> > On Wednesday 24 March 2021 11:05:08 Jianjun Wang wrote:
> > > +static void mtk_pcie_msi_handler(struct mtk_pcie_port *port, int set_idx)
> > > +{
> > > + struct mtk_msi_set *msi_set = >msi_sets[set_idx];
> > > + unsigned long msi_enable, msi_status;
> > > + unsigned int virq;
> > > + irq_hw_number_t bit, hwirq;
> > > +
> > > + msi_enable = readl_relaxed(msi_set->base + PCIE_MSI_SET_ENABLE_OFFSET);
> > > +
> > > + do {
> > > + msi_status = readl_relaxed(msi_set->base +
> > > +PCIE_MSI_SET_STATUS_OFFSET);
> > > + msi_status &= msi_enable;
> > > + if (!msi_status)
> > > + break;
> > > +
> > > + for_each_set_bit(bit, _status, PCIE_MSI_IRQS_PER_SET) {
> > > + hwirq = bit + set_idx * PCIE_MSI_IRQS_PER_SET;
> > > + virq = irq_find_mapping(port->msi_bottom_domain, hwirq);
> > > + generic_handle_irq(virq);
> > > + }
> > > + } while (true);
> > 
> > Hello!
> > 
> > Just a question, cannot this while-loop cause block of processing other
> > interrupts?
> 
> This is a level interrupt. You don't have much choice but to handle it
> immediately, although an alternative would be to mask it and deal with
> it in a thread. And since Linux doesn't deal with interrupt priority,
> a screaming interrupt is never a good thing.

I see. Something like "interrupt priority" (which does not exist?) would
be needed to handle it.

> > I have done tests with different HW (aardvark) but with same while(true)
> > loop logic. One XHCI PCIe controller was sending MSI interrupts too fast
> > and interrupt handler with this while(true) logic was in infinite loop.
> > During one IRQ it was calling infinite many times generic_handle_irq()
> > as HW was feeding new and new MSI hwirq into status register.
> 
> Define "too fast".

Fast - next interrupt comes prior checking if while(true)-loop should stop.

> If something in the system is able to program the
> XHCI device in such a way that it causes a screaming interrupt, that's
> the place to look for problems, and probably not in the interrupt
> handling itself, which does what it is supposed to do.
> 
> > But this is different HW, so it can have different behavior and does not
> > have to cause above issue.
> > 
> > I have just spotted same code pattern for processing MSI interrupts...
> 
> This is a common pattern that you will find in pretty much any
> interrupt handling/demuxing, and is done this way when the cost of
> taking the exception is high compared to that of handling it.

And would not help if while(true)-loop is replaced by loop with upper
limit of iterations? Or just call only one iteration?

> Which is pretty much any of the badly designed, level-driving,
> DW-inspired, sorry excuse for MSI implementations that are popular on
> low-end ARM SoCs.

Ok. So thank you for information!

> Thanks,
> 
>   M.
> 
> -- 
> Without deviation from the norm, progress is not possible.

Re: [v9,5/7] PCI: mediatek-gen3: Add MSI support

2021-03-27 Thread Pali Rohár

On Wednesday 24 March 2021 11:05:08 Jianjun Wang wrote:
> +static void mtk_pcie_msi_handler(struct mtk_pcie_port *port, int set_idx)
> +{
> + struct mtk_msi_set *msi_set = >msi_sets[set_idx];
> + unsigned long msi_enable, msi_status;
> + unsigned int virq;
> + irq_hw_number_t bit, hwirq;
> +
> + msi_enable = readl_relaxed(msi_set->base + PCIE_MSI_SET_ENABLE_OFFSET);
> +
> + do {
> + msi_status = readl_relaxed(msi_set->base +
> +PCIE_MSI_SET_STATUS_OFFSET);
> + msi_status &= msi_enable;
> + if (!msi_status)
> + break;
> +
> + for_each_set_bit(bit, _status, PCIE_MSI_IRQS_PER_SET) {
> + hwirq = bit + set_idx * PCIE_MSI_IRQS_PER_SET;
> + virq = irq_find_mapping(port->msi_bottom_domain, hwirq);
> + generic_handle_irq(virq);
> + }
> + } while (true);

Hello!

Just a question, cannot this while-loop cause block of processing other
interrupts?

I have done tests with different HW (aardvark) but with same while(true)
loop logic. One XHCI PCIe controller was sending MSI interrupts too fast
and interrupt handler with this while(true) logic was in infinite loop.
During one IRQ it was calling infinite many times generic_handle_irq()
as HW was feeding new and new MSI hwirq into status register.

But this is different HW, so it can have different behavior and does not
have to cause above issue.

I have just spotted same code pattern for processing MSI interrupts...

> +}
> +
>  static void mtk_pcie_irq_handler(struct irq_desc *desc)
>  {
>   struct mtk_pcie_port *port = irq_desc_get_handler_data(desc);
> @@ -405,6 +673,14 @@ static void mtk_pcie_irq_handler(struct irq_desc *desc)
>   generic_handle_irq(virq);
>   }
>  
> + irq_bit = PCIE_MSI_SHIFT;
> + for_each_set_bit_from(irq_bit, , PCIE_MSI_SET_NUM +
> +   PCIE_MSI_SHIFT) {
> + mtk_pcie_msi_handler(port, irq_bit - PCIE_MSI_SHIFT);
> +
> + writel_relaxed(BIT(irq_bit), port->base + PCIE_INT_STATUS_REG);
> + }
> +
>   chained_irq_exit(irqchip, desc);
>  }
>  
> -- 
> 2.25.1
>

Re: [PATCH] PCI: Disallow retraining link for Atheros QCA98xx chips on non-Gen1 PCIe bridges

2021-03-27 Thread Pali Rohár

On Saturday 27 March 2021 15:42:13 Marek Behún wrote:
> On Sat, 27 Mar 2021 14:29:59 +0100
> Pali Rohár  wrote:
> 
> > I can change this to 'if (!ret)' if needed, no problem.
> > 
> > I use 'if (!val)' mostly for boolean and pointer variables. If
> > variable can contain more integer values then I lot of times I use
> > '=='.
> 
> Comparing integer varibales with explicit literals is sensible, but
> if a function returning integer returns 0 on success and negative value
> on error, Linux kernel has a tradition of using just
>   if (!ret)
> or 
>   if (ret)
> 
> Marek

Ok, I will change it.

Re: [PATCH] PCI: Disallow retraining link for Atheros QCA98xx chips on non-Gen1 PCIe bridges

2021-03-27 Thread Pali Rohár

Hello!

On Saturday 27 March 2021 01:14:10 Krzysztof Wilczyński wrote:
> Hi Pali,
> 
> Thank you for sending the patch over!
> 
> [...]
> > +static int pcie_change_tls_to_gen1(struct pci_dev *parent)
> 
> Just a nitpick, so feel free to ignore it.  I would just call the
> variable "dev" as we pass a pointer to a particular device, but it does
> not matter as much, so I am leaving this to you.

I called it 'parent' because it is called 'parent' also in caller
function. Link consists of two devices, so 'dev' could be ambiguous.

> [...]
> > +   if (ret == 0) {
> 
> You prefer this style over "if (!ret)"?  Just asking in the view of the
> style that seem to be preferred in the code base at the moment.

I can change this to 'if (!ret)' if needed, no problem.

I use 'if (!val)' mostly for boolean and pointer variables. If variable
can contain more integer values then I lot of times I use '=='.

> > +   /* Verify that new value was really set */
> > +   pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, );
> > +   if ((reg16 & PCI_EXP_LNKCTL2_TLS) != PCI_EXP_LNKCTL2_TLS_2_5GT)
> > +   ret = -EINVAL;
> 
> I am wondering about this verification - did you have a case where the
> device would not properly set its capability, or accept the write and do
> nothing?

I do not know any real device which is doing this thing.

But this issue can happen with kernel's PCIe emulated root bridge:
drivers/pci/pci-bridge-emul.c

Drivers which are using this emulated root bridge (and both pci-mvebu.c
and pci-aardvark.c are using it!) do not have to implement callback for
every read and every write operation of every register.

Note that both pci-mvebu.c and pci-aardvark.c currently does not
implement access to HW register PCI_EXP_LNKCTL2. So currently it is not
possible to set PCI_EXP_LNKCTL2_TLS_2_5GT. And above code correctly
fails and disallow ASPM code to retrain link.

I have some WIP patches which implement LNKCAP2, LNKCTL2 and LNKSTA2
read/write callbacks on emulated bridge for pci-mvebu.c and
pci-aardvark.c. And I have tested that with those WIP patches ASPM code
can correctly switch link to 2.5GT/s and enable ASPM.

> > +   if (ret != 0)
> 
> I think "if (ret)" would be fine to use here, unless you prefer being
> more explicit.  See my question about style above.
> 
> >  static bool pcie_retrain_link(struct pcie_link_state *link)
> >  {
> > struct pci_dev *parent = link->pdev;
> > unsigned long end_jiffies;
> > u16 reg16;
> > +   u32 reg32;
> > +
> > +   /* Check if link is capable of higher speed than 2.5 GT/s and 
> > needs quirk */
> > +   pcie_capability_read_dword(parent, PCI_EXP_LNKCAP, );
> > +   if ((reg32 & PCI_EXP_LNKCAP_SLS) > PCI_EXP_LNKCAP_SLS_2_5GB) {
> 
> I wonder if moving this check to pcie_change_tls_to_gen1() would make
> more sense?  It would then make this function a little cleaner.  What do
> you think?

Ok, no problem. But then function needs to be renamed. Any idea how
should be this function called?

> [...]
> > +static void quirk_no_bus_reset_and_no_retrain_link(struct pci_dev *dev)
> > +{
> > +   dev->dev_flags |= PCI_DEV_FLAGS_NO_BUS_RESET | 
> > PCI_DEV_FLAGS_NO_RETRAIN_LINK_WHEN_NOT_GEN1;
> > +}
> [...]
> 
> I know that the style has been changed to allow 100 characters width and
> that checkpatch.pl now also does not warn about line length, as per
> commit bdc48fa11e46 ("checkpatch/coding-style: deprecate 80-column
> warning"), but I think Bjorn still prefers 80 characters, thus this line
> above might have to be aligned.

Ok! If needed I can reformat patch to 80 characters per line.

[PATCH] PCI: Disallow retraining link for Atheros QCA98xx chips on non-Gen1 PCIe bridges

2021-03-26 Thread Pali Rohár

Atheros QCA9880 and QCA9890 chips do not behave after a bus reset and also
after retrain link when PCIe bridge is not in GEN1 mode at 2.5 GT/s speed.
The device will throw a Link Down error and config space is not accessible
again. Retrain link can be called only when using GEN1 PCIe bridge or when
PCIe bridge has forced link speed to 2.5 GT/s via PCI_EXP_LNKCTL2 register.

This issue was reproduced with more Compex WLE900VX cards (QCA9880 based)
on Armada 385 with pci-mvebu.c driver and also on Armada 3720 with
pci-aardvark.c driver. Also this issue was reproduced with some "noname"
card with QCA9890 WiFi chip on Armada 3720. All problematic cards with
these QCA chips have PCI device id 0x003c.

Tests showed that other WiFi cards based on AR93xx (PCI device id 0x0030)
and AR9287 (PCI device id 0x002e) chips do not have these problems.

To workaround this issue, this change introduces a new PCI quirk called
PCI_DEV_FLAGS_NO_RETRAIN_LINK_WHEN_NOT_GEN1 for PCI device id 0x003c.

When this quirk is set then kernel disallows triggering PCI_EXP_LNKCTL_RL
bit in config space of PCIe Bridge in case PCIe Bridge is capable of higher
speed than 2.5 GT/s and higher speed is already allowed. When PCIe Bridge
has accessible LNKCTL2 register then kernel tries to force target link
speed via PCI_EXP_LNKCTL2_TLS* bits to 2.5 GT/s. After this change it is
possible to trigger PCI_EXP_LNKCTL_RL bit without causing issues on
problematic Atheros QCA98xx cards.

Currently only PCIe ASPM kernel code triggers this PCI_EXP_LNKCTL_RL bit,
so quirk check is added only into pcie/aspm.c file.

Signed-off-by: Pali Rohár 
Reported-by: Toke Høiland-Jørgensen 
Tested-by: Marek Behún 
Link: https://lore.kernel.org/linux-pci/87h7l8axqp@toke.dk/
Cc: sta...@vger.kernel.org # c80851f6ce63a ("PCI: Add PCI_EXP_LNKCTL2_TLS* 
macros")
---
 drivers/pci/pcie/aspm.c | 43 +
 drivers/pci/quirks.c| 15 +-
 include/linux/pci.h |  2 ++
 3 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index ac0557a305af..ea5bdf6107f6 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -192,11 +192,54 @@ static void pcie_clkpm_cap_init(struct pcie_link_state 
*link, int blacklist)
link->clkpm_disable = blacklist ? 1 : 0;
 }
 
+static int pcie_change_tls_to_gen1(struct pci_dev *parent)
+{
+   u16 reg16;
+   u32 reg32;
+   int ret;
+
+   /* Check if link speed can be forced to 2.5 GT/s */
+   pcie_capability_read_dword(parent, PCI_EXP_LNKCAP2, );
+   if (!(reg32 & PCI_EXP_LNKCAP2_SLS_2_5GB)) {
+   pci_err(parent, "ASPM: Bridge does not support changing Link 
Speed to 2.5 GT/s\n");
+   return -EOPNOTSUPP;
+   }
+
+   /* Force link speed to 2.5 GT/s */
+   ret = pcie_capability_clear_and_set_word(parent, PCI_EXP_LNKCTL2,
+PCI_EXP_LNKCTL2_TLS,
+PCI_EXP_LNKCTL2_TLS_2_5GT);
+   if (ret == 0) {
+   /* Verify that new value was really set */
+   pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, );
+   if ((reg16 & PCI_EXP_LNKCTL2_TLS) != PCI_EXP_LNKCTL2_TLS_2_5GT)
+   ret = -EINVAL;
+   }
+
+   if (ret != 0)
+   pci_err(parent, "ASPM: Changing Target Link Speed to 2.5 GT/s 
failed: %d\n", ret);
+
+   return ret;
+}
+
 static bool pcie_retrain_link(struct pcie_link_state *link)
 {
struct pci_dev *parent = link->pdev;
unsigned long end_jiffies;
u16 reg16;
+   u32 reg32;
+
+   if (link->downstream->dev_flags & 
PCI_DEV_FLAGS_NO_RETRAIN_LINK_WHEN_NOT_GEN1) {
+   /* Check if link is capable of higher speed than 2.5 GT/s and 
needs quirk */
+   pcie_capability_read_dword(parent, PCI_EXP_LNKCAP, );
+   if ((reg32 & PCI_EXP_LNKCAP_SLS) > PCI_EXP_LNKCAP_SLS_2_5GB) {
+   if (pcie_change_tls_to_gen1(parent) != 0) {
+   pci_err(parent, "ASPM: Retrain Link at higher 
speed is disallowed by quirk\n");
+   return false;
+   }
+   pci_info(parent, "ASPM: Target Link Speed changed to 
2.5 GT/s due to quirk\n");
+   }
+   }
 
pcie_capability_read_word(parent, PCI_EXP_LNKCTL, );
reg16 |= PCI_EXP_LNKCTL_RL;
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 653660e3ba9e..8ff690c7679d 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3558,6 +3558,11 @@ static void quirk_no_bus_reset(struct pci_dev *dev)
dev->dev_flags |= PCI_DEV_FLAGS_NO_BUS_RESET;
 }
 
+static void quirk_no_bus_reset_and_no_retrain_link(struct pci_dev *dev)
+{
+   dev->dev_flags |= PCI_DEV_FL

Re: [v9,2/7] PCI: Export pci_pio_to_address() for module use

2021-03-24 Thread Pali Rohár

On Wednesday 24 March 2021 11:05:05 Jianjun Wang wrote:
> This interface will be used by PCI host drivers for PIO translation,
> export it to support compiling those drivers as kernel modules.
> 
> Signed-off-by: Jianjun Wang 
> ---
>  drivers/pci/pci.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 16a17215f633..12bba221c9f2 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4052,6 +4052,7 @@ phys_addr_t pci_pio_to_address(unsigned long pio)
>  
>   return address;
>  }
> +EXPORT_SYMBOL(pci_pio_to_address);

Hello! I'm not sure if EXPORT_SYMBOL is correct because file has GPL-2.0
header. Should not be in this case used only EXPORT_SYMBOL_GPL? Maybe
other people would know what is correct?

>  
>  unsigned long __weak pci_address_to_pio(phys_addr_t address)
>  {
> -- 
> 2.25.1
>

Re: [PATCH 5.4 00/24] 5.4.105-rc1 review

2021-03-23 Thread Pali Rohár

On Tuesday 23 March 2021 10:20:27 Florian Fainelli wrote:
> On 3/15/2021 2:50 AM, Pali Rohár wrote:
> > On Friday 12 March 2021 12:54:18 Alexander Lobakin wrote:
> >> From: Florian Fainelli 
> >> Date: Thu, 11 Mar 2021 09:41:27 -0800
> >>
> >> Hi Florian,
> >>
> >>> On 3/11/21 9:40 AM, Greg KH wrote:
> >>>> On Thu, Mar 11, 2021 at 09:23:56AM -0800, Florian Fainelli wrote:
> >>>>> On 3/11/21 5:08 AM, Greg KH wrote:
> >>>>>> On Wed, Mar 10, 2021 at 08:19:45PM -0800, Florian Fainelli wrote:
> >>>>>>> +Alex,
> >>>>>>>
> >>>>>>> On 3/10/2021 5:24 AM, gre...@linuxfoundation.org wrote:
> >>>>>>>> From: Greg Kroah-Hartman 
> >>>>>>>>
> >>>>>>>> This is the start of the stable review cycle for the 5.4.105 release.
> >>>>>>>> There are 24 patches in this series, all will be posted as a response
> >>>>>>>> to this one.  If anyone has any issues with these being applied, 
> >>>>>>>> please
> >>>>>>>> let me know.
> >>>>>>>>
> >>>>>>>> Responses should be made by Fri, 12 Mar 2021 13:23:09 +.
> >>>>>>>> Anything received after that time might be too late.
> >>>>>>>>
> >>>>>>>> The whole patch series can be found in one patch at:
> >>>>>>>>  
> >>>>>>>> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.105-rc1.gz
> >>>>>>>> or in the git tree and branch at:
> >>>>>>>>  
> >>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
> >>>>>>>>  linux-5.4.y
> >>>>>>>> and the diffstat can be found below.
> >>>>>>>>
> >>>>>>>> thanks,
> >>>>>>>>
> >>>>>>>> greg k-h
> >>>>>>>
> >>>>>>> I believe you need to drop "net: dsa: add GRO support via gro_cells" 
> >>>>>>> as
> >>>>>>> it causes the following kernel panic on a DSA-enabled platform:
> >>>>>>>
> >>>>>>> Configuring rgmii_2 interface
> >>>>>>> [   10.170527] brcm-sf2 f0b0.ethernet_switch rgmii_2: configuring
> >>>>>>> for fixed/rgmii-txid link mode
> >>>>>>> [   10.179597] 8021q: adding VLAN 0 to HW filter on device rgmii_2
> >>>>>>> [   10.185608] brcm-sf2 f0b0.ethernet_switch rgmii_2: Link is Up -
> >>>>>>> 1Gbps/Full - flow control off
> >>>>>>> [   10.198631] IPv6: ADDRCONF(NETDEV_CHANGE): rgmii_2: link becomes 
> >>>>>>> ready
> >>>>>>> Configuring sit0 interface
> >>>>>>> [   10.254346] 8<--- cut here ---
> >>>>>>> [   10.257438] Unable to handle kernel paging request at virtual 
> >>>>>>> address
> >>>>>>> d6df6190
> >>>>>>> [   10.264685] pgd = (ptrval)
> >>>>>>> [   10.267411] [d6df6190] *pgd=807003, *pmd=
> >>>>>>> [   10.272846] Internal error: Oops: 206 [#1] SMP ARM
> >>>>>>> [   10.277661] Modules linked in:
> >>>>>>> [   10.280739] CPU: 0 PID: 1886 Comm: sed Not tainted
> >>>>>>> 5.4.105-1.0pre-geff642e2af2b #4
> >>>>>>> [   10.288337] Hardware name: Broadcom STB (Flattened Device Tree)
> >>>>>>> [   10.294292] PC is at gro_cells_receive+0x90/0x11c
> >>>>>>> [   10.299020] LR is at dsa_switch_rcv+0x120/0x1d4
> >>>>>>> [   10.303562] pc : []lr : []psr: 600f0113
> >>>>>>> [   10.309841] sp : c1d33cd0  ip : 03e8  fp : c1d33ce4
> >>>>>>> [   10.315078] r10: c8901000  r9 : c8901000  r8 : c0b4a53c
> >>>>>>> [   10.320314] r7 : c2208920  r6 :   r5 :   r4 : 
> >>>>>>> 4000
> >>>>>>> [   10.326855] r3 : d6df6188  r2 : c4927000  r1 : c8adc300  r0 : 
> >>>>>>> c22069dc
> >>>>>>> [   10.98] Flags: nZCv  IRQs on  FIQs on

Re: How long should be PCIe card in Warm Reset state?

2021-03-23 Thread Pali Rohár

On Tuesday 23 March 2021 21:49:41 Amey Narkhede wrote:
> On 21/03/10 12:05PM, Pali Rohár wrote:
> > Hello!
> >
> > I would like to open a question about PCIe Warm Reset. Warm Reset of
> > PCIe card is triggered by asserting PERST# signal and in most cases
> > PERST# signal is controlled by GPIO.
> >
> > Basically every native Linux PCIe controller driver is doing this Warm
> > Reset of connected PCIe card during native driver initialization
> > procedure.
> >
> > And now the important question is: How long should be PCIe card in Warm
> > Reset state? After which timeout can be PERST# signal de-asserted by
> > Linux controller driver?
> >
> > Lorenzo and Rob already expressed concerns [1] [2] that this Warm Reset
> > timeout should not be driver specific and I agree with them.
> >
> > I have done investigation which timeout is using which native PCIe
> > driver [3] and basically every driver is using different timeout.
> >
> > I have tried to find timeouts in PCIe specifications, I was not able to
> > understand and deduce correct timeout value for Warm Reset from PCIe
> > specifications. What I have found is written in my email [4].
> >
> > Alex (as a "reset expert"), could you look at this issue?
> >
> > Or is there somebody else who understand PCIe specifications and PCIe
> > diagrams to figure out what is the minimal timeout for de-asserting
> > PERST# signal?
> >
> > There are still some issues with WiFi cards (e.g. Compex one) which
> > sometimes do not appear on PCIe bus. And based on these "reset timeout
> > differences" in Linux PCIe controller drivers, I suspect that it is not
> > (only) the problems in WiFi cards but also in Linux PCIe controller
> > drivers. In my email [3] I have written that I figured out that WLE1216
> > card needs to be in Warm Reset state for at least 10ms, otherwise card
> > is not detected.
> >
> > [1] - 
> > https://lore.kernel.org/linux-pci/20200513115940.fiemtnxfqcyqo6ik@pali/
> > [2] - https://lore.kernel.org/linux-pci/20200507212002.GA32182@bogus/
> > [3] - 
> > https://lore.kernel.org/linux-pci/20200424092546.25p3hdtkehohe3xw@pali/
> > [4] - 
> > https://lore.kernel.org/linux-pci/20200430082245.xblvb7xeamm4e336@pali/
> 
> I somehow got my hands on PCIe Gen4 spec. It says on page no 555-
> "When PERST# is provided to a component or adapter, this signal must be
> used by the component or adapter as Fundamental Reset.
> When PERST# is not provided to a component or adapter, Fundamental Reset is
> generated autonomously by the component or adapter, and the details of how
> this is done are outside the scope of this document."
> Not sure what component/adapter means in this context.
> 
> Then below it says-
> "In some cases, it may be possible for the Fundamental Reset mechanism
> to be triggered  by hardware without the removal and re-application of
> power to the component.  This is called a warm reset. This document does
> not specify a means for generating a warm reset."
> 
> Thanks,
> Amey

Hello Amey, PCIe Base document does not specify how to control PERST#
signal and how to issue Warm Reset. But it is documented in PCIe CEM,
Mini PCIe CEM and M.2 CEM documents (maybe in some other PCIe docs too).

It is needed look into more documents, "merge them in head" and then
deduce final meaning...

Re: [v8,3/7] PCI: mediatek-gen3: Add MediaTek Gen3 driver for MT8192

2021-03-23 Thread Pali Rohár

On Tuesday 23 March 2021 09:31:34 Jianjun Wang wrote:
> One more question, is there any chance that we can put this linkup flow
> to a more "standard" way, such as drivers provides the ops of the PERST#
> pin and let the framework to decide how to start a link training, or we
> just use macro to replace this timeout value in the future?

This is something about which I was thinking that could be useful for
pci-aardvark.c driver. But I was not sure if some other driver can
benefit from such "framework". But now I see that your driver is another
candidate which can benefit from it.

Currently there is no such "framework" in kernel and the hardest part
would be to design it.

Having this API would allow kernel to implement and export PCIe Warm
Reset (which is done via PERST# signal) and easily extend Amey's reset
patches to export also Warm Reset via sysfs.

But to implement this framework and using it for reset we first need to
answer questions which I have sent in email:
https://lore.kernel.org/linux-pci/20210310110535.zh4pnn4vpmvzwl5q@pali/

Bjorn, Alex: any opinion about PERST#?

Also see Enrico's email, where confirmed that there are platforms which
shares one PERST# signal for more endpoint cards:
https://lore.kernel.org/linux-pci/1da0fa2c-8056-9ae8-6ce4-ab6453177...@metux.net/

Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism

2021-03-23 Thread Pali Rohár

On Thursday 18 March 2021 20:01:55 Amey Narkhede wrote:
> On 21/03/17 09:13PM, Pali Rohár wrote:
> > On Wednesday 17 March 2021 14:00:20 Alex Williamson wrote:
> > > On Wed, 17 Mar 2021 20:40:24 +0100
> > > Pali Rohár  wrote:
> > >
> > > > On Wednesday 17 March 2021 13:32:45 Alex Williamson wrote:
> > > > > On Wed, 17 Mar 2021 20:24:24 +0100
> > > > > Pali Rohár  wrote:
> > > > >
> > > > > > On Wednesday 17 March 2021 13:15:36 Alex Williamson wrote:
> > > > > > > On Wed, 17 Mar 2021 20:02:06 +0100
> > > > > > > Pali Rohár  wrote:
> > > > > > >
> > > > > > > > On Monday 15 March 2021 09:03:39 Alex Williamson wrote:
> > > > > > > > > On Mon, 15 Mar 2021 15:52:38 +0100
> > > > > > > > > Pali Rohár  wrote:
> > > > > > > > >
> > > > > > > > > > On Monday 15 March 2021 08:34:09 Alex Williamson wrote:
> > > > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > > > Pali Rohár  wrote:
> > > > > > > > > > >
> > > > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and 
> > > > > > > > > > > > > secondary bus
> > > > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot 
> > > > > > > > > > > > > reset and
> > > > > > > > > > > > > warm reset respectively.
> > > > > > > > > > > >
> > > > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot 
> > > > > > > > > > > > reset is just another
> > > > > > > > > > > > type of reset, which is currently implemented only for 
> > > > > > > > > > > > PCIe hot plug
> > > > > > > > > > > > bridges and for PowerPC PowerNV platform and it just 
> > > > > > > > > > > > call PCI secondary
> > > > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does 
> > > > > > > > > > > > not have API in
> > > > > > > > > > > > kernel and therefore drivers do not export this type of 
> > > > > > > > > > > > reset via any
> > > > > > > > > > > > kernel function (yet).
> > > > > > > > > > >
> > > > > > > > > > > Warm reset is beyond the scope of this series, but could 
> > > > > > > > > > > be implemented
> > > > > > > > > > > in a compatible way to fit within the 
> > > > > > > > > > > pci_reset_fn_methods[] array
> > > > > > > > > > > defined here.
> > > > > > > > > >
> > > > > > > > > > Ok!
> > > > > > > > > >
> > > > > > > > > > > Note that with this series the resets available through
> > > > > > > > > > > pci_reset_function() and the per device reset attribute 
> > > > > > > > > > > is sysfs remain
> > > > > > > > > > > exactly the same as they are currently.  The bus and slot 
> > > > > > > > > > > reset
> > > > > > > > > > > methods used here are limited to devices where only a 
> > > > > > > > > > > single function is
> > > > > > > > > > > affected by the reset, therefore it is not like the patch 
> > > > > > > > > > > you proposed
> > > > > > > > > > > which performed a reset irrespective of the downstream 
> > > > > > > > > > > devices.  This
> > > > > > > > > > > series only enables selection of the existing methods.  
> > > > > > > > > > > Thanks,
> > > > > > > > > > >
> > > > > > > > > > > Alex
> > > > > >

Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS

2021-03-20 Thread Pali Rohár

Hello Don!

I have read whole discussion and your EEPROM patch proposal. But for me
it looks like some kernel glue code for some old legacy / proprietary
access method which does not have any usage outside of that old code.

Your code does not contain any quirks which are needed to read different
EEPROMs in different SFP modules. As Andrew wrote there are lot of
broken SFPs which needs special handling and this logic is already
implemented in sfp.c and sfp-bus.c kernel drivers. These drivers then
export EEPROM content to userspace via ethtool -m API in unified way and
userspace does not implement any quirks (nor does not have to deal with
quirks).

If you try to read EEPROM "incorrectly" then SFP module with its EEPROM
chip (or emulation of chip) locks and is fully unusable after you unplug
it and plug it again. Kernel really should not export API to userspace
which can cause "damage" to SFP modules. And currently it does *not* do
it.

I have contributed code for some GPON SFP modules, so their EEPROM can
be correctly read and exported to userspace via ethtool -m. So I know
that this is very fragile area and needs to be properly handled.

So I do not see any reason why can be a new optoe method in _current_
form useful. It does not implemented required things for handling
different EEPROM modules.

I would rather suggest you to use ethtool -m IOCTL API and in case it is
not suitable for QSFP (e.g. because of paging system) then extend this
API.

There were already proposals for using netlink socket interface which is
today de-facto standard interface for new code. sysfs API for such thing
really looks like some legacy code and nowadays we have better access
methods.

If you want, I can help you with these steps and make patches to be in
acceptable state; not written in "legacy" style. As I'm working with
GPON SFP modules with different EEPROMs in them, I'm really interested
in any improvements in their EEPROM area.

Re: [v8,3/7] PCI: mediatek-gen3: Add MediaTek Gen3 driver for MT8192

2021-03-19 Thread Pali Rohár

On Thursday 18 March 2021 13:48:07 Jianjun Wang wrote:
> On Thu, 2021-03-18 at 01:02 +0100, Pali Rohár wrote:
> > On Saturday 13 March 2021 15:43:14 Jianjun Wang wrote:
> > > On Thu, 2021-03-11 at 13:38 +0100, Pali Rohár wrote:
> > > > On Wednesday 24 February 2021 14:11:28 Jianjun Wang wrote:
> > > > > +static int mtk_pcie_startup_port(struct mtk_pcie_port *port)
> > > > > +{
> > > > ...
> > > > > +
> > > > > + /* Delay 100ms to wait the reference clocks become stable */
> > > > > + msleep(100);
> > > > > +
> > > > > + /* De-assert PERST# signal */
> > > > > + val &= ~PCIE_PE_RSTB;
> > > > > + writel_relaxed(val, port->base + PCIE_RST_CTRL_REG);
> > > > 
> > > > Hello! This is a new driver which introduce yet another custom timeout
> > > > prior PERST# signal for PCIe card is de-asserted. Timeouts for other
> > > > drivers I collected in older email [2].
> > > > 
> > > > Please look at my email [1] about PCIe Warm Reset if you have any clue
> > > > about it. Lorenzo and Rob already expressed that this timeout should not
> > > > be driver specific. But nobody was able to "decode" and "understand"
> > > > PCIe spec yet about these timeouts.
> > > 
> > > Hi Pali,
> > > 
> > > I think this is more like a platform specific timeout, which is used to
> > > wait for the reference clocks to become stable and finish the reset flow
> > > of HW blocks.
> > > 
> > > Here is the steps to start a link training in this HW:
> > > 
> > > 1. Assert all reset signals which including the transaction layer, PIPE
> > > interface and internal bus interface;
> > > 
> > > 2. De-assert reset signals except the PERST#, this will make the
> > > physical layer active and start to output the reference clock, but the
> > > EP device remains in the reset state.
> > >Before releasing the PERST# signal, the HW blocks needs at least 10ms
> > > to finish the reset flow, and ref-clk needs about 30us to become stable.
> > > 
> > > 3. De-assert PERST# signal, wait LTSSM enter L0 state.
> > > 
> > > This 100ms timeout is reference to TPVPERL in the PCIe CEM spec. Since
> > > we are in the kernel stage, the power supply has already stabled, this
> > > timeout may not take that long.
> > 
> > I think that this is not platform specific timeout or platform specific
> > steps. This matches generic steps as defined in PCIe CEM spec, section
> > 2.2.1. Initial Power-Up (G3 to S0).
> > 
> > What is platform specific is just how to achieve these steps.
> > 
> > Am I right?
> > 
> > ...
> > 
> > TPVPERL is one of my timeout candidates as minimal required timeout for
> > Warm Reset. I have wrote it in email:
> > 
> > https://lore.kernel.org/linux-pci/20200430082245.xblvb7xeamm4e336@pali/
> > 
> > But I'm not sure as specially in none diagram is described just warm
> > reset as defined in mPCIe CEM (3.2.4.3. PERST# Signal).
> > 
> > ...
> > 
> > Anyway, I would suggest to define constants for those timeouts. I guess
> > that in future we could be able to define "generic" timeout constants
> > which would not be in private driver section, but in some common header
> > file.
> 
> I agree with this, but I'm not sure if we really need that long time in
> the kernel stage, because the power supply has already stable and it's
> really impact the boot time, especially when the platform have multi
> ports and not connect any EP device, we need to wait 200ms for each port
> when system bootup.

Ports are independent. So you can initialize them in parallel, right?

If you initialize each port in separate worker then during msleep calls
kernel can schedule other kernel thread to run and so it does not
increase boot time. While pcie is sleeping kernel can do other things.
So the result is that whole boot time is not increased, just reordered.

> For this PCIe controller driver, I would like to change the timeout
> value to 10ms to comply with the HW design, and save some boot time.

In case you can connect _any_ PCIe card to your HW then you cannot
decrease or change timeouts required by PCIe specs. Otherwise there can
be a card which would not be initialized correctly.

I'm debugging driver for aardvark PCIe controller and I see that Compex
cards really needs these timeouts, otherwise link is down and card
cannot be detected.

So I guess that t

Re: [v8,3/7] PCI: mediatek-gen3: Add MediaTek Gen3 driver for MT8192

2021-03-17 Thread Pali Rohár

On Saturday 13 March 2021 15:43:14 Jianjun Wang wrote:
> On Thu, 2021-03-11 at 13:38 +0100, Pali Rohár wrote:
> > On Wednesday 24 February 2021 14:11:28 Jianjun Wang wrote:
> > > +static int mtk_pcie_startup_port(struct mtk_pcie_port *port)
> > > +{
> > ...
> > > +
> > > + /* Delay 100ms to wait the reference clocks become stable */
> > > + msleep(100);
> > > +
> > > + /* De-assert PERST# signal */
> > > + val &= ~PCIE_PE_RSTB;
> > > + writel_relaxed(val, port->base + PCIE_RST_CTRL_REG);
> > 
> > Hello! This is a new driver which introduce yet another custom timeout
> > prior PERST# signal for PCIe card is de-asserted. Timeouts for other
> > drivers I collected in older email [2].
> > 
> > Please look at my email [1] about PCIe Warm Reset if you have any clue
> > about it. Lorenzo and Rob already expressed that this timeout should not
> > be driver specific. But nobody was able to "decode" and "understand"
> > PCIe spec yet about these timeouts.
> 
> Hi Pali,
> 
> I think this is more like a platform specific timeout, which is used to
> wait for the reference clocks to become stable and finish the reset flow
> of HW blocks.
> 
> Here is the steps to start a link training in this HW:
> 
> 1. Assert all reset signals which including the transaction layer, PIPE
> interface and internal bus interface;
> 
> 2. De-assert reset signals except the PERST#, this will make the
> physical layer active and start to output the reference clock, but the
> EP device remains in the reset state.
>Before releasing the PERST# signal, the HW blocks needs at least 10ms
> to finish the reset flow, and ref-clk needs about 30us to become stable.
> 
> 3. De-assert PERST# signal, wait LTSSM enter L0 state.
> 
> This 100ms timeout is reference to TPVPERL in the PCIe CEM spec. Since
> we are in the kernel stage, the power supply has already stabled, this
> timeout may not take that long.

I think that this is not platform specific timeout or platform specific
steps. This matches generic steps as defined in PCIe CEM spec, section
2.2.1. Initial Power-Up (G3 to S0).

What is platform specific is just how to achieve these steps.

Am I right?

...

TPVPERL is one of my timeout candidates as minimal required timeout for
Warm Reset. I have wrote it in email:

https://lore.kernel.org/linux-pci/20200430082245.xblvb7xeamm4e336@pali/

But I'm not sure as specially in none diagram is described just warm
reset as defined in mPCIe CEM (3.2.4.3. PERST# Signal).

...

Anyway, I would suggest to define constants for those timeouts. I guess
that in future we could be able to define "generic" timeout constants
which would not be in private driver section, but in some common header
file.

> > > +
> > > + /* Check if the link is up or not */
> > > + err = readl_poll_timeout(port->base + PCIE_LINK_STATUS_REG, val,
> > > +  !!(val & PCIE_PORT_LINKUP), 20,
> > > +  50 * USEC_PER_MSEC);
> > 
> > IIRC, you need to wait at least 100ms after de-asserting PERST# signal
> > as it is required by PCIe specs and also because experiments proved that
> > some Compex wifi cards (e.g. WLE900VX) are not detected if you do not
> > wait this minimal time.
> 
> Yes, this should be 100ms, I will fix it at next version, thanks for
> your review.

In past Bjorn suggested to use msleep(PCI_PM_D3COLD_WAIT); macro for
this step during reviewing aardvark driver.

https://lore.kernel.org/linux-pci/20190426161050.ga189...@google.com/

And next iteration used this PCI_PM_D3COLD_WAIT macro instead of 100:

https://lore.kernel.org/linux-pci/20190522213351.21366-2-r...@triplefau.lt/

> Thanks.
> > 
> > > + if (err) {
> > > + val = readl_relaxed(port->base + PCIE_LTSSM_STATUS_REG);
> > > + dev_err(port->dev, "PCIe link down, ltssm reg val: %#x\n", val);
> > > + return err;
> > > + }
> > 
> > [1] - 
> > https://lore.kernel.org/linux-pci/20210310110535.zh4pnn4vpmvzwl5q@pali/
> > [2] - 
> > https://lore.kernel.org/linux-pci/20200424092546.25p3hdtkehohe3xw@pali/
>

Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism

2021-03-17 Thread Pali Rohár

On Wednesday 17 March 2021 14:00:20 Alex Williamson wrote:
> On Wed, 17 Mar 2021 20:40:24 +0100
> Pali Rohár  wrote:
> 
> > On Wednesday 17 March 2021 13:32:45 Alex Williamson wrote:
> > > On Wed, 17 Mar 2021 20:24:24 +0100
> > > Pali Rohár  wrote:
> > >   
> > > > On Wednesday 17 March 2021 13:15:36 Alex Williamson wrote:  
> > > > > On Wed, 17 Mar 2021 20:02:06 +0100
> > > > > Pali Rohár  wrote:
> > > > > 
> > > > > > On Monday 15 March 2021 09:03:39 Alex Williamson wrote:
> > > > > > > On Mon, 15 Mar 2021 15:52:38 +0100
> > > > > > > Pali Rohár  wrote:
> > > > > > >   
> > > > > > > > On Monday 15 March 2021 08:34:09 Alex Williamson wrote:  
> > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > Pali Rohár  wrote:
> > > > > > > > > 
> > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:   
> > > > > > > > > >  
> > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset 
> > > > > > > > > > > and
> > > > > > > > > > > warm reset respectively.  
> > > > > > > > > > 
> > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is 
> > > > > > > > > > just another
> > > > > > > > > > type of reset, which is currently implemented only for PCIe 
> > > > > > > > > > hot plug
> > > > > > > > > > bridges and for PowerPC PowerNV platform and it just call 
> > > > > > > > > > PCI secondary
> > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not 
> > > > > > > > > > have API in
> > > > > > > > > > kernel and therefore drivers do not export this type of 
> > > > > > > > > > reset via any
> > > > > > > > > > kernel function (yet).
> > > > > > > > > 
> > > > > > > > > Warm reset is beyond the scope of this series, but could be 
> > > > > > > > > implemented
> > > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] 
> > > > > > > > > array
> > > > > > > > > defined here.
> > > > > > > > 
> > > > > > > > Ok!
> > > > > > > >   
> > > > > > > > > Note that with this series the resets available through
> > > > > > > > > pci_reset_function() and the per device reset attribute is 
> > > > > > > > > sysfs remain
> > > > > > > > > exactly the same as they are currently.  The bus and slot 
> > > > > > > > > reset
> > > > > > > > > methods used here are limited to devices where only a single 
> > > > > > > > > function is
> > > > > > > > > affected by the reset, therefore it is not like the patch you 
> > > > > > > > > proposed
> > > > > > > > > which performed a reset irrespective of the downstream 
> > > > > > > > > devices.  This
> > > > > > > > > series only enables selection of the existing methods.  
> > > > > > > > > Thanks,
> > > > > > > > > 
> > > > > > > > > Alex
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > But with this patch series, there is still an issue with PCI 
> > > > > > > > secondary
> > > > > > > > bus reset mechanism as exported sysfs attribute does not do that
> > > > > > > > remove-reset-rescan procedure. As discussed in other thread, 
> > > > > > > > this reset
> > > > > > > > let device in unconfigured / broken state.  
> > > > > > > 
> > > > > > > No, the

Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism

2021-03-17 Thread Pali Rohár

On Wednesday 17 March 2021 13:32:45 Alex Williamson wrote:
> On Wed, 17 Mar 2021 20:24:24 +0100
> Pali Rohár  wrote:
> 
> > On Wednesday 17 March 2021 13:15:36 Alex Williamson wrote:
> > > On Wed, 17 Mar 2021 20:02:06 +0100
> > > Pali Rohár  wrote:
> > >   
> > > > On Monday 15 March 2021 09:03:39 Alex Williamson wrote:  
> > > > > On Mon, 15 Mar 2021 15:52:38 +0100
> > > > > Pali Rohár  wrote:
> > > > > 
> > > > > > On Monday 15 March 2021 08:34:09 Alex Williamson wrote:
> > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > Pali Rohár  wrote:
> > > > > > >   
> > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:  
> > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > warm reset respectively.
> > > > > > > > 
> > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is 
> > > > > > > > just another
> > > > > > > > type of reset, which is currently implemented only for PCIe hot 
> > > > > > > > plug
> > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI 
> > > > > > > > secondary
> > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have 
> > > > > > > > API in
> > > > > > > > kernel and therefore drivers do not export this type of reset 
> > > > > > > > via any
> > > > > > > > kernel function (yet).  
> > > > > > > 
> > > > > > > Warm reset is beyond the scope of this series, but could be 
> > > > > > > implemented
> > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > defined here.  
> > > > > > 
> > > > > > Ok!
> > > > > > 
> > > > > > > Note that with this series the resets available through
> > > > > > > pci_reset_function() and the per device reset attribute is sysfs 
> > > > > > > remain
> > > > > > > exactly the same as they are currently.  The bus and slot reset
> > > > > > > methods used here are limited to devices where only a single 
> > > > > > > function is
> > > > > > > affected by the reset, therefore it is not like the patch you 
> > > > > > > proposed
> > > > > > > which performed a reset irrespective of the downstream devices.  
> > > > > > > This
> > > > > > > series only enables selection of the existing methods.  Thanks,
> > > > > > > 
> > > > > > > Alex
> > > > > > >   
> > > > > > 
> > > > > > But with this patch series, there is still an issue with PCI 
> > > > > > secondary
> > > > > > bus reset mechanism as exported sysfs attribute does not do that
> > > > > > remove-reset-rescan procedure. As discussed in other thread, this 
> > > > > > reset
> > > > > > let device in unconfigured / broken state.
> > > > > 
> > > > > No, there's not:
> > > > > 
> > > > > int pci_reset_function(struct pci_dev *dev)
> > > > > {
> > > > > int rc;
> > > > > 
> > > > > if (!dev->reset_fn)
> > > > > return -ENOTTY;
> > > > > 
> > > > > pci_dev_lock(dev);
> > > > > >>> pci_dev_save_and_disable(dev);
> > > > > 
> > > > > rc = __pci_reset_function_locked(dev);
> > > > > 
> > > > > >>> pci_dev_restore(dev);
> > > > > pci_dev_unlock(dev);
> > > > > 
> > > > > return rc;
> > > > > }
> > > > > 
> > > > > The remove/re-scan was discussed primarily because your patch 
> > > > > performed
> > > > > a bus reset regardless of what devices were

Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism

2021-03-17 Thread Pali Rohár

On Wednesday 17 March 2021 13:15:36 Alex Williamson wrote:
> On Wed, 17 Mar 2021 20:02:06 +0100
> Pali Rohár  wrote:
> 
> > On Monday 15 March 2021 09:03:39 Alex Williamson wrote:
> > > On Mon, 15 Mar 2021 15:52:38 +0100
> > > Pali Rohár  wrote:
> > >   
> > > > On Monday 15 March 2021 08:34:09 Alex Williamson wrote:  
> > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > Pali Rohár  wrote:
> > > > > 
> > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > warm reset respectively.  
> > > > > > 
> > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just 
> > > > > > another
> > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > bridges and for PowerPC PowerNV platform and it just call PCI 
> > > > > > secondary
> > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > kernel and therefore drivers do not export this type of reset via 
> > > > > > any
> > > > > > kernel function (yet).
> > > > > 
> > > > > Warm reset is beyond the scope of this series, but could be 
> > > > > implemented
> > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > defined here.
> > > > 
> > > > Ok!
> > > >   
> > > > > Note that with this series the resets available through
> > > > > pci_reset_function() and the per device reset attribute is sysfs 
> > > > > remain
> > > > > exactly the same as they are currently.  The bus and slot reset
> > > > > methods used here are limited to devices where only a single function 
> > > > > is
> > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > which performed a reset irrespective of the downstream devices.  This
> > > > > series only enables selection of the existing methods.  Thanks,
> > > > > 
> > > > > Alex
> > > > > 
> > > > 
> > > > But with this patch series, there is still an issue with PCI secondary
> > > > bus reset mechanism as exported sysfs attribute does not do that
> > > > remove-reset-rescan procedure. As discussed in other thread, this reset
> > > > let device in unconfigured / broken state.  
> > > 
> > > No, there's not:
> > > 
> > > int pci_reset_function(struct pci_dev *dev)
> > > {
> > > int rc;
> > > 
> > > if (!dev->reset_fn)
> > > return -ENOTTY;
> > > 
> > > pci_dev_lock(dev);  
> > > >>> pci_dev_save_and_disable(dev);  
> > > 
> > > rc = __pci_reset_function_locked(dev);
> > >   
> > > >>> pci_dev_restore(dev);  
> > > pci_dev_unlock(dev);
> > > 
> > > return rc;
> > > }
> > > 
> > > The remove/re-scan was discussed primarily because your patch performed
> > > a bus reset regardless of what devices were affected by that reset and
> > > it's difficult to manage the scope where multiple devices are affected.
> > > Here, the bus and slot reset functions will fail unless the scope is
> > > limited to the single device triggering this reset.  Thanks,
> > > 
> > > Alex
> > >   
> > 
> > I was thinking a bit more about it and I'm really sure how it would
> > behave with hotplugging PCIe bridge.
> > 
> > On aardvark PCIe controller I have already tested that secondary bus
> > reset bit is triggering Hot Reset event and then also Link Down event.
> > These events are not handled by aardvark driver yet (needs to
> > implemented into kernel's emulated root bridge code).
> > 
> > But I'm not sure how it would behave on real HW PCIe hotplugging bridge.
> > Kernel has already code which removes PCIe device if it changes presence
> > bit (and inform via interrupt). And Link Down event triggers this
> > change.
> 
> This is the difference between slot and bus resets, the slot reset is
> implemented by the hotplug controller and disables presence detection
> around the bus reset.  Thanks,

Yes, but I'm talking about bus reset, not about slot reset.

I mean: to use bus reset via sysfs on hardware which supports slots and
hotplugging.

And if I'm reading code correctly, this combination is allowed, right?
Via these new patches it is possible to disable slot reset and enable
bus reset.

Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism

2021-03-17 Thread Pali Rohár

On Monday 15 March 2021 09:03:39 Alex Williamson wrote:
> On Mon, 15 Mar 2021 15:52:38 +0100
> Pali Rohár  wrote:
> 
> > On Monday 15 March 2021 08:34:09 Alex Williamson wrote:
> > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > Pali Rohár  wrote:
> > >   
> > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:  
> > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > warm reset respectively.
> > > > 
> > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > kernel and therefore drivers do not export this type of reset via any
> > > > kernel function (yet).  
> > > 
> > > Warm reset is beyond the scope of this series, but could be implemented
> > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > defined here.  
> > 
> > Ok!
> > 
> > > Note that with this series the resets available through
> > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > exactly the same as they are currently.  The bus and slot reset
> > > methods used here are limited to devices where only a single function is
> > > affected by the reset, therefore it is not like the patch you proposed
> > > which performed a reset irrespective of the downstream devices.  This
> > > series only enables selection of the existing methods.  Thanks,
> > > 
> > > Alex
> > >   
> > 
> > But with this patch series, there is still an issue with PCI secondary
> > bus reset mechanism as exported sysfs attribute does not do that
> > remove-reset-rescan procedure. As discussed in other thread, this reset
> > let device in unconfigured / broken state.
> 
> No, there's not:
> 
> int pci_reset_function(struct pci_dev *dev)
> {
> int rc;
> 
> if (!dev->reset_fn)
> return -ENOTTY;
> 
> pci_dev_lock(dev);
> >>> pci_dev_save_and_disable(dev);
> 
> rc = __pci_reset_function_locked(dev);
> 
> >>> pci_dev_restore(dev);
> pci_dev_unlock(dev);
> 
> return rc;
> }
> 
> The remove/re-scan was discussed primarily because your patch performed
> a bus reset regardless of what devices were affected by that reset and
> it's difficult to manage the scope where multiple devices are affected.
> Here, the bus and slot reset functions will fail unless the scope is
> limited to the single device triggering this reset.  Thanks,
> 
> Alex
> 

I was thinking a bit more about it and I'm really sure how it would
behave with hotplugging PCIe bridge.

On aardvark PCIe controller I have already tested that secondary bus
reset bit is triggering Hot Reset event and then also Link Down event.
These events are not handled by aardvark driver yet (needs to
implemented into kernel's emulated root bridge code).

But I'm not sure how it would behave on real HW PCIe hotplugging bridge.
Kernel has already code which removes PCIe device if it changes presence
bit (and inform via interrupt). And Link Down event triggers this
change.

Can somebody test these changes on some PCIe hotplug controller what
secondary bus reset via sysfs would do? Because currently it is not
exported as reset method and there can be different race conditions and
maybe error (?) if hotplug code is going to remove device on which user
triggered bus reset via sysfs.

And in my opinion this can happen also in case when only one device is
on the bus, so it perfectly matches all conditions when sysfs can use
bus reset for one device.

I can try to implement hotplug code into aardvark driver and root bridge
emulator to test how this patch would happen. But it would take some
time...

Re: [PATCHv2 18/38] dt-bindings: power: supply: n900-battery: Convert to DT schema format

2021-03-17 Thread Pali Rohár

On Wednesday 17 March 2021 14:48:44 Sebastian Reichel wrote:
> Convert the binding to DT schema format.
> 
> Cc: Pali Rohár 

Rejected-by: Pali Rohár 

> Signed-off-by: Sebastian Reichel 

Hello Sebastian! I'm really really sorry, I have nothing against you,
but personally I cannot ack change signed by company where some people
are supporting censorship, GPL violations and other similar immoral
activities against other individual developers.

> ---
>  .../power/supply/nokia,n900-battery.yaml  | 49 +++
>  .../bindings/power/supply/rx51-battery.txt| 25 --
>  2 files changed, 49 insertions(+), 25 deletions(-)
>  create mode 100644 
> Documentation/devicetree/bindings/power/supply/nokia,n900-battery.yaml
>  delete mode 100644 
> Documentation/devicetree/bindings/power/supply/rx51-battery.txt

Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism

2021-03-15 Thread Pali Rohár

On Monday 15 March 2021 08:34:09 Alex Williamson wrote:
> On Mon, 15 Mar 2021 14:52:26 +0100
> Pali Rohár  wrote:
> 
> > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > warm reset respectively.  
> > 
> > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > type of reset, which is currently implemented only for PCIe hot plug
> > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > bus reset with some other hook. PCIe Warm Reset does not have API in
> > kernel and therefore drivers do not export this type of reset via any
> > kernel function (yet).
> 
> Warm reset is beyond the scope of this series, but could be implemented
> in a compatible way to fit within the pci_reset_fn_methods[] array
> defined here.

Ok!

> Note that with this series the resets available through
> pci_reset_function() and the per device reset attribute is sysfs remain
> exactly the same as they are currently.  The bus and slot reset
> methods used here are limited to devices where only a single function is
> affected by the reset, therefore it is not like the patch you proposed
> which performed a reset irrespective of the downstream devices.  This
> series only enables selection of the existing methods.  Thanks,
> 
> Alex
> 

But with this patch series, there is still an issue with PCI secondary
bus reset mechanism as exported sysfs attribute does not do that
remove-reset-rescan procedure. As discussed in other thread, this reset
let device in unconfigured / broken state.

Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism

2021-03-15 Thread Pali Rohár

On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> slot reset (pci_dev_reset_slot_function) and secondary bus
> reset(pci_parent_bus_reset) which I think are hot reset and
> warm reset respectively.

No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
type of reset, which is currently implemented only for PCIe hot plug
bridges and for PowerPC PowerNV platform and it just call PCI secondary
bus reset with some other hook. PCIe Warm Reset does not have API in
kernel and therefore drivers do not export this type of reset via any
kernel function (yet).

Re: [PATCH 5.4 00/24] 5.4.105-rc1 review

2021-03-15 Thread Pali Rohár

On Friday 12 March 2021 12:54:18 Alexander Lobakin wrote:
> From: Florian Fainelli 
> Date: Thu, 11 Mar 2021 09:41:27 -0800
> 
> Hi Florian,
> 
> > On 3/11/21 9:40 AM, Greg KH wrote:
> > > On Thu, Mar 11, 2021 at 09:23:56AM -0800, Florian Fainelli wrote:
> > >> On 3/11/21 5:08 AM, Greg KH wrote:
> > >>> On Wed, Mar 10, 2021 at 08:19:45PM -0800, Florian Fainelli wrote:
> >  +Alex,
> > 
> >  On 3/10/2021 5:24 AM, gre...@linuxfoundation.org wrote:
> > > From: Greg Kroah-Hartman 
> > >
> > > This is the start of the stable review cycle for the 5.4.105 release.
> > > There are 24 patches in this series, all will be posted as a response
> > > to this one.  If anyone has any issues with these being applied, 
> > > please
> > > let me know.
> > >
> > > Responses should be made by Fri, 12 Mar 2021 13:23:09 +.
> > > Anything received after that time might be too late.
> > >
> > > The whole patch series can be found in one patch at:
> > >   
> > > https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.105-rc1.gz
> > > or in the git tree and branch at:
> > >   
> > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
> > >  linux-5.4.y
> > > and the diffstat can be found below.
> > >
> > > thanks,
> > >
> > > greg k-h
> > 
> >  I believe you need to drop "net: dsa: add GRO support via gro_cells" as
> >  it causes the following kernel panic on a DSA-enabled platform:
> > 
> >  Configuring rgmii_2 interface
> >  [   10.170527] brcm-sf2 f0b0.ethernet_switch rgmii_2: configuring
> >  for fixed/rgmii-txid link mode
> >  [   10.179597] 8021q: adding VLAN 0 to HW filter on device rgmii_2
> >  [   10.185608] brcm-sf2 f0b0.ethernet_switch rgmii_2: Link is Up -
> >  1Gbps/Full - flow control off
> >  [   10.198631] IPv6: ADDRCONF(NETDEV_CHANGE): rgmii_2: link becomes 
> >  ready
> >  Configuring sit0 interface
> >  [   10.254346] 8<--- cut here ---
> >  [   10.257438] Unable to handle kernel paging request at virtual 
> >  address
> >  d6df6190
> >  [   10.264685] pgd = (ptrval)
> >  [   10.267411] [d6df6190] *pgd=807003, *pmd=
> >  [   10.272846] Internal error: Oops: 206 [#1] SMP ARM
> >  [   10.277661] Modules linked in:
> >  [   10.280739] CPU: 0 PID: 1886 Comm: sed Not tainted
> >  5.4.105-1.0pre-geff642e2af2b #4
> >  [   10.288337] Hardware name: Broadcom STB (Flattened Device Tree)
> >  [   10.294292] PC is at gro_cells_receive+0x90/0x11c
> >  [   10.299020] LR is at dsa_switch_rcv+0x120/0x1d4
> >  [   10.303562] pc : []lr : []psr: 600f0113
> >  [   10.309841] sp : c1d33cd0  ip : 03e8  fp : c1d33ce4
> >  [   10.315078] r10: c8901000  r9 : c8901000  r8 : c0b4a53c
> >  [   10.320314] r7 : c2208920  r6 :   r5 :   r4 : 
> >  4000
> >  [   10.326855] r3 : d6df6188  r2 : c4927000  r1 : c8adc300  r0 : 
> >  c22069dc
> >  [   10.98] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM
> >  Segment user
> >  [   10.340547] Control: 30c5387d  Table: 04ac4c80  DAC: fffd
> >  [   10.346307] Process sed (pid: 1886, stack limit = 0x(ptrval))
> >  [   10.352066] Stack: (0xc1d33cd0 to 0xc1d34000)
> >  [   10.356434] 3cc0: c8adc300
> >  c4927000 c1d33d04 c1d33ce8
> >  [   10.364631] 3ce0: c0b4a65c c0a579a4 c1d33d24 c2208920 c1d33d24
> >   c1d33d5c c1d33d08
> >  [   10.372827] 3d00: c0a0b38c c0b4a548 c021e070 c2204cc8 
> >  c89015c0 04b87700 c89015c0
> >  [   10.381023] 3d20: c2208920 c1d33d24 c1d33d24 00976ec2 04b87700
> >  c8adc300 c89015c0 
> >  [   10.389218] 3d40: c1d33d74 c1d32000  c230742c c1d33dac
> >  c1d33d60 c0a0b5c0 c0a0b180
> >  [   10.397414] 3d60:  c2204cc8  c1d33d6c c1d33d6c
> >  c1d33d80 c029daf8 00976ec2
> >  [   10.405610] 3d80: 0800 c8901540 c89015c0 c8901540 
> >  0001 016c 0162
> >  [   10.413805] 3da0: c1d33dc4 c1d33db0 c0a0b7fc c0a0b3b8 
> >  c8adc300 c1d33dfc c1d33dc8
> >  [   10.422001] 3dc0: c0a0c660 c0a0b7e4 c8901540 c8adc300 c1d33dfc
> >  c1d33de0 c8901540 c8adc300
> >  [   10.430196] 3de0: 015e c8901000 0001 016c c1d33e74
> >  c1d33e00 c083df00 c0a0c4fc
> >  [   10.438391] 3e00: 012c c22b0f14 c1d33e4c c1d33e18 c0fbd9b8
> >  c0fbd9cc c0fbd9e0 c0fbd98c
> >  [   10.446586] 3e20: 0001 0040 c8901500 0001 
> >    
> >  [   10.454780] 3e40:   c02f65a0 c8901540 0001
> >  0040 c22b07e4 012c
> >  [   10.462975] 3e60: d1003000 fffb942f c1d33edc c1d33e78 c0a0c94c
> >  c083dafc d051ad80 c2204cc8
> >  [   10.471170] 3e80: c2204cf0 c1d32000 c22b40b0

Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism

2021-03-14 Thread Pali Rohár

On Friday 12 March 2021 23:04:52 ameynarkhed...@gmail.com wrote:
> From: Amey Narkhede 
> 
> Add reset_methods_enabled bitmap to struct pci_dev to
> keep track of user preferred device reset mechanisms.
> Add reset_method sysfs attribute to query and set
> user preferred device reset mechanisms.
> 
> Signed-off-by: Amey Narkhede 
> ---
> Reviewed-by: Alex Williamson 
> Reviewed-by: Raphael Norwitz 
> 
>  Documentation/ABI/testing/sysfs-bus-pci | 15 ++
>  drivers/pci/pci-sysfs.c | 66 +++--
>  drivers/pci/pci.c   |  3 +-
>  include/linux/pci.h |  2 +
>  4 files changed, 82 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-pci 
> b/Documentation/ABI/testing/sysfs-bus-pci
> index 25c9c3977..ae53ecd2e 100644
> --- a/Documentation/ABI/testing/sysfs-bus-pci
> +++ b/Documentation/ABI/testing/sysfs-bus-pci
> @@ -121,6 +121,21 @@ Description:
>   child buses, and re-discover devices removed earlier
>   from this part of the device tree.
> 
> +What:/sys/bus/pci/devices/.../reset_method
> +Date:March 2021
> +Contact: Amey Narkhede 
> +Description:
> + Some devices allow an individual function to be reset
> + without affecting other functions in the same slot.
> + For devices that have this support, a file named reset_method
> + will be present in sysfs. Reading this file will give names
> + of the device supported reset methods. Currently used methods
> + are enclosed in brackets. Writing the name of any of the device
> + supported reset method to this file will set the reset method to
> + be used when resetting the device. Writing "none" to this file
> + will disable ability to reset the device and writing "default"
> + will return to the original value.
> +

Hello Amey!

I think that this API does not work for PCIe Hot Reset (=PCI secondary
bus reset) and PCIe Warm Reset.

First reset method is bound to the bus, not device and therefore kernel
does not have to see any registered device. So there would be no
"reset_method" sysfs file, and also no "reset" sysfs file. But PCIe Hot
Reset is in most cases needed when buggy card is not registered on bus,
to trigger this reset. And with this API this is not possible.

PCIe Warm Reset is done by PERST# signal. When signal is asserted then
device is in reset state and therefore is not registered. So again
kernel does not have to see registered device.

Moreover for mPCIe form factor cards, boards can share one PERST# signal
with more PCIe cards and control this signal via GPIO. So asserting
PERST# GPIO can trigger Warm reset for more PCIe cards, not just one. It
depends on board or topology.

So... I do not think that current approach with "reset_method" sysfs
entry bound to the PCI device does not work for PCI secondary bus reset
and also cannot be used for implementing PCIe Warm Reset.

I would rather suggest to re-design and prepare a new API which would
work also with PCIe Hot Reset and PCIe Warm Reset.

This "reset" sysfs file can work only with PCI Function Level Reset or
some PM or device specific reset. But not with reset types which are
more like slot or bus orientated.

>  What:/sys/bus/pci/devices/.../reset
>  Date:July 2009
>  Contact: Michael S. Tsirkin 
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index 78d2c130c..3cd06d1c0 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -1304,6 +1304,59 @@ static const struct bin_attribute pcie_config_attr = {
>   .write = pci_write_config,
>  };
> 
> +static ssize_t reset_method_show(struct device *dev,
> +  struct device_attribute *attr,
> +  char *buf)
> +{
> + const struct pci_reset_fn_method *reset;
> + struct pci_dev *pdev = to_pci_dev(dev);
> + ssize_t len = 0;
> + int i;
> +
> + for (i = 0, reset = pci_reset_fn_methods; reset->reset_fn; i++, 
> reset++) {
> + if (pdev->reset_methods_enabled & (1 << i))
> + len += sysfs_emit_at(buf, len, "[%s] ", reset->name);
> + else if (pdev->reset_methods & (1 << i))
> + len += sysfs_emit_at(buf, len, "%s ", reset->name);
> + }
> +
> + return len;
> +}
> +
> +static ssize_t reset_method_store(struct device *dev,
> +   struct device_attribute *attr,
> +   const char *buf, size_t count)
> +{
> + const struct pci_reset_fn_method *reset = pci_reset_fn_methods;
> + struct pci_dev *pdev = to_pci_dev(dev);
> + u8 reset_mechanism;
> + int i = 0;
> +
> + /* Writing none disables reset */
> + if (sysfs_streq(buf, "none")) {
> + reset_mechanism = 0;
> + } else if

Re: [PATCH 3/4] PCI: Remove reset_fn field from pci_dev

2021-03-14 Thread Pali Rohár

On Friday 12 March 2021 23:04:51 ameynarkhed...@gmail.com wrote:
> From: Amey Narkhede 
> 
> reset_fn field is used to indicate whether the
> device supports any reset mechanism or not.
> Deprecate use of reset_fn in favor of new
> reset_methods bitmap which can be used to keep
> track of all supported reset mechanisms of a device.

Hello Amey!

You cannot trigger PCIe Hot Reset (PCI secondary bus reset) in this
simple way from sysfs via new reset methods.

I proposed very similar functionality just few days ago:
https://lore.kernel.org/linux-pci/20210301171221.3d42a55i7h5ubqsb@pali/T/#u

And I realized that it needs more steps to be done.

At least some remove-reset-rescan procedure done atomically is required.

> Signed-off-by: Amey Narkhede 
> ---
> Reviewed-by: Alex Williamson 
> Reviewed-by: Raphael Norwitz 
> 
>  drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 2 +-
>  drivers/pci/pci-sysfs.c| 6 ++
>  drivers/pci/pci.c  | 6 +++---
>  drivers/pci/probe.c| 1 -
>  drivers/pci/quirks.c   | 2 +-
>  include/linux/pci.h| 1 -
>  6 files changed, 7 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c 
> b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
> index 9b9d305c6..3e2c49e08 100644
> --- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
> +++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
> @@ -526,7 +526,7 @@ static void octeon_destroy_resources(struct octeon_device 
> *oct)
>   oct->irq_name_storage = NULL;
>   }
>   /* Soft reset the octeon device before exiting */
> - if (oct->pci_dev->reset_fn)
> + if (oct->pci_dev->reset_methods)
>   octeon_pci_flr(oct);
>   else
>   cn23xx_vf_ask_pf_to_do_flr(oct);
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index f8afd54ca..78d2c130c 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -1334,7 +1334,7 @@ static int pci_create_capabilities_sysfs(struct pci_dev 
> *dev)
> 
>   pcie_vpd_create_sysfs_dev_files(dev);
> 
> - if (dev->reset_fn) {
> + if (dev->reset_methods) {
>   retval = device_create_file(>dev, _attr_reset);
>   if (retval)
>   goto error;
> @@ -1417,10 +1417,8 @@ int __must_check pci_create_sysfs_dev_files(struct 
> pci_dev *pdev)
>  static void pci_remove_capabilities_sysfs(struct pci_dev *dev)
>  {
>   pcie_vpd_remove_sysfs_dev_files(dev);
> - if (dev->reset_fn) {
> + if (dev->reset_methods)
>   device_remove_file(>dev, _attr_reset);
> - dev->reset_fn = 0;
> - }
>  }
> 
>  /**
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 407b44e85..b7f6c6588 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5175,7 +5175,7 @@ int pci_reset_function(struct pci_dev *dev)
>  {
>   int rc;
> 
> - if (!dev->reset_fn)
> + if (!dev->reset_methods)
>   return -ENOTTY;
> 
>   pci_dev_lock(dev);
> @@ -5211,7 +5211,7 @@ int pci_reset_function_locked(struct pci_dev *dev)
>  {
>   int rc;
> 
> - if (!dev->reset_fn)
> + if (!dev->reset_methods)
>   return -ENOTTY;
> 
>   pci_dev_save_and_disable(dev);
> @@ -5234,7 +5234,7 @@ int pci_try_reset_function(struct pci_dev *dev)
>  {
>   int rc;
> 
> - if (!dev->reset_fn)
> + if (!dev->reset_methods)
>   return -ENOTTY;
> 
>   if (!pci_dev_trylock(dev))
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 01dd037bd..4764e031a 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2404,7 +2404,6 @@ static void pci_init_capabilities(struct pci_dev *dev)
> 
>   pcie_report_downtraining(dev);
>   pci_init_reset_methods(dev);
> - dev->reset_fn = !!dev->reset_methods;
>  }
> 
>  /*
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 0a3df84c9..20a81b1bc 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -5535,7 +5535,7 @@ static void 
> quirk_reset_lenovo_thinkpad_p50_nvgpu(struct pci_dev *pdev)
> 
>   if (pdev->subsystem_vendor != PCI_VENDOR_ID_LENOVO ||
>   pdev->subsystem_device != 0x222e ||
> - !pdev->reset_fn)
> + !pdev->reset_methods)
>   return;
> 
>   if (pci_enable_device_mem(pdev))
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 56d6e4750..a2f003f4e 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -437,7 +437,6 @@ struct pci_dev {
>   unsigned intstate_saved:1;
>   unsigned intis_physfn:1;
>   unsigned intis_virtfn:1;
> - unsigned intreset_fn:1;
>   unsigned intis_hotplug_bridge:1;
>   unsigned intshpc_managed:1; /* SHPC owned by shpchp */
>

Re: [PATCH 2/4] PCI: Add new bitmap for keeping track of supported reset mechanisms

2021-03-14 Thread Pali Rohár

On Friday 12 March 2021 23:04:50 ameynarkhed...@gmail.com wrote:
> From: Amey Narkhede 
> 
> Introduce a new bitmap reset_methods in struct pci_dev
> to keep track of reset mechanisms supported by the
> device. Also refactor probing and reset functions
> to take advantage of calling convention of reset
> functions.
> 
> Signed-off-by: Amey Narkhede 
> ---
> Reviewed-by: Alex Williamson 
> Reviewed-by: Raphael Norwitz 
> 
>  drivers/pci/pci.c   | 106 
>  drivers/pci/pci.h   |  11 -
>  drivers/pci/probe.c |   5 +--
>  include/linux/pci.h |  10 +
>  4 files changed, 79 insertions(+), 53 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 4a7c084a3..407b44e85 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -40,6 +40,26 @@ const char *pci_power_names[] = {
>  };
>  EXPORT_SYMBOL_GPL(pci_power_names);
> 
> +static int pci_af_flr(struct pci_dev *dev, int probe);
> +static int pci_pm_reset(struct pci_dev *dev, int probe);
> +static int pci_dev_reset_slot_function(struct pci_dev *dev, int probe);
> +static int pci_parent_bus_reset(struct pci_dev *dev, int probe);
> +
> +/*
> + * The ordering for functions in pci_reset_fn_methods
> + * is required for bitmap positions defined
> + * in reset_methods in struct pci_dev
> + */
> +const struct pci_reset_fn_method pci_reset_fn_methods[] = {
> + { .reset_fn = _dev_specific_reset, .name = "device_specific" },
> + { .reset_fn = _flr, .name = "flr" },
> + { .reset_fn = _af_flr, .name = "af_flr" },
> + { .reset_fn = _pm_reset, .name = "pm" },
> + { .reset_fn = _dev_reset_slot_function, .name = "slot" },
> + { .reset_fn = _parent_bus_reset, .name = "bus" },

Hello Amey! In the list of reset methods is missing PCIe Warm Reset.

Could you extend and prepare API also for PCIe Warm Reset? According to
PCI Express mini card and m.2 electromechanical specifications, PCIe
Warm Reset can be triggered by PERST# signal and more kernel drivers can
internally control PERST#. Just there is no kernel API and therefore
PCIe Warm Reset nor PERST# signal is unified.

> + { 0 },
> +};
> +
>  int isa_dma_bridge_buggy;
>  EXPORT_SYMBOL(isa_dma_bridge_buggy);
> 
> @@ -5080,71 +5100,59 @@ static void pci_dev_restore(struct pci_dev *dev)
>   */
>  int __pci_reset_function_locked(struct pci_dev *dev)
>  {
> - int rc;
> + int i, rc = -ENOTTY;
> + const struct pci_reset_fn_method *reset;
> 
>   might_sleep();
> 
> - /*
> -  * A reset method returns -ENOTTY if it doesn't support this device
> -  * and we should try the next method.
> -  *
> -  * If it returns 0 (success), we're finished.  If it returns any
> -  * other error, we're also finished: this indicates that further
> -  * reset mechanisms might be broken on the device.
> -  */
> - rc = pci_dev_specific_reset(dev, 0);
> - if (rc != -ENOTTY)
> - return rc;
> - rc = pcie_flr(dev, 0);
> - if (rc != -ENOTTY)
> - return rc;
> - rc = pci_af_flr(dev, 0);
> - if (rc != -ENOTTY)
> - return rc;
> - rc = pci_pm_reset(dev, 0);
> - if (rc != -ENOTTY)
> - return rc;
> - rc = pci_dev_reset_slot_function(dev, 0);
> - if (rc != -ENOTTY)
> - return rc;
> - return pci_parent_bus_reset(dev, 0);
> + for (i = 0, reset = pci_reset_fn_methods; reset->reset_fn; i++, 
> reset++) {
> + if (!(dev->reset_methods & (1 << i)))
> + continue;
> +
> + /*
> +  * A reset method returns -ENOTTY if it doesn't support this 
> device
> +  * and we should try the next method.
> +  *
> +  * If it returns 0 (success), we're finished.  If it returns any
> +  * other error, we're also finished: this indicates that further
> +  * reset mechanisms might be broken on the device.
> +  */
> + rc = reset->reset_fn(dev, 0);
> + if (rc != -ENOTTY)
> + return rc;
> + }
> + return rc;
>  }
>  EXPORT_SYMBOL_GPL(__pci_reset_function_locked);
> 
>  /**
> - * pci_probe_reset_function - check whether the device can be safely reset
> - * @dev: PCI device to reset
> + * pci_init_reset_methods - check whether device can be safely reset
> + * and store supported reset mechanisms.
> + * @dev: PCI device to check for reset mechanisms
>   *
>   * Some devices allow an individual function to be reset without affecting
>   * other functions in the same device.  The PCI device must be responsive
> - * to PCI config space in order to use this function.
> + * to reads and writes to its PCI config space in order to use this function.
>   *
> - * Returns 0 if the device function can be reset or negative if the
> - * device doesn't support resetting a single function.
> + * Stores reset mechanisms supported by device in reset_methods bitmap
> + * field of struct pci_dev

Re: RFC: sysfs node for Secondary PCI bus reset (PCIe Hot Reset)

2021-03-11 Thread Pali Rohár

On Thursday 11 March 2021 19:53:26 Vidya Sagar wrote:
> On 3/3/2021 11:26 PM, Pali Rohár wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > On Tuesday 02 March 2021 12:58:29 Alex Williamson wrote:
> > > On Mon, 1 Mar 2021 14:28:17 -0600
> > > Bjorn Helgaas  wrote:
> > > 
> > > > [+cc Alex, reset expert]
> > > > 
> > > > On Mon, Mar 01, 2021 at 06:12:21PM +0100, Pali Rohár wrote:
> > > > > Hello!
> > > > > 
> > > > > PCIe card can be reset via in-band Hot Reset signal which can be
> > > > > triggered by PCIe bridge via Secondary Bus Reset bit in PCI config
> > > > > space.
> > > > > 
> > > > > Kernel already exports sysfs node "reset" for triggering Functional
> > > > > Reset of particular function of PCI device. But in some cases 
> > > > > Functional
> > > > > Reset is not enough and Hot Reset is required.
> > > > > 
> > > > > Following RFC patch exports sysfs node "reset_bus" for PCI bridges 
> > > > > which
> > > > > triggers Secondary Bus Reset and therefore for PCIe bridges it resets
> > > > > connected PCIe card.
> > > > > 
> > > > > What do you think about it?
> > > > > 
> > > > > Currently there is userspace script which can trigger PCIe Hot Reset 
> > > > > by
> > > > > modifying PCI config space from userspace:
> > > > > 
> > > > > https://alexforencich.com/wiki/en/pcie/hot-reset-linux
> > > > > 
> > > > > But because kernel already provides way how to trigger Functional 
> > > > > Reset
> > > > > it could provide also way how to trigger PCIe Hot Reset.
> > > 
> > > What that script does and what this does, or what the existing reset
> > > attribute does, are very different.  The script finds the upstream
> > > bridge for a given device, removes the device (ignoring that more than
> > > one device might be affected by the bus reset), uses setpci to trigger
> > > a secondary bus reset, then rescans devices.  The below only triggers
> > > the secondary bus reset, neither saving and restoring affected device
> > > state like the existing function level reset attribute, nor removing
> > > and rescanning as the script does.  It simply leaves an entire
> > > hierarchy of PCI devices entirely un-programmed yet still has struct
> > > pci_devs attached to them for untold future misery.
> > > 
> > > In fact, for the case of a single device affected by the bus reset, as
> > > intended by the script, the existing reset attribute will already do
> > > that if the device supports no other reset mechanism.  There's actually
> > > a running LFX mentorship project that aims to allow the user to control
> > > the type of reset performed by the existing reset attribute such that a
> > > user could force the bus reset behavior over other reset methods.
> > 
> > Hello Alex? Do you have a link for this "reset" project? I'm interesting
> > in it as I'm dealing with Compex wifi cards which are causing problems.
> > 
> > For correct initialization I need to issue PCIe Warm Reset for these
> > cards (Warm Reset is done via PERST# pin which most linux controller
> > drivers controls via GPIO subsystem). And for now there is no way to
> > trigger PCIe Warm Reset for particular PCIe device from userspace. As
> > there is no userspace <--> kernel API for it.
> > 
> > > There might be some justification for an attribute that actually
> > > implements the referenced script correctly, perhaps in kernel we could
> > > avoid races with bus rescans, but simply triggering an SBR to quietly
> > > de-program all downstream devices with no state restore or device
> > > rescan is not it.  Any affected device would be unusable.  Was this
> > > tested?  Thanks,
> > 
> > I have tested my change. First I called 'remove' attribute for PCIe
> > card, then I called this 'bus_reset' on parent PCIe bridge and later I
> > called 'rescan' attribute on bridge. It correctly rested tested ath9k
> > card. So I did something similar as in above script. But I agree that
> > there are race conditions and basically lot of other calls needs to be
> > done to restore state.
> > 
> > So I see that to make it 'usable' we need to do it automatically in
> > kernel and also re

Re: [v8,3/7] PCI: mediatek-gen3: Add MediaTek Gen3 driver for MT8192

2021-03-11 Thread Pali Rohár

On Wednesday 24 February 2021 14:11:28 Jianjun Wang wrote:
> +static int mtk_pcie_startup_port(struct mtk_pcie_port *port)
> +{
...
> +
> + /* Delay 100ms to wait the reference clocks become stable */
> + msleep(100);
> +
> + /* De-assert PERST# signal */
> + val &= ~PCIE_PE_RSTB;
> + writel_relaxed(val, port->base + PCIE_RST_CTRL_REG);

Hello! This is a new driver which introduce yet another custom timeout
prior PERST# signal for PCIe card is de-asserted. Timeouts for other
drivers I collected in older email [2].

Please look at my email [1] about PCIe Warm Reset if you have any clue
about it. Lorenzo and Rob already expressed that this timeout should not
be driver specific. But nobody was able to "decode" and "understand"
PCIe spec yet about these timeouts.

> +
> + /* Check if the link is up or not */
> + err = readl_poll_timeout(port->base + PCIE_LINK_STATUS_REG, val,
> +  !!(val & PCIE_PORT_LINKUP), 20,
> +  50 * USEC_PER_MSEC);

IIRC, you need to wait at least 100ms after de-asserting PERST# signal
as it is required by PCIe specs and also because experiments proved that
some Compex wifi cards (e.g. WLE900VX) are not detected if you do not
wait this minimal time.

> + if (err) {
> + val = readl_relaxed(port->base + PCIE_LTSSM_STATUS_REG);
> + dev_err(port->dev, "PCIe link down, ltssm reg val: %#x\n", val);
> + return err;
> + }

[1] - https://lore.kernel.org/linux-pci/20210310110535.zh4pnn4vpmvzwl5q@pali/
[2] - https://lore.kernel.org/linux-pci/20200424092546.25p3hdtkehohe3xw@pali/

Re: [v8,5/7] PCI: mediatek-gen3: Add MSI support

2021-03-10 Thread Pali Rohár

On Wednesday 24 February 2021 14:11:30 Jianjun Wang wrote:
> +static int mtk_msi_bottom_domain_alloc(struct irq_domain *domain,
> +unsigned int virq, unsigned int nr_irqs,
> +void *arg)
> +{
> + struct mtk_pcie_port *port = domain->host_data;
> + struct mtk_msi_set *msi_set;
> + int i, hwirq, set_idx;
> +
> + mutex_lock(>lock);
> +
> + hwirq = bitmap_find_free_region(port->msi_irq_in_use, PCIE_MSI_IRQS_NUM,
> + order_base_2(nr_irqs));
> +
> + mutex_unlock(>lock);
> +
> + if (hwirq < 0)
> + return -ENOSPC;
> +
> + set_idx = hwirq / PCIE_MSI_IRQS_PER_SET;
> + msi_set = >msi_sets[set_idx];
> +
> + for (i = 0; i < nr_irqs; i++)
> + irq_domain_set_info(domain, virq + i, hwirq + i,
> + _msi_bottom_irq_chip, msi_set,
> + handle_edge_irq, NULL, NULL);
> +
> + return 0;
> +}
> +
> +static void mtk_msi_bottom_domain_free(struct irq_domain *domain,
> +unsigned int virq, unsigned int nr_irqs)
> +{
> + struct mtk_pcie_port *port = domain->host_data;
> + struct irq_data *data = irq_domain_get_irq_data(domain, virq);
> +
> + mutex_lock(>lock);
> +
> + bitmap_clear(port->msi_irq_in_use, data->hwirq, nr_irqs);

Marc, should not be there bitmap_release_region() with order_base_2()?

bitmap_release_region(port->msi_irq_in_use, data->hwirq, order_base_2(nr_irqs));

Because mtk_msi_bottom_domain_alloc() is allocating
order_base_2(nr_irqs) interrupts, not only nr_irqs.

> +
> + mutex_unlock(>lock);
> +
> + irq_domain_free_irqs_common(domain, virq, nr_irqs);
> +}

How long should be PCIe card in Warm Reset state?

2021-03-10 Thread Pali Rohár

Hello!

I would like to open a question about PCIe Warm Reset. Warm Reset of
PCIe card is triggered by asserting PERST# signal and in most cases
PERST# signal is controlled by GPIO.

Basically every native Linux PCIe controller driver is doing this Warm
Reset of connected PCIe card during native driver initialization
procedure.

And now the important question is: How long should be PCIe card in Warm
Reset state? After which timeout can be PERST# signal de-asserted by
Linux controller driver?

Lorenzo and Rob already expressed concerns [1] [2] that this Warm Reset
timeout should not be driver specific and I agree with them.

I have done investigation which timeout is using which native PCIe
driver [3] and basically every driver is using different timeout.

I have tried to find timeouts in PCIe specifications, I was not able to
understand and deduce correct timeout value for Warm Reset from PCIe
specifications. What I have found is written in my email [4].

Alex (as a "reset expert"), could you look at this issue?

Or is there somebody else who understand PCIe specifications and PCIe
diagrams to figure out what is the minimal timeout for de-asserting
PERST# signal?

There are still some issues with WiFi cards (e.g. Compex one) which
sometimes do not appear on PCIe bus. And based on these "reset timeout
differences" in Linux PCIe controller drivers, I suspect that it is not
(only) the problems in WiFi cards but also in Linux PCIe controller
drivers. In my email [3] I have written that I figured out that WLE1216
card needs to be in Warm Reset state for at least 10ms, otherwise card
is not detected.

[1] - https://lore.kernel.org/linux-pci/20200513115940.fiemtnxfqcyqo6ik@pali/
[2] - https://lore.kernel.org/linux-pci/20200507212002.GA32182@bogus/
[3] - https://lore.kernel.org/linux-pci/20200424092546.25p3hdtkehohe3xw@pali/
[4] - https://lore.kernel.org/linux-pci/20200430082245.xblvb7xeamm4e336@pali/

Re: [PATCH AUTOSEL 5.11 16/67] net: sfp: add mode quirk for GPON module Ubiquiti U-Fiber Instant

2021-03-05 Thread Pali Rohár

On Thursday 04 March 2021 17:33:01 Sasha Levin wrote:
> On Thu, Feb 25, 2021 at 08:03:06PM +0100, Pali Rohár wrote:
> > On Wednesday 24 February 2021 07:49:34 Sasha Levin wrote:
> > > From: Pali Rohár 
> > > 
> > > [ Upstream commit f0b4f847673299577c29b71d3f3acd3c313d81b7 ]
> > 
> > Hello! This commit requires also commit~1 from that patch series:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=426c6cbc409cbda9ab1a9dbf15d3c2ef947eb8c1
> > 
> > Without it kernel cannot read EEPROM from Ubiquiti U-Fiber Instant
> > module and therefore the hook based on EEPROM data which is below would
> > not be applied.
> 
> Looks like that commit is already in, thanks!

Yes! Now I see that commit in 5.11 queue. So 5.11 would be OK.

But I do not see it in 5.10 queue. In 5.10 queue I see only backport of
f0b4f8476732 commit. 426c6cbc409c seems to be still missing.

Could you check it?

Re: RFC: sysfs node for Secondary PCI bus reset (PCIe Hot Reset)

2021-03-03 Thread Pali Rohár

On Tuesday 02 March 2021 12:58:29 Alex Williamson wrote:
> On Mon, 1 Mar 2021 14:28:17 -0600
> Bjorn Helgaas  wrote:
> 
> > [+cc Alex, reset expert]
> > 
> > On Mon, Mar 01, 2021 at 06:12:21PM +0100, Pali Rohár wrote:
> > > Hello!
> > > 
> > > PCIe card can be reset via in-band Hot Reset signal which can be
> > > triggered by PCIe bridge via Secondary Bus Reset bit in PCI config
> > > space.
> > > 
> > > Kernel already exports sysfs node "reset" for triggering Functional
> > > Reset of particular function of PCI device. But in some cases Functional
> > > Reset is not enough and Hot Reset is required.
> > > 
> > > Following RFC patch exports sysfs node "reset_bus" for PCI bridges which
> > > triggers Secondary Bus Reset and therefore for PCIe bridges it resets
> > > connected PCIe card.
> > > 
> > > What do you think about it?
> > > 
> > > Currently there is userspace script which can trigger PCIe Hot Reset by
> > > modifying PCI config space from userspace:
> > > 
> > > https://alexforencich.com/wiki/en/pcie/hot-reset-linux
> > > 
> > > But because kernel already provides way how to trigger Functional Reset
> > > it could provide also way how to trigger PCIe Hot Reset.
> 
> What that script does and what this does, or what the existing reset
> attribute does, are very different.  The script finds the upstream
> bridge for a given device, removes the device (ignoring that more than
> one device might be affected by the bus reset), uses setpci to trigger
> a secondary bus reset, then rescans devices.  The below only triggers
> the secondary bus reset, neither saving and restoring affected device
> state like the existing function level reset attribute, nor removing
> and rescanning as the script does.  It simply leaves an entire
> hierarchy of PCI devices entirely un-programmed yet still has struct
> pci_devs attached to them for untold future misery.
> 
> In fact, for the case of a single device affected by the bus reset, as
> intended by the script, the existing reset attribute will already do
> that if the device supports no other reset mechanism.  There's actually
> a running LFX mentorship project that aims to allow the user to control
> the type of reset performed by the existing reset attribute such that a
> user could force the bus reset behavior over other reset methods.

Hello Alex? Do you have a link for this "reset" project? I'm interesting
in it as I'm dealing with Compex wifi cards which are causing problems.

For correct initialization I need to issue PCIe Warm Reset for these
cards (Warm Reset is done via PERST# pin which most linux controller
drivers controls via GPIO subsystem). And for now there is no way to
trigger PCIe Warm Reset for particular PCIe device from userspace. As
there is no userspace <--> kernel API for it.

> There might be some justification for an attribute that actually
> implements the referenced script correctly, perhaps in kernel we could
> avoid races with bus rescans, but simply triggering an SBR to quietly
> de-program all downstream devices with no state restore or device
> rescan is not it.  Any affected device would be unusable.  Was this
> tested?  Thanks,

I have tested my change. First I called 'remove' attribute for PCIe
card, then I called this 'bus_reset' on parent PCIe bridge and later I
called 'rescan' attribute on bridge. It correctly rested tested ath9k
card. So I did something similar as in above script. But I agree that
there are race conditions and basically lot of other calls needs to be
done to restore state.

So I see that to make it 'usable' we need to do it automatically in
kernel and also rescan/restore state of PCIe devices behind bridge after
reset...

> Alex
> 
> > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> > > index 50fcb62d59b5..f5e11c589498 100644
> > > --- a/drivers/pci/pci-sysfs.c
> > > +++ b/drivers/pci/pci-sysfs.c
> > > @@ -1321,6 +1321,30 @@ static ssize_t reset_store(struct device *dev, 
> > > struct device_attribute *attr,
> > >  
> > >  static DEVICE_ATTR(reset, 0200, NULL, reset_store);
> > >  
> > > +static ssize_t reset_bus_store(struct device *dev, struct 
> > > device_attribute *attr,
> > > +const char *buf, size_t count)
> > > +{
> > > + struct pci_dev *pdev = to_pci_dev(dev);
> > > + unsigned long val;
> > > + ssize_t result = kstrtoul(buf, 0, );
> > > +
> > > + if (result < 0)
> > > + return result;
> > > +
> > > + if

[PATCH] PCI: iproc: Fix return value of iproc_msi_irq_domain_alloc()

2021-03-03 Thread Pali Rohár

IRQ domain alloc function should return zero on success. Non-zero value
indicates failure.

Signed-off-by: Pali Rohár 
Fixes: fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs")
---
 drivers/pci/controller/pcie-iproc-msi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/controller/pcie-iproc-msi.c 
b/drivers/pci/controller/pcie-iproc-msi.c
index 908475d27e0e..eede4e8f3f75 100644
--- a/drivers/pci/controller/pcie-iproc-msi.c
+++ b/drivers/pci/controller/pcie-iproc-msi.c
@@ -271,7 +271,7 @@ static int iproc_msi_irq_domain_alloc(struct irq_domain 
*domain,
NULL, NULL);
}
 
-   return hwirq;
+   return 0;
 }
 
 static void iproc_msi_irq_domain_free(struct irq_domain *domain,
-- 
2.20.1

Re: [PATCH mvebu v3 00/10] Armada 37xx: Fix cpufreq changing base CPU speed to 800 MHz from 1000 MHz

2021-03-01 Thread Pali Rohár

Hello Gregory!

Patches are the for almost two months and more people have tested them.
They are marked with Fixed/CC-stable tags, they should go also into
stable trees as they are fixing CPU scaling and instability issues.

Are there any issues with these patches? If not, could you please merge
them for upcoming Linux version?

On Monday 22 February 2021 20:41:48 Pali Rohár wrote:
> Hello!
> 
> This is third version of patches for Armada 37xx cpufreq driver which
> fix CPU scaling with 1 GHz base frequency.
> 
> The only change in this third version is modified patch 04/10 with fixes
> for 1.2 GHz variant of Espressobin. Minimal CPU voltage in L1 load for
> 1.2 GHz variant was increased to 1.155V.
> 
> Patches are now rebased on top of the kernel version 5.11 with all
> collected Acked-by/Tested-by lines and are available also in my git
> tree in branch a3720-cpufreq-issues:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/pali/linux.git/log/?h=a3720-cpufreq-issues
> 
> If you have other Armada 3720 boards with 1.2 GHz CPU, including
> Espressobin V7, let us know if it is working fine for you.
> 
> Marek & Pali
> 
> Marek Behún (3):
>   arm64: dts: marvell: armada-37xx: add syscon compatible to NB clk node
>   cpufreq: armada-37xx: Fix setting TBG parent for load levels
>   clk: mvebu: armada-37xx-periph: remove .set_parent method for CPU PM
> clock
> 
> Pali Rohár (7):
>   cpufreq: armada-37xx: Fix the AVS value for load L1
>   clk: mvebu: armada-37xx-periph: Fix switching CPU freq from 250 Mhz to
> 1 GHz
>   clk: mvebu: armada-37xx-periph: Fix workaround for switching from L1
> to L0
>   cpufreq: armada-37xx: Fix driver cleanup when registration failed
>   cpufreq: armada-37xx: Fix determining base CPU frequency
>   cpufreq: armada-37xx: Remove cur_frequency variable
>   cpufreq: armada-37xx: Fix module unloading
> 
>  arch/arm64/boot/dts/marvell/armada-37xx.dtsi |   3 +-
>  drivers/clk/mvebu/armada-37xx-periph.c   |  83 +++---
>  drivers/cpufreq/armada-37xx-cpufreq.c| 111 +++
>  3 files changed, 135 insertions(+), 62 deletions(-)
> 
> -- 
> 2.20.1
>

RFC: sysfs node for Secondary PCI bus reset (PCIe Hot Reset)

2021-03-01 Thread Pali Rohár

Hello!

PCIe card can be reset via in-band Hot Reset signal which can be
triggered by PCIe bridge via Secondary Bus Reset bit in PCI config
space.

Kernel already exports sysfs node "reset" for triggering Functional
Reset of particular function of PCI device. But in some cases Functional
Reset is not enough and Hot Reset is required.

Following RFC patch exports sysfs node "reset_bus" for PCI bridges which
triggers Secondary Bus Reset and therefore for PCIe bridges it resets
connected PCIe card.

What do you think about it?

Currently there is userspace script which can trigger PCIe Hot Reset by
modifying PCI config space from userspace:

https://alexforencich.com/wiki/en/pcie/hot-reset-linux

But because kernel already provides way how to trigger Functional Reset
it could provide also way how to trigger PCIe Hot Reset.


diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 50fcb62d59b5..f5e11c589498 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -1321,6 +1321,30 @@ static ssize_t reset_store(struct device *dev, struct 
device_attribute *attr,
 
 static DEVICE_ATTR(reset, 0200, NULL, reset_store);
 
+static ssize_t reset_bus_store(struct device *dev, struct device_attribute 
*attr,
+  const char *buf, size_t count)
+{
+   struct pci_dev *pdev = to_pci_dev(dev);
+   unsigned long val;
+   ssize_t result = kstrtoul(buf, 0, );
+
+   if (result < 0)
+   return result;
+
+   if (val != 1)
+   return -EINVAL;
+
+   pm_runtime_get_sync(dev);
+   result = pci_bridge_secondary_bus_reset(pdev);
+   pm_runtime_put(dev);
+   if (result < 0)
+   return result;
+
+   return count;
+}
+
+static DEVICE_ATTR(reset_bus, 0200, NULL, reset_bus_store);
+
 static int pci_create_capabilities_sysfs(struct pci_dev *dev)
 {
int retval;
@@ -1332,8 +1356,15 @@ static int pci_create_capabilities_sysfs(struct pci_dev 
*dev)
if (retval)
goto error;
}
+   if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+   retval = device_create_file(>dev, _attr_reset_bus);
+   if (retval)
+   goto error_reset_bus;
+   }
return 0;
 
+error_reset_bus:
+   device_remove_file(>dev, _attr_reset);
 error:
pcie_vpd_remove_sysfs_dev_files(dev);
return retval;
@@ -1414,6 +1445,8 @@ static void pci_remove_capabilities_sysfs(struct pci_dev 
*dev)
device_remove_file(>dev, _attr_reset);
dev->reset_fn = 0;
}
+   if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE)
+   device_remove_file(>dev, _attr_reset_bus);
 }
 
 /**

Re: [PATCH AUTOSEL 5.11 16/67] net: sfp: add mode quirk for GPON module Ubiquiti U-Fiber Instant

2021-02-25 Thread Pali Rohár

On Wednesday 24 February 2021 07:49:34 Sasha Levin wrote:
> From: Pali Rohár 
> 
> [ Upstream commit f0b4f847673299577c29b71d3f3acd3c313d81b7 ]

Hello! This commit requires also commit~1 from that patch series:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=426c6cbc409cbda9ab1a9dbf15d3c2ef947eb8c1

Without it kernel cannot read EEPROM from Ubiquiti U-Fiber Instant
module and therefore the hook based on EEPROM data which is below would
not be applied.

> The Ubiquiti U-Fiber Instant SFP GPON module has nonsensical information
> stored in its EEPROM. It claims to support all transceiver types including
> 10G Ethernet. Clear all claimed modes and set only 1000baseX_Full, which is
> the only one supported.
> 
> This module has also phys_id set to SFF, and the SFP subsystem currently
> does not allow to use SFP modules detected as SFFs. Add exception for this
> module so it can be detected as supported.
> 
> This change finally allows to detect and use SFP GPON module Ubiquiti
> U-Fiber Instant on Linux system.
> 
> EEPROM content of this SFP module is (where XX is serial number):
> 
> 00: 02 04 0b ff ff ff ff ff ff ff ff 03 0c 00 14 c8?????.??
> 10: 00 00 00 00 55 42 4e 54 20 20 20 20 20 20 20 20UBNT
> 20: 20 20 20 20 00 18 e8 29 55 46 2d 49 4e 53 54 41.??)UF-INSTA
> 30: 4e 54 20 20 20 20 20 20 34 20 20 20 05 1e 00 36NT  4   ??.6
> 40: 00 06 00 00 55 42 4e 54 XX XX XX XX XX XX XX XX.?..UBNT
> 50: 20 20 20 20 31 34 30 31 32 33 20 20 60 80 02 41140123  `??A
> 
> Signed-off-by: Pali Rohár 
> Signed-off-by: Jakub Kicinski 
> Signed-off-by: Sasha Levin 
> ---
>  drivers/net/phy/sfp-bus.c | 15 +++
>  drivers/net/phy/sfp.c | 17 +++--
>  2 files changed, 30 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/phy/sfp-bus.c b/drivers/net/phy/sfp-bus.c
> index 20b91f5dfc6ed..4cf874fb5c5b4 100644
> --- a/drivers/net/phy/sfp-bus.c
> +++ b/drivers/net/phy/sfp-bus.c
> @@ -44,6 +44,17 @@ static void sfp_quirk_2500basex(const struct sfp_eeprom_id 
> *id,
>   phylink_set(modes, 2500baseX_Full);
>  }
>  
> +static void sfp_quirk_ubnt_uf_instant(const struct sfp_eeprom_id *id,
> +   unsigned long *modes)
> +{
> + /* Ubiquiti U-Fiber Instant module claims that support all transceiver
> +  * types including 10G Ethernet which is not truth. So clear all claimed
> +  * modes and set only one mode which module supports: 1000baseX_Full.
> +  */
> + phylink_zero(modes);
> + phylink_set(modes, 1000baseX_Full);
> +}
> +
>  static const struct sfp_quirk sfp_quirks[] = {
>   {
>   // Alcatel Lucent G-010S-P can operate at 2500base-X, but
> @@ -63,6 +74,10 @@ static const struct sfp_quirk sfp_quirks[] = {
>   .vendor = "HUAWEI",
>   .part = "MA5671A",
>   .modes = sfp_quirk_2500basex,
> + }, {
> + .vendor = "UBNT",
> + .part = "UF-INSTANT",
> + .modes = sfp_quirk_ubnt_uf_instant,
>   },
>  };
>  
> diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c
> index 91d74c1a920ab..804295ad8a044 100644
> --- a/drivers/net/phy/sfp.c
> +++ b/drivers/net/phy/sfp.c
> @@ -273,8 +273,21 @@ static const struct sff_data sff_data = {
>  
>  static bool sfp_module_supported(const struct sfp_eeprom_id *id)
>  {
> - return id->base.phys_id == SFF8024_ID_SFP &&
> -id->base.phys_ext_id == SFP_PHYS_EXT_ID_SFP;
> + if (id->base.phys_id == SFF8024_ID_SFP &&
> + id->base.phys_ext_id == SFP_PHYS_EXT_ID_SFP)
> + return true;
> +
> + /* SFP GPON module Ubiquiti U-Fiber Instant has in its EEPROM stored
> +  * phys id SFF instead of SFP. Therefore mark this module explicitly
> +  * as supported based on vendor name and pn match.
> +  */
> + if (id->base.phys_id == SFF8024_ID_SFF_8472 &&
> + id->base.phys_ext_id == SFP_PHYS_EXT_ID_SFP &&
> + !memcmp(id->base.vendor_name, "UBNT", 16) &&
> + !memcmp(id->base.vendor_pn, "UF-INSTANT  ", 16))
> + return true;
> +
> + return false;
>  }
>  
>  static const struct sff_data sfp_data = {
> -- 
> 2.27.0
> 

On Wednesday 24 February 2021 07:51:28 Sasha Levin wrote:
> From: Pali Rohár 
> 
> [ Upstream commit f0b4f847673299577c29b71d3f3acd3c313d81b7 ]
> 
> The Ubiquiti U-Fiber Instant SFP GPON module has nonsensical information
> stored in its EEPROM. It claims to support all transceiver types including
> 10G Eth

Re: [PATCH 1/3] power: supply: bq27xxx: fix sign of current_now for newer ICs

2021-02-23 Thread Pali Rohár

On Tuesday 23 February 2021 15:49:46 Andreas Kemnade wrote:
> On Tue, 23 Feb 2021 15:11:20 +0100
> Matthias Schiffer  wrote:
> 
> > Commit cd060b4d0868 ("power: supply: bq27xxx: fix polarity of current_now")
> > changed the sign of current_now for all bq27xxx variants, but on BQ28Z610
> > I'm now seeing negated values *with* that patch.
> > 
> > The GTA04/Openmoko device that was used for testing uses a BQ27000 or
> > BQ27010 IC, so I assume only the BQ27XXX_O_ZERO code path was incorrect.
> > Revert the behaviour for newer ICs.
> > 
> > Fixes: cd060b4d0868 "power: supply: bq27xxx: fix polarity of current_now"
> > Signed-off-by: Matthias Schiffer 
> > ---
> > 
> > @Andreas Kemnade: It would be great to get a confirmation that the
> > Openmoko battery indeed uses BQ27000/BQ27010 - I was having some trouble
> > finding that information.
> > 
> I can confirm that.
> here is the corresponding schematic:
> 
> http://people.openmoko.org/tony_tu/GTA02/hardware/GTA02/CT-GTA02.pdf
> 
> Regards,
> Andreas

Nokia N900 has BQ27200 connected via i2c. CCing Maemo mailing list 
maemo-le...@lists.dyne.org

[PATCH mvebu v3 10/10] cpufreq: armada-37xx: Fix module unloading

2021-02-22 Thread Pali Rohár

This driver is missing module_exit hook. Add proper driver exit function
which unregisters the platform device and cleans up the data.

Signed-off-by: Pali Rohár 
Tested-by: Tomasz Maciej Nowak 
Tested-by: Anders Trier Olesen 
Tested-by: Philip Soares 
---
 drivers/cpufreq/armada-37xx-cpufreq.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/drivers/cpufreq/armada-37xx-cpufreq.c 
b/drivers/cpufreq/armada-37xx-cpufreq.c
index 050abff18308..3fc98a3ffd91 100644
--- a/drivers/cpufreq/armada-37xx-cpufreq.c
+++ b/drivers/cpufreq/armada-37xx-cpufreq.c
@@ -86,6 +86,8 @@ static int avs_map[] = {
 };
 
 struct armada37xx_cpufreq_state {
+   struct platform_device *pdev;
+   struct device *cpu_dev;
struct regmap *regmap;
u32 nb_l0l1;
u32 nb_l2l3;
@@ -506,6 +508,9 @@ static int __init armada37xx_cpufreq_driver_init(void)
if (ret)
goto disable_dvfs;
 
+   armada37xx_cpufreq_state->cpu_dev = cpu_dev;
+   armada37xx_cpufreq_state->pdev = pdev;
+   platform_set_drvdata(pdev, dvfs);
return 0;
 
 disable_dvfs:
@@ -524,6 +529,26 @@ static int __init armada37xx_cpufreq_driver_init(void)
 /* late_initcall, to guarantee the driver is loaded after A37xx clock driver */
 late_initcall(armada37xx_cpufreq_driver_init);
 
+static void __exit armada37xx_cpufreq_driver_exit(void)
+{
+   struct platform_device *pdev = armada37xx_cpufreq_state->pdev;
+   struct armada_37xx_dvfs *dvfs = platform_get_drvdata(pdev);
+   unsigned long freq;
+   int load_lvl;
+
+   platform_device_unregister(pdev);
+
+   armada37xx_cpufreq_disable_dvfs(armada37xx_cpufreq_state->regmap);
+
+   for (load_lvl = ARMADA_37XX_DVFS_LOAD_0; load_lvl < LOAD_LEVEL_NR; 
load_lvl++) {
+   freq = dvfs->cpu_freq_max / dvfs->divider[load_lvl];
+   dev_pm_opp_remove(armada37xx_cpufreq_state->cpu_dev, freq);
+   }
+
+   kfree(armada37xx_cpufreq_state);
+}
+module_exit(armada37xx_cpufreq_driver_exit);
+
 static const struct of_device_id __maybe_unused armada37xx_cpufreq_of_match[] 
= {
{ .compatible = "marvell,armada-3700-nb-pm" },
{ },
-- 
2.20.1

[PATCH mvebu v3 08/10] cpufreq: armada-37xx: Fix determining base CPU frequency

2021-02-22 Thread Pali Rohár

When current CPU load is not L0 then loading armada-37xx-cpufreq.ko driver
fails with following error:

# modprobe armada-37xx-cpufreq
[  502.702097] Unsupported CPU frequency 250 MHz

This issue was partially fixed by commit 8db82563451f ("cpufreq:
armada-37xx: fix frequency calculation for opp"), but only for calculating
CPU frequency for opp.

Fix this also for determination of base CPU frequency.

Signed-off-by: Pali Rohár 
Tested-by: Tomasz Maciej Nowak 
Tested-by: Anders Trier Olesen 
Tested-by: Philip Soares 
Fixes: 92ce45fb875d ("cpufreq: Add DVFS support for Armada 37xx")
Cc: sta...@vger.kernel.org # 8db82563451f ("cpufreq: armada-37xx: fix frequency 
calculation for opp")
---
 drivers/cpufreq/armada-37xx-cpufreq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/cpufreq/armada-37xx-cpufreq.c 
b/drivers/cpufreq/armada-37xx-cpufreq.c
index 1ab2113daef5..e4782f562e7a 100644
--- a/drivers/cpufreq/armada-37xx-cpufreq.c
+++ b/drivers/cpufreq/armada-37xx-cpufreq.c
@@ -469,7 +469,7 @@ static int __init armada37xx_cpufreq_driver_init(void)
return -EINVAL;
}
 
-   dvfs = armada_37xx_cpu_freq_info_get(cur_frequency);
+   dvfs = armada_37xx_cpu_freq_info_get(base_frequency);
if (!dvfs) {
clk_put(clk);
return -EINVAL;
-- 
2.20.1

[PATCH mvebu v3 07/10] cpufreq: armada-37xx: Fix driver cleanup when registration failed

2021-02-22 Thread Pali Rohár

Commit 8db82563451f ("cpufreq: armada-37xx: fix frequency calculation for
opp") changed calculation of frequency passed to the dev_pm_opp_add()
function call. But the code for dev_pm_opp_remove() function call was not
updated, so the driver cleanup phase does not work when registration fails.

This fixes the issue by using the same frequency in both calls.

Signed-off-by: Pali Rohár 
Tested-by: Tomasz Maciej Nowak 
Tested-by: Anders Trier Olesen 
Tested-by: Philip Soares 
Fixes: 8db82563451f ("cpufreq: armada-37xx: fix frequency calculation for opp")
Cc: sta...@vger.kernel.org
---
 drivers/cpufreq/armada-37xx-cpufreq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/cpufreq/armada-37xx-cpufreq.c 
b/drivers/cpufreq/armada-37xx-cpufreq.c
index c7683d447b11..1ab2113daef5 100644
--- a/drivers/cpufreq/armada-37xx-cpufreq.c
+++ b/drivers/cpufreq/armada-37xx-cpufreq.c
@@ -521,7 +521,7 @@ static int __init armada37xx_cpufreq_driver_init(void)
 remove_opp:
/* clean-up the already added opp before leaving */
while (load_lvl-- > ARMADA_37XX_DVFS_LOAD_0) {
-   freq = cur_frequency / dvfs->divider[load_lvl];
+   freq = base_frequency / dvfs->divider[load_lvl];
dev_pm_opp_remove(cpu_dev, freq);
}
 
-- 
2.20.1

[PATCH mvebu v3 06/10] clk: mvebu: armada-37xx-periph: Fix workaround for switching from L1 to L0

2021-02-22 Thread Pali Rohár

When CPU frequency is at 250 MHz and set_rate() is called with 500 MHz (L1)
quickly followed by a call with 1 GHz (L0), the CPU does not necessarily
stay in L1 for at least 20ms as is required by Marvell errata.

This situation happens frequently with the ondemand cpufreq governor and
can be also reproduced with userspace governor. In most cases it causes CPU
to crash.

This change fixes the above issue and ensures that the CPU always stays in
L1 for at least 20ms when switching from any state to L0.

Signed-off-by: Marek Behún 
Signed-off-by: Pali Rohár 
Acked-by: Stephen Boyd 
Tested-by: Tomasz Maciej Nowak 
Tested-by: Anders Trier Olesen 
Tested-by: Philip Soares 
Fixes: 61c40f35f5cd ("clk: mvebu: armada-37xx-periph: Fix switching CPU rate 
from 300Mhz to 1.2GHz")
Cc: sta...@vger.kernel.org
---
 drivers/clk/mvebu/armada-37xx-periph.c | 45 ++
 1 file changed, 39 insertions(+), 6 deletions(-)

diff --git a/drivers/clk/mvebu/armada-37xx-periph.c 
b/drivers/clk/mvebu/armada-37xx-periph.c
index b15e177bea7e..32ac6b6b7530 100644
--- a/drivers/clk/mvebu/armada-37xx-periph.c
+++ b/drivers/clk/mvebu/armada-37xx-periph.c
@@ -84,6 +84,7 @@ struct clk_pm_cpu {
void __iomem *reg_div;
u8 shift_div;
struct regmap *nb_pm_base;
+   unsigned long l1_expiration;
 };
 
 #define to_clk_double_div(_hw) container_of(_hw, struct clk_double_div, hw)
@@ -504,22 +505,52 @@ static long clk_pm_cpu_round_rate(struct clk_hw *hw, 
unsigned long rate,
  * 2. Sleep 20ms for stabling VDD voltage
  * 3. Then switch from L1 (500/600 MHz) to L0 (1000/1200 MHz).
  */
-static void clk_pm_cpu_set_rate_wa(unsigned long rate, struct regmap *base)
+static void clk_pm_cpu_set_rate_wa(struct clk_pm_cpu *pm_cpu,
+  unsigned int new_level, unsigned long rate,
+  struct regmap *base)
 {
unsigned int cur_level;
 
-   if (rate < 1000 * 1000 * 1000)
-   return;
-
regmap_read(base, ARMADA_37XX_NB_CPU_LOAD, _level);
cur_level &= ARMADA_37XX_NB_CPU_LOAD_MASK;
-   if (cur_level <= ARMADA_37XX_DVFS_LOAD_1)
+
+   if (cur_level == new_level)
+   return;
+
+   /*
+* System wants to go to L1 on its own. If we are going from L2/L3,
+* remember when 20ms will expire. If from L0, set the value so that
+* next switch to L0 won't have to wait.
+*/
+   if (new_level == ARMADA_37XX_DVFS_LOAD_1) {
+   if (cur_level == ARMADA_37XX_DVFS_LOAD_0)
+   pm_cpu->l1_expiration = jiffies;
+   else
+   pm_cpu->l1_expiration = jiffies + msecs_to_jiffies(20);
return;
+   }
+
+   /*
+* If we are setting to L2/L3, just invalidate L1 expiration time,
+* sleeping is not needed.
+*/
+   if (rate < 1000*1000*1000)
+   goto invalidate_l1_exp;
+
+   /*
+* We are going to L0 with rate >= 1GHz. Check whether we have been at
+* L1 for long enough time. If not, go to L1 for 20ms.
+*/
+   if (pm_cpu->l1_expiration && jiffies >= pm_cpu->l1_expiration)
+   goto invalidate_l1_exp;
 
regmap_update_bits(base, ARMADA_37XX_NB_CPU_LOAD,
   ARMADA_37XX_NB_CPU_LOAD_MASK,
   ARMADA_37XX_DVFS_LOAD_1);
msleep(20);
+
+invalidate_l1_exp:
+   pm_cpu->l1_expiration = 0;
 }
 
 static int clk_pm_cpu_set_rate(struct clk_hw *hw, unsigned long rate,
@@ -553,7 +584,9 @@ static int clk_pm_cpu_set_rate(struct clk_hw *hw, unsigned 
long rate,
reg = ARMADA_37XX_NB_CPU_LOAD;
mask = ARMADA_37XX_NB_CPU_LOAD_MASK;
 
-   clk_pm_cpu_set_rate_wa(rate, base);
+   /* Apply workaround when base CPU frequency is 1000 or 
1200 MHz */
+   if (parent_rate >= 1000*1000*1000)
+   clk_pm_cpu_set_rate_wa(pm_cpu, load_level, 
rate, base);
 
regmap_update_bits(base, reg, mask, load_level);
 
-- 
2.20.1

[PATCH mvebu v3 09/10] cpufreq: armada-37xx: Remove cur_frequency variable

2021-02-22 Thread Pali Rohár

Variable cur_frequency in armada37xx_cpufreq_driver_init() is unused.

Signed-off-by: Pali Rohár 
Tested-by: Tomasz Maciej Nowak 
Tested-by: Anders Trier Olesen 
Tested-by: Philip Soares 
---
 drivers/cpufreq/armada-37xx-cpufreq.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/drivers/cpufreq/armada-37xx-cpufreq.c 
b/drivers/cpufreq/armada-37xx-cpufreq.c
index e4782f562e7a..050abff18308 100644
--- a/drivers/cpufreq/armada-37xx-cpufreq.c
+++ b/drivers/cpufreq/armada-37xx-cpufreq.c
@@ -400,7 +400,7 @@ static int __init armada37xx_cpufreq_driver_init(void)
struct armada_37xx_dvfs *dvfs;
struct platform_device *pdev;
unsigned long freq;
-   unsigned int cur_frequency, base_frequency;
+   unsigned int base_frequency;
struct regmap *nb_clk_base, *nb_pm_base, *avs_base;
struct device *cpu_dev;
int load_lvl, ret;
@@ -461,14 +461,6 @@ static int __init armada37xx_cpufreq_driver_init(void)
return -EINVAL;
}
 
-   /* Get nominal (current) CPU frequency */
-   cur_frequency = clk_get_rate(clk);
-   if (!cur_frequency) {
-   dev_err(cpu_dev, "Failed to get clock rate for CPU\n");
-   clk_put(clk);
-   return -EINVAL;
-   }
-
dvfs = armada_37xx_cpu_freq_info_get(base_frequency);
if (!dvfs) {
clk_put(clk);
-- 
2.20.1

[PATCH mvebu v3 05/10] clk: mvebu: armada-37xx-periph: Fix switching CPU freq from 250 Mhz to 1 GHz

2021-02-22 Thread Pali Rohár

It was observed that the workaround introduced by commit 61c40f35f5cd
("clk: mvebu: armada-37xx-periph: Fix switching CPU rate from 300Mhz to
1.2GHz") when base CPU frequency is 1.2 GHz is also required when base
CPU frequency is 1 GHz. Otherwise switching CPU frequency directly from
L2 (250 MHz) to L0 (1 GHz) causes a crash.

When base CPU frequency is just 800 MHz no crashed were observed during
switch from L2 to L0.

Signed-off-by: Pali Rohár 
Acked-by: Stephen Boyd 
Tested-by: Tomasz Maciej Nowak 
Tested-by: Anders Trier Olesen 
Tested-by: Philip Soares 
Fixes: 2089dc33ea0e ("clk: mvebu: armada-37xx-periph: add DVFS support for cpu 
clocks")
Cc: sta...@vger.kernel.org # 61c40f35f5cd ("clk: mvebu: armada-37xx-periph: Fix 
switching CPU rate from 300Mhz to 1.2GHz")
---
 drivers/clk/mvebu/armada-37xx-periph.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/clk/mvebu/armada-37xx-periph.c 
b/drivers/clk/mvebu/armada-37xx-periph.c
index 6507bd2c5f31..b15e177bea7e 100644
--- a/drivers/clk/mvebu/armada-37xx-periph.c
+++ b/drivers/clk/mvebu/armada-37xx-periph.c
@@ -487,8 +487,10 @@ static long clk_pm_cpu_round_rate(struct clk_hw *hw, 
unsigned long rate,
 }
 
 /*
- * Switching the CPU from the L2 or L3 frequencies (300 and 200 Mhz
- * respectively) to L0 frequency (1.2 Ghz) requires a significant
+ * Workaround when base CPU frequnecy is 1000 or 1200 MHz
+ *
+ * Switching the CPU from the L2 or L3 frequencies (250/300 or 200 MHz
+ * respectively) to L0 frequency (1/1.2 GHz) requires a significant
  * amount of time to let VDD stabilize to the appropriate
  * voltage. This amount of time is large enough that it cannot be
  * covered by the hardware countdown register. Due to this, the CPU
@@ -498,15 +500,15 @@ static long clk_pm_cpu_round_rate(struct clk_hw *hw, 
unsigned long rate,
  * To work around this problem, we prevent switching directly from the
  * L2/L3 frequencies to the L0 frequency, and instead switch to the L1
  * frequency in-between. The sequence therefore becomes:
- * 1. First switch from L2/L3(200/300MHz) to L1(600MHZ)
+ * 1. First switch from L2/L3 (200/250/300 MHz) to L1 (500/600 MHz)
  * 2. Sleep 20ms for stabling VDD voltage
- * 3. Then switch from L1(600MHZ) to L0(1200Mhz).
+ * 3. Then switch from L1 (500/600 MHz) to L0 (1000/1200 MHz).
  */
 static void clk_pm_cpu_set_rate_wa(unsigned long rate, struct regmap *base)
 {
unsigned int cur_level;
 
-   if (rate != 1200 * 1000 * 1000)
+   if (rate < 1000 * 1000 * 1000)
return;
 
regmap_read(base, ARMADA_37XX_NB_CPU_LOAD, _level);
-- 
2.20.1

[PATCH mvebu v3 03/10] clk: mvebu: armada-37xx-periph: remove .set_parent method for CPU PM clock

2021-02-22 Thread Pali Rohár

From: Marek Behún 

Remove the .set_parent method in clk_pm_cpu_ops.

This method was supposed to be needed by the armada-37xx-cpufreq driver,
but was never actually called due to wrong assumptions in the cpufreq
driver. After this was fixed in the cpufreq driver, this method is not
needed anymore.

Signed-off-by: Marek Behún 
Acked-by: Stephen Boyd 
Tested-by: Pali Rohár 
Tested-by: Tomasz Maciej Nowak 
Tested-by: Anders Trier Olesen 
Tested-by: Philip Soares 
Fixes: 2089dc33ea0e ("clk: mvebu: armada-37xx-periph: add DVFS support for cpu 
clocks")
---
 drivers/clk/mvebu/armada-37xx-periph.c | 28 --
 1 file changed, 28 deletions(-)

diff --git a/drivers/clk/mvebu/armada-37xx-periph.c 
b/drivers/clk/mvebu/armada-37xx-periph.c
index f5746f9ea929..6507bd2c5f31 100644
--- a/drivers/clk/mvebu/armada-37xx-periph.c
+++ b/drivers/clk/mvebu/armada-37xx-periph.c
@@ -440,33 +440,6 @@ static u8 clk_pm_cpu_get_parent(struct clk_hw *hw)
return val;
 }
 
-static int clk_pm_cpu_set_parent(struct clk_hw *hw, u8 index)
-{
-   struct clk_pm_cpu *pm_cpu = to_clk_pm_cpu(hw);
-   struct regmap *base = pm_cpu->nb_pm_base;
-   int load_level;
-
-   /*
-* We set the clock parent only if the DVFS is available but
-* not enabled.
-*/
-   if (IS_ERR(base) || armada_3700_pm_dvfs_is_enabled(base))
-   return -EINVAL;
-
-   /* Set the parent clock for all the load level */
-   for (load_level = 0; load_level < LOAD_LEVEL_NR; load_level++) {
-   unsigned int reg, mask,  val,
-   offset = ARMADA_37XX_NB_TBG_SEL_OFF;
-
-   armada_3700_pm_dvfs_update_regs(load_level, , );
-
-   val = index << offset;
-   mask = ARMADA_37XX_NB_TBG_SEL_MASK << offset;
-   regmap_update_bits(base, reg, mask, val);
-   }
-   return 0;
-}
-
 static unsigned long clk_pm_cpu_recalc_rate(struct clk_hw *hw,
unsigned long parent_rate)
 {
@@ -592,7 +565,6 @@ static int clk_pm_cpu_set_rate(struct clk_hw *hw, unsigned 
long rate,
 
 static const struct clk_ops clk_pm_cpu_ops = {
.get_parent = clk_pm_cpu_get_parent,
-   .set_parent = clk_pm_cpu_set_parent,
.round_rate = clk_pm_cpu_round_rate,
.set_rate = clk_pm_cpu_set_rate,
.recalc_rate = clk_pm_cpu_recalc_rate,
-- 
2.20.1

[PATCH mvebu v3 02/10] cpufreq: armada-37xx: Fix setting TBG parent for load levels

2021-02-22 Thread Pali Rohár

From: Marek Behún 

With CPU frequency determining software [1] we have discovered that
after this driver does one CPU frequency change, the base frequency of
the CPU is set to the frequency of TBG-A-P clock, instead of the TBG
that is parent to the CPU.

This can be reproduced on EspressoBIN and Turris MOX:
  cd /sys/devices/system/cpu/cpufreq/policy0
  echo powersave >scaling_governor
  echo performance >scaling_governor

Running the mhz tool before this driver is loaded reports 1000 MHz, and
after loading the driver and executing commands above the tool reports
800 MHz.

The change of TBG clock selector is supposed to happen in function
armada37xx_cpufreq_dvfs_setup. Before the function returns, it does
this:
  parent = clk_get_parent(clk);
  clk_set_parent(clk, parent);

The armada-37xx-periph clock driver has the .set_parent method
implemented correctly for this, so if the method was actually called,
this would work. But since the introduction of the common clock
framework in commit b2476490ef11 ("clk: introduce the common clock..."),
the clk_set_parent function checks whether the parent is actually
changing, and if the requested new parent is same as the old parent
(which is obviously the case for the code above), the .set_parent method
is not called at all.

This patch fixes this issue by filling the correct TBG clock selector
directly in the armada37xx_cpufreq_dvfs_setup during the filling of
other registers at the same address. But the determination of CPU TBG
index cannot be done via the common clock framework, therefore we need
to access the North Bridge Peripheral Clock registers directly in this
driver.

[1] https://github.com/wtarreau/mhz

Signed-off-by: Marek Behún 
Tested-by: Pali Rohár 
Tested-by: Tomasz Maciej Nowak 
Tested-by: Anders Trier Olesen 
Tested-by: Philip Soares 
Fixes: 92ce45fb875d ("cpufreq: Add DVFS support for Armada 37xx")
Cc: sta...@vger.kernel.org
---
 drivers/cpufreq/armada-37xx-cpufreq.c | 35 ++-
 1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/drivers/cpufreq/armada-37xx-cpufreq.c 
b/drivers/cpufreq/armada-37xx-cpufreq.c
index b4af4094309b..b8dc6c849579 100644
--- a/drivers/cpufreq/armada-37xx-cpufreq.c
+++ b/drivers/cpufreq/armada-37xx-cpufreq.c
@@ -25,6 +25,10 @@
 
 #include "cpufreq-dt.h"
 
+/* Clk register set */
+#define ARMADA_37XX_CLK_TBG_SEL0
+#define ARMADA_37XX_CLK_TBG_SEL_CPU_OFF22
+
 /* Power management in North Bridge register set */
 #define ARMADA_37XX_NB_L0L10x18
 #define ARMADA_37XX_NB_L2L30x1C
@@ -120,10 +124,15 @@ static struct armada_37xx_dvfs 
*armada_37xx_cpu_freq_info_get(u32 freq)
  * will be configured then the DVFS will be enabled.
  */
 static void __init armada37xx_cpufreq_dvfs_setup(struct regmap *base,
-struct clk *clk, u8 *divider)
+struct regmap *clk_base, u8 
*divider)
 {
+   u32 cpu_tbg_sel;
int load_lvl;
-   struct clk *parent;
+
+   /* Determine to which TBG clock is CPU connected */
+   regmap_read(clk_base, ARMADA_37XX_CLK_TBG_SEL, _tbg_sel);
+   cpu_tbg_sel >>= ARMADA_37XX_CLK_TBG_SEL_CPU_OFF;
+   cpu_tbg_sel &= ARMADA_37XX_NB_TBG_SEL_MASK;
 
for (load_lvl = 0; load_lvl < LOAD_LEVEL_NR; load_lvl++) {
unsigned int reg, mask, val, offset = 0;
@@ -142,6 +151,11 @@ static void __init armada37xx_cpufreq_dvfs_setup(struct 
regmap *base,
mask = (ARMADA_37XX_NB_CLK_SEL_MASK
<< ARMADA_37XX_NB_CLK_SEL_OFF);
 
+   /* Set TBG index, for all levels we use the same TBG */
+   val = cpu_tbg_sel << ARMADA_37XX_NB_TBG_SEL_OFF;
+   mask = (ARMADA_37XX_NB_TBG_SEL_MASK
+   << ARMADA_37XX_NB_TBG_SEL_OFF);
+
/*
 * Set cpu divider based on the pre-computed array in
 * order to have balanced step.
@@ -160,14 +174,6 @@ static void __init armada37xx_cpufreq_dvfs_setup(struct 
regmap *base,
 
regmap_update_bits(base, reg, mask, val);
}
-
-   /*
-* Set cpu clock source, for all the level we keep the same
-* clock source that the one already configured. For this one
-* we need to use the clock framework
-*/
-   parent = clk_get_parent(clk);
-   clk_set_parent(clk, parent);
 }
 
 /*
@@ -358,11 +364,16 @@ static int __init armada37xx_cpufreq_driver_init(void)
struct platform_device *pdev;
unsigned long freq;
unsigned int cur_frequency, base_frequency;
-   struct regmap *nb_pm_base, *avs_base;
+   struct regmap *nb_clk_base, *nb_pm_base, *avs_base;
struct device *cpu_dev;
int load_lvl, ret;
struct clk *clk, *parent;
 
+   nb_clk_base =
+   
syscon_regmap_lookup_by_compatible(&qu

[PATCH mvebu v3 04/10] cpufreq: armada-37xx: Fix the AVS value for load L1

2021-02-22 Thread Pali Rohár

The original CPU voltage value for load L1 is too low for Armada 37xx SoC
when base CPU frequency is 1000 or 1200 MHz. It leads to instabilities
where CPU gets stuck soon after dynamic voltage scaling from load L1 to L0.

Update the CPU voltage value for load L1 accordingly when base frequency is
1000 or 1200 MHz. The minimal L1 value for base CPU frequency 1000 MHz is
updated from the original 1.05V to 1.108V and for 1200 MHz is updated to
1.155V. This minimal L1 value is used only in the case when it is lower
than value for L0.

This change fixes CPU instability issues on 1 GHz and 1.2 GHz variants of
Espressobin and 1 GHz Turris Mox.

Marvell previously for 1 GHz variant of Espressobin provided a patch [1]
suitable only for their Marvell Linux kernel 4.4 fork which workarounded
this issue. Patch forced CPU voltage value to 1.108V in all loads. But
such change does not fix CPU instability issues on 1.2 GHz variants of
Armada 3720 SoC.

During testing we come to the conclusion that using 1.108V as minimal
value for L1 load makes 1 GHz variants of Espressobin and Turris Mox boards
stable. And similarly 1.155V for 1.2 GHz variant of Espressobin.

These two values 1.108V and 1.155V are documented in Armada 3700 Hardware
Specifications as typical initial CPU voltage values.

Discussion about this issue is also at the Armbian forum [2].

[1] - 
https://github.com/MarvellEmbeddedProcessors/linux-marvell/commit/dc33b62c90696afb6adc7dbcc4ebbd48bedec269
[2] - https://forum.armbian.com/topic/10429-how-to-make-espressobin-v7-stable/

Signed-off-by: Pali Rohár 
Tested-by: Tomasz Maciej Nowak 
Tested-by: Anders Trier Olesen 
Tested-by: Philip Soares 
Fixes: 1c3528232f4b ("cpufreq: armada-37xx: Add AVS support")
Cc: sta...@vger.kernel.org
---
 drivers/cpufreq/armada-37xx-cpufreq.c | 37 +++
 1 file changed, 37 insertions(+)

diff --git a/drivers/cpufreq/armada-37xx-cpufreq.c 
b/drivers/cpufreq/armada-37xx-cpufreq.c
index b8dc6c849579..c7683d447b11 100644
--- a/drivers/cpufreq/armada-37xx-cpufreq.c
+++ b/drivers/cpufreq/armada-37xx-cpufreq.c
@@ -73,6 +73,8 @@
 #define LOAD_LEVEL_NR  4
 
 #define MIN_VOLT_MV 1000
+#define MIN_VOLT_MV_FOR_L1_1000MHZ 1108
+#define MIN_VOLT_MV_FOR_L1_1200MHZ 1155
 
 /*  AVS value for the corresponding voltage (in mV) */
 static int avs_map[] = {
@@ -208,6 +210,8 @@ static u32 armada_37xx_avs_val_match(int target_vm)
  * - L2 & L3 voltage should be about 150mv smaller than L0 voltage.
  * This function calculates L1 & L2 & L3 AVS values dynamically based
  * on L0 voltage and fill all AVS values to the AVS value table.
+ * When base CPU frequency is 1000 or 1200 MHz then there is additional
+ * minimal avs value for load L1.
  */
 static void __init armada37xx_cpufreq_avs_configure(struct regmap *base,
struct armada_37xx_dvfs *dvfs)
@@ -239,6 +243,19 @@ static void __init armada37xx_cpufreq_avs_configure(struct 
regmap *base,
for (load_level = 1; load_level < LOAD_LEVEL_NR; load_level++)
dvfs->avs[load_level] = avs_min;
 
+   /*
+* Set the avs values for load L0 and L1 when base CPU frequency
+* is 1000/1200 MHz to its typical initial values according to
+* the Armada 3700 Hardware Specifications.
+*/
+   if (dvfs->cpu_freq_max >= 1000*1000*1000) {
+   if (dvfs->cpu_freq_max >= 1200*1000*1000)
+   avs_min = 
armada_37xx_avs_val_match(MIN_VOLT_MV_FOR_L1_1200MHZ);
+   else
+   avs_min = 
armada_37xx_avs_val_match(MIN_VOLT_MV_FOR_L1_1000MHZ);
+   dvfs->avs[0] = dvfs->avs[1] = avs_min;
+   }
+
return;
}
 
@@ -258,6 +275,26 @@ static void __init armada37xx_cpufreq_avs_configure(struct 
regmap *base,
target_vm = avs_map[l0_vdd_min] - 150;
target_vm = target_vm > MIN_VOLT_MV ? target_vm : MIN_VOLT_MV;
dvfs->avs[2] = dvfs->avs[3] = armada_37xx_avs_val_match(target_vm);
+
+   /*
+* Fix the avs value for load L1 when base CPU frequency is 1000/1200 
MHz,
+* otherwise the CPU gets stuck when switching from load L1 to load L0.
+* Also ensure that avs value for load L1 is not higher than for L0.
+*/
+   if (dvfs->cpu_freq_max >= 1000*1000*1000) {
+   u32 avs_min_l1;
+
+   if (dvfs->cpu_freq_max >= 1200*1000*1000)
+   avs_min_l1 = 
armada_37xx_avs_val_match(MIN_VOLT_MV_FOR_L1_1200MHZ);
+   else
+   avs_min_l1 = 
armada_37xx_avs_val_match(MIN_VOLT_MV_FOR_L1_1000MHZ);
+
+   if (avs_min_l1 > dvfs->avs[0])
+   avs_min_l1 = dvfs->avs[0];
+
+   if (dvfs->avs[1] < avs_min_l1)
+

[PATCH mvebu v3 00/10] Armada 37xx: Fix cpufreq changing base CPU speed to 800 MHz from 1000 MHz

2021-02-22 Thread Pali Rohár

Hello!

This is third version of patches for Armada 37xx cpufreq driver which
fix CPU scaling with 1 GHz base frequency.

The only change in this third version is modified patch 04/10 with fixes
for 1.2 GHz variant of Espressobin. Minimal CPU voltage in L1 load for
1.2 GHz variant was increased to 1.155V.

Patches are now rebased on top of the kernel version 5.11 with all
collected Acked-by/Tested-by lines and are available also in my git
tree in branch a3720-cpufreq-issues:

https://git.kernel.org/pub/scm/linux/kernel/git/pali/linux.git/log/?h=a3720-cpufreq-issues

If you have other Armada 3720 boards with 1.2 GHz CPU, including
Espressobin V7, let us know if it is working fine for you.

Marek & Pali

Marek Behún (3):
  arm64: dts: marvell: armada-37xx: add syscon compatible to NB clk node
  cpufreq: armada-37xx: Fix setting TBG parent for load levels
  clk: mvebu: armada-37xx-periph: remove .set_parent method for CPU PM
clock

Pali Rohár (7):
  cpufreq: armada-37xx: Fix the AVS value for load L1
  clk: mvebu: armada-37xx-periph: Fix switching CPU freq from 250 Mhz to
1 GHz
  clk: mvebu: armada-37xx-periph: Fix workaround for switching from L1
to L0
  cpufreq: armada-37xx: Fix driver cleanup when registration failed
  cpufreq: armada-37xx: Fix determining base CPU frequency
  cpufreq: armada-37xx: Remove cur_frequency variable
  cpufreq: armada-37xx: Fix module unloading

 arch/arm64/boot/dts/marvell/armada-37xx.dtsi |   3 +-
 drivers/clk/mvebu/armada-37xx-periph.c   |  83 +++---
 drivers/cpufreq/armada-37xx-cpufreq.c| 111 +++
 3 files changed, 135 insertions(+), 62 deletions(-)

-- 
2.20.1

[PATCH mvebu v3 01/10] arm64: dts: marvell: armada-37xx: add syscon compatible to NB clk node

2021-02-22 Thread Pali Rohár

From: Marek Behún 

Add "syscon" compatible to the North Bridge clocks node to allow the
cpufreq driver to access these registers via syscon API.

This is needed for a fix of cpufreq driver.

Signed-off-by: Marek Behún 
Tested-by: Pali Rohár 
Tested-by: Tomasz Maciej Nowak 
Tested-by: Anders Trier Olesen 
Tested-by: Philip Soares 
Fixes: e8d66e7927b2 ("arm64: dts: marvell: armada-37xx: add nodes...")
Cc: sta...@vger.kernel.org
---
 arch/arm64/boot/dts/marvell/armada-37xx.dtsi | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/marvell/armada-37xx.dtsi 
b/arch/arm64/boot/dts/marvell/armada-37xx.dtsi
index d5b6c0a1c54a..a89e47d95eef 100644
--- a/arch/arm64/boot/dts/marvell/armada-37xx.dtsi
+++ b/arch/arm64/boot/dts/marvell/armada-37xx.dtsi
@@ -156,7 +156,8 @@
};
 
nb_periph_clk: nb-periph-clk@13000 {
-   compatible = 
"marvell,armada-3700-periph-clock-nb";
+   compatible = 
"marvell,armada-3700-periph-clock-nb",
+"syscon";
reg = <0x13000 0x100>;
clocks = < 0>, < 1>, < 2>,
< 3>, <>;
-- 
2.20.1

Re: [PATCH mvebu v2 00/10] Armada 37xx: Fix cpufreq changing base CPU speed to 800 MHz from 1000 MHz

2021-02-22 Thread Pali Rohár

On Sunday 21 February 2021 19:17:40 nnet wrote:
> > Could you test if 1.155V voltage for L1 is stable on 1.2 GHz variant?
> 
> ++#define MIN_VOLT_MV_FOR_L1_1200MHZ 1155
> ...
> ++  if (avs_min_l1 > dvfs->avs[0])
> ++  avs_min_l1 = dvfs->avs[0];
> ++
> ++  if (dvfs->avs[1] < avs_min_l1)
> ++  dvfs->avs[1] = avs_min_l1;
> 
> This works fine. Tested with switching 600MHz to 1.2GHz under load.

Perfect! Therefore here is final version of patch 04/10 for both 1 GHz
and 1.2 GHz variants of A3720 SoC. I will resend whole patch series.
Could I add your Tested-by line to patch series?

diff --git a/drivers/cpufreq/armada-37xx-cpufreq.c 
b/drivers/cpufreq/armada-37xx-cpufreq.c
index b8dc6c849..c7683d447 100644
--- a/drivers/cpufreq/armada-37xx-cpufreq.c
+++ b/drivers/cpufreq/armada-37xx-cpufreq.c
@@ -73,6 +73,8 @@
 #define LOAD_LEVEL_NR  4
 
 #define MIN_VOLT_MV 1000
+#define MIN_VOLT_MV_FOR_L1_1000MHZ 1108
+#define MIN_VOLT_MV_FOR_L1_1200MHZ 1155
 
 /*  AVS value for the corresponding voltage (in mV) */
 static int avs_map[] = {
@@ -208,6 +210,8 @@ static u32 armada_37xx_avs_val_match(int target_vm)
  * - L2 & L3 voltage should be about 150mv smaller than L0 voltage.
  * This function calculates L1 & L2 & L3 AVS values dynamically based
  * on L0 voltage and fill all AVS values to the AVS value table.
+ * When base CPU frequency is 1000 or 1200 MHz then there is additional
+ * minimal avs value for load L1.
  */
 static void __init armada37xx_cpufreq_avs_configure(struct regmap *base,
struct armada_37xx_dvfs *dvfs)
@@ -239,6 +243,19 @@ static void __init armada37xx_cpufreq_avs_configure(struct 
regmap *base,
for (load_level = 1; load_level < LOAD_LEVEL_NR; load_level++)
dvfs->avs[load_level] = avs_min;
 
+   /*
+* Set the avs values for load L0 and L1 when base CPU frequency
+* is 1000/1200 MHz to its typical initial values according to
+* the Armada 3700 Hardware Specifications.
+*/
+   if (dvfs->cpu_freq_max >= 1000*1000*1000) {
+   if (dvfs->cpu_freq_max >= 1200*1000*1000)
+   avs_min = 
armada_37xx_avs_val_match(MIN_VOLT_MV_FOR_L1_1200MHZ);
+   else
+   avs_min = 
armada_37xx_avs_val_match(MIN_VOLT_MV_FOR_L1_1000MHZ);
+   dvfs->avs[0] = dvfs->avs[1] = avs_min;
+   }
+
return;
}
 
@@ -258,6 +275,26 @@ static void __init armada37xx_cpufreq_avs_configure(struct 
regmap *base,
target_vm = avs_map[l0_vdd_min] - 150;
target_vm = target_vm > MIN_VOLT_MV ? target_vm : MIN_VOLT_MV;
dvfs->avs[2] = dvfs->avs[3] = armada_37xx_avs_val_match(target_vm);
+
+   /*
+* Fix the avs value for load L1 when base CPU frequency is 1000/1200 
MHz,
+* otherwise the CPU gets stuck when switching from load L1 to load L0.
+* Also ensure that avs value for load L1 is not higher than for L0.
+*/
+   if (dvfs->cpu_freq_max >= 1000*1000*1000) {
+   u32 avs_min_l1;
+
+   if (dvfs->cpu_freq_max >= 1200*1000*1000)
+   avs_min_l1 = 
armada_37xx_avs_val_match(MIN_VOLT_MV_FOR_L1_1200MHZ);
+   else
+   avs_min_l1 = 
armada_37xx_avs_val_match(MIN_VOLT_MV_FOR_L1_1000MHZ);
+
+   if (avs_min_l1 > dvfs->avs[0])
+   avs_min_l1 = dvfs->avs[0];
+
+   if (dvfs->avs[1] < avs_min_l1)
+   dvfs->avs[1] = avs_min_l1;
+   }
 }
 
 static void __init armada37xx_cpufreq_avs_setup(struct regmap *base,

Re: [PATCH] drivers: input: mouse: Change postive to positive in the file alps.c

2021-02-22 Thread Pali Rohár

On Monday 22 February 2021 00:07:40 Randy Dunlap wrote:
> On 2/21/21 11:54 PM, Bhaskar Chowdhury wrote:
> > 
> > s/postive/positive/
> > 
> > Signed-off-by: Bhaskar Chowdhury 
> 
> Acked-by: Randy Dunlap 

Reviewed-by: Pali Rohár 

> > ---
> >  drivers/input/mouse/alps.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/input/mouse/alps.c b/drivers/input/mouse/alps.c
> > index b067bfd2699c..4a6b33bbe7ea 100644
> > --- a/drivers/input/mouse/alps.c
> > +++ b/drivers/input/mouse/alps.c
> > @@ -986,7 +986,7 @@ static void alps_get_finger_coordinate_v7(struct 
> > input_mt_pos *mt,
> > case V7_PACKET_ID_TWO:
> > mt[1].x &= ~0x000F;
> > mt[1].y |= 0x000F;
> > -   /* Detect false-postive touches where x & y report max value */
> > +   /* Detect false-positive touches where x & y report max value */
> > if (mt[1].y == 0x7ff && mt[1].x == 0xff0) {
> > mt[1].x = 0;
> > /* y gets set to 0 at the end of this function */
> > --
> 
> 
> -- 
> ~Randy
>

Re: [PATCH v3 2/3] docs: arm: marvell: drop a dead links

2021-02-21 Thread Pali Rohár

> This one has gone away.
...
> --- a/Documentation/arm/marvell.rst
> +++ b/Documentation/arm/marvell.rst
> @@ -399,7 +399,6 @@ Berlin family (Multimedia Solutions)
>- Flavors:
>   - 88DE3010, Armada 1000 (no Linux support)
>   - Core: Marvell PJ1 (ARMv5TE), Dual-core
> - - Product Brief:
> http://www.marvell.com.cn/digital-entertainment/assets/armada_1000_pb.pdf

It is still in web archive:
https://web.archive.org/web/20131103162620/http://www.marvell.com/digital-entertainment/assets/armada_1000_pb.pdf

Re: [PATCH mvebu v2 00/10] Armada 37xx: Fix cpufreq changing base CPU speed to 800 MHz from 1000 MHz

2021-02-19 Thread Pali Rohár

On Tuesday 16 February 2021 08:27:10 nnet wrote:
> > Therefore I'm thinking if the correct way is instead to use L1 := L0 
> > voltage value for 1/1.2 GHz mode.
> 
> This latest 04/10 works fine for me going 600MHz <-> 1.2GHz under with and 
> without load.

Ok, thanks for testing! Just to note that typical documented value for
1.2GHz mode is 1.155V, so it would be useful to know if this value could
be stable for L1 with 1.2GHz mode.

I'm thinking that for 1GHz variant it would be better to rather use
1.108V like in my original patch as this is already tested by lot of
people, nobody complained yet and it can be lower value as L0 (so there
is benefit to decrease CPU frequency when CPU is idle).

For 1.2GHz variant I still do not know. You wrote that 1.132V is
unstable, so it cannot be used for sure. Documented typical value 1.155V
is bigger, so maybe it can be stable but needs testing. And stable seems
to be L0 value... But then I do not see a benefit for downclocking CPU
from 1.2 GHz frequency in L0 to 600 MHz freq. in L1 if it use same CPU
voltage... But it is still better than unstable CPU with crashes!

Could you test if 1.155V voltage for L1 is stable on 1.2 GHz variant?

Re: [PATCH mvebu v2 00/10] Armada 37xx: Fix cpufreq changing base CPU speed to 800 MHz from 1000 MHz

2021-02-16 Thread Pali Rohár

On Monday 15 February 2021 21:48:08 nnet wrote:
> > Could you test following change instead of PATCH 04/10? I added here also
> > logic for 1.2 GHz variant with 1.132 V value another change is that
> > value for load L0 is not touched as it is stable.
> 
> These changes to patch 04/10 worked going 600 MHz <-> 1.2 GHz , _but_ only 
> with:
> 
> ++#define MIN_VOLT_MV_FOR_L1_1200MHZ 1213
> 
> During this latest testing I saw freezes with 1132 mV.
> 
> I've had no lockups with 1213 mV which I just used from the values for L1/L0 
> from OTP.

I still do not know what is the meaning of values stored in OTP...
And there are more non-zero bits which are not used (yet).

Anyway, with your another test it looks like that limit is not based on
fixed value but rather on current L0 value. Therefore I'm thinking if
the correct way is instead to use L1 := L0 voltage value for 1/1.2 GHz
mode. Could you try following change instead of previous and PATCH 04/10?

diff --git a/drivers/cpufreq/armada-37xx-cpufreq.c 
b/drivers/cpufreq/armada-37xx-cpufreq.c
index b8dc6c849..12d0ff7b1 100644
--- a/drivers/cpufreq/armada-37xx-cpufreq.c
+++ b/drivers/cpufreq/armada-37xx-cpufreq.c
@@ -208,6 +208,8 @@ static u32 armada_37xx_avs_val_match(int target_vm)
  * - L2 & L3 voltage should be about 150mv smaller than L0 voltage.
  * This function calculates L1 & L2 & L3 AVS values dynamically based
  * on L0 voltage and fill all AVS values to the AVS value table.
+ * When base CPU frequency is 1000 or 1200 MHz then there is additional
+ * minimal avs value for load L1.
  */
 static void __init armada37xx_cpufreq_avs_configure(struct regmap *base,
struct armada_37xx_dvfs *dvfs)
@@ -239,17 +241,36 @@ static void __init 
armada37xx_cpufreq_avs_configure(struct regmap *base,
for (load_level = 1; load_level < LOAD_LEVEL_NR; load_level++)
dvfs->avs[load_level] = avs_min;
 
+   /*
+* Set the avs values for load L0 and L1 when base CPU frequency
+* is 1000/1200 MHz to its typical initial values according to
+* the Armada 3700 Hardware Specifications.
+*/
+   if (dvfs->cpu_freq_max >= 1000*1000*1000) {
+   if (dvfs->cpu_freq_max >= 1200*1000*1000)
+   avs_min = armada_37xx_avs_val_match(1155);
+   else
+   avs_min = armada_37xx_avs_val_match(1108);
+   dvfs->avs[0] = dvfs->avs[1] = avs_min;
+   }
+
return;
}
 
/*
 * L1 voltage is equal to L0 voltage - 100mv and it must be
-* larger than 1000mv
+* larger than 1000mv. When base CPU frequency is 1000/1200 MHz,
+* L1 voltage must must be equal to L0 voltage, otherwise
+* the CPU gets stuck when switching from load L1 to load L0.
 */
 
-   target_vm = avs_map[l0_vdd_min] - 100;
-   target_vm = target_vm > MIN_VOLT_MV ? target_vm : MIN_VOLT_MV;
-   dvfs->avs[1] = armada_37xx_avs_val_match(target_vm);
+   if (dvfs->cpu_freq_max >= 1000*1000*1000) {
+   dvfs->avs[1] = dvfs->avs[0];
+   } else {
+   target_vm = avs_map[l0_vdd_min] - 100;
+   target_vm = target_vm > MIN_VOLT_MV ? target_vm : MIN_VOLT_MV;
+   dvfs->avs[1] = armada_37xx_avs_val_match(target_vm);
+   }
 
/*
 * L2 & L3 voltage is equal to L0 voltage - 150mv and it must

Re: [PATCH net-next 1/2] net: mvneta: Remove per-cpu queue mapping for Armada 3700

2021-02-14 Thread Pali Rohár

> According to Errata #23 "The per-CPU GbE interrupt is limited to Core
> 0", we can't use the per-cpu interrupt mechanism on the Armada 3700
> familly.
> 
> This is correctly checked for RSS configuration, but the initial queue
> mapping is still done by having the queues spread across all the CPUs in
> the system, both in the init path and in the cpu_hotplug path.

Hello Maxime!

This patch looks like a bug fix for Armada 3700 SoC. What about marking
this commit with Fixes line? E.g.:

Fixes: 2636ac3cc2b4 ("net: mvneta: Add network support for Armada 3700 SoC")

Re: [PATCH mvebu v2 00/10] Armada 37xx: Fix cpufreq changing base CPU speed to 800 MHz from 1000 MHz

2021-02-14 Thread Pali Rohár

On Saturday 13 February 2021 10:30:19 nnet wrote:
> On Sat, Feb 13, 2021, at 2:01 AM, Pali Rohár wrote:
> > On Thursday 11 February 2021 16:41:13 nnet wrote:
> > > On Thu, Feb 11, 2021, at 3:44 PM, Pali Rohár wrote:
> > > > On Thursday 11 February 2021 12:22:52 nnet wrote:
> > > > > On Thu, Feb 11, 2021, at 11:55 AM, Pali Rohár wrote:
> > > > > > On Wednesday 10 February 2021 11:08:59 nnet wrote:
> > > > > > > On Wed, Feb 10, 2021, at 10:03 AM, Pali Rohár wrote:
> > > > > > > > > > Hello! Could you please enable userspace governor during 
> > > > > > > > > > kernel
> > > > > > > > > > compilation?
> > > > > > > > > > 
> > > > > > > > > > CONFIG_CPU_FREQ_GOV_USERSPACE=y
> > > > > > > > > > 
> > > > > > > > > > It can be activated via command:
> > > > > > > > > > 
> > > > > > > > > > echo userspace > 
> > > > > > > > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
> > > > > > > > > > 
> > > > > > > > > > After that you can "force" CPU frequency to specific value, 
> > > > > > > > > > e.g.:
> > > > > > > > > > 
> > > > > > > > > > echo 100 > 
> > > > > > > > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed
> > > > > > > > > > 
> > > > > > > > > > I need to know which switch (from --> to freq) cause this 
> > > > > > > > > > system hang.
> > > > > > > > > > 
> > > > > > > > > > This patch series (via MIN_VOLT_MV_FOR_L0_L1_1GHZ) is 
> > > > > > > > > > fixing only
> > > > > > > > > > switching from 500 MHz to 1000 MHz on 1 GHz variant. As 
> > > > > > > > > > only this switch
> > > > > > > > > > is causing issue.
> > > > > > > > > > 
> > > > > > > > > > I have used following simple bash script to check that 
> > > > > > > > > > switching between
> > > > > > > > > > 500 MHz and 1 GHz is stable:
> > > > > > > > > > 
> > > > > > > > > > while true; do
> > > > > > > > > > echo 100 > 
> > > > > > > > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
> > > > > > > > > > echo 50 > 
> > > > > > > > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
> > > > > > > > > > echo 100 > 
> > > > > > > > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
> > > > > > > > > > echo 50 > 
> > > > > > > > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
> > > > > > > > > > done
> > > > > > > > > 
> > > > > > > > > echo userspace | tee 
> > > > > > > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
> > > > > > > > > while true; do
> > > > > > > > >   echo 120 | tee 
> > > > > > > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
> > > > > > > > >   echo 60 | tee 
> > > > > > > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
> > > > > > > > > done
> > > > > > > > > 
> > > > > > > > > >> +#define MIN_VOLT_MV_FOR_L0_L1_1GHZ 1108
> > > > > > > > > 
> > > > > > > > > With 1108 I get a freeze within a minute. The last output to 
> > > > > > > > > stdout is 60.
> > > > > > > > > 
> > > > > > > > > With 1120 it takes a few minutes.
> > > > > > > > > 
> > > > > > > > > With any of 1225, 1155, 1132 the device doesn't freeze over 
> > > > > > > &

Re: [PATCH mvebu v2 00/10] Armada 37xx: Fix cpufreq changing base CPU speed to 800 MHz from 1000 MHz

2021-02-13 Thread Pali Rohár

On Thursday 11 February 2021 16:41:13 nnet wrote:
> On Thu, Feb 11, 2021, at 3:44 PM, Pali Rohár wrote:
> > On Thursday 11 February 2021 12:22:52 nnet wrote:
> > > On Thu, Feb 11, 2021, at 11:55 AM, Pali Rohár wrote:
> > > > On Wednesday 10 February 2021 11:08:59 nnet wrote:
> > > > > On Wed, Feb 10, 2021, at 10:03 AM, Pali Rohár wrote:
> > > > > > > > Hello! Could you please enable userspace governor during kernel
> > > > > > > > compilation?
> > > > > > > > 
> > > > > > > > CONFIG_CPU_FREQ_GOV_USERSPACE=y
> > > > > > > > 
> > > > > > > > It can be activated via command:
> > > > > > > > 
> > > > > > > > echo userspace > 
> > > > > > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
> > > > > > > > 
> > > > > > > > After that you can "force" CPU frequency to specific value, 
> > > > > > > > e.g.:
> > > > > > > > 
> > > > > > > > echo 100 > 
> > > > > > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed
> > > > > > > > 
> > > > > > > > I need to know which switch (from --> to freq) cause this 
> > > > > > > > system hang.
> > > > > > > > 
> > > > > > > > This patch series (via MIN_VOLT_MV_FOR_L0_L1_1GHZ) is fixing 
> > > > > > > > only
> > > > > > > > switching from 500 MHz to 1000 MHz on 1 GHz variant. As only 
> > > > > > > > this switch
> > > > > > > > is causing issue.
> > > > > > > > 
> > > > > > > > I have used following simple bash script to check that 
> > > > > > > > switching between
> > > > > > > > 500 MHz and 1 GHz is stable:
> > > > > > > > 
> > > > > > > > while true; do
> > > > > > > > echo 100 > 
> > > > > > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
> > > > > > > > echo 50 > 
> > > > > > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
> > > > > > > > echo 100 > 
> > > > > > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
> > > > > > > > echo 50 > 
> > > > > > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
> > > > > > > > done
> > > > > > > 
> > > > > > > echo userspace | tee 
> > > > > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
> > > > > > > while true; do
> > > > > > >   echo 120 | tee 
> > > > > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
> > > > > > >   echo 60 | tee 
> > > > > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
> > > > > > > done
> > > > > > > 
> > > > > > > >> +#define MIN_VOLT_MV_FOR_L0_L1_1GHZ 1108
> > > > > > > 
> > > > > > > With 1108 I get a freeze within a minute. The last output to 
> > > > > > > stdout is 60.
> > > > > > > 
> > > > > > > With 1120 it takes a few minutes.
> > > > > > > 
> > > > > > > With any of 1225, 1155, 1132 the device doesn't freeze over the 
> > > > > > > full 5 minute load test.
> > > > > > > 
> > > > > > > I'm using ondemand now with the above at 1132 without issue so 
> > > > > > > far.
> > > > > > 
> > > > > > Great, thank you for testing!
> > > > > > 
> > > > > > Can you check if switching between any two lower frequencies 20
> > > > > > 30 60 is stable?
> > > > > 
> > > > > This is stable using 1132 mV for MIN_VOLT_MV_FOR_L0_L1_1GHZ:
> > > > > 
> > > > > while true; do
> > > > >   # down
> > > > >   echo 120 | tee 
> > > > > /sys/devices/system/cpu/cpufreq/policy0/s

Re: [PATCH mvebu v2 00/10] Armada 37xx: Fix cpufreq changing base CPU speed to 800 MHz from 1000 MHz

2021-02-11 Thread Pali Rohár

On Thursday 11 February 2021 12:22:52 nnet wrote:
> On Thu, Feb 11, 2021, at 11:55 AM, Pali Rohár wrote:
> > On Wednesday 10 February 2021 11:08:59 nnet wrote:
> > > On Wed, Feb 10, 2021, at 10:03 AM, Pali Rohár wrote:
> > > > > > Hello! Could you please enable userspace governor during kernel
> > > > > > compilation?
> > > > > > 
> > > > > > CONFIG_CPU_FREQ_GOV_USERSPACE=y
> > > > > > 
> > > > > > It can be activated via command:
> > > > > > 
> > > > > > echo userspace > 
> > > > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
> > > > > > 
> > > > > > After that you can "force" CPU frequency to specific value, e.g.:
> > > > > > 
> > > > > > echo 100 > 
> > > > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed
> > > > > > 
> > > > > > I need to know which switch (from --> to freq) cause this system 
> > > > > > hang.
> > > > > > 
> > > > > > This patch series (via MIN_VOLT_MV_FOR_L0_L1_1GHZ) is fixing only
> > > > > > switching from 500 MHz to 1000 MHz on 1 GHz variant. As only this 
> > > > > > switch
> > > > > > is causing issue.
> > > > > > 
> > > > > > I have used following simple bash script to check that switching 
> > > > > > between
> > > > > > 500 MHz and 1 GHz is stable:
> > > > > > 
> > > > > > while true; do
> > > > > > echo 100 > 
> > > > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
> > > > > > echo 50 > 
> > > > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
> > > > > > echo 100 > 
> > > > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
> > > > > > echo 50 > 
> > > > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
> > > > > > done
> > > > > 
> > > > > echo userspace | tee 
> > > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
> > > > > while true; do
> > > > >   echo 120 | tee 
> > > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
> > > > >   echo 60 | tee 
> > > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
> > > > > done
> > > > > 
> > > > > >> +#define MIN_VOLT_MV_FOR_L0_L1_1GHZ 1108
> > > > > 
> > > > > With 1108 I get a freeze within a minute. The last output to stdout 
> > > > > is 60.
> > > > > 
> > > > > With 1120 it takes a few minutes.
> > > > > 
> > > > > With any of 1225, 1155, 1132 the device doesn't freeze over the full 
> > > > > 5 minute load test.
> > > > > 
> > > > > I'm using ondemand now with the above at 1132 without issue so far.
> > > > 
> > > > Great, thank you for testing!
> > > > 
> > > > Can you check if switching between any two lower frequencies 20
> > > > 30 60 is stable?
> > > 
> > > This is stable using 1132 mV for MIN_VOLT_MV_FOR_L0_L1_1GHZ:
> > > 
> > > while true; do
> > >   # down
> > >   echo 120 | tee 
> > > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
> > ...
> > 
> > Hello!
> > 
> > Could you please re-run test without tee, in form as I have shown above?
> > UART is slow and printing something to console adds delay which decrease
> > probability that real issue is triggered as this is timing issue.
> 
> The test was done over SSH.

Ok! But it is still better to not print any results as it adds unwanted
delay between frequency switching.

> > Also please do tests just between two frequencies in loop as I observed
> > that switching between more decreased probability to hit issue.
> 
> > > > > echo userspace | tee 
> > > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
> > > > > while true; do
> > > > >   echo 120 | tee 
> > > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
> > > > >   echo 60 | tee 
> > > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
> > > > > done
> 
> The first test ^ switched between 600 MHz and 1.2 GHz.
> 
> > The real issue for 1 GHz variant of A3720 is only when doing switch from
> > 500 MHz to 1 GHz. So could you try to do some tests also without
> > changing MIN_VOLT_MV_FOR_L0_L1_1GHZ and switching just between non-1.2
> > frequencies (to verify that on 1.2 GHz variant it is also from 600 MHz
> > to 1.2 GHz)?
> 
> With 1108 mV and switching between 600 MHz and 1.2GHz, I always saw a freeze 
> within a minute.

I mean to try switching with 1.108 V between 200 MHz and 300 MHz or
between 300 MHz and 600 MHz. To check that issue is really only with
switch from 600 MHz to 1.2 GHz.

I need to know if current settings are fine for 200, 300 and 600 MHz
frequencies and the only 600 --> 1200 needs to be fixed.

Re: [PATCH v2 05/12] arm64: dts: marvell: armada-3720-db: add comphy references

2021-02-11 Thread Pali Rohár

On Wednesday 10 February 2021 16:09:42 kos...@marvell.com wrote:
> From: Grzegorz Jaszczyk 
> 
> Adding phy description to pcie, sata and usb will allow appropriate drivers
> to configure marvell comphy-a3700 accordingly.
> 
> Signed-off-by: Grzegorz Jaszczyk 
> Signed-off-by: Konstantin Porotchkin 

Hello! This patch is not needed too as Gregory already included into his
tree alternative patch which defines SATA PHY globally into main include
file armada-37xx.dtsi:

https://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu.git/commit/?h=for-next=6ece0f7dbd558670ec72ba390379949a4d4dc5c0

And PCIe and USB 3.0 PHY definitions are already in include file.

> ---
>  arch/arm64/boot/dts/marvell/armada-3720-db.dts | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/marvell/armada-3720-db.dts 
> b/arch/arm64/boot/dts/marvell/armada-3720-db.dts
> index 3e5789f37206..15e923f945d4 100644
> --- a/arch/arm64/boot/dts/marvell/armada-3720-db.dts
> +++ b/arch/arm64/boot/dts/marvell/armada-3720-db.dts
> @@ -132,11 +132,15 @@
>   pinctrl-0 = <_reset_pins _clkreq_pins>;
>   reset-gpios = < 3 GPIO_ACTIVE_LOW>;
>   status = "okay";
> + /* Generic PHY, providing serdes lanes */
> + phys = < 0>;
>  };
>  
>  /* CON3 */
>   {
>   status = "okay";
> + /* Generic PHY, providing serdes lanes */
> + phys = < 0>;
>  };
>  
>   {
> @@ -217,4 +221,7 @@
>   {
>   status = "okay";
>   usb-phy = <_phy>;
> + /* Generic PHY, providing serdes lanes */
> + phys = < 0>;
> + phy-names = "usb";
>  };
> -- 
> 2.17.1
>

Re: [PATCH v2 06/12] arm64: dts: marvell: armada-3270-espressobin: add comphy references

2021-02-11 Thread Pali Rohár

On Wednesday 10 February 2021 16:09:43 kos...@marvell.com wrote:
> From: Grzegorz Jaszczyk 
> 
> Add "phys" entries pointing to COMPHYs to PCIe and USB3 nodes
> 
> Signed-off-by: Grzegorz Jaszczyk 
> Signed-off-by: Konstantin Porotchkin 

Hello! This patch is not needed and now does nothing.

> ---
>  arch/arm64/boot/dts/marvell/armada-3720-espressobin.dtsi | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/marvell/armada-3720-espressobin.dtsi 
> b/arch/arm64/boot/dts/marvell/armada-3720-espressobin.dtsi
> index daffe136c523..bbd955909813 100644
> --- a/arch/arm64/boot/dts/marvell/armada-3720-espressobin.dtsi
> +++ b/arch/arm64/boot/dts/marvell/armada-3720-espressobin.dtsi
> @@ -59,6 +59,8 @@
>  /* J9 */
>   {
>   status = "okay";
> + /* Generic PHY, providing serdes lanes */
> + phys = < 0>;

In mainline kernel is PCIe PHY already provided in armada-37xx.dtsi:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/boot/dts/marvell/armada-37xx.dtsi#n497

>   pinctrl-names = "default";
>   pinctrl-0 = <_reset_pins _clkreq_pins>;
>   reset-gpios = < 3 GPIO_ACTIVE_LOW>;
> @@ -139,6 +141,9 @@
>  /* J7 */
>   {
>   status = "okay";
> + /* Generic PHY, providing serdes lanes */
> + phys = < 0>;
> + phy-names = "usb";

In mainline kernel is USB 3.0 PHY already provided in armada-37xx.dtsi:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/boot/dts/marvell/armada-37xx.dtsi#n359

>  };
>  
>  /* J8 */
> -- 
> 2.17.1
> 

So final binary espressobin DTB files are same with and without this
patch.

Re: [PATCH mvebu v2 00/10] Armada 37xx: Fix cpufreq changing base CPU speed to 800 MHz from 1000 MHz

2021-02-11 Thread Pali Rohár

On Wednesday 10 February 2021 11:08:59 nnet wrote:
> On Wed, Feb 10, 2021, at 10:03 AM, Pali Rohár wrote:
> > > > Hello! Could you please enable userspace governor during kernel
> > > > compilation?
> > > > 
> > > > CONFIG_CPU_FREQ_GOV_USERSPACE=y
> > > > 
> > > > It can be activated via command:
> > > > 
> > > > echo userspace > 
> > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
> > > > 
> > > > After that you can "force" CPU frequency to specific value, e.g.:
> > > > 
> > > > echo 100 > 
> > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed
> > > > 
> > > > I need to know which switch (from --> to freq) cause this system hang.
> > > > 
> > > > This patch series (via MIN_VOLT_MV_FOR_L0_L1_1GHZ) is fixing only
> > > > switching from 500 MHz to 1000 MHz on 1 GHz variant. As only this switch
> > > > is causing issue.
> > > > 
> > > > I have used following simple bash script to check that switching between
> > > > 500 MHz and 1 GHz is stable:
> > > > 
> > > > while true; do
> > > > echo 100 > 
> > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
> > > > echo 50 > 
> > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
> > > > echo 100 > 
> > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
> > > > echo 50 > 
> > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
> > > > done
> > > 
> > > echo userspace | tee 
> > > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
> > > while true; do
> > >   echo 120 | tee 
> > > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
> > >   echo 60 | tee 
> > > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
> > > done
> > > 
> > > >> +#define MIN_VOLT_MV_FOR_L0_L1_1GHZ 1108
> > > 
> > > With 1108 I get a freeze within a minute. The last output to stdout is 
> > > 60.
> > > 
> > > With 1120 it takes a few minutes.
> > > 
> > > With any of 1225, 1155, 1132 the device doesn't freeze over the full 5 
> > > minute load test.
> > > 
> > > I'm using ondemand now with the above at 1132 without issue so far.
> > 
> > Great, thank you for testing!
> > 
> > Can you check if switching between any two lower frequencies 20
> > 30 60 is stable?
> 
> This is stable using 1132 mV for MIN_VOLT_MV_FOR_L0_L1_1GHZ:
> 
> while true; do
>   # down
>   echo 120 | tee /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
...

Hello!

Could you please re-run test without tee, in form as I have shown above?
UART is slow and printing something to console adds delay which decrease
probability that real issue is triggered as this is timing issue.

Also please do tests just between two frequencies in loop as I observed
that switching between more decreased probability to hit issue.

The real issue for 1 GHz variant of A3720 is only when doing switch from
500 MHz to 1 GHz. So could you try to do some tests also without
changing MIN_VOLT_MV_FOR_L0_L1_1GHZ and switching just between non-1.2
frequencies (to verify that on 1.2 GHz variant it is also from 600 MHz
to 1.2 GHz)?

Re: [PATCH mvebu v2 00/10] Armada 37xx: Fix cpufreq changing base CPU speed to 800 MHz from 1000 MHz

2021-02-10 Thread Pali Rohár

On Wednesday 10 February 2021 09:34:07 nnet wrote:
> On Wed, Feb 10, 2021, at 1:23 AM, Pali Rohár wrote:
> > On Tuesday 09 February 2021 18:07:41 nnet wrote:
> > > On Tue, Feb 9, 2021, at 5:51 PM, nnet wrote:
> > > > On Tue, Feb 9, 2021, at 5:31 PM, nnet wrote:
> > > > > On Tue, Feb 9, 2021, at 3:26 PM, Marek Behún wrote:
> > > > > > On Tue, 09 Feb 2021 15:16:45 -0800
> > > > > > nnet  wrote:
> > > > > > 
> > > > > > > I've two of these and I've just swapped them (and re-pasted the 
> > > > > > > heat sinks).
> > > > > > > 
> > > > > > > The second one ran under load for awhile and now has frozen as 
> > > > > > > well.
> > > > > > > 
> > > > > > > Under a moderate load `wget -O /dev/null ` @X00Mbits 
> > > > > > > they are fine.
> > > > > > > 
> > > > > > > Under a 1 min speed test of load ~200Mbits routed WireGuard they 
> > > > > > > freeze.
> > > > > > > 
> > > > > > > They fine with both those workloads @1000_800.
> > > > > > > 
> > > > > > > Perhaps it's heat? Unfortunately I don't have any numbers on that 
> > > > > > > ATM.
> > > > > > 
> > > > > > Try disabling cpufreq in kernel completely, compile boot image at
> > > > > > 1200 MHz. If it continues freezing, then I fear we can't help you 
> > > > > > with
> > > > > > 1200 MHz :(
> > > > > 
> > > > > cat 
> > > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies 
> > > > > 20 30 60 120 
> > > > > 
> > > > > I'm not getting any freezes with 1.2GHz fixed after 20 minutes of 
> > > > > load:
> > > > > 
> > > > > echo 120 > 
> > > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq
> > > > > 
> > > > > Setting it back to min 200MHz I get a freeze within a minute:
> > > > > 
> > > > > echo 20 > /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq
> > 
> > Hello! Could you please enable userspace governor during kernel
> > compilation?
> > 
> > CONFIG_CPU_FREQ_GOV_USERSPACE=y
> > 
> > It can be activated via command:
> > 
> > echo userspace > 
> > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
> > 
> > After that you can "force" CPU frequency to specific value, e.g.:
> > 
> > echo 100 > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed
> > 
> > I need to know which switch (from --> to freq) cause this system hang.
> > 
> > This patch series (via MIN_VOLT_MV_FOR_L0_L1_1GHZ) is fixing only
> > switching from 500 MHz to 1000 MHz on 1 GHz variant. As only this switch
> > is causing issue.
> > 
> > I have used following simple bash script to check that switching between
> > 500 MHz and 1 GHz is stable:
> > 
> > while true; do
> > echo 100 > 
> > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
> > echo 50 > 
> > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
> > echo 100 > 
> > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
> > echo 50 > 
> > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
> > done
> 
> echo userspace | tee /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
> while true; do
>   echo 120 | tee /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
>   echo 60 | tee /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
> done
> 
> >> +#define MIN_VOLT_MV_FOR_L0_L1_1GHZ 1108
> 
> With 1108 I get a freeze within a minute. The last output to stdout is 60.
> 
> With 1120 it takes a few minutes.
> 
> With any of 1225, 1155, 1132 the device doesn't freeze over the full 5 minute 
> load test.
> 
> I'm using ondemand now with the above at 1132 without issue so far.

Great, thank you for testing!

Can you check if switching between any two lower frequencies 20
30 60 is stable?

> >> Update the CPU voltage value for loads L0 and L1 accordingly when base
> >> frequency is 1000 or 1200 MHz. The minimal value is updated from the
> >> original 1.05V to 1.108V.
> 
> Perhaps similiar to how

Re: [PATCH mvebu v2 00/10] Armada 37xx: Fix cpufreq changing base CPU speed to 800 MHz from 1000 MHz

2021-02-10 Thread Pali Rohár

On Tuesday 09 February 2021 18:07:41 nnet wrote:
> On Tue, Feb 9, 2021, at 5:51 PM, nnet wrote:
> > On Tue, Feb 9, 2021, at 5:31 PM, nnet wrote:
> > > On Tue, Feb 9, 2021, at 3:26 PM, Marek Behún wrote:
> > > > On Tue, 09 Feb 2021 15:16:45 -0800
> > > > nnet  wrote:
> > > > 
> > > > > I've two of these and I've just swapped them (and re-pasted the heat 
> > > > > sinks).
> > > > > 
> > > > > The second one ran under load for awhile and now has frozen as well.
> > > > > 
> > > > > Under a moderate load `wget -O /dev/null ` @X00Mbits they 
> > > > > are fine.
> > > > > 
> > > > > Under a 1 min speed test of load ~200Mbits routed WireGuard they 
> > > > > freeze.
> > > > > 
> > > > > They fine with both those workloads @1000_800.
> > > > > 
> > > > > Perhaps it's heat? Unfortunately I don't have any numbers on that ATM.
> > > > 
> > > > Try disabling cpufreq in kernel completely, compile boot image at
> > > > 1200 MHz. If it continues freezing, then I fear we can't help you with
> > > > 1200 MHz :(
> > > 
> > > cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies 
> > > 20 30 60 120 
> > > 
> > > I'm not getting any freezes with 1.2GHz fixed after 20 minutes of load:
> > > 
> > > echo 120 > /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq
> > > 
> > > Setting it back to min 200MHz I get a freeze within a minute:
> > > 
> > > echo 20 > /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq

Hello! Could you please enable userspace governor during kernel
compilation?

CONFIG_CPU_FREQ_GOV_USERSPACE=y

It can be activated via command:

echo userspace > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor

After that you can "force" CPU frequency to specific value, e.g.:

echo 100 > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed

I need to know which switch (from --> to freq) cause this system hang.

This patch series (via MIN_VOLT_MV_FOR_L0_L1_1GHZ) is fixing only
switching from 500 MHz to 1000 MHz on 1 GHz variant. As only this switch
is causing issue.

I have used following simple bash script to check that switching between
500 MHz and 1 GHz is stable:

while true; do
echo 100 > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
echo 50 > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
echo 100 > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
echo 50 > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed;
done

(of course on 1.2 GHz variant you need to adjust values as only
following frequencies 20 30 60 120 are supported)

> > > > Marek
> > > >
> > 
> > > +#define MIN_VOLT_MV_FOR_L0_L1_1GHZ 1108
> > 
> > Based on the below at boot time might an equivalent of the above need 
> > to be 1225 for 1.2GHz?
> > 
> > 1200_750
> > SVC REV: 5, CPU VDD voltage: 1.225V
> > 
> > 1000_800
> > SVC REV: 5, CPU VDD voltage: 1.108V

This value is printed in WTMI avs.c by following code:

shift = OTP_SVC_SPEED_1000_OFF;
(OR)
shift = OTP_SVC_SPEED_1200_OFF;

vdd_otp = ((otp_data[OTP_DATA_SVC_SPEED_ID] >> shift) +
AVS_VDD_BASE) & AVS_VDD_MASK;
regval |= (vdd_otp << HIGH_VDD_LIMIT_OFF);
regval |= (vdd_otp << LOW_VDD_LIMIT_OFF);
printf("SVC REV: %d, CPU VDD voltage: %s\n", svc_rev,
avis_dump[vdd_otp].desc);

So voltage value is read from the OTP memory.

But I do not know what this value means.

> I did this for a quick test for 1.2GHz:
> 
> +#define MIN_VOLT_MV_FOR_L0_L1_1GHZ 1225
> 
> This is working well so far. Frequency is shifting up/down with load 
> applied/stopped.

Re: [PATCH mvebu v2 00/10] Armada 37xx: Fix cpufreq changing base CPU speed to 800 MHz from 1000 MHz

2021-02-09 Thread Pali Rohár

On Tuesday 09 February 2021 14:52:56 nnet wrote:
> On Tue, Feb 9, 2021, at 2:42 PM, Pali Rohár wrote:
> > On Tuesday 09 February 2021 13:45:08 nnet wrote:
> > > On Tue, Feb 9, 2021, at 1:33 PM, Pali Rohár wrote:
> > > > On Tuesday 09 February 2021 13:00:26 nnet wrote:
> > > > > > If you have other Armada 3720 boards (Espressobin v5/v7, uDPU, 
> > > > > > Devel Board, ...) then it will be nice to do an additional tests 
> > > > > > and check if instability issues are finally fixed.
> > > > > 
> > > > > These patches applied to the 5.4.96 in OpenWrt (98d61b5) work fine so 
> > > > > far on an Espressobin v7 AFAICT per changing values in 
> > > > > /sys/devices/system/cpu/cpufreq/policy0.
> > > > > 
> > > > > Are these changes intended to work @1.2 GHz on the v7?
> > > > 
> > > > Hello! Do you have 1.2 GHz A3720 SoC?
> > > 
> > > Maybe (not)? ESPRESSObin_V7_1-0 on the underside.
> > 
> > Look at the package of SoC chip. On top of the package is printed
> > identifier 88F3720. On the last line should be one of the string:
> > C080, C100, C120, I080, I100 which identifies frequency
> > (080 = 800 MHz, 100 = 1 GHz, 120 = 1.2 GHz)
> > 
> > Can you check what is printed on A3720 SoC package?
> 
> Both of mine are 88F3720-A0-C120.

Interesting... it is really 1.2 GHz. I have not seen it yet. And few
weeks ago I thought that it was never available on market.

Well, if 1.2 GHz variant has same issues as 1 GHz variants then this
patch series should fix it. But you said that with this patch series is
board crashing when running at 1.2 GHz?

Re: [PATCH mvebu v2 00/10] Armada 37xx: Fix cpufreq changing base CPU speed to 800 MHz from 1000 MHz

2021-02-09 Thread Pali Rohár

On Tuesday 09 February 2021 13:45:08 nnet wrote:
> On Tue, Feb 9, 2021, at 1:33 PM, Pali Rohár wrote:
> > On Tuesday 09 February 2021 13:00:26 nnet wrote:
> > > > If you have other Armada 3720 boards (Espressobin v5/v7, uDPU, Devel 
> > > > Board, ...) then it will be nice to do an additional tests and check if 
> > > > instability issues are finally fixed.
> > > 
> > > These patches applied to the 5.4.96 in OpenWrt (98d61b5) work fine so far 
> > > on an Espressobin v7 AFAICT per changing values in 
> > > /sys/devices/system/cpu/cpufreq/policy0.
> > > 
> > > Are these changes intended to work @1.2 GHz on the v7?
> > 
> > Hello! Do you have 1.2 GHz A3720 SoC?
> 
> Maybe (not)? ESPRESSObin_V7_1-0 on the underside.

Look at the package of SoC chip. On top of the package is printed
identifier 88F3720. On the last line should be one of the string:
C080, C100, C120, I080, I100 which identifies frequency
(080 = 800 MHz, 100 = 1 GHz, 120 = 1.2 GHz)

Can you check what is printed on A3720 SoC package?

> BTW, with the 1200_750 firmware and the patches:
> 
> root@OpenWrt:/sys/devices/system/cpu/cpufreq/policy0# cat 
> scaling_available_frequencies 
> 20 30 60 120
> 
> Of course that could mean nothing, but thought I'd mention what I do see.

This is value set by firmware file which you have started.

Re: [PATCH mvebu v2 00/10] Armada 37xx: Fix cpufreq changing base CPU speed to 800 MHz from 1000 MHz

2021-02-09 Thread Pali Rohár

On Tuesday 09 February 2021 13:00:26 nnet wrote:
> > If you have other Armada 3720 boards (Espressobin v5/v7, uDPU, Devel Board, 
> > ...) then it will be nice to do an additional tests and check if 
> > instability issues are finally fixed.
> 
> These patches applied to the 5.4.96 in OpenWrt (98d61b5) work fine so far on 
> an Espressobin v7 AFAICT per changing values in 
> /sys/devices/system/cpu/cpufreq/policy0.
> 
> Are these changes intended to work @1.2 GHz on the v7?

Hello! Do you have 1.2 GHz A3720 SoC?

Re: [PATCH 1/2] PCI: also set up legacy files only after sysfs init

2021-02-05 Thread Pali Rohár

On Friday 05 February 2021 10:59:50 Daniel Vetter wrote:
> On Thu, Feb 4, 2021 at 11:24 PM Pali Rohár  wrote:
> >
> > On Thursday 04 February 2021 15:50:19 Bjorn Helgaas wrote:
> > > [+cc Oliver, Pali, Krzysztof]
> >
> > Just to note that extending or using sysfs_initialized introduces
> > another race condition into kernel code which results in PCI fatal
> > errors. Details are in email discussion which Bjorn already sent.
> 
> Yeah I wondered why this doesn't race.

It races, but with smaller probability. I have not seen this race
condition on x86. But I was able to reproduce it with native PCIe
drivers on ARM64 (Marvell Armada 3720; pci-aardvark). In mentioned
discussion I wrote when this race condition happen. But I understand
that it is hard to simulate it.

> but since the history goes back
> to pre-git times I figured it would have been addressed somehow
> already if it indeed does race.
> -Daniel
> 
> > > s/also/Also/ in subject
> > >
> > > On Thu, Feb 04, 2021 at 05:58:30PM +0100, Daniel Vetter wrote:
> > > > We are already doing this for all the regular sysfs files on PCI
> > > > devices, but not yet on the legacy io files on the PCI buses. Thus far
> > > > now problem, but in the next patch I want to wire up iomem revoke
> > > > support. That needs the vfs up an running already to make so that
> > > > iomem_get_mapping() works.
> > >
> > > s/now problem/no problem/
> > > s/an running/and running/
> > > s/so that/sure that/ ?
> > >
> > > iomem_get_mapping() doesn't exist; I don't know what that should be.
> > >
> > > > Wire it up exactly like the existing code. Note that
> > > > pci_remove_legacy_files() doesn't need a check since the one for
> > > > pci_bus->legacy_io is sufficient.
> > >
> > > I'm not sure exactly what you mean by "the existing code."  I could
> > > probably figure it out, but it would save time to mention the existing
> > > function here.
> > >
> > > This looks like another instance where we should really apply Oliver's
> > > idea of converting these to attribute_groups [1].
> > >
> > > The cover letter mentions options discussed with Greg in [2], but I
> > > don't think the "sysfs_initialized" hack vs attribute_groups was part
> > > of that discussion.
> > >
> > > It's not absolutely a show-stopper, but it *is* a shame to extend the
> > > sysfs_initialized hack if attribute_groups could do this more cleanly
> > > and help solve more than one issue.
> > >
> > > Bjorn
> > >
> > > [1] 
> > > https://lore.kernel.org/r/caosf1chss03dbsdo4pmttmp0tceu5kscn704zewlkgxqzbf...@mail.gmail.com
> > > [2] 
> > > https://lore.kernel.org/dri-devel/cakmk7ugrddrbtj0oyzqqc0cgrqwc2f3tfju9vlfm2jjufaz...@mail.gmail.com/
> > >
> > > > Signed-off-by: Daniel Vetter 
> > > > Cc: Stephen Rothwell 
> > > > Cc: Jason Gunthorpe 
> > > > Cc: Kees Cook 
> > > > Cc: Dan Williams 
> > > > Cc: Andrew Morton 
> > > > Cc: John Hubbard 
> > > > Cc: Jérôme Glisse 
> > > > Cc: Jan Kara 
> > > > Cc: Dan Williams 
> > > > Cc: Greg Kroah-Hartman 
> > > > Cc: linux...@kvack.org
> > > > Cc: linux-arm-ker...@lists.infradead.org
> > > > Cc: linux-samsung-...@vger.kernel.org
> > > > Cc: linux-me...@vger.kernel.org
> > > > Cc: Bjorn Helgaas 
> > > > Cc: linux-...@vger.kernel.org
> > > > ---
> > > >  drivers/pci/pci-sysfs.c | 7 +++
> > > >  1 file changed, 7 insertions(+)
> > > >
> > > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> > > > index fb072f4b3176..0c45b4f7b214 100644
> > > > --- a/drivers/pci/pci-sysfs.c
> > > > +++ b/drivers/pci/pci-sysfs.c
> > > > @@ -927,6 +927,9 @@ void pci_create_legacy_files(struct pci_bus *b)
> > > >  {
> > > > int error;
> > > >
> > > > +   if (!sysfs_initialized)
> > > > +   return;
> > > > +
> > > > b->legacy_io = kcalloc(2, sizeof(struct bin_attribute),
> > > >GFP_ATOMIC);
> > > > if (!b->legacy_io)
> > > > @@ -1448,6 +1451,7 @@ void pci_remove_sysfs_dev_files(struct pci_dev 
> > > > *pdev)
> > > >  static int __init pci_sysfs_init(void)
> > > >  {
> > > > struct pci_dev *pdev = NULL;
> > > > +   struct pci_bus *pbus = NULL;
> > > > int retval;
> > > >
> > > > sysfs_initialized = 1;
> > > > @@ -1459,6 +1463,9 @@ static int __init pci_sysfs_init(void)
> > > > }
> > > > }
> > > >
> > > > +   while ((pbus = pci_find_next_bus(pbus)))
> > > > +   pci_create_legacy_files(pbus);
> > > > +
> > > > return 0;
> > > >  }
> > > >  late_initcall(pci_sysfs_init);
> > > > --
> > > > 2.30.0
> > > >
> > > >
> > > > ___
> > > > linux-arm-kernel mailing list
> > > > linux-arm-ker...@lists.infradead.org
> > > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
> 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

Re: [PATCH 1/2] PCI: also set up legacy files only after sysfs init

2021-02-05 Thread Pali Rohár

On Friday 05 February 2021 11:16:00 Daniel Vetter wrote:
> On Fri, Feb 5, 2021 at 11:04 AM Pali Rohár  wrote:
> >
> > On Friday 05 February 2021 10:59:50 Daniel Vetter wrote:
> > > On Thu, Feb 4, 2021 at 11:24 PM Pali Rohár  wrote:
> > > >
> > > > On Thursday 04 February 2021 15:50:19 Bjorn Helgaas wrote:
> > > > > [+cc Oliver, Pali, Krzysztof]
> > > >
> > > > Just to note that extending or using sysfs_initialized introduces
> > > > another race condition into kernel code which results in PCI fatal
> > > > errors. Details are in email discussion which Bjorn already sent.
> > >
> > > Yeah I wondered why this doesn't race.
> >
> > It races, but with smaller probability. I have not seen this race
> > condition on x86. But I was able to reproduce it with native PCIe
> > drivers on ARM64 (Marvell Armada 3720; pci-aardvark). In mentioned
> > discussion I wrote when this race condition happen. But I understand
> > that it is hard to simulate it.
> 
> btw I looked at your patch, and isn't that just reducing the race window?

I probably have not wrote reply to that thread and only to Krzysztof on
IRC, but my "hack" really does not solve that race condition. And as you
wrote it only reduced occurrence on tested HW.

Krzysztof wrote that would look at this issue and try to solve it
properly. So I have not doing more investigation on that my "hack"
patch, race conditions are hard to catch and solve...

> I think we have a very similar problem in drm, where the
> drm_dev_register() for the overall device (which also registers all
> drm_connector) can race with the hotplug of an individual connector in
> drm_connector_register() which is hotplugged at runtime.
> 
> I went with a per-connector registered boolean + a lock to make sure
> that really only one of the two call paths can end up registering the
> connector. Part of registering connectors is setting up sysfs files,
> so I think it's exactly the same problem as here.
> 
> Cheers, Daniel
> 
> >
> > > but since the history goes back
> > > to pre-git times I figured it would have been addressed somehow
> > > already if it indeed does race.
> > > -Daniel
> > >
> > > > > s/also/Also/ in subject
> > > > >
> > > > > On Thu, Feb 04, 2021 at 05:58:30PM +0100, Daniel Vetter wrote:
> > > > > > We are already doing this for all the regular sysfs files on PCI
> > > > > > devices, but not yet on the legacy io files on the PCI buses. Thus 
> > > > > > far
> > > > > > now problem, but in the next patch I want to wire up iomem revoke
> > > > > > support. That needs the vfs up an running already to make so that
> > > > > > iomem_get_mapping() works.
> > > > >
> > > > > s/now problem/no problem/
> > > > > s/an running/and running/
> > > > > s/so that/sure that/ ?
> > > > >
> > > > > iomem_get_mapping() doesn't exist; I don't know what that should be.
> > > > >
> > > > > > Wire it up exactly like the existing code. Note that
> > > > > > pci_remove_legacy_files() doesn't need a check since the one for
> > > > > > pci_bus->legacy_io is sufficient.
> > > > >
> > > > > I'm not sure exactly what you mean by "the existing code."  I could
> > > > > probably figure it out, but it would save time to mention the existing
> > > > > function here.
> > > > >
> > > > > This looks like another instance where we should really apply Oliver's
> > > > > idea of converting these to attribute_groups [1].
> > > > >
> > > > > The cover letter mentions options discussed with Greg in [2], but I
> > > > > don't think the "sysfs_initialized" hack vs attribute_groups was part
> > > > > of that discussion.
> > > > >
> > > > > It's not absolutely a show-stopper, but it *is* a shame to extend the
> > > > > sysfs_initialized hack if attribute_groups could do this more cleanly
> > > > > and help solve more than one issue.
> > > > >
> > > > > Bjorn
> > > > >
> > > > > [1] 
> > > > > https://lore.kernel.org/r/caosf1chss03dbsdo4pmttmp0tceu5kscn704zewlkgxqzbf...@mail.gmail.com
> > > > > [2] 
> > > > > https://lore.kernel.org/dri-devel/cakmk7ugrddrbtj0oyzqqc0cgrqwc2f3tfju9vlfm2jjufaz...@mail

Re: [PATCH 1/2] PCI: also set up legacy files only after sysfs init

2021-02-04 Thread Pali Rohár

On Thursday 04 February 2021 15:50:19 Bjorn Helgaas wrote:
> [+cc Oliver, Pali, Krzysztof]

Just to note that extending or using sysfs_initialized introduces
another race condition into kernel code which results in PCI fatal
errors. Details are in email discussion which Bjorn already sent.

> s/also/Also/ in subject
> 
> On Thu, Feb 04, 2021 at 05:58:30PM +0100, Daniel Vetter wrote:
> > We are already doing this for all the regular sysfs files on PCI
> > devices, but not yet on the legacy io files on the PCI buses. Thus far
> > now problem, but in the next patch I want to wire up iomem revoke
> > support. That needs the vfs up an running already to make so that
> > iomem_get_mapping() works.
> 
> s/now problem/no problem/
> s/an running/and running/
> s/so that/sure that/ ?
> 
> iomem_get_mapping() doesn't exist; I don't know what that should be.
> 
> > Wire it up exactly like the existing code. Note that
> > pci_remove_legacy_files() doesn't need a check since the one for
> > pci_bus->legacy_io is sufficient.
> 
> I'm not sure exactly what you mean by "the existing code."  I could
> probably figure it out, but it would save time to mention the existing
> function here.
> 
> This looks like another instance where we should really apply Oliver's
> idea of converting these to attribute_groups [1].
> 
> The cover letter mentions options discussed with Greg in [2], but I
> don't think the "sysfs_initialized" hack vs attribute_groups was part
> of that discussion.
> 
> It's not absolutely a show-stopper, but it *is* a shame to extend the
> sysfs_initialized hack if attribute_groups could do this more cleanly
> and help solve more than one issue.
> 
> Bjorn
> 
> [1] 
> https://lore.kernel.org/r/caosf1chss03dbsdo4pmttmp0tceu5kscn704zewlkgxqzbf...@mail.gmail.com
> [2] 
> https://lore.kernel.org/dri-devel/cakmk7ugrddrbtj0oyzqqc0cgrqwc2f3tfju9vlfm2jjufaz...@mail.gmail.com/
> 
> > Signed-off-by: Daniel Vetter 
> > Cc: Stephen Rothwell 
> > Cc: Jason Gunthorpe 
> > Cc: Kees Cook 
> > Cc: Dan Williams 
> > Cc: Andrew Morton 
> > Cc: John Hubbard 
> > Cc: Jérôme Glisse 
> > Cc: Jan Kara 
> > Cc: Dan Williams 
> > Cc: Greg Kroah-Hartman 
> > Cc: linux...@kvack.org
> > Cc: linux-arm-ker...@lists.infradead.org
> > Cc: linux-samsung-...@vger.kernel.org
> > Cc: linux-me...@vger.kernel.org
> > Cc: Bjorn Helgaas 
> > Cc: linux-...@vger.kernel.org
> > ---
> >  drivers/pci/pci-sysfs.c | 7 +++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> > index fb072f4b3176..0c45b4f7b214 100644
> > --- a/drivers/pci/pci-sysfs.c
> > +++ b/drivers/pci/pci-sysfs.c
> > @@ -927,6 +927,9 @@ void pci_create_legacy_files(struct pci_bus *b)
> >  {
> > int error;
> >  
> > +   if (!sysfs_initialized)
> > +   return;
> > +
> > b->legacy_io = kcalloc(2, sizeof(struct bin_attribute),
> >GFP_ATOMIC);
> > if (!b->legacy_io)
> > @@ -1448,6 +1451,7 @@ void pci_remove_sysfs_dev_files(struct pci_dev *pdev)
> >  static int __init pci_sysfs_init(void)
> >  {
> > struct pci_dev *pdev = NULL;
> > +   struct pci_bus *pbus = NULL;
> > int retval;
> >  
> > sysfs_initialized = 1;
> > @@ -1459,6 +1463,9 @@ static int __init pci_sysfs_init(void)
> > }
> > }
> >  
> > +   while ((pbus = pci_find_next_bus(pbus)))
> > +   pci_create_legacy_files(pbus);
> > +
> > return 0;
> >  }
> >  late_initcall(pci_sysfs_init);
> > -- 
> > 2.30.0
> > 
> > 
> > ___
> > linux-arm-kernel mailing list
> > linux-arm-ker...@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Re: [PATCH v2] usb: host: xhci-plat: fix support for XHCI_SKIP_PHY_INIT quirk

2021-02-01 Thread Pali Rohár

On Tuesday 26 January 2021 10:06:06 Pali Rohár wrote:
> On Tuesday 26 January 2021 04:27:37 Yoshihiro Shimoda wrote:
> > Hi Pali,
> > > > I can see the benefit in this.
> > > > In the xhci-plat case usb_create_hcd and usb_add_hcd are separate 
> > > > steps, and
> > > > we could both copy the xhci_plat_priv .quirks and run the .init_qurks 
> > > > before
> > > > adding the hcd.
> > > > I guess the current way is inherited from pci case where the earliest 
> > > > place
> > > > to do this after hcd is created is the hcd->driver->reset callback 
> > > > (which is
> > > > set to xhci_pci_setup() or xhci_plat_setup()).
> > > >
> > > > xhci-rcar.c is using the .init_quirk to load firmware, we need to check 
> > > > with
> > > > them if this change is ok. (added Yoshihiro Shimoda to cc)
> > > 
> > > Yoshihiro, is this change OK?
> > > 
> > > Can we move forward? I really need to now how to handle regression in
> > > xhci-mvebu driver. And one option is with this patch...
> > 
> > Thank you for asking me about this topic. I tested the patch, but 
> > unfortunately,
> > this patch is possible to break a rcar platform because a phy 
> > initialization is
> > needed before the firmware loading if the platform uses the phy. (Note that
> > upstream code (salvator-common.dtsi) doesn't use the phy for xhci. But,
> > if we use the phy on other board with this patch, the xhci will not work.)
> > 
> > So, I think we need to add a new function pointer for your case.
> 
> Ok, thank you for testing! I will try to come up with other solution to
> mentioned mvebu-xhci issue.

Hello! New version of this patch is in following thread, please review it:
"[PATCH v2] usb: host: xhci: mvebu: make USB 3.0 PHY optional for Armada 3720"
https://lore.kernel.org/linux-usb/20210201150803.7305-1-p...@kernel.org/

> > Best regards,
> > Yoshihiro Shimoda
> > 
> > > > Their firmware would be loaded before phy parts are initialized, usb bus
> > > > registered, or roothub device allocated.
> > > >
> > > > Thanks
> > > > -Mathias

[PATCH v2] usb: host: xhci: mvebu: make USB 3.0 PHY optional for Armada 3720

2021-02-01 Thread Pali Rohár

Older ATF does not provide SMC call for USB 3.0 phy power on functionality
and therefore initialization of xhci-hcd is failing when older version of
ATF is used. In this case phy_power_on() function returns -EOPNOTSUPP.

[3.108467] mvebu-a3700-comphy d0018300.phy: unsupported SMC call, try 
updating your firmware
[3.117250] phy phy-d0018300.phy.0: phy poweron failed --> -95
[3.123465] xhci-hcd: probe of d0058000.usb failed with error -95

This patch introduces a new plat_setup callback for xhci platform drivers
which is called prior calling usb_add_hcd() function. This function at its
beginning skips PHY init if hcd->skip_phy_initialization is set.

Current init_quirk callback for xhci platform drivers is called from
xhci_plat_setup() function which is called after chip reset completes.
It happens in the middle of the usb_add_hcd() function and therefore this
callback cannot be used for setting if PHY init should be skipped or not.

For Armada 3720 this patch introduce a new xhci_mvebu_a3700_plat_setup()
function configured as a xhci platform plat_setup callback. This new
function calls phy_power_on() and in case it returns -EOPNOTSUPP then
XHCI_SKIP_PHY_INIT quirk is set to instruct xhci-plat to skip PHY
initialization.

This patch fixes above failure by ignoring 'not supported' error in
xhci-hcd driver. In this case it is expected that phy is already power on.

It fixes initialization of xhci-hcd on Espressobin boards where is older
Marvell's Arm Trusted Firmware without SMC call for USB 3.0 phy power.

This is regression introduced in commit bd3d25b07342 ("arm64: dts: marvell:
armada-37xx: link USB hosts with their PHYs") where USB 3.0 phy was defined
and therefore xhci-hcd on Espressobin with older ATF started failing.

Signed-off-by: Pali Rohár 
Tested-by: Tomasz Maciej Nowak 
Fixes: bd3d25b07342 ("arm64: dts: marvell: armada-37xx: link USB hosts with 
their PHYs")
Cc:  # 5.1+: ea17a0f153af: phy: marvell: comphy: 
Convert internal SMCC firmware return codes to errno
Cc:  # 5.1+: f768e718911e: usb: host: xhci-plat: add 
priv quirk for skip PHY initialization

---
Changes in v2:
* Drop dependency on patch "usb: host: xhci-plat: fix support for 
XHCI_SKIP_PHY_INIT quirk"
* Introduce a new .plat_setup callback for xhci-plat drivers in this patch (to 
simplify backporting and review)
* Move implementation from xhci_mvebu_mbus_init_quirk() into 
xhci_mvebu_a3700_plat_setup()
---
 drivers/usb/host/xhci-mvebu.c | 42 +++
 drivers/usb/host/xhci-mvebu.h |  6 +
 drivers/usb/host/xhci-plat.c  | 20 -
 drivers/usb/host/xhci-plat.h  |  1 +
 4 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/host/xhci-mvebu.c b/drivers/usb/host/xhci-mvebu.c
index 60651a50770f..8ca1a235d164 100644
--- a/drivers/usb/host/xhci-mvebu.c
+++ b/drivers/usb/host/xhci-mvebu.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -74,6 +75,47 @@ int xhci_mvebu_mbus_init_quirk(struct usb_hcd *hcd)
return 0;
 }
 
+int xhci_mvebu_a3700_plat_setup(struct usb_hcd *hcd)
+{
+   struct xhci_hcd *xhci = hcd_to_xhci(hcd);
+   struct device *dev = hcd->self.controller;
+   struct phy *phy;
+   int ret;
+
+   /* Old bindings miss the PHY handle */
+   phy = of_phy_get(dev->of_node, "usb3-phy");
+   if (IS_ERR(phy) && PTR_ERR(phy) == -EPROBE_DEFER)
+   return -EPROBE_DEFER;
+   else if (IS_ERR(phy))
+   goto phy_out;
+
+   ret = phy_init(phy);
+   if (ret)
+   goto phy_put;
+
+   ret = phy_set_mode(phy, PHY_MODE_USB_HOST_SS);
+   if (ret)
+   goto phy_exit;
+
+   ret = phy_power_on(phy);
+   if (ret == -EOPNOTSUPP) {
+   /* Skip initializatin of XHCI PHY when it is unsupported by 
firmware */
+   dev_warn(dev, "PHY unsupported by firmware\n");
+   xhci->quirks |= XHCI_SKIP_PHY_INIT;
+   }
+   if (ret)
+   goto phy_exit;
+
+   phy_power_off(phy);
+phy_exit:
+   phy_exit(phy);
+phy_put:
+   of_phy_put(phy);
+phy_out:
+
+   return 0;
+}
+
 int xhci_mvebu_a3700_init_quirk(struct usb_hcd *hcd)
 {
struct xhci_hcd *xhci = hcd_to_xhci(hcd);
diff --git a/drivers/usb/host/xhci-mvebu.h b/drivers/usb/host/xhci-mvebu.h
index 3be021793cc8..01bf3fcb3eca 100644
--- a/drivers/usb/host/xhci-mvebu.h
+++ b/drivers/usb/host/xhci-mvebu.h
@@ -12,6 +12,7 @@ struct usb_hcd;
 
 #if IS_ENABLED(CONFIG_USB_XHCI_MVEBU)
 int xhci_mvebu_mbus_init_quirk(struct usb_hcd *hcd);
+int xhci_mvebu_a3700_plat_setup(struct usb_hcd *hcd);
 int xhci_mvebu_a3700_init_quirk(struct usb_hcd *hcd);
 #else
 static inline int xhci_mvebu_mbus_init_quirk(struct usb_hcd *hcd)
@@ -19,6 +20,11 @@ static inline int xhci_mvebu_mbus_init_quirk(struct usb_hcd 
*hcd)
return 0;
 }
 
+static inline int xhci_mvebu_a3700_pla

Re: [PATCH] usb: host: xhci: mvebu: make USB 3.0 PHY optional for Armada 3720

2021-01-30 Thread Pali Rohár

On Saturday 30 January 2021 17:31:41 Tomasz Maciej Nowak wrote:
> W dniu 23.12.2020 o 17:24, Pali Rohár pisze:
> > Older ATF does not provide SMC call for USB 3.0 phy power on functionality
> > and therefore initialization of xhci-hcd is failing when older version of
> > ATF is used. In this case phy_power_on() function returns -EOPNOTSUPP.
> > 
> > [3.108467] mvebu-a3700-comphy d0018300.phy: unsupported SMC call, try 
> > updating your firmware
> > [3.117250] phy phy-d0018300.phy.0: phy poweron failed --> -95
> > [3.123465] xhci-hcd: probe of d0058000.usb failed with error -95
> > 
> > This patch calls phy_power_on() in xhci_mvebu_a3700_init_quirk() function
> > and in case it returns -EOPNOTSUPP then XHCI_SKIP_PHY_INIT quirk is set to
> > instruct xhci-plat to skip PHY initialization.
> > 
> > This patch fixes above failure by ignoring 'not supported' error in
> > aardvark driver. In this case it is expected that phy is already power on.
> > 
> > It fixes initialization of xhci-hcd on Espressobin boards where is older
> > Marvell's Arm Trusted Firmware without SMC call for USB 3.0 phy power.
> > 
> > This is regression introduced in commit bd3d25b07342 ("arm64: dts: marvell:
> > armada-37xx: link USB hosts with their PHYs") where USB 3.0 phy was defined
> > and therefore xhci-hcd on Espressobin with older ATF started failing.
> > 
> > Fixes: bd3d25b07342 ("arm64: dts: marvell: armada-37xx: link USB hosts with 
> > their PHYs")
> > Signed-off-by: Pali Rohár 
> > Cc:  # 5.1+: ea17a0f153af: phy: marvell: comphy: 
> > Convert internal SMCC firmware return codes to errno
> > Cc:  # 5.1+: f768e718911e: usb: host: xhci-plat: 
> > add priv quirk for skip PHY initialization
> > 
> > ---
> > 
> > When applying this patch, please include additional line
> > 
> > Cc:  # 5.1+: : usb: host: xhci-plat: fix 
> > support for XHCI_SKIP_PHY_INIT quirk
> > 
> > with correct COMMIT_ID of mentioned patch which is available in the thread:
> 
> Hi,
> and sorry for late reply, but that might be good reminder for maintainers.
> Anyway I tested this patch in conjunction with v2 from this topic:
> > https://lore.kernel.org/lkml/20201221150903.26630-1-p...@kernel.org/T/#u
> The USB ports work flawlessly with older ATF, so:
> 
> Tested-by: Tomasz Maciej Nowak 

Thank you for testing. But as was mentioned in discussion in above
patch, I have to rework both patches. I have new changes prepared but
have not had a chance to test new changes yet. If you want I can send
you a new version of those untested patches.

> At OpenWrt we are using patch which removes Comphy assignments from nodes
> in dts, but that is sub optimal, since that "discriminates" users with
> updated ATF. I would prefer this patch instead of what we are doing now
> in OpenWrt.
> 
> Thanks.
> 
> > 
> > As mentioned patch is required for change in this patch to work. Above
> > mentioned patch is prerequisite for this patch and therefore needs to be
> > reviewed and applied prior this patch.
> > 
> > Note that same issue as in this USB 3.0 PHY patch was already resolved and
> > applied also for SATA PHY and PCIe PHY on A3720 SOC in following commits:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=45aefe3d2251e4e229d7662052739f96ad1d08d9
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b0c6ae0f8948a2be6bf4e8b4bbab9ca1343289b6
> > 
> > And these commits were also backported to stable kernel versions (where
> > were affected commits which broke drivers initialization).
> > ---
> >  drivers/usb/host/xhci-mvebu.c | 35 +++
> >  1 file changed, 35 insertions(+)
> > 
> > diff --git a/drivers/usb/host/xhci-mvebu.c b/drivers/usb/host/xhci-mvebu.c
> > index 60651a50770f..ec4f6d6e44cf 100644
> > --- a/drivers/usb/host/xhci-mvebu.c
> > +++ b/drivers/usb/host/xhci-mvebu.c
> > @@ -8,6 +8,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  #include 
> >  #include 
> > @@ -77,9 +78,43 @@ int xhci_mvebu_mbus_init_quirk(struct usb_hcd *hcd)
> >  int xhci_mvebu_a3700_init_quirk(struct usb_hcd *hcd)
> >  {
> > struct xhci_hcd *xhci = hcd_to_xhci(hcd);
> > +   struct device *dev = hcd->self.controller;
> > +   struct phy *phy;
> > +   int ret;
> >  
> > /* Without reset on resume, the HC won't work at all */
> > xhci->quirks |= XHCI_RESET_ON_RESUME;
> >  
> > +   /* Old bindin

Re: [PATCH] hwmon: (dell-smm) Add XPS 15 L502X to fan control blacklist

2021-01-27 Thread Pali Rohár

On Monday 25 January 2021 21:21:30 Pali Rohár wrote:
> On Monday 25 January 2021 12:19:38 Guenter Roeck wrote:
> > On Mon, Jan 25, 2021 at 11:05:40AM +0100, Pali Rohár wrote:
> > > On Saturday 23 January 2021 18:46:08 Thomas Hebb wrote:
> > > > It has been reported[0] that the Dell XPS 15 L502X exhibits similar
> > > > freezing behavior to the other systems[1] on this blacklist. The issue
> > > > was exposed by a prior change of mine to automatically load
> > > > dell_smm_hwmon on a wider set of XPS models. To fix the regression, add
> > > > this model to the blacklist.
> > > > 
> > > > [0] https://bugzilla.kernel.org/show_bug.cgi?id=211081
> > > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=195751
> > > > 
> > > > Fixes: b8a13e5e8f37 ("hwmon: (dell-smm) Use one DMI match for all XPS 
> > > > models")
> > > > Cc: sta...@vger.kernel.org
> > > > Reported-by: Bob Hepple 
> > > > Tested-by: Bob Hepple 
> > > > Signed-off-by: Thomas Hebb 
> > > > ---
> > > > 
> > > >  drivers/hwmon/dell-smm-hwmon.c | 7 +++
> > > >  1 file changed, 7 insertions(+)
> > > > 
> > > > diff --git a/drivers/hwmon/dell-smm-hwmon.c 
> > > > b/drivers/hwmon/dell-smm-hwmon.c
> > > > index ec448f5f2dc3..73b9db9e3aab 100644
> > > > --- a/drivers/hwmon/dell-smm-hwmon.c
> > > > +++ b/drivers/hwmon/dell-smm-hwmon.c
> > > > @@ -1159,6 +1159,13 @@ static struct dmi_system_id 
> > > > i8k_blacklist_fan_support_dmi_table[] __initdata = {
> > > > DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "XPS13 9333"),
> > > > },
> > > > },
> > > > +   {
> > > > +   .ident = "Dell XPS 15 L502X",
> > > > +   .matches = {
> > > > +   DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
> > > > +   DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Dell System 
> > > > XPS L502X"),
> > > 
> > > Hello! Are you sure that it is required to completely disable fan
> > > support? And not only access to fan type label for which is different
> > > blaclist i8k_blacklist_fan_type_dmi_table?
> > > 
> > 
> > I'll drop this patch from my branch. Please send a Reviewed-by: or 
> > Acked-by: tag
> > if/when I should apply it.
> 
> Of course! We will just wait for Bob test results.

Guenter, now we have all needed information, fix is really needed in
this form. So you can add my:

Reviewed-by: Pali Rohár

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 4324 matches

Mail list logo