[fix linux-pci, remove ethan.zhao (bounces)] From: Bjorn Helgaas <[email protected]> Date: Tue, May 21, 2019 at 3:02 PM To: Himanshu Madhani Cc: [email protected], Andrew Vasquez, Girish Basrur, Giridhar Malavali, Myron Stowe, <[email protected]>, Linux Kernel Mailing List, Quinn Tran
> [+cc Myron, Quinn, linux-pci, linux-kernel] > > From: Himanshu Madhani <[email protected]> > Date: Fri, May 17, 2019 at 5:21 PM > To: [email protected], [email protected] > Cc: Andrew Vasquez, Girish Basrur, Giridhar Malavali > > > Hi Ethan, > > > > Our OEM partners reported to us that VPD access with latest distros were > > returning I/O error for them. They indicated this to be issue only with > > newer kernels. > > > > One of the distro vendor pointed out patch posted by you to be reason for > > IO error trying to VPD. The patch looks like blocks access to VPD by > > blacklisting ISP. > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0d5370d1d85251e5893ab7c90a429464de2e140b > > > > I setup PCIe analyzer to reproduce this in our lab to root cause it and > > discovered that after reverting the patch. I am able to get VPD data okay > > with upstream 5.1.0 and I used RHEL8. > > > > I also used "lspci" and "cat" to dump out VPD data and do not see any > > issue. > > > > # lspci -vvv -s 03:00.0 > > 03:00.0 Fibre Channel: QLogic Corp. ISP2722-based 16/32Gb Fibre Channel to > > PCIe Adapter (rev 01) > > Subsystem: QLogic Corp. QLE2742 Dual Port 32Gb Fibre > > Channel to PCIe Adapter > > Physical Slot: 15 > > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- > > ParErr+ Stepping- SERR+ FastB2B- DisINTx- > > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast > > >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > > Latency: 0, Cache Line Size: 64 bytes > > Interrupt: pin A routed to IRQ 67 > > NUMA node: 0 > > Region 0: Memory at fbe05000 (64-bit, prefetchable) > > [size=4K] > > Region 2: Memory at fbe02000 (64-bit, prefetchable) > > [size=8K] > > Region 4: Memory at fbd00000 (64-bit, prefetchable) > > [size=1M] > > Expansion ROM at fb540000 [disabled] [size=256K] > > Capabilities: [44] Power Management version 3 > > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA > > PME(D0-,D1-,D2-,D3hot-,D3cold-) > > Status: D0 NoSoftRst+ PME-Enable- DSel=0 > > DScale=0 PME- > > Capabilities: [4c] Express (v2) Endpoint, MSI 00 > > DevCap: MaxPayload 2048 > > bytes, PhantFunc 0, Latency L0s <4us, L1 <1us > > ExtTag- AttnBtn- AttnInd- > > PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W > > DevCtl: Report errors: Correctable+ > > Non-Fatal+ Fatal+ Unsupported+ > > RlxdOrd- ExtTag- PhantFunc- > > AuxPwr- NoSnoop+ FLReset- > > MaxPayload 256 bytes, > > MaxReadReq 4096 bytes > > DevSta: CorrErr+ UncorrErr- FatalErr- > > UnsuppReq+ AuxPwr- TransPend- > > LnkCap: Port #0, Speed 8GT/s, Width x8, > > ASPM L0s L1, Exit Latency L0s <512ns, L1 <2us > > ClockPM- Surprise- > > LLActRep- BwNot- ASPMOptComp+ > > LnkCtl: ASPM Disabled; RCB 64 bytes > > Disabled- CommClk+ > > ExtSynch- ClockPM- > > AutWidDis- BWInt- AutBWInt- > > LnkSta: Speed 8GT/s, Width x8, TrErr- > > Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- > > DevCap2: Completion Timeout: Range B, > > TimeoutDis+, LTR-, OBFF Not Supported > > AtomicOpsCap: 32bit- 64bit- > > 128bitCAS- > > DevCtl2: Completion Timeout: 50us to 50ms, > > TimeoutDis-, LTR-, OBFF Disabled > > AtomicOpsCtl: ReqEn- > > LnkCtl2: Target Link Speed: 8GT/s, > > EnterCompliance- SpeedDis- > > Transmit Margin: Normal > > Operating Range, EnterModifiedCompliance- ComplianceSOS- > > Compliance De-emphasis: -6dB > > LnkSta2: Current De-emphasis Level: -6dB, > > EqualizationComplete+, EqualizationPhase1+ > > EqualizationPhase2+, > > EqualizationPhase3+, LinkEqualizationRequest- > > Capabilities: [88] Vital Product Data > > Product Name: QLogic 32Gb 2-port FC to PCIe > > Gen3 x8 Adapter > > Read-only fields: > > [PN] Part number: QLE2742 > > [SN] Serial number: > > RFD1706R22611 > > [EC] Engineering changes: > > BK3210408-05 04 > > [V9] Vendor specific: 010189 > > [RV] Reserved: checksum > > good, 0 byte(s) reserved > > End > > Capabilities: [90] MSI-X: Enable+ Count=16 Masked- > > Vector table: BAR=2 offset=00000000 > > PBA: BAR=2 offset=00001000 > > Capabilities: [9c] Vendor Specific Information: Len=0c <?> > > Capabilities: [100 v1] Advanced Error Reporting > > UESta: DLP- SDES- TLP- FCP- CmpltTO- > > CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > > UEMsk: DLP- SDES- TLP- FCP- CmpltTO- > > CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > > UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- > > CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- > > CESta: RxErr- BadTLP- BadDLLP- Rollover- > > Timeout- NonFatalErr- > > CEMsk: RxErr- BadTLP- BadDLLP- Rollover- > > Timeout- NonFatalErr+ > > AERCap: First Error Pointer: > > 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- > > MultHdrRecCap- > > MultHdrRecEn- TLPPfxPres- HdrLogCap- > > HeaderLog: 00000000 00000000 00000000 > > 00000000 > > Capabilities: [154 v1] Alternative Routing-ID > > Interpretation (ARI) > > ARICap: MFVC- ACS-, Next Function: 1 > > ARICtl: MFVC- ACS-, Function Group: 0 > > Capabilities: [1c0 v1] #19 > > Capabilities: [1f4 v1] Vendor Specific Information: ID=0001 > > Rev=1 Len=014 <?> > > Kernel driver in use: qla2xxx > > Kernel modules: qla2xxx > > > > # cat /sys/bus/pci/devices/0000\:03\:00.0/vpd > > RFD1706R22611ECBK3210408-05 04V9010189RV�x > > > > Can you share some more insight into where you encountered issue? I am in > > process of reverting this patch from upstream kernel but wanted to reach > > out and find out if you still have setup to provide more context. > > 0d5370d1d852 ("PCI: Prevent VPD access for QLogic ISP2722") prevented > a panic while reading VPD, so we can't simply revert it. > > Since you don't see a panic while reading VPD from that device, it's > possible that a QLogic firmware change fixed the VPD format so Linux > no longer reads the area that caused the problem. Or possibly your > system doesn't handle the config read error the same way Ethan's HP > DL380 does. Unfortunately we don't have an actual PCIe analyzer trace > from Ethan's system, so we don't know exactly what happened on PCIe. > > I suggest that you capture the entire VPD area and hexdump it, e.g., > with "xxd", and look at its structure. pci_vpd_size() parses it and > computes the valid size based on a PCI_VPD_STIN_END tag, and > pci_vpd_read() should not read past that size. > > And you *do* have an analyzer trace. If new QLogic firmware fixed the > VPD format, the trace should show that Linux read only the valid part > of VPD, and there should be no errors in the trace. Then it might > just be a question of tweaking the quirk so it allows VPD reads if the > firmware is new enough. > > But if the trace does show config reads with errors, then it might be > that your system just tolerates the errors while the DL380 did not. > Then we'd have to figure out exactly what the error was and how to > deal with it so things work on both your system and Ethan's. > > Bjorn

