On Fri, Sep 05, 2014 at 05:34:21PM -0600, Bjorn Helgaas wrote:
> Date: Fri, 5 Sep 2014 17:34:21 -0600
> From: Bjorn Helgaas <bhelg...@google.com>
> To: "Chen, Gong" <gong.c...@linux.intel.com>
> Cc: rdun...@infradead.org, b...@alien8.de, tony.l...@intel.com,
>  linux-...@vger.kernel.org, linux-kernel@vger.kernel.org
> Subject: Re: [RESEND RFC 5/5] PCIe, AER: Update initial value of UC error
>  mask
> User-Agent: Mutt/1.5.21 (2010-09-15)
> 
> On Wed, Aug 13, 2014 at 02:22:41AM -0400, Chen, Gong wrote:
> > In PCI-e SPEC r3.0, BIT 0 of Uncorrectable Error Status Register
> > is redefined and it has an explicit requirement that when writing
> > this field, a value of 1b is the only choice. So change previous
> > initial maks from 0 to 1.
> > 
> > Signed-off-by: Chen, Gong <gong.c...@linux.intel.com>
> > ---
> > NOTE: After scratching all use cases, this is the most obvious use
> > case to violate the SPEC. Most of use cases just read first and
> > then overwrite for clear purpose. Even so, such fix is obvious to
> > not compatiable with previous SPEC definition. Do we need a dirty
> > hack?
> > 
> >  arch/mips/pci/pci-octeon.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/mips/pci/pci-octeon.c b/arch/mips/pci/pci-octeon.c
> > index 59cccd95688b..f1bfdc201297 100644
> > --- a/arch/mips/pci/pci-octeon.c
> > +++ b/arch/mips/pci/pci-octeon.c
> > @@ -134,7 +134,7 @@ int pcibios_plat_dev_init(struct pci_dev *dev)
> >                                    dconfig);
> >             /* Enable reporting of all uncorrectable errors */
> >             /* Uncorrectable Error Mask - turned on bits disable errors */
> > -           pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_MASK, 0);
> > +           pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_MASK, 1);
> 
> I see the text in the spec that says we should only write 1 to bit 0 (sec
> 7.10.3, for anybody following along).  It looks like that change was made
> between PCIe r1.0 and r1.1.  It would really be nice to have more context
> about why the change was made, because if there's hardware in the field
> that implements r1.0 behavior, this patch will change the way it works, and
> I don't know how to verify that is safe.
> 
> Does this actually change fix a problem?  If it fixes a problem that
> happens on real hardware, that's a much better reason to make a change than
> just to comply with the spec.
> 
> Sec 7.10.2 also says we should ignore the value of bit 0 in the
> Uncorrectable Error Status register, and I don't see any place where we
> follow that advice.
> 
That's why I mark this patch as RFC. As you mentioned above, these are my
concerns, too. I submit such a patch not for merging but throwing a potential
issue. As I noted above, I don't know if it is deserved to fix all affected
placed to comply with spec change. After all, no one reports such an
issue (or maybe have happened :-))

Attachment: signature.asc
Description: Digital signature



Reply via email to