On Fri, 22 Oct 2010, Chris Friesen wrote:
> On 10/22/2010 11:06 AM, Chris Friesen wrote: > > On 10/12/2010 11:08 AM, Chris Friesen wrote: > > > >> On 10/08/2010 04:36 PM, Brandeburg, Jesse wrote: > >> > >>> seems reasonable, it should work okay. Does it fix the problem? It > >>> seems > >>> there must be a race between when the interrupt gets re-enabled and when > >>> the hardware clears the mask via EIAM on the next interrupt. > >>> > >> I'm about to give it a try. The problem can take hours to reproduce, so > >> we won't know for a day or so whether it's really gone. > >> > > It looks like the attached patch makes our problem go away. I only did > > the msix/NAPI code path, so a complete solution would need some more > > changes. > > > > Where do we go from here? If this is something that occurs on other > > boards would it make sense for the driver to provide a way to turn off > > the automasking? (Module parameter perhaps?) The question becomes why haven't we been able to reproduce this and why haven't we seen it before? I'm betting that there is something wrong with the MSI-X semantics of either your kernel or the system hardware. We are getting pretty big exposure with this driver and hardware to a lot of different environments (including other PPC/PPC64) and haven't heard reports of this yet. Your patch itself is an okay way for the driver to run (in fact the 1.X versions of the driver ran this way for quite a while) but like you said would need to be audited to make sure all code paths were taken care of. That said, the hardware does enough PCIe transactions without us adding more PCIe writes (which is why we enabled EIAM to begin with) I also didn't see where you disabled EIAM in the patch, is that separate? Thanks for all your work on this, unfortunately I think we need more evidence that it isn't just your hardware before we proceed with a general driver fix, but if you want to carry the patch yourself it does seem sufficient. Jesse > Sorry everyone, apparently the list is removing my attachments, so I've > included the patch inline below. > > Chris > > > > > This patch converts the driver to turn off auto-masking and explicitly > mask interrupts in software. This adds a pci message for each interrupt > but seems to fix the problem. NOTE: This patch only addresses msix > interrupts in the NAPI code path. > > > diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c > index 706f7b8..c2e117a 100644 > --- a/drivers/net/ixgbe/ixgbe_main.c > +++ b/drivers/net/ixgbe/ixgbe_main.c > @@ -1948,7 +1948,7 @@ static irqreturn_t ixgbe_msix_clean_tx(int irq, void > *data) > } > > #ifdef CONFIG_IXGBE_NAPI > - /* EIAM disabled interrupts (on this vector) for us */ > + ixgbe_irq_disable_queues(adapter, ((u64)1 << q_vector->v_idx)); > napi_schedule(&q_vector->napi); > #endif > /* > @@ -1999,7 +1999,7 @@ static irqreturn_t ixgbe_msix_clean_rx(int irq, void > *data) > if (!q_vector->rxr_count) > return IRQ_HANDLED; > > - /* EIAM disabled interrupts (on this vector) for us */ > + ixgbe_irq_disable_queues(adapter, ((u64)1 << q_vector->v_idx)); > napi_schedule(&q_vector->napi); > #endif > > @@ -2054,7 +2054,7 @@ static irqreturn_t ixgbe_msix_clean_many(int irq, void > *data) > } > > /* disable interrupts on this vector only */ > - /* EIAM disabled interrupts (on this vector) for us */ > + ixgbe_irq_disable_queues(adapter, ((u64)1 << q_vector->v_idx)); > napi_schedule(&q_vector->napi); > #endif > > @@ -3886,7 +3886,7 @@ static int ixgbe_up_complete(struct ixgbe_adapter > *adapter) > break; > default: > case ixgbe_mac_82599EB: > - IXGBE_WRITE_REG(hw, IXGBE_EIAM_EX(0), 0xFFFFFFFF); > + IXGBE_WRITE_REG(hw, IXGBE_EIAM_EX(0), 0xFFFF0000); > IXGBE_WRITE_REG(hw, IXGBE_EIAM_EX(1), 0xFFFFFFFF); > break; > } > > > > ------------------------------------------------------------------------------ Nokia and AT&T present the 2010 Calling All Innovators-North America contest Create new apps & games for the Nokia N8 for consumers in U.S. and Canada $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store http://p.sf.net/sfu/nokia-dev2dev _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired