Re: [5/5] e1000e: Avoid receiver overrun interrupt bursts
On Tue, Sep 19, 2017 at 09:41:02PM +0200, Benjamin Poirier wrote: > On 2017/09/19 12:38, Philip Prindeville wrote: > > Hi. > > > > We’ve been running this patchset (all 5) for about as long as they’ve been > > under review… about 2 months. And in a burn-in lab with heavy traffic. > > > > We’ve not seen a single link-flap in hundreds of ours of saturated traffic. > > > > Would love to see some resolution soon on this as we don’t want to ship a > > release with unsanctioned patches. > > > > Is there an estimate on when that might be? > > The patches have been added to Jeff Kirsher's next-queue tree. I guess > they will be submitted for v4.15 which might be released in early > 2018... > http://phb-crystal-ball.org/ And then they will be submitted to linux-stable so this long standing regression can be fixed, right? -- Len Sorensen
Re: [Intel-wired-lan] [PATCH 4/5] e1000e: Separate signaling for link check/link up
On Wed, Aug 02, 2017 at 02:28:07PM +0300, Neftin, Sasha wrote: > On 7/21/2017 21:36, Benjamin Poirier wrote: > > Lennart reported the following race condition: > > > > \ e1000_watchdog_task > > \ e1000e_has_link > > \ hw->mac.ops.check_for_link() === e1000e_check_for_copper_link > > /* link is up */ > > mac->get_link_status = false; > > > > /* interrupt */ > > \ e1000_msix_other > > hw->mac.get_link_status = true; > > > > link_active = !hw->mac.get_link_status > > /* link_active is false, wrongly */ > > > > This problem arises because the single flag get_link_status is used to > > signal two different states: link status needs checking and link status is > > down. > > > > Avoid the problem by using the return value of .check_for_link to signal > > the link status to e1000e_has_link(). > > > > Reported-by: Lennart Sorensen <lsore...@csclub.uwaterloo.ca> > > Signed-off-by: Benjamin Poirier <bpoir...@suse.com> > > --- > > drivers/net/ethernet/intel/e1000e/mac.c| 11 --- > > drivers/net/ethernet/intel/e1000e/netdev.c | 2 +- > > 2 files changed, 9 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/net/ethernet/intel/e1000e/mac.c > > b/drivers/net/ethernet/intel/e1000e/mac.c > > index b322011ec282..f457c5703d0c 100644 > > --- a/drivers/net/ethernet/intel/e1000e/mac.c > > +++ b/drivers/net/ethernet/intel/e1000e/mac.c > > @@ -410,6 +410,9 @@ void e1000e_clear_hw_cntrs_base(struct e1000_hw *hw) > >* Checks to see of the link status of the hardware has changed. If a > >* change in link status has been detected, then we read the PHY > > registers > >* to get the current speed/duplex if link exists. > > + * > > + * Returns a negative error code (-E1000_ERR_*) or 0 (link down) or 1 > > (link > > + * up). > >**/ > > s32 e1000e_check_for_copper_link(struct e1000_hw *hw) > > { > > @@ -423,7 +426,7 @@ s32 e1000e_check_for_copper_link(struct e1000_hw *hw) > > * Change or Rx Sequence Error interrupt. > > */ > > if (!mac->get_link_status) > > - return 0; > > + return 1; > > /* First we want to see if the MII Status Register reports > > * link. If so, then we want to get the current speed/duplex > > @@ -461,10 +464,12 @@ s32 e1000e_check_for_copper_link(struct e1000_hw *hw) > > * different link partner. > > */ > > ret_val = e1000e_config_fc_after_link_up(hw); > > - if (ret_val) > > + if (ret_val) { > > e_dbg("Error configuring flow control\n"); > > + return ret_val; > > + } > > - return ret_val; > > + return 1; > > } > > /** > > diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c > > b/drivers/net/ethernet/intel/e1000e/netdev.c > > index fc6a1db2..5a8ab1136566 100644 > > --- a/drivers/net/ethernet/intel/e1000e/netdev.c > > +++ b/drivers/net/ethernet/intel/e1000e/netdev.c > > @@ -5081,7 +5081,7 @@ static bool e1000e_has_link(struct e1000_adapter > > *adapter) > > case e1000_media_type_copper: > > if (hw->mac.get_link_status) { > > ret_val = hw->mac.ops.check_for_link(hw); > > - link_active = !hw->mac.get_link_status; > > + link_active = ret_val > 0; > > } else { > > link_active = true; > > } > > Hello Benjamin, > > Will this patch fix any serious problem with link indication? Is it > necessary? Can we consider your patch series without 4/5 part? Without this patch, you have the race condition that can make the watchdog_task mistakenly think the link is down when it isn't, and then it resets the adapter, which does make the link go down. So it is rather catastrophic for the interface. The other patch to the interrupt handling should make it never get hit, but the issue does still exist if not fixed and I wouldn't rule out that it could possibly still happen even with the other fix in place. -- Len Sorensen
Re: commit 16ecba59 breaks 82574L under heavy load.
On Thu, Jul 20, 2017 at 04:44:55PM -0700, Benjamin Poirier wrote: > Could you please test the following patch and let me know if it: > 1) reduces the interrupt rate of the Other msi-x vector > 2) avoids the link flaps > or > 3) logs some dmesg warnings of the form "Other interrupt with unhandled [...]" > In this case, please paste icr values printed. By the way, while at fixing the e1000e, I just noticed that if you are blasting the port with traffic when it comes up, you risk getting a transmit queue time out, because the queue is started before the carrier is up. ixgbe already fixed that in cdc04dcce0598fead6029a2f95e95a4d2ea419c2. igb has the same problem (which goes away by moving the queue start to the watchdog after carrier_on, I just haven't got around to sending that patch yet). I am going to try moving the queue start to the watchdog and try it again. Trace looked like this: [ cut here ] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316 dev_watchdog+0x1f9/0x200 NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out Modules linked in: dpi_drv(PO) ccu_util(PO) ipv4_mb(PO) l2bridge_config_util(PO) l2_config_util(PO) route_config_util(PO) qos_config_util(PO) sysapp_common(PO) chantry_fwd_eng_2800_config(PO) shim_module(PO) sadb_cc(PO) ipsecXformer(PO) libeCrypto(PO) ipmatch_cc(PO) l2h_cc(PO) ndproxy_cc(PO) arpint_cc(PO) portinfo_cc(PO) chantryqos_cc(PO) redirector_cc(PO) ix_ph(PO) fpm_core_cc(PO) pulse_cc(PO) vnstt_cc(PO) vnsap_cc(PO) fm_cc(PO) rutm_cc(PO) mutm_cc(PO) ethernet_tx_cc(PO) stkdrv_cc(PO) l2bridge_cc(PO) events_util(PO) sched_cc(PO) qm_cc(PO) ipv4_cc(PO) wred_cc(PO) tc_meter_cc(PO) dscp_classifier_cc(PO) classifier_6t_cc(PO) ent586_cc(PO) dev_cc_arp(PO) chantry_fwd_eng_2800_tables(PO) ether_arp_lib(PO) rtmv4_lib(PO) lkup_lib(PO) l2tm_lib(PO) fragmentation_lib(PO) properties_lib(PO) msg_support_lib(PO) utilities_lib(PO) cci_lib(PO) rm_lib(PO) libossl(O) vip(O) productSpec_x86_dp(PO) e1000e CPU: 0 PID: 0 Comm: swapper/0 Tainted: P O4.9.24 #20 Hardware name: Supermicro X7SPA-HF/X7SPA-HF, BIOS 1.2a 06/23/12 811cef1b 88007fc03e88 81037ade 88007fc03ed8 0001 0082 0001 81037b4c Call Trace: [] ? dump_stack+0x46/0x5b [] ? __warn+0xbe/0xe0 [] ? warn_slowpath_fmt+0x4c/0x50 [] ? mod_timer+0xf2/0x150 [] ? dev_watchdog+0x1f9/0x200 [] ? dev_graft_qdisc+0x70/0x70 [] ? call_timer_fn.isra.26+0x11/0x80 [] ? run_timer_softirq+0x128/0x150 [] ? __do_softirq+0xeb/0x1f0 [] ? irq_exit+0x55/0x60 [] ? smp_apic_timer_interrupt+0x39/0x50 [] ? apic_timer_interrupt+0x7c/0x90 [] ? mwait_idle+0x51/0x80 [] ? cpu_startup_entry+0xa7/0x130 [] ? start_kernel+0x306/0x30e ---[ end trace ee759b7a56e1110b ]--- -- Len Sorensen
Re: [PATCH 4/5] e1000e: Separate signaling for link check/link up
On Fri, Jul 21, 2017 at 11:36:26AM -0700, Benjamin Poirier wrote: > Lennart reported the following race condition: > > \ e1000_watchdog_task > \ e1000e_has_link > \ hw->mac.ops.check_for_link() === e1000e_check_for_copper_link > /* link is up */ > mac->get_link_status = false; > > /* interrupt */ > \ e1000_msix_other > hw->mac.get_link_status = true; > > link_active = !hw->mac.get_link_status > /* link_active is false, wrongly */ > > This problem arises because the single flag get_link_status is used to > signal two different states: link status needs checking and link status is > down. > > Avoid the problem by using the return value of .check_for_link to signal > the link status to e1000e_has_link(). > > Reported-by: Lennart Sorensen <lsore...@csclub.uwaterloo.ca> > Signed-off-by: Benjamin Poirier <bpoir...@suse.com> This too seems potentially -stable worthy, although with patch 5, the problem becomes much much less likely to occur. -- Len Sorensen
Re: [PATCH 5/5] e1000e: Avoid receiver overrun interrupt bursts
On Fri, Jul 21, 2017 at 11:36:27AM -0700, Benjamin Poirier wrote: > When e1000e_poll() is not fast enough to keep up with incoming traffic, the > adapter (when operating in msix mode) raises the Other interrupt to signal > Receiver Overrun. > > This is a double problem because 1) at the moment e1000_msix_other() > assumes that it is only called in case of Link Status Change and 2) if the > condition persists, the interrupt is repeatedly raised again in quick > succession. > > Ideally we would configure the Other interrupt to not be raised in case of > receiver overrun but this doesn't seem possible on this adapter. Instead, > we handle the first part of the problem by reverting to the practice of > reading ICR in the other interrupt handler, like before commit 16ecba59bc33 > ("e1000e: Do not read ICR in Other interrupt"). Thanks to commit > 0a8047ac68e5 ("e1000e: Fix msi-x interrupt automask") which cleared IAME > from CTRL_EXT, reading ICR doesn't interfere with RxQ0, TxQ0 interrupts > anymore. We handle the second part of the problem by not re-enabling the > Other interrupt right away when there is overrun. Instead, we wait until > traffic subsides, napi polling mode is exited and interrupts are > re-enabled. > > Reported-by: Lennart Sorensen <lsore...@csclub.uwaterloo.ca> > Fixes: 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt") > Signed-off-by: Benjamin Poirier <bpoir...@suse.com> Any chance of this fix hitting -stable? After all adapter reset under load is not nice. -- Len Sorensen
Re: commit 16ecba59 breaks 82574L under heavy load.
On Fri, Jul 21, 2017 at 11:27:09AM -0400, wrote: > On Thu, Jul 20, 2017 at 04:44:55PM -0700, Benjamin Poirier wrote: > > Could you please test the following patch and let me know if it: > > 1) reduces the interrupt rate of the Other msi-x vector > > 2) avoids the link flaps > > or > > 3) logs some dmesg warnings of the form "Other interrupt with unhandled > > [...]" > > In this case, please paste icr values printed. > > I will give it a try. So test looks excellent. Seems to only get interrupts when link state actually changes now. > Another odd behaviour I see is that the driver will hang in > napi_synchronize on shutdown if there is traffic at the time (at least > I think that's the trigger, maybe the trigger is if there has been an > overload of traffic and the backlog in napi was used). > > From doing some searching, this seems to be a problem that has plagued > some people for years with this driver. > > I am having trouble figuring out exactly what napi_synchronize is waiting > for and who is supposed to toggle the flag it is waiting on. The flag > appears to work backwards from what I would have expected it to do. > I see lots of places that can set the bit, but only napi_enable seems > to clear it again, and I don't see how that would get called for all > the places that potentially set the bit. I just realized NAPI_STATE_SCHED and NAPIF_STATE_SCHED are the same thing and I need to look at both of those. Still something seems odd in some corner case where napi gets stuck and you can't close the port anymore due to napi_synchronize never being able to finish. Some traffic pattern causes that SCHED state bit to get into the wrong state and nothing ever clears it. Even managed to see it get stuck so it never passed traffic again and hung on shutdown. The napi poll was never called again. -- Len Sorensen
Re: commit 16ecba59 breaks 82574L under heavy load.
On Thu, Jul 20, 2017 at 04:44:55PM -0700, Benjamin Poirier wrote: > Could you please test the following patch and let me know if it: > 1) reduces the interrupt rate of the Other msi-x vector > 2) avoids the link flaps > or > 3) logs some dmesg warnings of the form "Other interrupt with unhandled [...]" > In this case, please paste icr values printed. I will give it a try. Another odd behaviour I see is that the driver will hang in napi_synchronize on shutdown if there is traffic at the time (at least I think that's the trigger, maybe the trigger is if there has been an overload of traffic and the backlog in napi was used). >From doing some searching, this seems to be a problem that has plagued some people for years with this driver. I am having trouble figuring out exactly what napi_synchronize is waiting for and who is supposed to toggle the flag it is waiting on. The flag appears to work backwards from what I would have expected it to do. I see lots of places that can set the bit, but only napi_enable seems to clear it again, and I don't see how that would get called for all the places that potentially set the bit. -- Len Sorensen
Re: commit 16ecba59 breaks 82574L under heavy load.
On Wed, Jul 19, 2017 at 05:07:47PM -0700, Benjamin Poirier wrote: > Are you sure about this? In my testing, while triggering the overrun > with the msleep, I read ICR when entering e1000_msix_other() and RXO is > consistently set. I had thousands of calls to e1000_msix_other where the only bit set was OTHER. I don't know if the cause is overruns, it just seems plausible. > I'm working on a patch that uses that fact to handle the situation and > limit the interrupt. Excellent. Running in MSI mode rather than MSI-X seems to not have the problem of unexpected interrupts, but has other issues (such as loosing the IRQ affinity setting if you do ifconfig down;ifconfig up on the interface, which does not happen in MSI-X's case.) That's rather annoying as you can't set the affinity before bringing up the interface which is rather undesirable. -- Len Sorensen
Re: commit 16ecba59 breaks 82574L under heavy load.
On Tue, Jul 18, 2017 at 04:14:35PM -0700, Benjamin Poirier wrote: > Thanks for the detailed analysis. > > Refering to the original discussion around this patch series, it seemed like > the IMS bit for a condition had to be set for the Other interrupt to be raised > for that condition. > > https://lkml.org/lkml/2015/11/4/683 > > In this case however, E1000_ICR_RXT0 is not set in IMS so Other shouldn't be > raised for Receiver Overrun. Apparently something is going on... > > I can reproduce the spurious Other interrupts with a simple mdelay() > With the debugging patch at the end of the mail I see stuff like this > while blasting with udp frames: > -0 [086] d.h1 15338.742675: e1000_msix_other: got Other > interrupt, count 15127 ><...>-54504 [086] d.h. 15338.742724: e1000_msix_other: got Other > interrupt, count 1 ><...>-54504 [086] d.h. 15338.742774: e1000_msix_other: got Other > interrupt, count 1 ><...>-54504 [086] d.h. 15338.742824: e1000_msix_other: got Other > interrupt, count 1 > -0 [086] d.h1 15340.745123: e1000_msix_other: got Other > interrupt, count 27584 ><...>-54504 [086] d.h. 15340.745172: e1000_msix_other: got Other > interrupt, count 1 ><...>-54504 [086] d.h. 15340.745222: e1000_msix_other: got Other > interrupt, count 1 ><...>-54504 [086] d.h. 15340.745272: e1000_msix_other: got Other > interrupt, count 1 > > > hence sets the flag that (unfortunately) means both link is down and link > > state should be checked. Since this now happens 3000 times per second, > > the chances of it happening while the watchdog_task is checking the link > > state becomes pretty high, and it if does happen to coincice, then the > > watchdog_task will reset the adapter, which causes a real loss of link. > > Through which path does watchdog_task reset the adapter? I didn't > reproduce that. The other interrupt happens and sets get_link_status to true. At some point the watchdog_task runs on some core and calls e1000e_has_link, which then calls check_for_link to find out the current link status. While e1000e_check_for_copper_link is checking the link state and after updating get_link_status to false to indicate link is up, another interrupt occurs and another core handles it and changes get_link_status to true again. So by the time e1000e_has_link goes to determine the return value, get_link_state has changed back again so now it returns link down, and as a result the watchdog_task calls reset, because we have packets in the transmit queue (we were busy forwarding over 10 packets per second when it happened). Running on an Atom D525 which isn't very fast and uses hyperthreading might have something to do with how the scheduling manages to trigger this race condition. On a faster CPU you very likely would be done checking the link state quickly enough that the interrupt handler rarely gets a chance to interfere. Also we have the irq affinity set so the RX/TX of one port is handled by one CPU, the RX/TX of the other port by a different CPU and the Other interrupts and other tasks (like the watchdog) are handled by the last two CPUs. Either making the current link state its own bool and keeping it's meaning away from get_link_state, or making the interrupt handler only change get_link_state when LSC is actually present makes the problem go away. Having two meanings to get_link_state (both link state needs checking and what the link state is) causes issues. After all it is using a bool to store 3 values: Link is up, link needs checking but is up and link needs checking but is down. Of course the last two states are rather quantum, in that you don't know which it is until you check. > diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c > b/drivers/net/ethernet/intel/e1000e/netdev.c > index b3679728caac..689ad76d0d12 100644 > --- a/drivers/net/ethernet/intel/e1000e/netdev.c > +++ b/drivers/net/ethernet/intel/e1000e/netdev.c > @@ -46,6 +46,8 @@ > > #include "e1000.h" > > +DEFINE_RATELIMIT_STATE(e1000e_ratelimit_state, 2 * HZ, 4); > + > #define DRV_EXTRAVERSION "-k" > > #define DRV_VERSION "3.2.6" DRV_EXTRAVERSION > @@ -937,6 +939,8 @@ static bool e1000_clean_rx_irq(struct e1000_ring > *rx_ring, int *work_done, > bool cleaned = false; > unsigned int total_rx_bytes = 0, total_rx_packets = 0; > > + mdelay(10); > + > i = rx_ring->next_to_clean; > rx_desc = E1000_RX_DESC_EXT(*rx_ring, i); > staterr = le32_to_cpu(rx_desc->wb.upper.status_error); > @@ -1067,6 +1071,13 @@ static bool e1000_clean_rx_irq(struct e1000_ring > *rx_ring, int *work_done, > > adapter->total_rx_bytes += total_rx_bytes; > adapter->total_rx_packets += total_rx_packets; > + > + if (__ratelimit(_ratelimit_state)) { > + static unsigned int max; > + max = max(max, total_rx_packets); > + trace_printk("received %u max %u\n",
commit 16ecba59 breaks 82574L under heavy load.
Commit 16ecba59bc333d6282ee057fb02339f77a880beb has apparently broken at least the 82574L under heavy load (as in load heavy enough to cause packet drops). In this case, when running in MSI-X mode, the Other Causes interrupt fires about 3000 times per second, but not due to link state changes. Unfortunately this commit changed the driver to assume that the Other Causes interrupt can only mean link state change and hence sets the flag that (unfortunately) means both link is down and link state should be checked. Since this now happens 3000 times per second, the chances of it happening while the watchdog_task is checking the link state becomes pretty high, and it if does happen to coincice, then the watchdog_task will reset the adapter, which causes a real loss of link. Reverting the commit makes everything work fine again (of course packets are still dropped, but at least the link stays up, the adapter isn't reset, and most packets make it through). I tried checking what the bits in the ICR actually were under these conditions, and it would appear that the only bit set is 24 (the Other Causes interrupt bit). So I don't know what the real cause is although rx buffer overrun would be my guess, and in fact I see nothing in the datasheet indicating that you can actually disable the rx buffer overrun from generating an interrupt. Prior to this commit, the interrupt handler explicitly checked that the interrupt was caused by a link state change and only then did it trigger a recheck which worked fine and did not cause incorrect adapter resets, although it of course still had lots of undesired interrupts to deal with. Of course ideally there would be a way to make these 3000 pointless interrupts per second not happen, but unless there is a way to determine that, I think this commit needs reverting, since it apparently causes link failures on actual hardware that exists. The ports are onboard intel 82574L on a Supermicro X7SPA-HF-D525 with 1.2a BIOS (upgrading to 1.2b to check if it makes a difference is not an option unfortunately). -- Len Sorensen
Re: e1000e on Thinkpad x60: gigabit not available due to "SmartSpeed"
On Thu, Sep 01, 2016 at 02:58:13PM -0700, Greg wrote: > On Thu, 2016-09-01 at 22:14 +0200, Pavel Machek wrote: > > Hi! > > > > I have trouble getting 1000mbit out of my ethernet card. > > > > I tried direct connection between two PCs with different cables, and > > no luck. > > > > Today I tried connection to 1000mbit switch, and no luck, either. (Two > > cables, one was cat6, both short). > > > > My computer sees 1000mbit being advertised by the other side, but does > > not advertise 1000mbit, "Link Speed was downgraded by SmartSpeed". > > Check your cables? > > https://vmxp.wordpress.com/2015/01/06/1gbe-intel-nic-throttled-to-100mbit-by-smartspeed/ Of course if it isn't the cable, then it could even be a broken pin in the port. As far as I can tell, anything that causes one of the 3rd or 4th pairs of wires to not work will degrade to 100Mbit on just the first 2 pairs of wires and give that message. Some badly implemented switches can also cause it of course. -- Len Sorensen
Re: CVE-2014-9900 fix is not upstream
On Tue, Aug 23, 2016 at 10:25:45PM +0100, Al Viro wrote: > Sadly, sizeof is what we use when copying that sucker to userland. So these > padding bits in the end would've leaked, true enough, and the case is somewhat > weaker. And any normal architecture will have those, but then any such > architecture will have no more trouble zeroing a 32bit value than 16bit one. Hmm, good point. Too bad I don't see a compiler option of "zero all padding in structs". Certainly generating the code should not really be that different. I see someone did request it 2 years ago: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63479 -- Len Sorensen
Re: CVE-2014-9900 fix is not upstream
On Tue, Aug 23, 2016 at 01:34:05PM -0700, Joe Perches wrote: > On Tue, 2016-08-23 at 21:09 +0100, Al Viro wrote: > > On Tue, Aug 23, 2016 at 11:24:06AM -0700, David Miller wrote: > > ... and then we can file a bug report against the sodding compiler. Note > > that > > struct ethtool_wolinfo { > > __u32 cmd; > > __u32 supported; > > __u32 wolopts; > > __u8sopass[SOPASS_MAX]; // 6, actually > > }; > > is not going to *have* padding. Not on anything even remotely sane. > > If array of 6 char as member of a struct requires 64bit alignment on some > > architecture, I would really like some of what the designers of that ABI > > must have been smoking. > > try this on x86-64 > > $ pahole -C ethtool_wolinfo vmlinux > struct ethtool_wolinfo { > __u32 cmd; /* 0 4 */ > __u32 supported;/* 4 4 */ > __u32 wolopts; /* 8 4 */ > __u8 sopass[6];/*12 6 */ > > /* size: 20, cachelines: 1, members: 4 */ > /* padding: 2 */ > /* last cacheline: 20 bytes */ > }; That would be padding after the structure elements. I think what was meant is that it won't add padding in the middle of the structure due to alignment, ie it isn't doing: struct ethtool_wolinfo { __u32 cmd; /* 0 4 */ __u32 supported;/* 4 4 */ __u32 wolopts; /* 8 4 */ <4 bytes padding here> __u8 sopass[6];/*16 6 */ }; which would have 4 bytes of padding in the middle between wolopts and sopass. I would not think it is the compilers job to worry about what is after your structure elements, since you shouldn't be going there. -- Len Sorensen
Re: [PATCH] net: ethernet: ti: cpdma: switch to use genalloc
On Fri, Jun 24, 2016 at 07:58:32PM +0300, Grygorii Strashko wrote: > Oh. nice :( So, seems, I'd need to send v3. Right? > By the way, this code hasn't been introduced by this patch - I've > just moved whole function from one place to another. Well since it is moving I would think that was a handy time to fix the coding style violation too, since it got noticed. That leaves just one place in that file violating that part of the coding style (the other is in cpdma_chan_dump). Somehow it wasn't spotted when the code was put in back in 2010, and since they were wrapped lines, they don't stand out quite as much visually. -- Len Sorensen
Re: [PATCH] net: ethernet: ti: cpdma: switch to use genalloc
On Fri, Jun 24, 2016 at 11:35:15AM +0530, Mugunthan V N wrote: > >> +static void cpdma_desc_pool_destroy(struct cpdma_desc_pool *pool) > >> +{ > >> +if (!pool) > >> +return; > >> + > >> +WARN_ON(pool->used_desc); > >> +if (pool->cpumap) { > >> +dma_free_coherent(pool->dev, pool->mem_size, pool->cpumap, > >> + pool->phys); > >> +} else { > >> +iounmap(pool->iomap); > >> +} > >> +} > >> + > > single if, brackets? > > if() has multiple line statement, so brackets are must. It is line wrapped, it is still one statement. And you can't argue the else being multiple lines, although the style does require using brackets for the else if the if required them. Style says "Do not unnecessarily use braces where a single statement will do." It says statement, not line. A multiline wrapped statement is still one statement. I may personally hate the lack of brackets, but style wise it seems very clear that the linux kernel only uses brakcets when required, which is only when there is more than one statement. I prefer what you did, but not as much as I prefer consistency. -- Len Sorensen
Re: [PATCH] ti: Remove no longer used functions and prototypes in the files, cpsw_ale.c and cpsw_ale.h
On Fri, May 29, 2015 at 12:31:57PM -0400, Nicholas Krause wrote: This removes the function, cpsw_ale_flush and its prototype from the files cpsw_ale.c and cpsw_ale.h due to having no more callers. Finally we also remove the functions, cpsw_ale_set_vlan_entry, cpsw_ale_flush_ucast and cpsw_ale_add_ucast and their prototypes due to their only caller being removed with the removal of the function, cpsw_ale.c respectfully. Signed-off-by: Nicholas Krause xerofo...@gmail.com --- drivers/net/ethernet/ti/cpsw_ale.c | 162 - drivers/net/ethernet/ti/cpsw_ale.h | 3 - 2 files changed, 165 deletions(-) diff --git a/drivers/net/ethernet/ti/cpsw_ale.c b/drivers/net/ethernet/ti/cpsw_ale.c index 6e927b4..b360dc8 100644 --- a/drivers/net/ethernet/ti/cpsw_ale.c +++ b/drivers/net/ethernet/ti/cpsw_ale.c @@ -147,27 +147,6 @@ static int cpsw_ale_write(struct cpsw_ale *ale, int idx, u32 *ale_entry) return idx; } -static int cpsw_ale_match_addr(struct cpsw_ale *ale, u8 *addr, u16 vid) -{ - u32 ale_entry[ALE_ENTRY_WORDS]; - int type, idx; - - for (idx = 0; idx ale-params.ale_entries; idx++) { - u8 entry_addr[6]; - - cpsw_ale_read(ale, idx, ale_entry); - type = cpsw_ale_get_entry_type(ale_entry); - if (type != ALE_TYPE_ADDR type != ALE_TYPE_VLAN_ADDR) - continue; - if (cpsw_ale_get_vlan_id(ale_entry) != vid) - continue; - cpsw_ale_get_addr(ale_entry, entry_addr); - if (ether_addr_equal(entry_addr, addr)) - return idx; - } - return -ENOENT; -} - static int cpsw_ale_match_vlan(struct cpsw_ale *ale, u16 vid) { u32 ale_entry[ALE_ENTRY_WORDS]; @@ -268,147 +247,6 @@ int cpsw_ale_flush_multicast(struct cpsw_ale *ale, int port_mask, int vid) } EXPORT_SYMBOL_GPL(cpsw_ale_flush_multicast); -static void cpsw_ale_flush_ucast(struct cpsw_ale *ale, u32 *ale_entry, - int port_mask) -{ - int port; - - port = cpsw_ale_get_port_num(ale_entry); - if ((BIT(port) port_mask) == 0) - return; /* ports dont intersect, not interested */ - cpsw_ale_set_entry_type(ale_entry, ALE_TYPE_FREE); -} - -int cpsw_ale_flush(struct cpsw_ale *ale, int port_mask) -{ - u32 ale_entry[ALE_ENTRY_WORDS]; - int ret, idx; - - for (idx = 0; idx ale-params.ale_entries; idx++) { - cpsw_ale_read(ale, idx, ale_entry); - ret = cpsw_ale_get_entry_type(ale_entry); - if (ret != ALE_TYPE_ADDR ret != ALE_TYPE_VLAN_ADDR) - continue; - - if (cpsw_ale_get_mcast(ale_entry)) - cpsw_ale_flush_mcast(ale, ale_entry, port_mask); - else - cpsw_ale_flush_ucast(ale, ale_entry, port_mask); - - cpsw_ale_write(ale, idx, ale_entry); - } - return 0; -} -EXPORT_SYMBOL_GPL(cpsw_ale_flush); - -static inline void cpsw_ale_set_vlan_entry_type(u32 *ale_entry, - int flags, u16 vid) -{ - if (flags ALE_VLAN) { - cpsw_ale_set_entry_type(ale_entry, ALE_TYPE_VLAN_ADDR); - cpsw_ale_set_vlan_id(ale_entry, vid); - } else { - cpsw_ale_set_entry_type(ale_entry, ALE_TYPE_ADDR); - } -} - -int cpsw_ale_add_ucast(struct cpsw_ale *ale, u8 *addr, int port, -int flags, u16 vid) -{ - u32 ale_entry[ALE_ENTRY_WORDS] = {0, 0, 0}; - int idx; - - cpsw_ale_set_vlan_entry_type(ale_entry, flags, vid); - - cpsw_ale_set_addr(ale_entry, addr); - cpsw_ale_set_ucast_type(ale_entry, ALE_UCAST_PERSISTANT); - cpsw_ale_set_secure(ale_entry, (flags ALE_SECURE) ? 1 : 0); - cpsw_ale_set_blocked(ale_entry, (flags ALE_BLOCKED) ? 1 : 0); - cpsw_ale_set_port_num(ale_entry, port); - - idx = cpsw_ale_match_addr(ale, addr, (flags ALE_VLAN) ? vid : 0); - if (idx 0) - idx = cpsw_ale_match_free(ale); - if (idx 0) - idx = cpsw_ale_find_ageable(ale); - if (idx 0) - return -ENOMEM; - - cpsw_ale_write(ale, idx, ale_entry); - return 0; -} -EXPORT_SYMBOL_GPL(cpsw_ale_add_ucast); - -int cpsw_ale_del_ucast(struct cpsw_ale *ale, u8 *addr, int port, -int flags, u16 vid) -{ - u32 ale_entry[ALE_ENTRY_WORDS] = {0, 0, 0}; - int idx; - - idx = cpsw_ale_match_addr(ale, addr, (flags ALE_VLAN) ? vid : 0); - if (idx 0) - return -ENOENT; - - cpsw_ale_set_entry_type(ale_entry, ALE_TYPE_FREE); - cpsw_ale_write(ale, idx, ale_entry); - return 0; -} -EXPORT_SYMBOL_GPL(cpsw_ale_del_ucast); - -int cpsw_ale_add_mcast(struct cpsw_ale *ale, u8 *addr, int port_mask, -int flags, u16 vid, int mcast_state) -{ - u32
Re: No idea about shaping trough many pc
On Thu, Jan 10, 2008 at 12:06:35PM +0300, Badalian Vyacheslav wrote: Hello all. I try more then 2 month resolve problem witch my shaping. Maybe you can help for me? Sheme: +---+ + - | Shaping PC 1 | -+ / +---+ \ ++ / ++ \ + + | Cisco | + | Shaping PC N | ---+ -| CISCO | ++ \ ++ / +-+ \ +-+ / + - | Shaping PC 20 | + +-+ Network - Over 10k users. Common bandwidth to INTERNET more then 1 GBs All computers have BGP and turn on multipath. Cisco can't do load sharing by Packet (its can resolve all my problems =((( ). Only by DST IP, SRC IP, or +Level4. Ok. User must have speed 1mbs. Lets look variants: 1. Create rules to user = (1mbs/N computers). If user use N connection all great, but if it use 1 connection his speed = 1mbs/N - its not look good. All be great if cisco can PER PACKET load sharing =( 2. Create rules to user = 1mbs. If user use 1 connection all great, but if it use N connection his speed much more then needed limit =( Why i use 20 PC? Becouse 1 pc normal forward 100-150mbs... when it have 100% cpu usage on Sofware Interrupts... I have managed forwarding of 600Mbps using about 15% CPU load on a 500MHz Geode LX, using 4 100Mbit pcnet32 interfaces and a small tweak to how the NAPI is implemented on it. Adding traffic shapping and such to the processing would certainly increase the CPU load, but hopefully not by much. The reason I didn't get more than 600Mbps was that the PCI bus is now full. Any idea how to resolve this problem? In my dreams (feature request to netdev ;) ): Get PC - title: MASTER TC. All 20 PC syncronize statistic with MASTER and have common rules and statistic. Then i use variant 2 and will be happy... but its not real? =( Maybe have other variants? Well now sure about synchornizing and all that. I still think if I can manage 600Mbps forwarding rate using a slow poke Geode then a modern CPU like a Q6600 with a number of PCIe gig ports should be able to do quite a lot. The tweak I did was to add a timer to the driver that I can activate whenever I finish emptying the receive queue. When the timer expires it adds the port back to the NAPI queue, and when it is called again the poll will either process whatever packets arrived during the delay, or it will actually unmask the IRQ and go back to IRQ mode. The delay I use is 1 jiffy, and I run with 1000HZ and set the queues to 256 packets, since 1ms at 100MBps can provide at most about 200 packets (64byte worst case). I simply check whenever I empty the queue how many packets I just processed. If greater than 0, I enable the timer to expire on the next jiffy and leave the port masked after removing port from napi polling, and if it was 0 then I must have been called again after the timer expired and still had no packets to process in which case I unmask the IRQ and don't enable the timer. I had to change the HZ to 1000 since at 250 or 100 I wouldn't be able to handle the worst case number of packets (the pcnet32 has a maximum of 512 packets in a queue). With NAPI the normal behaviour is that whenever you empty the receive queue, you reenable IRQs, but it doesn't take that fast a CPU to actually empty the queue all the time and then you end up with the overhead for masking IRQs everytime you receive packets, process them, and then the overhead of unmasking the IRQ just to within a fraction of a milisecond getting an IRQ for the next packet. With the delay until the next jiffy for unmasking the IRQ you end up causing a potential lag on processing packets of up to 1ms, although on average less than that, but the IRQ load drops dramatically and the overhead of managing the IRQ masking and the IRQ handler goes away. In the case of this system the CPU load dropped from 90% at 500Mbps to 15% at 600Mbps, and the interrupt rate dropped from one IRQ every couple of packets, to one IRQ at the start of each burst of packets. I believe some GB ethernet ports and most 10Gig ports have the ability to do delayed IRQ where they wait for a certain number of packets before generating an IRQ, which is pretty much what I tried to emulate with my tweak and it sure works amazingly well. -- Len Sorensen -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pcnet32: fix non-napi packet reception
On Wed, Oct 17, 2007 at 05:04:01PM -0700, Don Fry wrote: I have no objections myself. It has been slowly moving that direction. First with the napi implementation, default off, labeled experimental. Then removing experimental and then making the default on. If any other user of the pcnet32 has objections, now is the time to speak loudly! I have used NAPI only on the pcnet32 for quite a while now. In fact I think a few of my local patches would break if I disabled NAPI. __ Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: iproute2: resend of patches from Debian.
On Thu, Oct 11, 2007 at 08:25:32PM +0200, Andreas Henriksson wrote: Patch from debian iproute package. diff -urNad iproute-20060323~/ip/iplink.c iproute-20060323/ip/iplink.c --- iproute-20060323~/ip/iplink.c 2006-03-22 00:57:50.0 +0100 +++ iproute-20060323/ip/iplink.c 2006-09-08 21:07:14.0 +0200 @@ -384,6 +384,10 @@ } if (newname strcmp(dev, newname)) { + if (strlen(newname) == 0) { + printf(\\ is not valid device identifier\n,dev); + return -1; + } if (do_changename(dev, newname) 0) return -1; dev = newname; Isn't that printf missing somewhere for the 'dev' argument to go? -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] bnx2: factor out gzip unpacker
On Fri, Sep 21, 2007 at 11:37:52PM +0100, Denys Vlasenko wrote: But I compile net/* into bzImage. I like netbooting :) Isn't it possible to netboot with an initramfs image? I am pretty sure I have seen some systems do exactly that. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Odd behaviour of proxy_arp (although I solved part of it and think I figured out what stupid thing it is doing)
On Mon, Jul 23, 2007 at 04:36:22PM -0400, Lennart Sorensen wrote: I have been seeing some occasional strange behavior when using proxy_arp. I have a router running with an ADSL PPPoE link to the Internet, and an Ethernet link to a local network. It has proxy_arp enabled on the internal Ethernet port since I sometimes have ipsec tunnels running where I use proxy_arp to proxy for the IP assigned to the other end of the tunnel so that local machines can find and reach it. I run two independent subnets on the local network (one with fixed IPs for my machines here, and another with DHCP addresses for guest machines that visit occasionally just to give them Internet access). I run 10.0.0.0/8 and 192.168.254.0/24 on the local network with the router having an IP in each subnet. The strangeness that occurs is that once in a while there is a 10 second period where the system will answer all arp requests for all IPs on the local network, with it's own MAC address, which is clearly wrong since it doesn't have any of those IP addresses. It seems to happen every couple of days or so on average, although not at any specific time. One day it happened at 11:32:30 to 11:32:39, and a few days later it happened at 12:08:38 to 12:08:48. If I disable proxy_arp, it never happens at all, but then I loose the ability to do what I have proxy_arp enabled for in the first place. It turns out the reason for the 10 seconds or so, was to do with running VRRP and how I handle routes in that situation, so I fixed that. It still doesn't solve the annoyance below. Related to that problem, there is also the annoyance that any IP that isn't part of either of the two subnets the router belongs to, have arp requests answered by the router all the time, which it also should not be answering, since it doesn't actually have a clue what those IP addresses belong to and certainly has no idea where it should forward to to reach them. I occasionally have other random subnets in use on the network for running local test networks separate from everything else. It would be great if the kernel would keep its nose out of those subnets too. So far I have seen this behavior with 2.6.8, 2.6.16, and 2.6.18 (being the kernels I have run on this router). So have I misunderstood something about what proxy_arp is supposed to do, or is proxy_arp in the kernel simply broken, or is it perhaps mis-designed? Are there some tuning parameters that could perhaps make it actually do what one would expect it to be doing? So I found out part of the problem. If ip forwarding is enabled (and why ever would it not be) and proxy_arp is enabled, and you have a default route set, then the kernel will answer arp requests for any IP address that it doesn't think is local to a given interface. So if the system has an ip of 10.0.0.254/8 on eth1, and an arp request arrives for 192.168.1.1 it will answer the arp request with the MAC of eth1 just because it thinks it could forwards the packet through the default route. Now the reason for having proxy_arp enabled in the first place, is to allow ipsec connections to use 10.x.x.x/8 addresses for the remote client, to make them appear local. This of course does not mean I want to screw up life for people doing a small test on the local network with 192.168.x.x addresses on their own devices. So the question is, can one make the kernel only answer arp requests for target IPs that belong to a given network interface? It looks like arp_filter or arp_ignore should do such a thing, but no matter what I set that to, it still answers all arp requests for IPs it doesn't think are local to the network as far as I can tell (I was doing arping for random addresses from a client on the network, although the source IP would have been considered local so maybe that doesn't count). I do see arp requests show up occationally from other clients on the network which get answered by the router when they shouldn't have though, so I don't think the source IP has anything to do with it. Perhaps there is a mistake in the filter/ignore code making it not filter requests correctly. Of course overall I am finding it hard to understand when anyone would ever want to answer arp requests for any IP that shouldn't exist on the network the request comes from and which the router doesn't explicitly have an arp entry for (such as the ipsec case). I especially don't see when you would ever want the default route to be considered for matching to determine if it could forward the packet and should answer the arp request. Is it incorrect to enable proxy_arp in the case where I want to answer arp requests for IPs that should be local but which I have to tunnel to reach? Is there a better solution? -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Odd behaviour of proxy_arp
I have been seeing some occasional strange behavior when using proxy_arp. I have a router running with an ADSL PPPoE link to the Internet, and an Ethernet link to a local network. It has proxy_arp enabled on the internal Ethernet port since I sometimes have ipsec tunnels running where I use proxy_arp to proxy for the IP assigned to the other end of the tunnel so that local machines can find and reach it. I run two independent subnets on the local network (one with fixed IPs for my machines here, and another with DHCP addresses for guest machines that visit occasionally just to give them Internet access). I run 10.0.0.0/8 and 192.168.254.0/24 on the local network with the router having an IP in each subnet. The strangeness that occurs is that once in a while there is a 10 second period where the system will answer all arp requests for all IPs on the local network, with it's own MAC address, which is clearly wrong since it doesn't have any of those IP addresses. It seems to happen every couple of days or so on average, although not at any specific time. One day it happened at 11:32:30 to 11:32:39, and a few days later it happened at 12:08:38 to 12:08:48. If I disable proxy_arp, it never happens at all, but then I loose the ability to do what I have proxy_arp enabled for in the first place. Related to that problem, there is also the annoyance that any IP that isn't part of either of the two subnets the router belongs to, have arp requests answered by the router all the time, which it also should not be answering, since it doesn't actually have a clue what those IP addresses belong to and certainly has no idea where it should forward to to reach them. I occasionally have other random subnets in use on the network for running local test networks separate from everything else. It would be great if the kernel would keep its nose out of those subnets too. So far I have seen this behavior with 2.6.8, 2.6.16, and 2.6.18 (being the kernels I have run on this router). So have I misunderstood something about what proxy_arp is supposed to do, or is proxy_arp in the kernel simply broken, or is it perhaps mis-designed? Are there some tuning parameters that could perhaps make it actually do what one would expect it to be doing? -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Strange soft lockup detected message (looks like spin_lock bug in pcnet32)
On Mon, May 07, 2007 at 01:45:11PM -0400, Lennart Sorensen wrote: Hmm, I thought I saw it on two systems already, but I should go try that again. Hmm, still haven't figured this out. I just saw this one this morning: BUG: soft lockup detected on CPU#0! [c0103fc4] dump_stack+0x24/0x30 [c013d36e] softlockup_tick+0x7e/0xc0 [c011eb23] update_process_times+0x33/0x80 [c01062c9] timer_interrupt+0x39/0x80 [c013d6fd] handle_IRQ_event+0x3d/0x70 [c013da59] __do_IRQ+0xa9/0x150 [c0104e55] do_IRQ+0x25/0x60 [c010313a] common_interrupt+0x1a/0x20 [c013d6d8] handle_IRQ_event+0x18/0x70 [c013da59] __do_IRQ+0xa9/0x150 [c0104e55] do_IRQ+0x25/0x60 [c010313a] common_interrupt+0x1a/0x20 [c0119cda] __do_softirq+0x3a/0xa0 [c0119d6d] do_softirq+0x2d/0x30 [c0119fb7] irq_exit+0x37/0x40 [c0104e5a] do_IRQ+0x2a/0x60 [c010313a] common_interrupt+0x1a/0x20 [c013dcee] setup_irq+0xce/0x1e0 [c013de97] request_irq+0x97/0xb0 [d0851f9d] pcnet32_open+0x4d/0x3d0 [pcnet32] [c023a4f9] dev_open+0x39/0x80 [c0238cea] dev_change_flags+0xfa/0x130 [c027eb9f] devinet_ioctl+0x4ff/0x6f0 [c022dab1] sock_ioctl+0xf1/0x1f0 [c017413c] do_ioctl+0x2c/0x80 [c01741e2] vfs_ioctl+0x52/0x2f0 [c01744ef] sys_ioctl+0x6f/0x80 [c0102ef7] syscall_call+0x7/0xb [b7f41d04] 0xb7f41d04 And it is happening on multiple systems. I am starting to wonder if it is a bug in the soft lockup detection. Maybe it really isn't locked up but just momentarily appears to be. I will try turning off the soft lockup detection and see what happens. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PCNET32] Lock solid with netconsole
On Mon, May 28, 2007 at 05:25:51PM +0200, Emmanuel Fust? wrote: Any difference if you disable the debug messages in the pcnet32 driver and you apply the patch below ? diff --git a/drivers/net/pcnet32.c b/drivers/net/pcnet32.c index 9c171a7..be4513f 100644 --- a/drivers/net/pcnet32.c +++ b/drivers/net/pcnet32.c @@ -2556,11 +2556,12 @@ pcnet32_interrupt(int irq, void *dev_id) unsigned long ioaddr; u16 csr0; int boguscnt = max_interrupt_work; + unsigned long flags; ioaddr = dev-base_addr; lp = netdev_priv(dev); - spin_lock(lp-lock); + spin_lock_irqsave(lp-lock, flags); csr0 = lp-a.read_csr(ioaddr, CSR0); while ((csr0 0x8f00) --boguscnt = 0) { @@ -2632,7 +2633,7 @@ pcnet32_interrupt(int irq, void *dev_id) printk(KERN_DEBUG %s: exiting interrupt, csr0=%#4.4x.\n, dev-name, lp-a.read_csr(ioaddr, CSR0)); - spin_unlock(lp-lock); + spin_unlock_irqrestore(lp-lock, flags); return IRQ_HANDLED; } Hi, Tested under very high console activity and it no longer freeze. Hmm, I have been seeing lockups too and asked about doing something almost exactly the same as this recently, but was told that it shouldn't need irqs disabled at this point. Well if it makes netconsole more stable, I think I will try adding it to and see if it makes the problems go away for good (my problem only happens at random and can be days between it happening). -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Questions about IPsec and Netfilter
On Thu, May 10, 2007 at 10:36:14AM -0400, Alan Stern wrote: I've got a few questions about the relationship between the IPsec implementation and Netfilter. Q1: At what points during packet processing do the IPsec transformations occur? In particular, which netfilter hooks do they come before and after? And likewise, which routing operations do they come before and after? Are you using netkey or klips? Q2: When a packet using IPsec tunnel mode is encapsulated or de-encapsulated, does the newly-formed packet return to some earlier point in the stack for further netfilter processing or routing? What about transport mode? As far as I can tell the encrypted packet goes in the INPUT chain, then is decrypted and goes back in either INPUT or FORWARD depending on the unencrypted source/destination. Well for netkey anyhow. klips goes in the INPUT chain and then is decrypted and then comes in the ipsecX interface either on INPUT or FORWARD chains. Q3: How can iptables rules determine whether they are dealing with a packet which has been de-encapsulated from (or encapsulated within) an IPsec wrapper? If using netkey, and 2.6.16 or newer, then the policy tag will be ipsec if it was decrypted from an ipsec tunnel. I recently had to upgrade to shorewall 3.x to deal with that when I wnet to using netkey and 2.6.18 kernel together. With klips the packets from an ipsec tunnel arive on the ipsecX interface after being decrypted so you can recognize them that way. Q4: Is it true that NAT-Traversal isn't implemented for transport mode? No idea. In RFC 2401 (Security Architecture for the Internet Protocol), section 5 includes this text: As mentioned in Section 4.4.1 The Security Policy Database (SPD), the SPD must be consulted during the processing of all traffic (INBOUND and OUTBOUND), including non-IPsec traffic. If no policy is found in the SPD that matches the packet (for either inbound or outbound traffic), the packet MUST be discarded. But on Linux systems, by default the SPD is normally empty (as shown by setkey -DP) and all packets are allowed to pass unhindered. Q5: Isn't this a violation of the RFC? Or is there some implicit policy entry which accepts all packets without applying any security association? Thanks for any answers. I may think up more questions later... -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Strange soft lockup detected message (looks like spin_lock bug in pcnet32)
On Fri, May 04, 2007 at 03:02:36PM -0400, Lennart Sorensen wrote: Well I don't know, but something is going wrong and causing the soft lock up. I must admit I am surprised if an interrupt can occour while handling an interrupt, but then again maybe that is supposed to be allowed. I tried building a kernel where the only change was enabling the spin lock debugging. It doesn't fail, while without spin lock debugging it seemed to fail very frequently. Darn! I hate when debugging makes hides the problem. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Strange soft lockup detected message (looks like spin_lock bug in pcnet32)
On Mon, May 07, 2007 at 04:48:37PM +0200, Frederik Deweerdt wrote: Can you try running on another Geode LX system, just to rule out a hardware problem on you board? Hmm, I thought I saw it on two systems already, but I should go try that again. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Strange soft lockup detected message (looks like spin_lock bug in pcnet32)
On Thu, May 03, 2007 at 04:31:43PM -0400, Lennart Sorensen wrote: I have had this happen a few times recently and was wondering if anyone has an idea what could be going on: BUG: soft lockup detected on CPU#0! [c0103fc4] dump_stack+0x24/0x30 [c013d71e] softlockup_tick+0x7e/0xc0 [c011eb23] update_process_times+0x33/0x80 [c01062c9] timer_interrupt+0x39/0x80 [c013daad] handle_IRQ_event+0x3d/0x70 [c013de09] __do_IRQ+0xa9/0x150 [c0104e55] do_IRQ+0x25/0x60 [c010313a] common_interrupt+0x1a/0x20 [d084e00c] pcnet32_dwio_read_csr+0xc/0x20 [pcnet32] [d084e9d2] pcnet32_interrupt+0x42/0x2b0 [pcnet32] [c013daad] handle_IRQ_event+0x3d/0x70 [c013de09] __do_IRQ+0xa9/0x150 [c0104e55] do_IRQ+0x25/0x60 [c010313a] common_interrupt+0x1a/0x20 [c013da88] handle_IRQ_event+0x18/0x70 [c013de09] __do_IRQ+0xa9/0x150 [c0104e55] do_IRQ+0x25/0x60 [c010313a] common_interrupt+0x1a/0x20 [5791] 0x5791 This is on a system running a Geode LX at 500MHz, using 2.6.18 based kernel (specifically a slightly modified debian 4.0 Etch kernel). I am really wondering where do I go looking for the cause of this. The same kernel running on a Geode SC1200 (GX1) does not appear to do this. If I knew what the error meant I would have a better idea how to debug it and fix it. I looked at the pcnet32_interrupt function and where it calls pcnet32_dwio_read_csr and saw this: 2550 /* The PCNET32 interrupt handler. */ 2551 static irqreturn_t 2552 pcnet32_interrupt(int irq, void *dev_id) 2553 { 2554 struct net_device *dev = dev_id; 2555 struct pcnet32_private *lp; 2556 unsigned long ioaddr; 2557 u16 csr0; 2558 int boguscnt = max_interrupt_work; 2559 2560 ioaddr = dev-base_addr; 2561 lp = netdev_priv(dev); 2562 2563 spin_lock(lp-lock); 2564 2565 csr0 = lp-a.read_csr(ioaddr, CSR0); 2566 while ((csr0 0x8f00) --boguscnt = 0) { 2567 if (csr0 == 0x) { 2568 break; /* PCMCIA remove happened */ So I wonder, what happens if an interrupt occours, and since one of the devices on that interrupt is the pcnet32 so it grabs the port lock, goes to read CSR0, and then another interrupt occours on the same IRQ line (I run with PREEMPT enabled if that matters) and the pcnet32 interrupt handler is called again but since the port is already locked it has to wait, causing the cpu to be locked up. Should line 2563 be a spin_lock_irqsave instead along with the appropriate unluck later? -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Strange soft lockup detected message (looks like spin_lock bug in pcnet32)
On Fri, May 04, 2007 at 04:33:26PM +0200, Frederik Deweerdt wrote: On Fri, May 04, 2007 at 10:10:24AM -0400, Lennart Sorensen wrote: On Thu, May 03, 2007 at 04:31:43PM -0400, Lennart Sorensen wrote: [...] Should line 2563 be a spin_lock_irqsave instead along with the appropriate unluck later? IIRC, when you enable lockdep, it will complain about spinlocks used in an invalid context. What is lockdep and how do I enable it? I enabled SPINLOCK_DEBUG and am going to try that kernel now (except it hit the bug before I could even log in and install the kernel this time, so another reboot first). -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Strange soft lockup detected message (looks like spin_lock bug in pcnet32)
On Fri, May 04, 2007 at 05:34:38PM +0200, Frederik Deweerdt wrote: For the what part, see Documentation/lockdep-design.txt. You'll enable it by with SPINLOCK_DEBUG, indeed. Well I hope to see it hit the BUG again soon then to see what it has to say. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Strange soft lockup detected message (looks like spin_lock bug in pcnet32)
On Fri, May 04, 2007 at 11:40:09AM -0400, Lennart Sorensen wrote: On Fri, May 04, 2007 at 05:34:38PM +0200, Frederik Deweerdt wrote: For the what part, see Documentation/lockdep-design.txt. You'll enable it by with SPINLOCK_DEBUG, indeed. Well I hope to see it hit the BUG again soon then to see what it has to say. Well I didn't see anything for a while with SPINLOCK_DEBUG enabled (maybe I didn't wait long enough). So I tried changing it to spin_lock_irqsave, and that didn't go well. I got this as the result now: onfiguring network interfaces...eth1: link up, 100Mbps, full-duplex BUG: spinlock recursion on CPU#0, ifconfig/962 lock: cf7a3304, .magic: dead4ead, .owner: ifconfig/962, .owner_cpu: 0 [c0104024] dump_stack+0x24/0x30 [c01e3947] _raw_spin_lock+0x137/0x140 [c02981ec] _spin_lock_irqsave+0x1c/0x30 [d084eb86] pcnet32_interrupt+0x216/0x290 [pcnet32] [c013b95d] handle_IRQ_event+0x3d/0x70 [c013ba2c] __do_IRQ+0x9c/0x120 [c0105025] do_IRQ+0x25/0x60 [c010316a] common_interrupt+0x1a/0x20 [c011927a] __do_softirq+0x3a/0xa0 [c011930d] do_softirq+0x2d/0x30 [c0119557] irq_exit+0x37/0x40 [c010502a] do_IRQ+0x2a/0x60 [c010316a] common_interrupt+0x1a/0x20 [c02983c0] _spin_unlock_irqrestore+0x10/0x40 [d08517ea] pcnet32_open+0x27a/0x390 [pcnet32] [c02343e9] dev_open+0x39/0x80 [c0232b5a] dev_change_flags+0xfa/0x130 [c0277b7f] devinet_ioctl+0x4ff/0x6f0 [c0227b24] sock_ioctl+0xf4/0x1f0 [c017027c] do_ioctl+0x2c/0x80 [c0170322] vfs_ioctl+0x52/0x2f0 [c017062f] sys_ioctl+0x6f/0x80 [c0102f27] syscall_call+0x7/0xb [b7eebd04] 0xb7eebd04 BUG: spinlock lockup on CPU#0, ifconfig/962, cf7a3304 [c0104024] dump_stack+0x24/0x30 [c01e391f] _raw_spin_lock+0x10f/0x140 [c02981ec] _spin_lock_irqsave+0x1c/0x30 [d084eb86] pcnet32_interrupt+0x216/0x290 [pcnet32] [c013b95d] handle_IRQ_event+0x3d/0x70 [c013ba2c] __do_IRQ+0x9c/0x120 [c0105025] do_IRQ+0x25/0x60 [c010316a] common_interrupt+0x1a/0x20 [c011927a] __do_softirq+0x3a/0xa0 [c011930d] do_softirq+0x2d/0x30 [c0119557] irq_exit+0x37/0x40 [c010502a] do_IRQ+0x2a/0x60 [c010316a] common_interrupt+0x1a/0x20 [c02983c0] _spin_unlock_irqrestore+0x10/0x40 [d08517ea] pcnet32_open+0x27a/0x390 [pcnet32] [c02343e9] dev_open+0x39/0x80 [c0232b5a] dev_change_flags+0xfa/0x130 [c0277b7f] devinet_ioctl+0x4ff/0x6f0 [c0227b24] sock_ioctl+0xf4/0x1f0 [c017027c] do_ioctl+0x2c/0x80 [c0170322] vfs_ioctl+0x52/0x2f0 [c017062f] sys_ioctl+0x6f/0x80 [c0102f27] syscall_call+0x7/0xb [b7eebd04] 0xb7eebd04 Obviously that wasn't so good. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Strange soft lockup detected message (looks like spin_lock bug in pcnet32)
On Fri, May 04, 2007 at 01:44:56PM -0400, Lennart Sorensen wrote: On Fri, May 04, 2007 at 11:40:09AM -0400, Lennart Sorensen wrote: On Fri, May 04, 2007 at 05:34:38PM +0200, Frederik Deweerdt wrote: For the what part, see Documentation/lockdep-design.txt. You'll enable it by with SPINLOCK_DEBUG, indeed. Well I hope to see it hit the BUG again soon then to see what it has to say. Well I didn't see anything for a while with SPINLOCK_DEBUG enabled (maybe I didn't wait long enough). So I tried changing it to spin_lock_irqsave, and that didn't go well. I got this as the result now: onfiguring network interfaces...eth1: link up, 100Mbps, full-duplex BUG: spinlock recursion on CPU#0, ifconfig/962 lock: cf7a3304, .magic: dead4ead, .owner: ifconfig/962, .owner_cpu: 0 [c0104024] dump_stack+0x24/0x30 [c01e3947] _raw_spin_lock+0x137/0x140 [c02981ec] _spin_lock_irqsave+0x1c/0x30 [d084eb86] pcnet32_interrupt+0x216/0x290 [pcnet32] [c013b95d] handle_IRQ_event+0x3d/0x70 [c013ba2c] __do_IRQ+0x9c/0x120 [c0105025] do_IRQ+0x25/0x60 [c010316a] common_interrupt+0x1a/0x20 [c011927a] __do_softirq+0x3a/0xa0 [c011930d] do_softirq+0x2d/0x30 [c0119557] irq_exit+0x37/0x40 [c010502a] do_IRQ+0x2a/0x60 [c010316a] common_interrupt+0x1a/0x20 [c02983c0] _spin_unlock_irqrestore+0x10/0x40 [d08517ea] pcnet32_open+0x27a/0x390 [pcnet32] [c02343e9] dev_open+0x39/0x80 [c0232b5a] dev_change_flags+0xfa/0x130 [c0277b7f] devinet_ioctl+0x4ff/0x6f0 [c0227b24] sock_ioctl+0xf4/0x1f0 [c017027c] do_ioctl+0x2c/0x80 [c0170322] vfs_ioctl+0x52/0x2f0 [c017062f] sys_ioctl+0x6f/0x80 [c0102f27] syscall_call+0x7/0xb [b7eebd04] 0xb7eebd04 BUG: spinlock lockup on CPU#0, ifconfig/962, cf7a3304 [c0104024] dump_stack+0x24/0x30 [c01e391f] _raw_spin_lock+0x10f/0x140 [c02981ec] _spin_lock_irqsave+0x1c/0x30 [d084eb86] pcnet32_interrupt+0x216/0x290 [pcnet32] [c013b95d] handle_IRQ_event+0x3d/0x70 [c013ba2c] __do_IRQ+0x9c/0x120 [c0105025] do_IRQ+0x25/0x60 [c010316a] common_interrupt+0x1a/0x20 [c011927a] __do_softirq+0x3a/0xa0 [c011930d] do_softirq+0x2d/0x30 [c0119557] irq_exit+0x37/0x40 [c010502a] do_IRQ+0x2a/0x60 [c010316a] common_interrupt+0x1a/0x20 [c02983c0] _spin_unlock_irqrestore+0x10/0x40 [d08517ea] pcnet32_open+0x27a/0x390 [pcnet32] [c02343e9] dev_open+0x39/0x80 [c0232b5a] dev_change_flags+0xfa/0x130 [c0277b7f] devinet_ioctl+0x4ff/0x6f0 [c0227b24] sock_ioctl+0xf4/0x1f0 [c017027c] do_ioctl+0x2c/0x80 [c0170322] vfs_ioctl+0x52/0x2f0 [c017062f] sys_ioctl+0x6f/0x80 [c0102f27] syscall_call+0x7/0xb [b7eebd04] 0xb7eebd04 Obviously that wasn't so good. Nevermind. I am obviously an idiot today placing spin_lock_irqsave both in place of spin_lock and spin_unlock. Yeah that will work well. Now to try with spin_lock_irqrestore or whatever it is called. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Strange soft lockup detected message (looks like spin_lock bug in pcnet32)
On Fri, May 04, 2007 at 11:24:33AM -0700, Don Fry wrote: All instances of obtaining the lock in pcnet32 are done as spin_lock_irqsave except the interrupt handler itself. The interrupt mask needs to be saved everywhere else, but the interrupt handler is known not to need to save the flags. If the lock is held and the same CPU tries to get the lock again, it will wait a very long time ;-(. I believe the locking is fine for a non-preemptable kernel, but I have little experience with a preemptable kernel. When does a preemptable kernel allow interrupts to occur? I have no idea actually. Is there a bug in this particular architectures locking code? On i386? I hope not. From looking at preempt-locking.txt the driver has (1) no per-cpu data, (2) 'CPU state protection' should be fine, (3) the 'lock is acquired and released by the same task'. I don't see a problem unless I am misunderstanding something. Well I don't know, but something is going wrong and causing the soft lock up. I must admit I am surprised if an interrupt can occour while handling an interrupt, but then again maybe that is supposed to be allowed. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFT] e100 driver on ARM
On Thu, Apr 26, 2007 at 09:19:34AM -0700, H. Peter Anvin wrote: Why wouldn't that be permitted? It, in fact, happens all the time (the host bridge withdraws the GNT# line and raises STOP#, which does a Termination With Data of the bus transfer.) This is a normal event and if you can't handle it you won't work with many host bridges at all. Well there must have been something else wrong then. Certainly I saw data corruption on a rtl8139. No problems with the same hardware using a geode SC1200, so I have no idea. I liked the speed of the PXA255 a lot better than the slow poke SC1200. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 09/11] forcedeth: improve NAPI logic
On Thu, Apr 26, 2007 at 10:53:04AM -0400, Ayaz Abdulla wrote: Ok. In that case, the patch needs to be improved. The following needs to be done when NAPI is enabled: - remove the tx handling within the ISRs - mask off the tx interrupts within the ISRs that handle tx processing - re-enable tx interrupts within the NAPI handler - add tx handling within the NAPI handler (this patch covers it) I thought a number of drivers handled tx from napi while receives were happening, but went to plain interrupts if no receives were happening. Maybe I misread the code (I have mainly dealt with pcnet32 so far). Certainly for gigabit I would think napi all the time would be much more efficient. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFT] e100 driver on ARM
On Mon, Apr 16, 2007 at 11:07:36AM -0400, David Acker wrote: Lennart Sorensen wrote: Which PCI host controller are you using with the PXA255? We tried using a PXA255 based system with a PCI controller a couple of years ago and have to change to a different cpu in the end due to the PCI controller simply not being valid PCI. The PXA255 wasn't designed for PCI, and I get the impression that non of the PCI companion chips for it do a good enough job to actually add it correctly. Sorry for the delay in responding...my wife and I just had twins! We are using the IT8152G RISC-to-PCI companion chip. Well the IT8152G+PXA255 combination used on the SBC we tried a couple of years ago did not work. The PCI bus had errors and the SBC maker gave up trying to fix it. We switched to a Geode SC1200 based board instead which works fine PCI wise. My suspicision (although it is only that) is that the PXA255 trying to access memory may cause interruptions in PCI bus master transfers, which is of course not permitted by the PCI spec (at least the way I read it). We tried it with RTL8139, AMD 972 (both ethernet) as well as a number of T1/E1 and DDS wan cards from sangoma. The wan cards had the most issues with it (they drivers and hardware would get out of sync due to PCI bus problems), while the ethernet just had occational packet corruption. I will certainly never consider using a PXA + ITE pci controller combination ever again. Too bad since the performance of the PXA is amazing. The PXA chips are not designed to speak to PCI, and the ITE companion chip doesn't quite do the job of pretending it was. I would expect problems if you do pci bus master transfers and/or any kind of PCI bus traffic load. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: two gateways with one NIC
On Sun, Apr 08, 2007 at 08:29:07PM +0100, W Agtail wrote: This is what I'm trying to achieve with the following iptables/iproute2 configuration on both web servers: iptables -t mangle -A PREROUTING -p tcp --dport 8088 -i eth0 -j LOG --log-prefix fwmark 1: iptables -t mangle -A PREROUTING -p tcp --dport 8089 -i eth0 -j LOG --log-prefix fwmark 2: iptables -t mangle -A PREROUTING -p tcp --dport 8088 -i eth0 -j MARK --set-mark 1 iptables -t mangle -A PREROUTING -p tcp --dport 8089 -i eth0 -j MARK --set-mark 2 You are supposed to mangle things _coming_ from port 8088 and 8089. After all it is the replies you are trying to affect, not the requests. So it should be the --sport not --dport. And of course outbound not incoming on eth0. iptables -t mangle -A PREROUTING -m mark --mark 1 -j LOG --log-prefix marked 1: iptables -t mangle -A PREROUTING -m mark --mark 2 -j LOG --log-prefix marked 2: ip route add table 1 default via 10.18.35.11 dev eth0 # GW1 ip route add table 2 default via 10.18.35.21 dev eth0 # GW2 ip rule add fwmark 1 table 1 ip rule add fwmark 2 table 2 On web2, the default gw is set to gw2 and in /var/log/messages, I can see packets appear to be marked. However, for some reason, 8088 is still routing back via gw2 (default gw) rather than being routed via gw1, which I'm trying to do with the above ip rules etc. Is the above the correct syntax? or I guess I could totally be missing the plot? Many thanks for your time on this one. Hope that helps. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: two gateways with one NIC
On Mon, Apr 09, 2007 at 06:13:50PM +0200, Patrick McHardy wrote: As the name suggests, POSTROUTING comes after routing, so marking packets there doesn't affect routing. Use PREROUTING for forwarded traffic and OUTPUT for locally generated traffic. I didn't even notice that had been changed. It used to say PREROUTING when it was for --dport, and all I suggested changing was --dport to --sport and change the -o part (probably to nothing at all really since routing hasn't been decided yet). Yes it absolutely has to be done PREROUTING. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: two gateways with one NIC
On Mon, Apr 09, 2007 at 06:02:23PM +0100, W Agtail wrote: Thanks Patrick for your comments too. It seems that you can't mix PREROUTING with --sport or -o. I've also changed the ip rule tables to higher numbers, so I now have: I thought you could have --sport, but NOT -o. No need for -o of course. iptables -t mangle -A PREROUTING -p tcp --dport 8088 -i eth0 -j LOG --log-prefix fwmark 1: iptables -t mangle -A PREROUTING -p tcp --dport 8089 -i eth0 -j LOG --log-prefix fwmark 2: iptables -t mangle -A PREROUTING -p tcp --dport 8088 -i eth0 -j MARK --set-mark 1 iptables -t mangle -A PREROUTING -p tcp --dport 8089 -i eth0 -j MARK --set-mark 2 iptables -t mangle -A PREROUTING -m mark --mark 1 -j LOG --log-prefix marked 1: iptables -t mangle -A PREROUTING -m mark --mark 2 -j LOG --log-prefix marked 2: The thing is that the destination port will NEVER be 8088 for the outgoing packets from apache. The source port will be. Try this: iptables -t mangle -A PREROUTING -p tcp --sport 8088 -j LOG --log-prefix fwmark 1: iptables -t mangle -A PREROUTING -p tcp --sport 8089 -j LOG --log-prefix fwmark 2: iptables -t mangle -A PREROUTING -p tcp --sport 8088 -j MARK --set-mark 1 iptables -t mangle -A PREROUTING -p tcp --sport 8089 -j MARK --set-mark 2 iptables -t mangle -A PREROUTING -m mark --mark 1 -j LOG --log-prefix marked 1: iptables -t mangle -A PREROUTING -m mark --mark 2 -j LOG --log-prefix marked 2: ip route add table 8088 default via 10.18.35.11 dev eth0 ip route add table 8089 default via 10.18.35.21 dev eth0 ip rule add fwmark 1 table 8088 ip rule add fwmark 2 table 8089 # Confirmation of syntax: iptables -t mangle --list -v -n Chain PREROUTING (policy ACCEPT 5921 packets, 403K bytes) pkts bytes target prot opt in out source destination 18 984 LOGtcp -- eth0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8088 LOG flags 0 level 4 prefix `fwmark 1: ' 0 0 LOGtcp -- eth0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8089 LOG flags 0 level 4 prefix `fwmark 2: ' 18 984 MARK tcp -- eth0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8088 MARK set 0x1 0 0 MARK tcp -- eth0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8089 MARK set 0x2 18 984 LOGall -- * * 0.0.0.0/0 0.0.0.0/0 MARK match 0x1 LOG flags 0 level 4 prefix `marked 1: ' 0 0 LOGall -- * * 0.0.0.0/0 0.0.0.0/0 MARK match 0x2 LOG flags 0 level 4 prefix `marked 2: ' ip rule list 0: from all lookup local 32764: from all fwmark 0x2 lookup 8089 32765: from all fwmark 0x1 lookup 8088 32766: from all lookup main 32767: from all lookup default ip route list table 8088; ip route list table 8089 default via 10.18.35.11 dev eth0 default via 10.18.35.21 dev eth0 This is what I see in web2's /var/log/messages: Apr 9 06:46:58 web2-fc6 kernel: fwmark 1: IN=eth0 OUT= MAC=00:0c:29:d1:08:48:00:0c:29:49:04:9f:08:00 SRC=192.168.0.241 DST=10.18.35.52 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=42359 DF PROTO=TCP SPT=33321 DPT=8088 WINDOW=5840 RES=0x00 SYN URGP=0 Apr 9 06:46:58 web2-fc6 kernel: marked 1: IN=eth0 OUT= MAC=00:0c:29:d1:08:48:00:0c:29:49:04:9f:08:00 SRC=192.168.0.241 DST=10.18.35.52 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=42359 DF PROTO=TCP SPT=33321 DPT=8088 WINDOW=5840 RES=0x00 SYN URGP=0 As you can see, packets appear to be marked. But here's a tcpdump on gw2's eth1: 07:20:35.004205 192.168.0.241.59438 10.18.35.52.8088: S 221760494:221760494(0) win 5840 mss 1460,sackOK,timestamp 1320423 0,nop,wscale 6 (DF) 07:20:35.013144 10.18.35.52.8088 192.168.0.241.59438: S 2705868365:2705868365(0) ack 221760495 win 5792 mss 1460,sackOK,timestamp 2191014 1320423,nop,wscale 1 (DF) 07:20:35.021857 192.168.0.241.59438 10.18.35.52.8088: R 221760495:221760495(0) win 0 (DF) 07:20:38.069688 192.168.0.241.59438 10.18.35.52.8088: S 221760494:221760494(0) win 5840 mss 1460,sackOK,timestamp 1321173 0,nop,wscale 6 (DF) 07:20:38.069695 10.18.35.52.8088 192.168.0.241.59438: S 2706988830:2706988830(0) ack 221760495 win 5792 mss 1460,sackOK,timestamp 2192135 1321173,nop,wscale 1 (DF) 07:20:38.071232 192.168.0.241.59438 10.18.35.52.8088: R 221760495:221760495(0) win 0 (DF) So, traffic is being returned via gw2, rather than gw1 :( They are marked I guess, but much too late. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: two gateways with one NIC
On Mon, Apr 09, 2007 at 07:05:31PM +0100, W Agtail wrote: Nice one, but unfortunately still doesn't work. I'm now not seeing any marked messages in /var/log/messages and traffic still going via gw2 for port 8088. What does 'iptables -v -t mangle -L' show at the moment? Have you been flushing it between attemps to make sure you don't have conflicting rules? -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: two gateways with one NIC
On Mon, Apr 09, 2007 at 07:24:07PM +0100, W Agtail wrote: Yup, I've been flushing iptables each time. This is what we have atm: iptables -n -v -t mangle -L Chain PREROUTING (policy ACCEPT 12656 packets, 2518K bytes) pkts bytes target prot opt in out source destination 0 0 LOGtcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp spt:8088 LOG flags 0 level 4 prefix `fwmark 1: ' 0 0 LOGtcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp spt:8089 LOG flags 0 level 4 prefix `fwmark 2: ' 0 0 MARK tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp spt:8088 MARK set 0x1 0 0 MARK tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp spt:8089 MARK set 0x2 0 0 LOGall -- * * 0.0.0.0/0 0.0.0.0/0 MARK match 0x1 LOG flags 0 level 4 prefix `marked 1: ' 0 0 LOGall -- * * 0.0.0.0/0 0.0.0.0/0 MARK match 0x2 LOG flags 0 level 4 prefix `marked 2: ' Chain INPUT (policy ACCEPT 10664 packets, 2438K bytes) pkts bytes target prot opt in out source destination Chain FORWARD (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination Chain OUTPUT (policy ACCEPT 6311 packets, 896K bytes) pkts bytes target prot opt in out source destination Chain POSTROUTING (policy ACCEPT 6311 packets, 896K bytes) pkts bytes target prot opt in out source destination Odd how the packet count on those mangle table entries is 0. It seems like it is never even getting to there. Do you need a rule in the output chain telling it to send some packets to the mangle table? That doesn't make sense either though. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: two gateways with one NIC
On Sun, Apr 08, 2007 at 04:35:53AM +0100, W Agtail wrote: Hope you can help. I have the following setup using LVS (Linux Virtual Servers): LAN192.168.0.0/24- = CLIENTS | | | | LVS1LVS2 vip1: 192.168.0.111 vip2: 192.168.0.121 eth0: 192.168.0.110 eth0: 192.168.0.120 eth1: 10.18.35.10 eth1: 10.18.35.20 gw1: 10.18.35.11 gw2: 10.18.35.21 | | | | LAN10.18.35.0/24- | | | | Apache WEB1 10.18.35.51:8088 WEB2 10.18.35.52:8088 Apache WEB1 10.18.35.51:8089 WEB2 10.18.35.52:8088 ### LVS ### The two LVS servers have a VIP and a GW. LVS1 LVS2 have ip_forward set to 1. LVS1 has the following iptables: iptables -t nat -A PREROUTING -i eth0 -j DNAT --to 192.168.0.111 iptables -t nat -A POSTROUTING -o eth0 -j SNAT --to 192.168.0.111 with ipvsadm forwarding vip1:8088 to web1:8088 web2:8088 LVS2 has the following iptables: iptables -t nat -A PREROUTING -i eth0 -j DNAT --to 192.168.0.121 iptables -t nat -A POSTROUTING -o eth0 -j SNAT --to 192.168.0.121 with ipvsadm forwarding vip1:8089 to web1:8089 web2:8089 ### WEB ### The two Web servers have 2 virtual web servers listening on ports 8088 8089 and have the following iptables iproute2 config: iptables -t mangle -A PREROUTING -p tcp --dport 8088 -i eth0 -j MARK --set-mark 1 iptables -t mangle -A PREROUTING -p tcp --dport 8089 -i eth0 -j MARK --set-mark 2 ip route add table 1 default via 10.18.35.11 dev eth0 ip route add table 2 default via 10.18.35.21 dev eth0 ip rule add fwmark 1 table 1 ip rule add fwmark 2 table 2 WEB1's default GW is set to gw1. WEB2's default GW is set to gw2. CLIENTS should be able to connect to vip1:8088 and vip2:8089 ### MY PROBLEM ### If i set WEB2's default GW to gw1, everything works as expected (as I now only have one GW). But when trying to set WEB2's default GW to gw2, things don't work. For example, if i was to run: curl vip1:8088 from a CLIENT, I would be able to connect to web1:8088 via LVS OK, but unable to connect to web2:8088 should LVS take me to web2. Its as though the iptables/ip route settings are not working as they should. Any ideas what I'm doing wrong? Many thanks, W Agtail. Well give I am not sure what you are trying to do, I will take a guess. I think you are trying to have redundant load balancers and multiple web servers behind those two load balancers. Here is how I would do it: LAN192.168.0.0/24- = CLIENTS | | | | LVS1LVS2 vrrp: 192.168.0.110 (linked)vrrp: 192.168.0.110 (linked) eth0: 192.168.0.111 eth0: 192.168.0.112 eth1: 10.18.35.11 eth1: 10.18.35.12 vrrp: 10.18.35.10 (master) vrrp: 10.18.35.10 (slave) | | | | LAN10.18.35.0/24- | | | | Apache WEB1 10.18.35.51:8088 WEB2 10.18.35.52:8088 Apache WEB1 10.18.35.51:8089 WEB2 10.18.35.52:8088 So using VRRP to have a shared virtual IP between the two load balancers, any client can connect to 192.168.0.110 and be sent through to one of the web servers. The server side interface also has a VRRP virtual IP shared between the two load balancers, which is linked to the other virtual IP, so that if the link goes down on one side of the load balancer, it will automatically drop the virtual IP on both sides to let the slave machine take over control of the IP. To the clients this should be pretty transparent since they don't need to know the IP changed, other than the momentary change in mac address (letting vrrp play with the mac address just causes a terrible mess in my experience, and I have had much better luck by simply changing IPs and letting the clients relear the new mac). keepalived's vrrp works very well (Hmm, actually I think I made some fixes to it, which I don't remember if I sent back upstream yet. I should check that tomorrow). You could run multiple vrrps per interface if you want to somehow have one be the master of one IP and the other the master of another to allow different traffic to use each load balancer by default, but everything going through one in case of a failure. -- Len Sorensen - To unsubscribe from this
Re: two gateways with one NIC
On Sun, Apr 08, 2007 at 05:10:15PM +0100, W Agtail wrote: Hi, and thanks very much for your response. Your guess sounds spot on. As you've mentioned, using one sync group works quite well and gives you an active/passive LVS cluster (not sure of correct terminology here - sorry), thus all traffic goes via LVS1, leaving LVS2 not doing much unless LVS1 fails. I thought it would be a cool idea to setup two sync groups to ultimately handle several Apache instances on the two Apache servers. This way, both LVS servers would be used in a kind of active/active fashion and would be a master/slave to each other. For example, vip1 gw1 could possibly end up on LVS2 with vip2 gw2. The challenge though in having two sync groups, with two GWs. I would like all traffic coming through vip1 to be returned via gw1 and all traffic coming through vip2 to be returned via gw2. I am using keepalived (v1.1.13) with two sync groups. One with vip1 gw1 and another with vip2 gw2. Port 8088 will always comes through vip1/gw1, load balancing to web1:8088 and web2:8088. Port 8089 will always come through vip2/gw2, load balancing to web1:8089 and web2:8089. Web1's default gw is set to gw1 and web2's default gw is set to gw2. But this causing issues when say, vip1:8088 gets forwarded through gw1 to web2:8088 and doesn't get back back via gw2. To get round this, I need something like iproute2 on web2 to send all 8088 traffic back through gw1. You have to set up both web servers to use the same gateway. You can setup an alternate routing table and tag packets from the apache on port 8089 to use the other gateway IP instead, but any traffic handled by LVS1 _must_ be returned through LVS1. So both web servers have to have identical configuration (which is also much simpler to maintain). You can use iptables to tag packets matching the source port of 8089 and have ip route route all packets with that specific tag using an alternate routing table, which will then use the other LVS. So if you have two VRRP groups, you have port 8088 return by the regular default gateway going to the first group IP, and you have tagging flag all port 8089 packets to go through the second vrrp IP. If an LVS fails, both vrrp groups end up on the working LVS and everything still works, but while both works, one LVS handles one port, and the other the other port. Of course routing packets is hardly a lot of work, so it may not really be worth the bother to do anything extra with two groups. You really have to configure both web servers identically though in terms of routes. Hope this makes a little more sense to what I'm trying to achieve? Thanks again. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFT] e100 driver on ARM
On Thu, Mar 29, 2007 at 01:17:38AM -0400, David Acker wrote: I have a pxa255 based system with PCI added to it. The e100 would have memory corruption in its receive buffers detected by slab debugging unless I put in the patch to use the S-bit. Here is a link to the patch posting: http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc3/2.6.20-rc3-mm1/broken-out/git-netdev-all.patch Search for e100.c. http://www-gatago.com/linux/kernel/15457063.html - This discussion seems to hit the issue. There appears to be a race on the cache line where the EL bit and the next packet info live. In my case the hardware appeared to write to a free packet. The S-bit seems to make the hardware stop and spin on the bit, while the EL bit seems to let the hardware try to use that packet. This race would occur less often when the receive buffer chain is always refilled before the hardware can use them up. On our 400 Mhz Xscale, we can use up all 256 buffers if the PCI bus has another busy device on it. In our case it is an 802.11g miniPCI card and our software was routing all ethernet packets to the wireless interface and vice versa while TCP streams were running accross these connections. Which PCI host controller are you using with the PXA255? We tried using a PXA255 based system with a PCI controller a couple of years ago and have to change to a different cpu in the end due to the PCI controller simply not being valid PCI. The PXA255 wasn't designed for PCI, and I get the impression that non of the PCI companion chips for it do a good enough job to actually add it correctly. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: MediaGX/GeodeGX1 requires X86_OOSTORE.
On Sat, Mar 17, 2007 at 10:08:10PM +0900, takada wrote: I tested some patterns. just X86_OOSTORE was effective. WBINVD is needless. --- arch/i386/Kconfig.cpu~2007-02-05 03:44:54.0 +0900 +++ arch/i386/Kconfig.cpu 2007-02-17 21:25:52.0 +0900 @@ -322,7 +322,7 @@ config X86_USE_3DNOW config X86_OOSTORE bool - depends on (MWINCHIP3D || MWINCHIP2 || MWINCHIPC6) MTRR + depends on (MWINCHIP3D || MWINCHIP2 || MWINCHIPC6) MTRR || MGEODEGX1 default y config X86_TSC Well that is exactly what I did for the Geode SC1200 (a GX1 based design) as well and it certainly improved things a lot for me as well. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: MediaGX/GeodeGX1 requires X86_OOSTORE.
On Thu, Mar 15, 2007 at 02:39:39PM +0900, takada wrote: Hiroshi Miura posted `Geode out-of-order store enables' patch in Jun, 2003. There is http://lkml.org/lkml/2003/6/5/57 . OOSTORE was enabled at this point in time. It seems to have disappeared somewhere. I believe the patch was rejected as 'not required' since the data sheet is not very clear on that feature. BTW, I use MediaGX with kernel 2.6.20(and 2.6.20.3) and suspend2. When I resume the PC and use the PC Card modem, PC is hungup. However, PC isn't hung up when I apply a WBINVD patch. I can't understand it whether there is problem in resume of suspend2 or MediaGX or both. Many drivers lack support for resume on my PC. Which patch are you refering to? -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] pcnet32: only allocate init_block dma consistent
On Tue, Mar 06, 2007 at 07:39:21PM -0800, Michael K. Edwards wrote: On 3/6/07, Ralf Baechle [EMAIL PROTECTED] wrote: This small change btw. delivers about ~ 3% extra performance on a very slow test system. Has this change been tested / benchmarked under VMWare? pcnet32 is the (default?) virtual device presented by VMWare Workstation, and that's probably a large fraction of its use in the field these days. But then Don probably already knows that. :-) Unless you install vmware tools in which case you use vmxnet instead which of course performs better since it knows it isn't talking to real hardware. I am currently about to try what this patch does to the performance of our system (266MHz Geode SC1200 with 4 pcnet32's). -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Strange connection slowdown on pcnet32
On Mon, Feb 19, 2007 at 06:59:16PM -0500, Lennart Sorensen wrote: I am also noticing the receive error count going up, and the source is this code: if (status 0x01) /* Only count a general error at the */ lp-stats.rx_errors++; /* end of a packet. */ It appears this means I am receiving a frame marked with End Of Packet but without Start of Packet. I have no idea how that happens, but it shouldn't be able to make the driver and MAC stop processing the receive ring. Well the packets actually have both start and end marked, but also have overflow marked, so the cpu simply isn't keeping up it seems (It is taking about 100% of the cpu to push through 6500KB/s). Certainly the CONFIG_X86_OOSTORE makes a major difference, although I am still not sure why. Simply skipping ahead one or two receive descriptors when the current one is marked as owned by the MAC but the one a few ahead is owned by the CPU allows it to continue receiving when it happens. I really want to find out why it happens though, although I am not sure how to go about doing that. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: MediaGX/GeodeGX1 requires X86_OOSTORE.
On Tue, Feb 20, 2007 at 08:34:13PM +0900, takada wrote: I posted with 2.6.20 + enabled X86_OOSTORE. The clflush sze line is in /proc/cpuinfo. but clfush is not in flags line. BTW, can we use WBINVD instruction? I tested compile only. Do you know a method to change dynamically without #ifdef when it works with MediaGX/GeodeGX. diff -Narup a/include/asm-i386/io.h b/include/asm-i386/io.h --- a/include/asm-i386/io.h 2007-02-20 16:23:25.0 +0900 +++ b/include/asm-i386/io.h 2007-02-20 17:07:14.0 +0900 @@ -232,7 +232,19 @@ static inline void memcpy_toio(volatile * 2. Accidentally out of order processors (PPro errata #51) */ -#if defined(CONFIG_X86_OOSTORE) || defined(CONFIG_X86_PPRO_FENCE) +#ifdef CONFIG_MGEODEGX1 + +static inline void dma_flush_cache(void) +{ + __asm__ __volatile__ (wbinvd: : :memory); +} + +#define dma_cache_inv(_start,_size) dma_flush_cache() +#define dma_cache_wback(_start,_size)dma_flush_cache() +#define dma_cache_wback_inv(_start,_size)dma_flush_cache() +#define flush_write_buffers() + +#elif defined(CONFIG_X86_OOSTORE) || defined(CONFIG_X86_PPRO_FENCE) static inline void flush_write_buffers(void) { - Well it is starting to look like it isn't a caching issue, but more likely an issue of which order writes are performed in. I think the MAC might be seeing the ownership bit change before the rest of the descriptor, which shouldn't happen. With X86_OOSTORE, wmb() is called between setting the fields in the descriptor and setting the ownership bit to the MAC. I still have to investigate a bit more to find out for sure, but that could certainly explain why X86_OOSTORE makes the problem become much less frequent. It doesn't completely elliminate it though. Of course maybe there are two different problems with the same symptoms. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Strange connection slowdown on pcnet32
On Fri, Feb 16, 2007 at 04:01:57PM -0500, Lennart Sorensen wrote: eth1: netif_receive_skb(skb) eth1: netif_receive_skb(skb) eth1: pcnet32_poll: pcnet32_rx() got 16 packets eth1: base: 0x05215812 status: 0310 next-status: 0310 eth1: netif_receive_skb(skb) eth1: netif_receive_skb(skb) eth1: netif_receive_skb(skb) eth1: netif_receive_skb(skb) eth1: netif_receive_skb(skb) eth1: netif_receive_skb(skb) eth1: netif_receive_skb(skb) eth1: netif_receive_skb(skb) eth1: netif_receive_skb(skb) eth1: netif_receive_skb(skb) eth1: netif_receive_skb(skb) eth1: netif_receive_skb(skb) eth1: netif_receive_skb(skb) eth1: netif_receive_skb(skb) eth1: netif_receive_skb(skb) eth1: netif_receive_skb(skb) eth1: pcnet32_poll: pcnet32_rx() got 16 packets eth1: base: 0x04c51812 status: 8000 next-status: 0310 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x6f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0310 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0310 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0433, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0310 eth1: pcnet32_poll: pcnet32_rx() got 0 packets So somehow it ends up that when it reads the status of the descriptor at address 0x04c51812, it sees the status as 0x8000 (which means owned by the MAC I believe), even though the next descriptor in the ring has a sensible status, indicating that the descriptor is ready to be handled by the driver. Since the descriptor isn't ready, we exit without handling anything and NAPI reschedules is the next time we get an interrupt, and after some random number of tries, we finally see the right status and handle the packet, along with a bunch of other packets waiting in the descriptor ring. Then we seem to hit the exact same descriptor address again, with the same problem in the status we read, and again we are stuck for a while, until finally we see the right status, and another pile of packets get handled, and we again hit the same descriptor address and get stuck. I have been poking at things with firescope to see if the MAC is actually writing to system memory or not. The entry that it gets stuch on is _always_ entry 0 in the rx_ring. There does not appear to be any exceptions to this. Here is my firescope (slightly modified for this purpose) dump of the rx_ring of eth1: Descriptor:Address: /--base---\ /buf\ /sta\ /-message-\ /reserved-\ : : | | |len| |tus| | length | | | RXdesc[00]:6694000: 12 18 5f 05 fa f9 00 80 40 00 00 00 00 00 00 00 RXdesc[01]:6694010: 12 78 15 05 fa f9 40 03 ee 05 00 00 00 00 00 00 RXdesc[02]:6694020: 12 a0 52 06 fa f9 40 03 ee 05 00 00 00 00 00 00 RXdesc[03]:6694030: 12 f8 c2 04 fa f9 40 03 ee 05 00 00 00 00 00 00 RXdesc[04]:6694040: 12 70 15 05 fa f9 40 03 ee 05 00 00 00 00 00 00 RXdesc[05]:6694050: 12 e8 37 05 fa f9 40 03 ee 05 00 00 00 00 00 00 RXdesc[06]:6694060: 12 e0 37 05 fa f9 40 03 ee 05 00 00 00 00 00 00 RXdesc[07]:6694070: 12 e8 d5 04 fa f9 40 03 ee 05 00 00 00 00 00 00 RXdesc[08]:6694080: 12 e0 d5 04 fa f9 40 03 ee 05 00 00 00 00 00 00 RXdesc[09]:6694090: 12 d8 d1 05 fa f9 40 03 46 00 00 00 00 00 00 00 RXdesc[10]:66940a0: 12 d0 d1 05 fa f9 40 03 4e 00 00 00 00 00 00 00 RXdesc[11]:66940b0: 12 d8 02 05 fa f9 10 03 40 00 00 00 00 00 00 00 RXdesc[12]:66940c0: 12 d0 02 05 fa f9 40 03 46 00 00 00 00 00 00 00 RXdesc[13]:66940d0: 12 38 58 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[14]:66940e0: 12 30 58 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[15]:66940f0: 12 78 2c 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[16]:6694100: 12 a0 58 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[17]:6694110: 12 b0 04 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[18]:6694120: 12 b8 04 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[19]:6694130: 12 70 2c 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[20]:6694140: 12 f8 56 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[21]:6694150: 12 c8 29 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[22]:6694160: 12 20 03 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[23]:6694170: 12 60 4c 05 fa f9 00 80 87 05 00 00 00 00 00 00 RXdesc[24]:6694180: 12 98 53 05 fa f9 00 80 40 00 00 00 00 00 00 00 RXdesc[25]:6694190: 12 b0 cc 04 fa f9 00 80 40 00 00 00 00 00 00 00 RXdesc[26]:66941a0: 12 a8 3f 05 fa f9 00 80 40 00 00 00 00 00 00 00 RXdesc[27]:66941b0: 12 58 e8 04 fa f9 00 80 40 00 00 00 00 00 00 00 RXdesc[28]:66941c0: 12 b0 4d 06 fa f9 00 80 40 00 00 00 00 00 00 00 RXdesc[29]:66941d0: 12 38 ef 04 fa f9 00 80 40 00 00 00 00 00 00 00 RXdesc[30]:66941e0: 12 98 1f 05 fa f9 00 80 40 00 00 00 00 00 00 00 RXdesc[31]:66941f0: 12 28 f1 04 fa
Re: Re: Strange connection slowdown on pcnet32
On Mon, Feb 19, 2007 at 03:11:36PM -0500, Lennart Sorensen wrote: I have been poking at things with firescope to see if the MAC is actually writing to system memory or not. The entry that it gets stuch on is _always_ entry 0 in the rx_ring. There does not appear to be any exceptions to this. Here is my firescope (slightly modified for this purpose) dump of the rx_ring of eth1: Descriptor:Address: /--base---\ /buf\ /sta\ /-message-\ /reserved-\ : : | | |len| |tus| | length | | | RXdesc[00]:6694000: 12 18 5f 05 fa f9 00 80 40 00 00 00 00 00 00 00 RXdesc[01]:6694010: 12 78 15 05 fa f9 40 03 ee 05 00 00 00 00 00 00 RXdesc[02]:6694020: 12 a0 52 06 fa f9 40 03 ee 05 00 00 00 00 00 00 RXdesc[03]:6694030: 12 f8 c2 04 fa f9 40 03 ee 05 00 00 00 00 00 00 RXdesc[04]:6694040: 12 70 15 05 fa f9 40 03 ee 05 00 00 00 00 00 00 RXdesc[05]:6694050: 12 e8 37 05 fa f9 40 03 ee 05 00 00 00 00 00 00 RXdesc[06]:6694060: 12 e0 37 05 fa f9 40 03 ee 05 00 00 00 00 00 00 RXdesc[07]:6694070: 12 e8 d5 04 fa f9 40 03 ee 05 00 00 00 00 00 00 RXdesc[08]:6694080: 12 e0 d5 04 fa f9 40 03 ee 05 00 00 00 00 00 00 RXdesc[09]:6694090: 12 d8 d1 05 fa f9 40 03 46 00 00 00 00 00 00 00 RXdesc[10]:66940a0: 12 d0 d1 05 fa f9 40 03 4e 00 00 00 00 00 00 00 RXdesc[11]:66940b0: 12 d8 02 05 fa f9 10 03 40 00 00 00 00 00 00 00 RXdesc[12]:66940c0: 12 d0 02 05 fa f9 40 03 46 00 00 00 00 00 00 00 RXdesc[13]:66940d0: 12 38 58 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[14]:66940e0: 12 30 58 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[15]:66940f0: 12 78 2c 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[16]:6694100: 12 a0 58 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[17]:6694110: 12 b0 04 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[18]:6694120: 12 b8 04 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[19]:6694130: 12 70 2c 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[20]:6694140: 12 f8 56 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[21]:6694150: 12 c8 29 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[22]:6694160: 12 20 03 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[23]:6694170: 12 60 4c 05 fa f9 00 80 87 05 00 00 00 00 00 00 RXdesc[24]:6694180: 12 98 53 05 fa f9 00 80 40 00 00 00 00 00 00 00 RXdesc[25]:6694190: 12 b0 cc 04 fa f9 00 80 40 00 00 00 00 00 00 00 RXdesc[26]:66941a0: 12 a8 3f 05 fa f9 00 80 40 00 00 00 00 00 00 00 RXdesc[27]:66941b0: 12 58 e8 04 fa f9 00 80 40 00 00 00 00 00 00 00 RXdesc[28]:66941c0: 12 b0 4d 06 fa f9 00 80 40 00 00 00 00 00 00 00 RXdesc[29]:66941d0: 12 38 ef 04 fa f9 00 80 40 00 00 00 00 00 00 00 RXdesc[30]:66941e0: 12 98 1f 05 fa f9 00 80 40 00 00 00 00 00 00 00 RXdesc[31]:66941f0: 12 28 f1 04 fa f9 00 80 40 00 00 00 00 00 00 00 I only ever see entry 0 as status 0080 (0x8000 which is owned by mac), and this is while the driver is checking entry 0 every time it tries to check for any waiting packets. Running tcpdump while pinging gives the interesting result that some packets are ariving out of order making it seem like the driver is processing the packets out of order. Perhaps the driver is wrong to be looking at entry 0, and should be looking at entry 1 and is hence stuck until the whole receive ring has been filled again? 15:06:04.112812 IP 10.128.10.254 10.128.10.1: icmp 64: echo request seq 1 15:06:05.119799 IP 10.128.10.254 10.128.10.1: icmp 64: echo request seq 2 15:06:05.120159 IP 10.128.10.1 10.128.10.254: icmp 64: echo reply seq 2 15:06:05.127045 IP 10.128.10.1 10.128.10.254: icmp 64: echo reply seq 1 15:06:06.119862 IP 10.128.10.254 10.128.10.1: icmp 64: echo request seq 3 15:06:07.119921 IP 10.128.10.254 10.128.10.1: icmp 64: echo request seq 4 15:06:08.119994 IP 10.128.10.254 10.128.10.1: icmp 64: echo request seq 5 15:06:08.426400 IP 10.128.10.1 10.128.10.254: icmp 64: echo reply seq 3 15:06:08.427915 IP 10.128.10.1 10.128.10.254: icmp 64: echo reply seq 4 15:06:08.429033 IP 10.128.10.1 10.128.10.254: icmp 64: echo reply seq 5 15:06:09.120053 IP 10.128.10.254 10.128.10.1: icmp 64: echo request seq 6 15:06:10.120109 IP 10.128.10.254 10.128.10.1: icmp 64: echo request seq 7 15:06:10.705332 IP 10.128.10.1 10.128.10.254: icmp 64: echo reply seq 6 15:06:10.707258 IP 10.128.10.1 10.128.10.254: icmp 64: echo reply seq 7 15:06:11.120175 IP 10.128.10.254 10.128.10.1: icmp 64: echo request seq 8 15:06:12.120233 IP 10.128.10.254 10.128.10.1: icmp 64: echo request seq 9 15:06:13.120297 IP 10.128.10.254 10.128.10.1: icmp 64: echo request seq 10 15:06:14.120359 IP 10.128.10.254 10.128.10.1: icmp 64: echo request seq 11 15:06:14.120737 IP 10.128.10.1 10.128.10.254: icmp 64: echo reply seq 11 15:06:14.127064 IP 10.128.10.1 10.128.10.254: icmp 64: echo reply seq 8 15:06:14.127700 IP 10.128.10.1 10.128.10.254: icmp 64: echo reply seq 9 15:06:14.128268 IP 10.128.10.1 10.128.10.254: icmp 64: echo reply seq 10 15:06:15.120426 IP 10.128.10.254 10.128.10.1: icmp 64: echo request seq 12 15:06
Re: Re: Strange connection slowdown on pcnet32
On Mon, Feb 19, 2007 at 05:18:45PM -0500, Lennart Sorensen wrote: On Mon, Feb 19, 2007 at 03:11:36PM -0500, Lennart Sorensen wrote: I have been poking at things with firescope to see if the MAC is actually writing to system memory or not. The entry that it gets stuch on is _always_ entry 0 in the rx_ring. There does not appear to be any exceptions to this. Here is my firescope (slightly modified for this purpose) dump of the rx_ring of eth1: Descriptor:Address: /--base---\ /buf\ /sta\ /-message-\ /reserved-\ : : | | |len| |tus| | length | | | RXdesc[00]:6694000: 12 18 5f 05 fa f9 00 80 40 00 00 00 00 00 00 00 RXdesc[01]:6694010: 12 78 15 05 fa f9 40 03 ee 05 00 00 00 00 00 00 RXdesc[02]:6694020: 12 a0 52 06 fa f9 40 03 ee 05 00 00 00 00 00 00 RXdesc[03]:6694030: 12 f8 c2 04 fa f9 40 03 ee 05 00 00 00 00 00 00 RXdesc[04]:6694040: 12 70 15 05 fa f9 40 03 ee 05 00 00 00 00 00 00 RXdesc[05]:6694050: 12 e8 37 05 fa f9 40 03 ee 05 00 00 00 00 00 00 RXdesc[06]:6694060: 12 e0 37 05 fa f9 40 03 ee 05 00 00 00 00 00 00 RXdesc[07]:6694070: 12 e8 d5 04 fa f9 40 03 ee 05 00 00 00 00 00 00 RXdesc[08]:6694080: 12 e0 d5 04 fa f9 40 03 ee 05 00 00 00 00 00 00 RXdesc[09]:6694090: 12 d8 d1 05 fa f9 40 03 46 00 00 00 00 00 00 00 RXdesc[10]:66940a0: 12 d0 d1 05 fa f9 40 03 4e 00 00 00 00 00 00 00 RXdesc[11]:66940b0: 12 d8 02 05 fa f9 10 03 40 00 00 00 00 00 00 00 RXdesc[12]:66940c0: 12 d0 02 05 fa f9 40 03 46 00 00 00 00 00 00 00 RXdesc[13]:66940d0: 12 38 58 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[14]:66940e0: 12 30 58 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[15]:66940f0: 12 78 2c 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[16]:6694100: 12 a0 58 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[17]:6694110: 12 b0 04 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[18]:6694120: 12 b8 04 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[19]:6694130: 12 70 2c 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[20]:6694140: 12 f8 56 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[21]:6694150: 12 c8 29 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[22]:6694160: 12 20 03 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[23]:6694170: 12 60 4c 05 fa f9 00 80 87 05 00 00 00 00 00 00 RXdesc[24]:6694180: 12 98 53 05 fa f9 00 80 40 00 00 00 00 00 00 00 RXdesc[25]:6694190: 12 b0 cc 04 fa f9 00 80 40 00 00 00 00 00 00 00 RXdesc[26]:66941a0: 12 a8 3f 05 fa f9 00 80 40 00 00 00 00 00 00 00 RXdesc[27]:66941b0: 12 58 e8 04 fa f9 00 80 40 00 00 00 00 00 00 00 RXdesc[28]:66941c0: 12 b0 4d 06 fa f9 00 80 40 00 00 00 00 00 00 00 RXdesc[29]:66941d0: 12 38 ef 04 fa f9 00 80 40 00 00 00 00 00 00 00 RXdesc[30]:66941e0: 12 98 1f 05 fa f9 00 80 40 00 00 00 00 00 00 00 RXdesc[31]:66941f0: 12 28 f1 04 fa f9 00 80 40 00 00 00 00 00 00 00 I only ever see entry 0 as status 0080 (0x8000 which is owned by mac), and this is while the driver is checking entry 0 every time it tries to check for any waiting packets. Running tcpdump while pinging gives the interesting result that some packets are ariving out of order making it seem like the driver is processing the packets out of order. Perhaps the driver is wrong to be looking at entry 0, and should be looking at entry 1 and is hence stuck until the whole receive ring has been filled again? 15:06:04.112812 IP 10.128.10.254 10.128.10.1: icmp 64: echo request seq 1 15:06:05.119799 IP 10.128.10.254 10.128.10.1: icmp 64: echo request seq 2 15:06:05.120159 IP 10.128.10.1 10.128.10.254: icmp 64: echo reply seq 2 15:06:05.127045 IP 10.128.10.1 10.128.10.254: icmp 64: echo reply seq 1 15:06:06.119862 IP 10.128.10.254 10.128.10.1: icmp 64: echo request seq 3 15:06:07.119921 IP 10.128.10.254 10.128.10.1: icmp 64: echo request seq 4 15:06:08.119994 IP 10.128.10.254 10.128.10.1: icmp 64: echo request seq 5 15:06:08.426400 IP 10.128.10.1 10.128.10.254: icmp 64: echo reply seq 3 15:06:08.427915 IP 10.128.10.1 10.128.10.254: icmp 64: echo reply seq 4 15:06:08.429033 IP 10.128.10.1 10.128.10.254: icmp 64: echo reply seq 5 15:06:09.120053 IP 10.128.10.254 10.128.10.1: icmp 64: echo request seq 6 15:06:10.120109 IP 10.128.10.254 10.128.10.1: icmp 64: echo request seq 7 15:06:10.705332 IP 10.128.10.1 10.128.10.254: icmp 64: echo reply seq 6 15:06:10.707258 IP 10.128.10.1 10.128.10.254: icmp 64: echo reply seq 7 15:06:11.120175 IP 10.128.10.254 10.128.10.1: icmp 64: echo request seq 8 15:06:12.120233 IP 10.128.10.254 10.128.10.1: icmp 64: echo request seq 9 15:06:13.120297 IP 10.128.10.254 10.128.10.1: icmp 64: echo request seq 10 15:06:14.120359 IP 10.128.10.254 10.128.10.1: icmp 64: echo request seq 11 15:06:14.120737 IP 10.128.10.1 10.128.10.254: icmp 64: echo reply seq 11 15:06:14.127064 IP 10.128.10.1 10.128.10.254: icmp 64: echo reply seq 8 15:06:14.127700 IP 10.128.10.1 10.128.10.254: icmp 64: echo reply seq 9 15:06:14.128268
Re: Re: Strange connection slowdown on pcnet32
On Mon, Feb 19, 2007 at 05:29:20PM -0500, Lennart Sorensen wrote: I just noticed, it seems almost all these problems occour right at the start of transfers when the tcp window size is still being worked out for the connection speed, and I am seeing the error count go up in ifconfig for the port when it happens too. Is it possible for an error to get flagged in a receive descriptor without the owner bit being updated? It seems the problem actually occours when the receive descriptor ring is full. This seems to generate one (or sometimes more) descriptors in the ring which claim to be owned by the MAC, but at the head of the receive ring as far as the driver is concerned. I see some note in the driver about an SP3G chipset sometimes causing this. How would one identify this and clear such descriptors out of the way? Getting stuck until the next time the MAC gets around to the descriptor and overwrites it is not good, since it causes delays, and out of order packets. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Strange connection slowdown on pcnet32
On Mon, Feb 19, 2007 at 06:45:48PM -0500, Lennart Sorensen wrote: It seems the problem actually occours when the receive descriptor ring is full. This seems to generate one (or sometimes more) descriptors in the ring which claim to be owned by the MAC, but at the head of the receive ring as far as the driver is concerned. I see some note in the driver about an SP3G chipset sometimes causing this. How would one identify this and clear such descriptors out of the way? Getting stuck until the next time the MAC gets around to the descriptor and overwrites it is not good, since it causes delays, and out of order packets. I am also noticing the receive error count going up, and the source is this code: if (status 0x01) /* Only count a general error at the */ lp-stats.rx_errors++; /* end of a packet. */ It appears this means I am receiving a frame marked with End Of Packet but without Start of Packet. I have no idea how that happens, but it shouldn't be able to make the driver and MAC stop processing the receive ring. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: MediaGX/GeodeGX1 requires X86_OOSTORE.
On Sat, Feb 17, 2007 at 11:11:13PM +0900, takada wrote: is it mean what doesn't help with doesn't call set_cx86_reoder()? this function disable to reorder at 0x4000: to 0x:. does pcnet32 access at out of above range? --- arch/i386/Kconfig.cpu~2007-02-05 03:44:54.0 +0900 +++ arch/i386/Kconfig.cpu 2007-02-17 21:25:52.0 +0900 @@ -322,7 +322,7 @@ config X86_USE_3DNOW config X86_OOSTORE bool - depends on (MWINCHIP3D || MWINCHIP2 || MWINCHIPC6) MTRR + depends on (MWINCHIP3D || MWINCHIP2 || MWINCHIPC6) MTRR || MGEODEGX1 default y config X86_TSC Well it turns out that enabling OOSTORE doesn't elliminate the problem, but it does make it go from occouring within seconds to occouring within many hours. I am off to investigate some more. Does anyone know if there is any way to flush a cache line of the cpu to force rereading system memory for a given address or address range? -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: MediaGX/GeodeGX1 requires X86_OOSTORE.
On Mon, Feb 19, 2007 at 11:48:27AM -0800, Roland Dreier wrote: Does anyone know if there is any way to flush a cache line of the cpu to force rereading system memory for a given address or address range? There is the clflush instruction, but not all x86 CPUs support it. You need to check the CPUID flag to know for sure (/proc/cpuinfo will show a clflush flag if it is supported). Well I will check for that. Of course it is still possible that is it actually the network chip screwing up somehow. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: MediaGX/GeodeGX1 requires X86_OOSTORE.
On Tue, Feb 20, 2007 at 08:56:39AM +0900, takada wrote: /proc/cpuinfo with MediaGXm : processor : 0 vendor_id : CyrixInstead cpu family: 5 model : 5 model name: Cyrix MediaGXtm MMXtm Enhanced stepping : 2 cpu MHz : 199.750 cache size: 16 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp: yes flags : fpu tsc msr cx8 cmov mmx cxmmx bogomips : 401.00 clflush size : 32 Hmm with 2.6.18 I am seeing: processor : 0 vendor_id : CyrixInstead cpu family : 5 model : 9 model name : Geode(TM) Integrated Processor by National Semi stepping: 1 cpu MHz : 266.648 cache size : 16 KB fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu tsc msr cx8 cmov mmx cxmmx bogomips: 534.50 Similar, but the last line isn't there. It looks like 2.6.18 doesn't actually have code to print that information though. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: MediaGX/GeodeGX1 requires X86_OOSTORE.
On Sat, Feb 17, 2007 at 11:11:13PM +0900, takada wrote: is it mean what doesn't help with doesn't call set_cx86_reoder()? this function disable to reorder at 0x4000: to 0x:. does pcnet32 access at out of above range? No it is accessing system memory by DMA to transfer frames. Since the system has 128MB ram, the addresses are probably all in the first 128MB range. I tried changing cyrix.c to explicitly set the serialize bit (0x8000 in PCR0) rather than explcitly clearing it as is done now. Didn't make a difference. I tried reversing the memory bypass setting, which also did nothing. Enabling CONFIG_X86_OOSTORE and recompiling however does make a difference. --- arch/i386/Kconfig.cpu~2007-02-05 03:44:54.0 +0900 +++ arch/i386/Kconfig.cpu 2007-02-17 21:25:52.0 +0900 @@ -322,7 +322,7 @@ config X86_USE_3DNOW config X86_OOSTORE bool - depends on (MWINCHIP3D || MWINCHIP2 || MWINCHIPC6) MTRR + depends on (MWINCHIP3D || MWINCHIP2 || MWINCHIPC6) MTRR || MGEODEGX1 default y config X86_TSC I did: depends on ((MWINCHIP3D || MWINCHIP2 || MWINCHIPC6) MTRR) || MGEODEGX1 since I wasn't sure of the precedence in the Kconfig files. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Strange connection slowdown on pcnet32
On Thu, Feb 15, 2007 at 05:50:30PM -0500, Lennart Sorensen wrote: I have encountered a strange behaviour with the pcnet32. I am transfering data from a server to a client routing it through my router. The router has 2 ethernet ports, both of which are amd 972 chips (pcnet32). The transfer has so far been either http or ftp (both see the same problem). I transfer lots of data, and after a while (I have seen anywhere from 200 to 700MB or so) the speed suddenly drops to less than 1KB/s. If I ping from the router to the server, the ping requests go out normally (seen by tcpdump on the server) every second, but on the router the replies are not seen by the kernel for multiple seconds. Sometimes I will see 3 ping replies together, sometimes 5 or even 10. The turn around times will show 10500, 9500, 8500, ..., 500ms for the packets received in a batch. ifconfig on the router shows the packet receive counts showing up in lumps, just as ping does, and tcpdump on the interface on the router. Doing ifconfig down and up on the port connecting to the server makes the problem clear and it can handle another pile of data before the problem reappears. The CPU on the router is not fast enough to ensure there won't ever be dropped packets at 100Mbps. When I force the port to the server to 10Mbps I have no problems at all. Replacing the port to the server with an rtl8139 doesn't show any problems at 100Mbps, although the transfer rate drops from 6500KBps to 4000KBps compared to using the pcnet32. Kernel used so far is 2.6.16 and 2.6.18. I have a tulip card I intend to try with as well just to see if it affects anything other than the pcnet32. Does anyone have any hints as to what part of the code to look at for changes made by doing ifconfig eth1 down; ifconfig eth1 up? Any ideas as to what could make the reception of packets suddenly get very very slow? On one pass where I was running tcpdump on the router, I saw a wrap of the sequence number right before the problem occoured, but that has not been the case every time as far as I can tell, so I am not sure if that is related to the problem at all. I have run some tests using 2.6.8 now, and so far it hasn't failed. Still investigating... -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Strange connection slowdown on pcnet32
On Fri, Feb 16, 2007 at 09:35:54AM -0500, Lennart Sorensen wrote: I have run some tests using 2.6.8 now, and so far it hasn't failed. Still investigating... And 5 minutes later 2.6.8 failed the same way too. Maybe I will go back to 2.4 and check. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Strange connection slowdown on pcnet32
On Fri, Feb 16, 2007 at 10:21:24AM -0600, [EMAIL PROTECTED] wrote: Are there any messages in the log about timeouts, or anything else from the driver? When it gets in this state, can you communicate with another system, and does it have the same slow behavior? Nope no timeouts or messages. As far as the system looks, cpu and ram and logs show nothing unusual. Just very slow reception on the ethernet port going towards the server providing the data for the transfer. Messages do get through eventually, but very very late (when a ping reply arives at the port and takes 5 to 10 seconds to make it to the network stack, then something isn't right, at least when there is no other traffic waiting). I did have NAPI in the driver even in 2.6.8 (I was adding that at the time). I am now testing with 2.6.8 without NAPI (so no mask/unmask of receive interrupts taking place), and so far it has run for over an hour without failing, although that doens't prove it won't, just that it has lasted longer. I think I will try compiling 2.6.18 again with NAPI disabled on the pcnet32 and see what that does. There is a chance that something in the NAPI implementation is breaking the chip's receive somehow although I can't currently imagine what it could be or how. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Strange connection slowdown on pcnet32
On Fri, Feb 16, 2007 at 12:21:10PM -0500, Lennart Sorensen wrote: On Fri, Feb 16, 2007 at 10:21:24AM -0600, [EMAIL PROTECTED] wrote: Are there any messages in the log about timeouts, or anything else from the driver? When it gets in this state, can you communicate with another system, and does it have the same slow behavior? Nope no timeouts or messages. As far as the system looks, cpu and ram and logs show nothing unusual. Just very slow reception on the ethernet port going towards the server providing the data for the transfer. Messages do get through eventually, but very very late (when a ping reply arives at the port and takes 5 to 10 seconds to make it to the network stack, then something isn't right, at least when there is no other traffic waiting). I did have NAPI in the driver even in 2.6.8 (I was adding that at the time). I am now testing with 2.6.8 without NAPI (so no mask/unmask of receive interrupts taking place), and so far it has run for over an hour without failing, although that doens't prove it won't, just that it has lasted longer. I think I will try compiling 2.6.18 again with NAPI disabled on the pcnet32 and see what that does. There is a chance that something in the NAPI implementation is breaking the chip's receive somehow although I can't currently imagine what it could be or how. So I have determined that when the port gets stuck/slow it is hitting this problem: (in pcnet32_rx): while (quota npackets (short)le16_to_cpu(rxp-status) = 0) { if (netif_msg_intr(lp)) printk(KERN_DEBUG %s: pcnet32_rx npackets %d\n, dev-name, npackets); pcnet32_rx_entry(dev, lp, rxp, entry); npackets += 1; /* * The docs say that the buffer length isn't touched, but Andrew * Boyd of QNX reports that some revs of the 79C965 clear it. */ rxp-buf_length = le16_to_cpu(2 - PKT_BUF_SZ); wmb(); /* Make sure owner changes after others are visible */ rxp-status = le16_to_cpu(0x8000); entry = (++lp-cur_rx) lp-rx_mod_mask; rxp = lp-rx_ring[entry]; } Unfortunately rxp-status reads as 0x8000 for a long time, and then eventually changes to 0x0310 at which point the receive happens. Until that happens, the poll is called about once per second and each time returns that 0 packets were received but that more packets are waiting. I can't figure out why it would get a status of 0x8000 which means that the MAC hasn't changed the ownership flag on the packet yet, even though it generated a receive interrupt multiple seconds ago. Could it be some caching issue that makes the cpu not realize that the memory has in fact been changed by DMA? Any way to force a cache update for a memory location? The CPU is a Geode SC1200 (Geode GX1 + Companion in one). So far I have seen __memcpy from system ram to device memory get data out of order, so I have no reason to believe the cpu doesn't have more stupid bugs related to doing I/O. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Strange connection slowdown on pcnet32
On Fri, Feb 16, 2007 at 03:23:00PM -0500, Lennart Sorensen wrote: So I have determined that when the port gets stuck/slow it is hitting this problem: (in pcnet32_rx): while (quota npackets (short)le16_to_cpu(rxp-status) = 0) { if (netif_msg_intr(lp)) printk(KERN_DEBUG %s: pcnet32_rx npackets %d\n, dev-name, npackets); pcnet32_rx_entry(dev, lp, rxp, entry); npackets += 1; /* * The docs say that the buffer length isn't touched, but Andrew * Boyd of QNX reports that some revs of the 79C965 clear it. */ rxp-buf_length = le16_to_cpu(2 - PKT_BUF_SZ); wmb(); /* Make sure owner changes after others are visible */ rxp-status = le16_to_cpu(0x8000); entry = (++lp-cur_rx) lp-rx_mod_mask; rxp = lp-rx_ring[entry]; } Unfortunately rxp-status reads as 0x8000 for a long time, and then eventually changes to 0x0310 at which point the receive happens. Until that happens, the poll is called about once per second and each time returns that 0 packets were received but that more packets are waiting. I can't figure out why it would get a status of 0x8000 which means that the MAC hasn't changed the ownership flag on the packet yet, even though it generated a receive interrupt multiple seconds ago. Could it be some caching issue that makes the cpu not realize that the memory has in fact been changed by DMA? Any way to force a cache update for a memory location? The CPU is a Geode SC1200 (Geode GX1 + Companion in one). So far I have seen __memcpy from system ram to device memory get data out of order, so I have no reason to believe the cpu doesn't have more stupid bugs related to doing I/O. It seems whenever it gets stuck, it is always the same descripter it is stuck on. Here is my current log: eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0433, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0433, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt
Re: Re: Strange connection slowdown on pcnet32
On Fri, Feb 16, 2007 at 04:01:57PM -0500, Lennart Sorensen wrote: It seems whenever it gets stuck, it is always the same descripter it is stuck on. Here is my current log: eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0433, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0433, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0433, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0
MediaGX/GeodeGX1 requires X86_OOSTORE. (Was: Re: Strange connection slowdown on pcnet32)
On Fri, Feb 16, 2007 at 05:27:28PM -0500, Lennart Sorensen wrote: On Fri, Feb 16, 2007 at 04:01:57PM -0500, Lennart Sorensen wrote: It seems whenever it gets stuck, it is always the same descripter it is stuck on. Here is my current log: eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0433, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0433, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0433, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000 next-status: 0340 eth1: pcnet32_poll: pcnet32_rx() got 0 packets eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x. eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. eth1: base: 0x04c51812 status: 8000
Re: MediaGX/GeodeGX1 requires X86_OOSTORE. (Was: Re: Strange connection slowdown on pcnet32)
On Fri, Feb 16, 2007 at 05:48:24PM -0500, Lennart Sorensen wrote: Well so far it really looks like enabling OOSTORE on the Geode SC1200/GX1 really does make a difference. A bit of searching seems to indicate the person that originally submitted the patch that enabled load/store reordering on the MediaGX/Geode though it might need OOSTORE, but was convinced by others it didn't. Looks like it really does need it. The failure that occoured before within a few seconds of starting a large transfer, no longer fails and all I did was enable CONFIG_X86_OOSTORE, and recompile pcnet32.ko and load the new module on the running system. Moving back to the pcnet32.ko built without OOSTORE enabled hits the failure again within seconds, until ifconfig eth1 down/up reinitialized it's descriptor ring, after which it survices another bit of transfer and then fails again. Well forcing load/store serialize on the CPU doesn't help, disalbing memory bypass doesn't help. Enabling the X86_OOSTORE does help. What a stupid CPU design. So far nothing has managed to fix the __memcpy_toio in the jsm driver getting data out of order when sending on an exar pci uart chip. Only calling memcpy with one byte at a time seems to work there. Works fine on every other cpu of course. What else am I going to discover is wrong with this CPU. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Strange connection slowdown on pcnet32
I have encountered a strange behaviour with the pcnet32. I am transfering data from a server to a client routing it through my router. The router has 2 ethernet ports, both of which are amd 972 chips (pcnet32). The transfer has so far been either http or ftp (both see the same problem). I transfer lots of data, and after a while (I have seen anywhere from 200 to 700MB or so) the speed suddenly drops to less than 1KB/s. If I ping from the router to the server, the ping requests go out normally (seen by tcpdump on the server) every second, but on the router the replies are not seen by the kernel for multiple seconds. Sometimes I will see 3 ping replies together, sometimes 5 or even 10. The turn around times will show 10500, 9500, 8500, ..., 500ms for the packets received in a batch. ifconfig on the router shows the packet receive counts showing up in lumps, just as ping does, and tcpdump on the interface on the router. Doing ifconfig down and up on the port connecting to the server makes the problem clear and it can handle another pile of data before the problem reappears. The CPU on the router is not fast enough to ensure there won't ever be dropped packets at 100Mbps. When I force the port to the server to 10Mbps I have no problems at all. Replacing the port to the server with an rtl8139 doesn't show any problems at 100Mbps, although the transfer rate drops from 6500KBps to 4000KBps compared to using the pcnet32. Kernel used so far is 2.6.16 and 2.6.18. I have a tulip card I intend to try with as well just to see if it affects anything other than the pcnet32. Does anyone have any hints as to what part of the code to look at for changes made by doing ifconfig eth1 down; ifconfig eth1 up? Any ideas as to what could make the reception of packets suddenly get very very slow? On one pass where I was running tcpdump on the router, I saw a wrap of the sequence number right before the problem occoured, but that has not been the case every time as far as I can tell, so I am not sure if that is related to the problem at all. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 04/26] rt2x00: EEPROM 93Cx6
On Wed, Dec 13, 2006 at 05:47:41PM +0100, Ivo van Doorn wrote: Do you need to actually write data to the eeprom chip? Currently the module does not support writing to the eeprom, this is something I could add (The original Ralink code, where this module is based on also contains the code to write to the EEPROM). I am going to use it to write the custom pci vendor ID to the eeprom, so yes I intend to write to it. The code appears as if it has the ability to write to the eeprom but I didn't look at all of it carefully yet. I don't actually have any need to read it back, although I intend to do so to verify the contents. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] d80211, rt2x00: fixes
On Wed, Dec 13, 2006 at 06:00:35PM +0100, Jiri Benc wrote: John, in addition to the previous pull request, please also apply the following two fixes. What is the state of the rx2x00 driver by now? I have been playing around with an rt2500 based card, with some success but not enough for me to switch over from wired ethernet yet on my machine. I used to get lots of hard lockups, but with the latest cvs snapshot in debian's rt2x00-source package, it no longer seems to lockup. It also now works with WPA without using wpa_supplicant (Yay! Good work.), it does however very frequently pause the transfer, and then after a while (20 or 30 seconds probably) it will start moving data again and my transfer will continue. Is this considered normal for now? My card happens to be a linksys WMP54G version 4.0. At least pauses beat crashes. It's going the right way for a work in progress. I guess I should go read the bug tracking system and try out newer cvs versions. :) -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] eeprom_93cx6: Add write support
On Wed, Dec 13, 2006 at 07:56:50PM +0100, Ivo van Doorn wrote: This patch addes support for writing to the eeprom, this also moves some duplicate code into seperate functions. Signed-off-by Ivo van Doorn [EMAIL PROTECTED] Thank you. I will have a try with that to see if I can get that to work with the jsm driver. Too bad the serial drivers don't have any geteeprom/seteeprom standard ioctl's the way ethtool does for network devices. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] d80211, rt2x00: fixes
On Wed, Dec 13, 2006 at 12:38:43PM -0500, Dan Williams wrote: How, by private ioctls? That's just wrong; I believe you still need to go through the 4-way handshake to get the right keying information even if you use PSK, which means you still need the supplicant, right? All I did was add this to /etc/network/interfaces: iface wlan0 inet static address 192.168.1.51 network 192.168.1.0 netmask 255.255.255.0 gateway 192.168.1.254 broadcast 192.168.1.255 pre-up ifconfig wlan0 up pre-up iwpriv wlan0 set AuthMode=WPAPSK pre-up iwpriv wlan0 set EncrypType=TKIP pre-up iwconfig wlan0 essid USR8054 pre-up iwpriv wlan0 set WPAPSK=My WPA passphrase... It seems to work, although I guess I could be wrong. It was what I found in the documentation for the rt2x00 driver for doing WPA. It looks nothing like the wpa_supplicant stuff I used to have with an older version of the driver. My understanding was that the rt2x00 driver and/or d80211 stack took care of it now. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] d80211, rt2x00: fixes
On Wed, Dec 13, 2006 at 06:56:57PM +0100, Ivo van Doorn wrote: rt2x00 completely uses the dscape stack, so I am not sure how he is managing this wpa without wpa_supplicant with rt2x00. Lennart, are you using rt2x00 or the legacy rt2500 driver? rt2x00 with dscape stack. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] d80211, rt2x00: fixes
On Wed, Dec 13, 2006 at 06:49:07PM +0100, Ivo van Doorn wrote: Well results seem to vary between users. Since recently users have started reporting panics and freezes with rt2x00. I have not yet traced that problem to the source, because the panics I have received don't contain any rt2x00 or d80211 functions. But the presence of the rt2x00 module is the important factor in reproducing the crash. :( Others however seem to have more success with rt2x00, master mode seems to work with reasonable speed. Association with managed mode is still very shortlifed. People who manage to get associated are being kicked from the AP quite quickly. (This could be because d80211 is not sending NULL-frames every once in a while). I should do some more testing and submit a report of how it is behaving. Anything specific worth checking if it misbehaves (so far misbehaving seems to be pausing the network transmissions for a short period and then resuming). But since results vary much between users, I can still descrive the rt2x00 state as experimental driver Well every once in a while I load a new version and see how it is. Eventually I hope it will work perfectly, and I can move my mythtv box into the living room with the TV using wireless, rather than sitting in the basement next to the router machine using wired ethernet. Running ethernet cable through the wall when the basement is all finished just seems like too much work. I got an rt based card because I knew it was being worked on. I know someday it will simply work, which is good enough for me. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] d80211, rt2x00: fixes
On Wed, Dec 13, 2006 at 10:28:15PM +0100, Ivo Van Doorn wrote: That is definately the rt2500 legacy driver and _not_ the rt2x00 driver. Yeah I just noticed that a few minutes ago. I had been trying out both to see how they worked, and I left the old module loaded by accident. Correct, that is why those iwpriv commands are the clear evidence you are not using rt2x00 but rt2500 legacy. Check which driver is loaded rt2500 means legacy rt2500pci means rt2x00. Yep I am now poking with the wpa_supplicant again, getting other interesting messages from it. I will try to give a report on how my system behaves with the 2x00 driver soon since apparently some of my testing was with the wrong driver. Oops. :) -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Simultanious transmits seems to cause hang on pcnet32
I am currently doing some testing on my system and managing to totally hang the system (so that the watchdog has to come along and reboot it). The setup is this: I have a PLX PCI-PCI bridge with 4 79C972 chips behind it, each running 100baseTX. I am transmitting traffic from a smartbits test system from port 1 to port 3 and back, and from port 2 to port 4 and back. I am running 500 packets/second with 60 byte packets each way. If I start the traffic on all 4 ports at the same time, I get less than 100 packets received back at the smartbits on each port, and then the linux kernel is hung. No response to anything I have tried. The watchdog then reboots the system. If I start traffic on less than 4 ports, and then add the remaining ports a second or so later, then it runs just fine and keeps up with the traffic. I tried making the traffic all flow out eth0 (an rtl8139 port) instead of out the pcnet32 ports, and then there is no problem, so I think there is some problem when multiple ports try to start transmitting at the same time. So far it has failed with 2.6.8 and 2.6.16 and with 2.6.17's pcnet32 with the napi patches applied. I noticed that sometime between 2.6.4 and 2.6.8, the TxDone interrupts were removed entirely, where as they used to be sent every once in a while. I am not sure if this is making a difference yet. I tried increasing the ring sizes to their maximum setting of 9/9 rather than the current default of 4/5, and that didn't make any difference either. Does anyone have a suggestion for how to go about debuging this issue? So far I am very confused. I tried turning on lots of debuging in pcnet32, but that seems to slow the system down enough (printing debug messages on the serial console) that it only manages to transmit 10 packets per port per second, at which point it doesn't lock up. Reducing the test setting from 500 60byte packets/second to 100 makes the problem disappear as well. So I am open for suggestions to try. I really don't know where to go about debuging this when it makes the kernel lock up. It makes me think it is getting stuck somewhere with interrupts disabled, but I can't see anything in the transmit code that looks like that could happen. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Simultanious transmits seems to cause hang on pcnet32
On Tue, Jul 18, 2006 at 10:57:47AM -0700, Don Fry wrote: I don't know what a 'smartbits test system' is or how it works. Could you please briefly explain what it is and does? It is a network test system built by spirent (www.spirentcom.com). It is mainly a layer 2 test system (you configure what you want it ethernet packet to look like, what rate you want them sent at, and what fields to change and by how much on each packet sent out). We have it configured to generate packets from 192.168.1.2 to 192.168.3.2 (and vice versa), with the ip of the router with the pcnet32 chips in it, set as the gateway. The packets are simply an ethernet packet with the IPv4 header with the source and destination IP filled in, along with the other required fields and the checksum, and then the data part of the packet filled with 0s in this case. Is the rdl8139 on the same PCI bus? The 8139 is on the primary PCI bus, the 972s are behind the pci bridge. The 8139 driver is normally not even loaded. Is there a version of the pcnet32 driver that does work? Is this a stock driver or do you have modifications made as well? I haven't found one that works yet. The only changes I have made are to initialize the PHY and set the MAC address, since we don't have an eeprom connected to the 972s. I was thinking of going and trying with 2.4.27 or something around there, to see if an older driver behaves differently. The ltint or TxDone interrupt deferral code was removed in May 2004, 2.6.7 timeframe. Every transmit packet causes an interrupt, rather than just occasionally. Hmm, the way I read the code, it looked like setting the status to 8300 made no packet generate the interrupt, and setting it to 9300 made a packet generate an interrupt. I guess I read it backwards. That wouldn't surprise me. :) Does reducing the ring size make any difference? Or tx large/rx small, or vice-versa? I don't know. I can try that. Is there any way to see what is happening on the PCI bus where the pcnet32 devices are connected? Or see what is happening on the master side of the pci-to-pci bridge? Do the chips share any interrupt lines or do they all have dedicated irq's? We have two interrupts for the PCI bus, irq10 and 11. eth1 and 3 share one, and eth2 and 4 share the other. Is this an SMP or UP system? Single amd geode SCx200 266MHz. I have also considered building with PREEMPT off, to see if that makes a difference, not that there are really any user space processes doing anything on the system. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFT PATCH] pcnet32: NAPI support
On Wed, Jun 28, 2006 at 09:55:41AM -0700, Don Fry wrote: Yes, I saw the debug statements when creating the email and was too lazy to remove them and create a new patch. The patch needs to be broken up into functional pieces anyway, so since it has passed all of my testing as well, I will start on that... So it might make 2.6.18 then? :) I just updated the driver in my 2.6.16 kernel to the 2.6.17 version and applied your patch, and then added my own weird stuff that no one else will want. Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFT PATCH] pcnet32: NAPI support
On Fri, Jun 23, 2006 at 02:32:12PM -0700, Don Fry wrote: This set of changes combines the work done by Len Sorensen and myself to add compile time support for NAPI for the pcnet32 driver. I have tested it on ia32 and ppc64 hardware with various versions of the pcnet32 adapter. I have also made a few changes requested by Jon Mason, but the substitution of the many magic numbers in the driver is not yet done. If no-one encounters any problems when testing this, I will break up the several changes, into proper patches and submit them next week. Well so far this is working for me. It is a somewhat different layout of the interrupt handler so it took me a bit of work to get the features I need patched in, but in the end I ended up with simpler code as a reesult, so I am quite happy with the new layout. The driver works on everything I have to try it on so far. Signed-off-by: Don Fry [EMAIL PROTECTED] --- linux-2.6.17/drivers/net/orig.Kconfig 2006-06-15 11:49:39.0 -0700 +++ linux-2.6.17/drivers/net/Kconfig 2006-06-22 15:44:52.0 -0700 @@ -1272,6 +1272,23 @@ config PCNET32 file:Documentation/networking/net-modules.txt. The module will be called pcnet32. +config PCNET32_NAPI + bool Use RX polling (NAPI) (EXPERIMENTAL) + depends on PCNET32 EXPERIMENTAL + help + NAPI is a new driver API designed to reduce CPU and interrupt load + when the driver is receiving lots of packets from the card. It is + still somewhat experimental and thus not yet enabled by default. + + If your estimated Rx load is 10kpps or more, or if the card will be + deployed on potentially unfriendly networks (e.g. in a firewall), + then say Y here. + + See file:Documentation/networking/NAPI_HOWTO.txt for more + information. + + If in doubt, say N. + config AMD8111_ETH tristate AMD 8111 (new PCI lance) support depends on NET_PCI PCI --- linux-2.6.17/drivers/net/orig.pcnet32.c Sat Jun 17 18:49:35 2006 +++ linux-2.6.17/drivers/net/pcnet32.cFri Jun 23 13:13:02 2006 @@ -21,9 +21,15 @@ * */ +#include linux/config.h + #define DRV_NAME pcnet32 -#define DRV_VERSION 1.32 -#define DRV_RELDATE 18.Mar.2006 +#ifdef CONFIG_PCNET32_NAPI +#define DRV_VERSION 1.33-NAPI +#else +#define DRV_VERSION 1.33 +#endif +#define DRV_RELDATE 23.Jun.2006 #define PFX DRV_NAME : static const char *const version = @@ -58,18 +64,15 @@ static const char *const version = * PCI device identifiers for new style Linux PCI Device Drivers */ static struct pci_device_id pcnet32_pci_tbl[] = { - { PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_LANCE_HOME, - PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0}, - { PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_LANCE, - PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0}, + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_LANCE_HOME), }, + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_LANCE), }, /* * Adapters that were sold with IBM's RS/6000 or pSeries hardware have * the incorrect vendor id. */ - { PCI_VENDOR_ID_TRIDENT, PCI_DEVICE_ID_AMD_LANCE, - PCI_ANY_ID, PCI_ANY_ID, - PCI_CLASS_NETWORK_ETHERNET 8, 0x00, 0}, + { PCI_DEVICE(PCI_VENDOR_ID_TRIDENT, PCI_DEVICE_ID_AMD_LANCE), + .class = (PCI_CLASS_NETWORK_ETHERNET 8), .class_mask = 0x00, }, { } /* terminate list */ }; @@ -277,13 +280,14 @@ struct pcnet32_private { u32 phymask; }; -static void pcnet32_probe_vlbus(void); static int pcnet32_probe_pci(struct pci_dev *, const struct pci_device_id *); static int pcnet32_probe1(unsigned long, int, struct pci_dev *); static int pcnet32_open(struct net_device *); static int pcnet32_init_ring(struct net_device *); static int pcnet32_start_xmit(struct sk_buff *, struct net_device *); -static int pcnet32_rx(struct net_device *); +#ifdef CONFIG_PCNET32_NAPI +static int pcnet32_poll(struct net_device *dev, int *budget); +#endif static void pcnet32_tx_timeout(struct net_device *dev); static irqreturn_t pcnet32_interrupt(int, void *, struct pt_regs *); static int pcnet32_close(struct net_device *); @@ -425,6 +429,235 @@ static struct pcnet32_access pcnet32_dwi .reset = pcnet32_dwio_reset }; +static void pcnet32_netif_stop(struct net_device *dev) +{ + dev-trans_start = jiffies; + netif_poll_disable(dev); + netif_tx_disable(dev); +} + +static void pcnet32_netif_start(struct net_device *dev) +{ + netif_wake_queue(dev); + netif_poll_enable(dev); +} + +/* + * Allocate space for the new sized tx ring. + * Free old resources + * Save new resources. + * Any failure keeps old resources. + * Must be called with lp-lock held. + */ +static void pcnet32_realloc_tx_ring(struct net_device
Re: [RFT] pcnet32 NAPI changes
On Tue, Jun 20, 2006 at 08:53:55AM -0500, Jon Mason wrote: The amount of polls per received packet is very low, thus removing the benefit of NAPI. A compile time option would allow those users who know better to DTRT. Well I know on the slow poke system I run on, with the napi polling, the system can process packets, and get work done, and not fall over and die from handling interrupts. Without it, even 70Mbit of data on a single port will flood the system with packet overruns to the point the watchdog times out and the system reboots. So I don't know if polling is slightly more inefficient with little traffic, it is certainly a lot more efficient and safer when there is suddenly a lot more traffic. Maybe it should be a module option, so that you can pick what you want. Heck it could be a per port option even. :) Yup, but the everyone else is doing it argument never worked with my parents. All it takes is one brave soul to determine the reasoning behind the magic numbers and convert them into #define's. Shouldn't be more than one day's work. Is this a magic number in your opinion? lp-a.write_csr(ioaddr, 0, 0x0002); /* Set STRT bit */ I guess one could do #define CSR0_RST 0x0001 #define CSR0_STRT 0x0002 #define CSR0_STOP 0x0004 etc... and then lp-a.write_csr(ioaddr, 0, CSR0_STRT); /* Set STRT bit */ Does that help? I am not sure. I think the comment behind it is plenty. Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFT] pcnet32 NAPI changes
On Tue, Jun 20, 2006 at 11:05:04AM -0500, Jon Mason wrote: The point of my comment was CPU utilization. It appears that a bug is trying to be fixed by adding NAPI. This sounds a bit hackish to me, and could hide the root cause of the problem. So I'm not sure that is the best idea, but I will defer to the maintainer. No it isn't a bug. If the hardware generates enough interrupts to keep the cpu at 100% handling them, starving user space (since interrupts have high priority compared to just running user code of course), then the watchdog daemon which of course runs in user space will never run and hence the watchdog hardware times out and resets the system, as it is designed to do. There is no bug, just a problem of too many interrupts generated by the network hardware. NAPI elliminates the receive interrupts when the system is busy, solving the problem at it's root cause. But your example is just one instance. Here is one without a comment: lp-a.write_csr(ioaddr, 4, 0x0915); Hmm. 0x0915 = 1001 0001 0101 = *Auto Pad Transmit (bit 11). Enabled auto padding of packets. *Missed Frame Counter Overflow Mask (bit 8): Masks out interrupts on overflow of the missed frame counter. *Receive Collision Counter Overflow Mask (bit 4): Masks out interrupts on overflow of the receive collision counter. *Transmit Start Mask (bit 2): Masks out interrupts on start of transmit. So every CSR has a different meaning for all its bits. Defining each one, and combining all of them could make a lot of the code really messy. Perhaps more comments on those places would be clearer. What is it doing? Is it still needed? Can it be done anywhere else? Who knows, because it is magic. The 4 can be defined as CSR0_STOP, per your example above, but what does value 0x0915 do? No the 4 has a different meaning in CSR4. It means stop in CSR0. in CSR4 it means Transmit Start Mask. It masks interrupts on transmit start. I think the value is wrong, since my data sheet says bit 0 and 1 are reserved and should be written as 0. 0x0915 would write bit 0 as a 1 which violates the data sheet of the 972 at least. My point was that there are certain parts of the code which are non-intuative and should be commented and there are others which a good descrptive value would be nice. Well I agree the code could get a bit better. I did think overall that the code was rather nice actually. Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFT] pcnet32 NAPI changes
On Fri, Jun 16, 2006 at 12:11:54PM -0700, Don Fry wrote: This patch is a collection of changes to pcnet32 which does the following: - Fix section mismatch warning. - fix set_ringparam to correctly handle memory allocation failures - fix off-by-one in get_ringparam. - cleanup at end of loopback_test when not up. - Add NAPI to driver, fixing set_ringparam and loopback_test to work correctly with poll. - for multicast, do not reset the chip unless cannot enter suspend mode to avoid race with poll. The set_ringparam code is larger than I would prefer, but it will not leave null pointers around for the code to stumble over when memory allocation fails. If anyone has a better idea, please let me know. Some complexity could be avoided by allocating memory for the maximum number of tx and rx buffers at probe time. Requiring 14k for the tx ring and arrays, and another 14k for rx; instead of about 10k total for the default sizes. So 28k vs 10k? Why are these adjustable if it makes that little difference? Is there any advantage to making them smaller? It is NAPI only, unlike Len Sorensen's version which allows for compile time selection. Some drivers are NAPI only, others have compile options. Which is preferred? I just figured making it an option was less intrusive, although I can't imagine a good reason for not wanting to use the NAPI version at all times. I certainly know I intend to use it that way. I have tested these changes with a 79C971, 973, 976, and 978 on a ppc64 machine, and 970A, 972, 973, 975, and 976 on an x86 machine. I have not tested these changes with VMware or Xen. I will give it a try with our system and see how it runs. Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFT] pcnet32 NAPI changes
On Mon, Jun 19, 2006 at 03:41:40PM -0500, Jon Mason wrote: I believe it is preferred to be a compile option for non-gigabit drivers, given that it will be eating a lot of cycles for infrequent packets (especially for the 10Mb). I believe there was a thread about this last year when e100 was having NAPI problems. How does NAPI eat cycles? It goes back to interrupt mode when the queue is empty, and only on RX interrupt does it turn on polling again. It is certainly possible that there are bugs in a NAPI conversion, which I guess could be a reason to have the option to stick with the old method, although then again not having the option ensures the bugs get found sooner. A general nit. There are ALOT of magic numbers in the code, most existing prior to this patch. The driver would benefit from a little clean-up. Also nothing to do with this patch, but I noticed it when the code was moved. A comment about why the following is necessary might be nice: lp-rx_ring[i].buf_length = le16_to_cpu(2 - PKT_BUF_SZ); I suspect many drivers are in need of some cleanup. Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Firewall question
On Fri, Jun 09, 2006 at 05:43:24AM +0200, Andi Kleen wrote: No one out on the internet, but it would be trivial for someone outside his house. All his traffic will be on a long unsecured cable. That is why I would never bridge home ethernet traffic onto a DSL line. Hmm, traffic sent between his machines would not go over the DSL since the MAC address doesn't match the DSL modem (I would think so at least). It would be a mess if the DSL modem tried to forwards all traffic on an ethernet segment (well it doesn't have the bandwidth for sure). Maybe I am incorrectly assuming the DSL modem only forwards the PPPoE traffic being sent at it. I could see broadcast traffic being forwarded, although arps and such are generally not that interesting. Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pcnet32 driver NAPI support
On Wed, Jun 07, 2006 at 03:32:45PM -0700, Don Fry wrote: One other problem I ran into. I applied the patch but it will not compile because rl_active is never defined. I have worked around it but Doh! I thought I cleaned up all my weird code from my own version. Because of the platform I work with having 4 pcnet32 ports, and a slow poke 266MHz geode, we can't handle full traffic load, so to keep the system responsive to pause processing receives when we pass a certain number of packets per second. rl_active is part of that. I meant to remove all of it, apparently I didn't read every line of my patch carefully enough. :( Well at least this ought to clean up my work a bit. Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Firewall question
On Thu, Jun 08, 2006 at 11:57:12AM -0700, Alex Davis wrote: The scenario: I have a DSL modem in pass through (bridge) mode. The linux firewall/router has a single ethernet card. It is running pppoe. This gives two interfaces: eth0 and ppp0. The firewall is running iptables. There are several machines behind the firewall. Problem: I've been told that if someone whose public IP address is on the same network subnet as mine were to get my mac address, (s)he could bypass the firewall and talk directly to the machines behind it. Is this true? Well the DSL modem only transfers whatever data the ISP end sends to it, which in your case is just PPP packets (LCC or LCP I think). No one out on the internet would be able to send ethernet data over the DSL link, so the only way to send data to another machine on your network (that the DSL modem is connected to physically) is if you have other machines on your local network which are also running PPPoE and listening for that traffic. So the worst thing I can see happening is that someone on your local network could potentially take over your PPPoE session, but that's about it. I just can't see anything else that could happen. I used to run exactly the setup you describe before I had to drop the DSL connection (I moved). Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pcnet32 driver NAPI support
On Wed, Jun 07, 2006 at 11:20:40AM -0700, Don Fry wrote: I am also working on a NAPI version of the pcnet32 driver for many of the same reasons, and will compare what you have with my own implementation. I probably won't be able to do much until Friday. Just a couple of comments. I am adding netdev@vger.kernel.org to the cc list, as most network driver discussion is done here rather than lkml. linux-kernel (and linux-net) should be deleted in future replies. I must have picked the wrong place to cc. The 2.6.17-rc6 would be the correct source to patch against. Since this is an enhancement it will not come out till 2.6.18. I thought so. That is why I did it against both 2.6.17-rc6 and 2.6.16 (since I use it with 2.6.16). I would not change the driver name from pcnet32 to pcnet32napi, but I would changes the version from 1.32 to 1.33NAPI or something like that. Hmm, perhaps. I just wanted something that made it obvious in dmesg which driver I was running. I see tulip actually does put it in the version instead. I don't remember where I got the driver name change idea from. Some areas of concern that you may have addressed already, I have not scanned your changes yet, are what happens if the ring size is changed without bringing down the interface (via ethtool), or if the loopback test is run in a similar fashion, or a tx timeout occurs. The same thing as if it was done before enabling napi. From a few messages I have seen, it doesn't work right now, and it won't work any better with my changes. I have never tried changing the ring size on the fly, so I don't know. It appears that the port is stopped before the ring size change is done, although I can't really tell how it handles things if the queue is not empty when it stops the port. Does it try to handle anything left in the ring first or does it just toss those packets? (That I would consider wrong). The lp-lock MUST be held whenever accessing the csr or bcr registers as this is a multi-step process, and has been the source of problems in the past. Even on UP systems. Hmm, I just followed what appeared to be in pcnet32_rx and how tulip and a few other drivers had done their napi conversions. It certainly works for me the way I did it. Haven't seen any lockups yet. I do see that I am not holding the lock when I acknowledge IRQs in pcnet32_poll, which pcnet32_rx doesn't need to worry about since it is called from the interrupt handler which already holds the lock. That should be fixed then. So I can do: // Clear RX interrupts spin_lock(lp-lock); lp-a.write_csr (ioaddr, 0, 0x1400); spin_unlock(lp-lock); That part seems simple enough to protect. Is this safe without holding the lock? } while(lp-a.read_csr (ioaddr, 0) 0x1400); Not sure how to wrap a lock around that one without holding the lock for way too long. perhaps: spin_lock(lp-lock); state=lp-a.read_csr (ioaddr, 0) 0x1400; spin_unlock(lp-lock); } while(state); Does that seem more reasonable? Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pcnet32 devices with incorrect trident vendor ID
On Thu, Jan 12, 2006 at 08:49:42PM +, Daniel Drake wrote: On the subject of pcnet32 and the invalid vendor ID, you may find this interesting: http://forums.gentoo.org/viewtopic-t-420013-highlight-trident.html The user saw the correct vendor ID (AMD) in 2.4, but when upgrading to 2.6, it changed to Trident. I guess this is still likely to be a hardware bug, but it demonstrates that the Linux PCI layer has something to do with it (even if it is just triggering it somehow). Perhaps there is a significant different in the pcnet32.c files between the two versions. I also remember that there is some powerpc specific code in there related to MAC address detection. There are certainly differences in 2.4 and 2.6's version of the driver, maybe something is broken in the newer one when run on powerpc. I don't run gentoo and have no idea how to get a hold of the source of pcnet32.c from each of those two. It does seem odd that only the pcnet32 has the pci ID change, but at the same time, somehow the driver is recognizing it and loading at boot time, so the ID can't be wrong at that time. Does the ID get mangled as part of what makes the MAC addresses read incorrectly on your 2.6.14? The 2.4 system shows all the cards overriding the MAC based on the PROM, which I believe is what the driver code should do on a powerpc system. On 2.6 that appears to only happen on one of the cards. At least on that device (pci 01:01) appears to agree what the MAC should be in both cases. Perhaps the lspci being wrong is just a side effect of the real problem. Maybe the driver is broken and messing things up. Len Sorensen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html