[PATCH 0/3] cxgb3 driver update
Hi Jeff, I'm submitting three more patches for inclusion in netdev#upstream. These patches are built over the series I resent yesterday night. The patch numbering reflects the stacking. Here is a brief description: - avoid false positives in the xgmac hang workaround - Properly set the CQ_ERR bit in RDMA CQ contexts. - Update CQ context operations time out values Cheers, Divy - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 12/11] cxgb3 - remove false positive in xgmac workaround
From: Divy Le Ray [EMAIL PROTECTED] Qualify toggling of xgmac tx enable with not getting pause frames, we might not make forward progress because the peer is sending lots of pause frames. Signed-off-by: Divy Le Ray [EMAIL PROTECTED] --- drivers/net/cxgb3/common.h |1 + drivers/net/cxgb3/xgmac.c |4 +++- 2 files changed, 4 insertions(+), 1 deletions(-) diff --git a/drivers/net/cxgb3/common.h b/drivers/net/cxgb3/common.h index ff867c2..3e5b0db 100644 --- a/drivers/net/cxgb3/common.h +++ b/drivers/net/cxgb3/common.h @@ -514,6 +514,7 @@ struct cmac { u64 rx_mcnt; unsigned int toggle_cnt; unsigned int txen; + u64 rx_pause; struct mac_stats stats; }; diff --git a/drivers/net/cxgb3/xgmac.c b/drivers/net/cxgb3/xgmac.c index 1d1c391..ff9e9dc 100644 --- a/drivers/net/cxgb3/xgmac.c +++ b/drivers/net/cxgb3/xgmac.c @@ -452,6 +452,7 @@ int t3_mac_enable(struct cmac *mac, int which) A_XGM_TX_SPI4_SOP_EOP_CNT + oft))); mac-rx_mcnt = s-rx_frames; + mac-rx_pause = s-rx_pause; mac-rx_xcnt = (G_TXSPI4SOPCNT(t3_read_reg(adap, A_XGM_RX_SPI4_SOP_EOP_CNT + oft))); @@ -504,7 +505,7 @@ int t3b2_mac_watchdog_task(struct cmac *mac) tx_xcnt = 1;/* By default tx_xcnt is making progress */ tx_tcnt = mac-tx_tcnt; /* If tx_mcnt is progressing ignore tx_tcnt */ rx_xcnt = 1;/* By default rx_xcnt is making progress */ - if (tx_mcnt == mac-tx_mcnt) { + if (tx_mcnt == mac-tx_mcnt mac-rx_pause == s-rx_pause) { tx_xcnt = (G_TXSPI4SOPCNT(t3_read_reg(adap, A_XGM_TX_SPI4_SOP_EOP_CNT + mac-offset))); @@ -560,6 +561,7 @@ out: mac-tx_mcnt = s-tx_frames; mac-rx_xcnt = rx_xcnt; mac-rx_mcnt = s-rx_frames; + mac-rx_pause = s-rx_pause; if (status == 1) { t3_write_reg(adap, A_XGM_TX_CTRL + mac-offset, 0); t3_read_reg(adap, A_XGM_TX_CTRL + mac-offset); /* flush */ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 13/11] cxgb3 - Set the CQ_ERR bit in CQ contexts.
From: Divy Le Ray [EMAIL PROTECTED] The cxgb3 driver is incorrectly configuring the HW CQ context for CQ's that use overflow-avoidance. Namely the RDMA control CQ. This results in a bad DMA from the device to bus address 0. The solution is to set the CQ_ERR bit in the context for these types of CQs. Signed-off-by: Divy Le Ray [EMAIL PROTECTED] --- drivers/net/cxgb3/sge_defs.h |4 drivers/net/cxgb3/t3_hw.c|3 ++- 2 files changed, 6 insertions(+), 1 deletions(-) diff --git a/drivers/net/cxgb3/sge_defs.h b/drivers/net/cxgb3/sge_defs.h index 514869e..29b6c80 100644 --- a/drivers/net/cxgb3/sge_defs.h +++ b/drivers/net/cxgb3/sge_defs.h @@ -106,6 +106,10 @@ #define V_CQ_GEN(x) ((x) S_CQ_GEN) #define F_CQ_GENV_CQ_GEN(1U) +#define S_CQ_ERR30 +#define V_CQ_ERR(x) ((x) S_CQ_ERR) +#define F_CQ_ERRV_CQ_ERR(1U) + #define S_CQ_OVERFLOW_MODE31 #define V_CQ_OVERFLOW_MODE(x) ((x) S_CQ_OVERFLOW_MODE) #define F_CQ_OVERFLOW_MODEV_CQ_OVERFLOW_MODE(1U) diff --git a/drivers/net/cxgb3/t3_hw.c b/drivers/net/cxgb3/t3_hw.c index 538b254..9358959 100644 --- a/drivers/net/cxgb3/t3_hw.c +++ b/drivers/net/cxgb3/t3_hw.c @@ -2043,7 +2043,8 @@ int t3_sge_init_cqcntxt(struct adapter *adapter, unsigned int id, u64 base_addr, base_addr = 32; t3_write_reg(adapter, A_SG_CONTEXT_DATA2, V_CQ_BASE_HI((u32) base_addr) | V_CQ_RSPQ(rspq) | -V_CQ_GEN(1) | V_CQ_OVERFLOW_MODE(ovfl_mode)); +V_CQ_GEN(1) | V_CQ_OVERFLOW_MODE(ovfl_mode) | +V_CQ_ERR(ovfl_mode)); t3_write_reg(adapter, A_SG_CONTEXT_DATA3, V_CQ_CREDITS(credits) | V_CQ_CREDIT_THRES(credit_thres)); return t3_sge_write_context(adapter, id, F_CQ); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 14/11] cxgb3 - CQ context operations time out too soon.
From: Divy Le Ray [EMAIL PROTECTED] Currently, the driver only tries up to 5 times (5us) to get the results of a CQ context operation. Testing has shown the chip can take as much as 50us to return the response on SG_CONTEXT_CMD operations. So we up the retry count to 100 to cover high loads. Signed-off-by: Divy Le Ray [EMAIL PROTECTED] --- drivers/net/cxgb3/t3_hw.c | 19 +++ 1 files changed, 11 insertions(+), 8 deletions(-) diff --git a/drivers/net/cxgb3/t3_hw.c b/drivers/net/cxgb3/t3_hw.c index 9358959..8f6efdb 100644 --- a/drivers/net/cxgb3/t3_hw.c +++ b/drivers/net/cxgb3/t3_hw.c @@ -1867,6 +1867,8 @@ void t3_port_intr_clear(struct adapter *adapter, int idx) phy-ops-intr_clear(phy); } +#define SG_CONTEXT_CMD_ATTEMPTS 100 + /** * t3_sge_write_context - write an SGE context * @adapter: the adapter @@ -1886,7 +1888,7 @@ static int t3_sge_write_context(struct adapter *adapter, unsigned int id, t3_write_reg(adapter, A_SG_CONTEXT_CMD, V_CONTEXT_CMD_OPCODE(1) | type | V_CONTEXT(id)); return t3_wait_op_done(adapter, A_SG_CONTEXT_CMD, F_CONTEXT_CMD_BUSY, - 0, 5, 1); + 0, SG_CONTEXT_CMD_ATTEMPTS, 1); } /** @@ -2072,7 +2074,7 @@ int t3_sge_enable_ecntxt(struct adapter *adapter, unsigned int id, int enable) t3_write_reg(adapter, A_SG_CONTEXT_CMD, V_CONTEXT_CMD_OPCODE(1) | F_EGRESS | V_CONTEXT(id)); return t3_wait_op_done(adapter, A_SG_CONTEXT_CMD, F_CONTEXT_CMD_BUSY, - 0, 5, 1); + 0, SG_CONTEXT_CMD_ATTEMPTS, 1); } /** @@ -2096,7 +2098,7 @@ int t3_sge_disable_fl(struct adapter *adapter, unsigned int id) t3_write_reg(adapter, A_SG_CONTEXT_CMD, V_CONTEXT_CMD_OPCODE(1) | F_FREELIST | V_CONTEXT(id)); return t3_wait_op_done(adapter, A_SG_CONTEXT_CMD, F_CONTEXT_CMD_BUSY, - 0, 5, 1); + 0, SG_CONTEXT_CMD_ATTEMPTS, 1); } /** @@ -2120,7 +2122,7 @@ int t3_sge_disable_rspcntxt(struct adapter *adapter, unsigned int id) t3_write_reg(adapter, A_SG_CONTEXT_CMD, V_CONTEXT_CMD_OPCODE(1) | F_RESPONSEQ | V_CONTEXT(id)); return t3_wait_op_done(adapter, A_SG_CONTEXT_CMD, F_CONTEXT_CMD_BUSY, - 0, 5, 1); + 0, SG_CONTEXT_CMD_ATTEMPTS, 1); } /** @@ -2144,7 +2146,7 @@ int t3_sge_disable_cqcntxt(struct adapter *adapter, unsigned int id) t3_write_reg(adapter, A_SG_CONTEXT_CMD, V_CONTEXT_CMD_OPCODE(1) | F_CQ | V_CONTEXT(id)); return t3_wait_op_done(adapter, A_SG_CONTEXT_CMD, F_CONTEXT_CMD_BUSY, - 0, 5, 1); + 0, SG_CONTEXT_CMD_ATTEMPTS, 1); } /** @@ -2169,7 +2171,7 @@ int t3_sge_cqcntxt_op(struct adapter *adapter, unsigned int id, unsigned int op, t3_write_reg(adapter, A_SG_CONTEXT_CMD, V_CONTEXT_CMD_OPCODE(op) | V_CONTEXT(id) | F_CQ); if (t3_wait_op_done_val(adapter, A_SG_CONTEXT_CMD, F_CONTEXT_CMD_BUSY, - 0, 5, 1, val)) + 0, SG_CONTEXT_CMD_ATTEMPTS, 1, val)) return -EIO; if (op = 2 op 7) { @@ -2179,7 +2181,8 @@ int t3_sge_cqcntxt_op(struct adapter *adapter, unsigned int id, unsigned int op, t3_write_reg(adapter, A_SG_CONTEXT_CMD, V_CONTEXT_CMD_OPCODE(0) | F_CQ | V_CONTEXT(id)); if (t3_wait_op_done(adapter, A_SG_CONTEXT_CMD, - F_CONTEXT_CMD_BUSY, 0, 5, 1)) + F_CONTEXT_CMD_BUSY, 0, + SG_CONTEXT_CMD_ATTEMPTS, 1)) return -EIO; return G_CQ_INDEX(t3_read_reg(adapter, A_SG_CONTEXT_DATA0)); } @@ -2205,7 +2208,7 @@ static int t3_sge_read_context(unsigned int type, struct adapter *adapter, t3_write_reg(adapter, A_SG_CONTEXT_CMD, V_CONTEXT_CMD_OPCODE(0) | type | V_CONTEXT(id)); if (t3_wait_op_done(adapter, A_SG_CONTEXT_CMD, F_CONTEXT_CMD_BUSY, 0, - 5, 1)) + SG_CONTEXT_CMD_ATTEMPTS, 1)) return -EIO; data[0] = t3_read_reg(adapter, A_SG_CONTEXT_DATA0); data[1] = t3_read_reg(adapter, A_SG_CONTEXT_DATA1); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] cxgb3 driver update
On Wed, Aug 22, 2007 at 11:35:20PM -0700, Divy Le Ray wrote: Hi Jeff, I'm submitting three more patches for inclusion in netdev#upstream. These patches are built over the series I resent yesterday night. The patch numbering reflects the stacking. Here is a brief description: - avoid false positives in the xgmac hang workaround - Properly set the CQ_ERR bit in RDMA CQ contexts. - Update CQ context operations time out values Speaking of cxgb3, could you explain what the hell is static int do_term(struct t3cdev *dev, struct sk_buff *skb) { unsigned int hwtid = ntohl(skb-priority) 8 0xf; doing? AFAIK, skb-priority is not net-endian... Another odd place is int t3_seeprom_write(struct adapter *adapter, u32 addr, u32 data) { u16 val; int attempts = EEPROM_MAX_POLL; unsigned int base = adapter-params.pci.vpd_cap_addr; if ((addr = EEPROMSIZE addr != EEPROM_STAT_ADDR) || (addr 3)) return -EINVAL; pci_write_config_dword(adapter-pdev, base + PCI_VPD_DATA, cpu_to_le32(data)); with callers like int t3_seeprom_wp(struct adapter *adapter, int enable) { return t3_seeprom_write(adapter, EEPROM_STAT_ADDR, enable ? 0xc : 0); IOW, you really get little-endian values passed to pci_write_config_dword() and it expects a host-endian as the last argument... - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: eHEA driver issues from net-2.6.24
On Thursday 23 August 2007 00:20, Andrew Theurer wrote: David Miller wrote: From: Andrew Theurer [EMAIL PROTECTED] Date: Wed, 22 Aug 2007 16:55:03 -0500 Thanks for finally getting to test this, I thought nobody would test this until it got merged into 2.6.24 :-/ Yes, sorry for the delay. kernel BUG at include/linux/netdevice.h:318! enter ? for help [cf613e40] c03fe394 .net_rx_action+0x1b8/0x254 [cf613ef0] c0057b70 .__do_softirq+0xa8/0x164 [cf613f90] c0024438 .call_do_softirq+0x14/0x24 [c00b8ffbf9f0] c000bd30 .do_softirq+0x68/0xac [c00b8ffbfa80] c0057cc4 .irq_exit+0x54/0x6c [c00b8ffbfb00] c000c358 .do_IRQ+0x170/0x1ac [c00b8ffbfb90] c0004780 hardware_interrupt_entry+0x18/0x98 --- Exception: 501 (Hardware Interrupt) at c0010bdc .cpu_idle+0x114/0x1e0 [c00b8ffbfe80] c0010bd0 .cpu_idle+0x108/0x1e0 (unreliable) [c00b8ffbff00] c0026db0 .start_secondary+0x160/0x184 [c00b8ffbff90] c0008364 .start_secondary_prolog+0xc/0x10 I'm a little confused if the port_napi_enable() is being called when the device is initialized, but then again, this is all new to me (should it be called in ehea_open?). I see it called on some reset routines, but not on the first initialization. This is similar to the problem that Arnaldo hit a few minutes ago in the VIA Rhine driver. You can't only make a napi_enable() call when there has been a previous napi_disable(). One way to fix this would be to forcefully napi_disable() on all the per-port NAPI structs at the beginning of ehea_open(), which should set things up to satisfy the pre-condition of the napi_enable() calls. OK, Ill try this. Let me fix this. I'll try to get it done today. You'll need to audit the entire driver to make sure this invariant is held properly. Also, on this code, in ehea_sense_port_attr() /* Number of default QPs */ if (use_mcs) port-num_def_qps = cb0-num_default_qps; else port-num_def_qps = 1; When using napi, since we have multi-queue napi support now, wouldn't we want to use all the default qps instead of 1? I don't know how this hardware works, you tell me :-) Heh, I don't know it well, either. Maybe Jan Bernd can chime in. We'd like to keep the possibility to switch back to a single queue for now. However, we could activate multi queue support as default now. I'll include this in the patch. Thanks for your help, -Andrew - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
UDPv4 port allocation problem
Hello, I noticed that it is possible that the kernel allocates the same UDP port to an application that was used and closed immediately before the new application got it. This means that applications that do not specify an exact port and rely on the kernel to allocate a port for them might see traffic originally meant for another application. Imagine that two applications want to resolve a name in DNS at about the same time. The following happens: * first app sends out the DNS query then closes the socket without waiting for an answer (e.g. it got interrupted by Ctrl+C) * second app opens an UDP socket, and gets the same port, originally assigned to app#1, sends out the DNS query * DNS server responds, the response goes to app#2 DNS might not be the perfect example, but you get the idea. Applications do not expect to receive data on newly opened sockets, not to mention the security implications. TCP on the other hand increases the allocated port number for each new socket, the same behaviour for UDP would add certain amount of time that decreases this risk. Is the current behaviour intended? Regards, Laszlo Attila Toth - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: eHEA driver issues from net-2.6.24
From: Jan-Bernd Themann [EMAIL PROTECTED] Date: Thu, 23 Aug 2007 08:55:29 +0200 We'd like to keep the possibility to switch back to a single queue for now. Please do not do this, we already have way too much configurability out there. If you have the physical hardware queues enabled, use multiqueue napi support. If you add a knob to use or not use multi-napi, this makes life more miserable for your users and your driver more complicated and harder to maintain. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: eHEA driver issues from net-2.6.24
Hi David, On Thursday 23 August 2007 10:17, David Miller wrote: From: Jan-Bernd Themann [EMAIL PROTECTED] Date: Thu, 23 Aug 2007 08:55:29 +0200 We'd like to keep the possibility to switch back to a single queue for now. Please do not do this, we already have way too much configurability out there. ok, we decided to remove the switch for kernel 2.6.24 Regards, Jan-Bernd - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -mm] ath5k: remove sysctl(2) support
sysctl(2) is supported but frozen. Signed-off-by: Alexey Dobriyan [EMAIL PROTECTED] --- drivers/net/wireless/ath5k_base.c | 21 ++--- 1 file changed, 6 insertions(+), 15 deletions(-) --- a/drivers/net/wireless/ath5k_base.c +++ b/drivers/net/wireless/ath5k_base.c @@ -2438,21 +2438,12 @@ static struct pci_driver ath_pci_drv_id = { .resume = ath_pci_resume, }; -/* - * Static (i.e. global) sysctls. Note that the hal sysctls - * are located under ours by sharing the setting for DEV_ATH. - */ -enum { - DEV_ATH = 9,/* XXX known by hal */ -}; - static int mincalibrate = 1; static int maxcalibrate = INT_MAX / 1000; -#defineCTL_AUTO-2 /* cannot be CTL_ANY or CTL_NONE */ static ctl_table ath_static_sysctls[] = { #if AR_DEBUG - { .ctl_name = CTL_AUTO, + { .procname = debug, .mode = 0644, .data = ath_debug, @@ -2460,28 +2451,28 @@ static ctl_table ath_static_sysctls[] = { .proc_handler = proc_dointvec }, #endif - { .ctl_name = CTL_AUTO, + { .procname = countrycode, .mode = 0444, .data = countrycode, .maxlen = sizeof(countrycode), .proc_handler = proc_dointvec }, - { .ctl_name = CTL_AUTO, + { .procname = outdoor, .mode = 0444, .data = outdoor, .maxlen = sizeof(outdoor), .proc_handler = proc_dointvec }, - { .ctl_name = CTL_AUTO, + { .procname = xchanmode, .mode = 0444, .data = xchanmode, .maxlen = sizeof(xchanmode), .proc_handler = proc_dointvec }, - { .ctl_name = CTL_AUTO, + { .procname = calibrate, .mode = 0644, .data = ath_calinterval, @@ -2493,7 +2484,7 @@ static ctl_table ath_static_sysctls[] = { { 0 } }; static ctl_table ath_ath_table[] = { - { .ctl_name = DEV_ATH, + { .procname = ath, .mode = 0555, .child= ath_static_sysctls - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH (take 2)] request_irq fix DEBUG_SHIRQ handling Re: 2.6.23-rc2-mm1: rtl8139 inconsistent lock state
Andrew Morton pointed out that my changelog was unusable. Sorry! Here is a second try with the changelog and kernel version changed. Regards, Jarek P. (take 2) Subject: request_irq() - fix DEBUG_SHIRQ handling Mariusz Kozlowski reported lockdep's warning: = [ INFO: inconsistent lock state ] 2.6.23-rc2-mm1 #7 - inconsistent {in-hardirq-W} - {hardirq-on-W} usage. ifconfig/5492 [HC0[0]:SC0[0]:HE1:SE1] takes: (tp-lock){+...}, at: [de8706e0] rtl8139_interrupt+0x27/0x46b [8139too] {in-hardirq-W} state was registered at: [c0138eeb] __lock_acquire+0x949/0x11ac [c01397e7] lock_acquire+0x99/0xb2 [c0452ff3] _spin_lock+0x35/0x42 [de8706e0] rtl8139_interrupt+0x27/0x46b [8139too] [c0147a5d] handle_IRQ_event+0x28/0x59 [c01493ca] handle_level_irq+0xad/0x10b [c0105a13] do_IRQ+0x93/0xd0 [c010441e] common_interrupt+0x2e/0x34 ... other info that might help us debug this: 1 lock held by ifconfig/5492: #0: (rtnl_mutex){--..}, at: [c0451778] mutex_lock+0x1c/0x1f stack backtrace: ... [c0452ff3] _spin_lock+0x35/0x42 [de8706e0] rtl8139_interrupt+0x27/0x46b [8139too] [c01480fd] free_irq+0x11b/0x146 [de871d59] rtl8139_close+0x8a/0x14a [8139too] [c03bde63] dev_close+0x57/0x74 ... This shows that a driver's irq handler was running both in hard interrupt and process contexts with irqs enabled. The latter was done during free_irq() call and was possible only with CONFIG_DEBUG_SHIRQ enabled. This was fixed by another patch. But similar problem is possible with request_irq(): any locks taken from irq handler could be vulnerable - especially with soft interrupts. This patch fixes it by disabling local interrupts during handler's run. (It seems, disabling softirqs should be enough, but it needs more checking on possible races or other special cases). This patch is recommended to all stable versions since 2.6.21, too. Reported-by: Mariusz Kozlowski [EMAIL PROTECTED] Signed-off-by: Jarek Poplawski [EMAIL PROTECTED] --- diff -Nurp 2.6.23-rc3-git6-/kernel/irq/manage.c 2.6.23-rc3-git6/kernel/irq/manage.c --- 2.6.23-rc3-git6-/kernel/irq/manage.c2007-08-23 10:11:35.0 +0200 +++ 2.6.23-rc3-git6/kernel/irq/manage.c 2007-08-23 10:16:29.0 +0200 @@ -555,14 +555,11 @@ int request_irq(unsigned int irq, irq_ha * We do this before actually registering it, to make sure that * a 'real' IRQ doesn't run in parallel with our fake */ - if (irqflags IRQF_DISABLED) { - unsigned long flags; + unsigned long flags; - local_irq_save(flags); - handler(irq, dev_id); - local_irq_restore(flags); - } else - handler(irq, dev_id); + local_irq_save(flags); + handler(irq, dev_id); + local_irq_restore(flags); } #endif - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] E1000: Fix ifdown hang in git-2.6.24
Doing napi_disable twice hangs ifdown of the device. e1000_down is the common place to call napi_disable. Signed-off-by: Krishna Kumar [EMAIL PROTECTED] --- e1000_main.c |4 1 files changed, 4 deletions(-) diff -ruNp org/drivers/net/e1000/e1000_main.c new/drivers/net/e1000/e1000_main.c --- org/drivers/net/e1000/e1000_main.c 2007-08-23 13:32:16.0 +0530 +++ new/drivers/net/e1000/e1000_main.c 2007-08-23 13:32:34.0 +0530 @@ -1477,10 +1477,6 @@ e1000_close(struct net_device *netdev) { struct e1000_adapter *adapter = netdev_priv(netdev); -#ifdef CONFIG_E1000_NAPI - napi_disable(adapter-napi); -#endif - WARN_ON(test_bit(__E1000_RESETTING, adapter-flags)); e1000_down(adapter); e1000_power_down_phy(adapter); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] [RFC] E1000: Fix hang in netdev_wait_allrefs()
After applying patch1, I started getting waiting for count messages when doing ifdown. Not sure if this is the right fix since the count was already showing as -1 in that message, but this patch fixes the problem. Signed-off-by: Krishna Kumar [EMAIL PROTECTED] --- e1000_main.c |3 ++- 1 files changed, 2 insertions(+), 1 deletion(-) diff -ruNp new/drivers/net/e1000/e1000_main.c new2/drivers/net/e1000/e1000_main.c --- new/drivers/net/e1000/e1000_main.c 2007-08-23 13:32:34.0 +0530 +++ new2/drivers/net/e1000/e1000_main.c 2007-08-23 14:28:12.0 +0530 @@ -1219,12 +1219,13 @@ e1000_remove(struct pci_dev *pdev) * would have already happened in close and is redundant. */ e1000_release_hw_control(adapter); - unregister_netdev(netdev); #ifdef CONFIG_E1000_NAPI for (i = 0; i adapter-num_rx_queues; i++) dev_put(adapter-polling_netdev[i]); #endif + unregister_netdev(netdev); + if (!e1000_check_phy_reset_block(adapter-hw)) e1000_phy_hw_reset(adapter-hw); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/9] define global BIT macro
On Sat, Aug 18, 2007 at 11:44:12AM +0200, Jiri Slaby wrote: define global BIT macro move all local BIT defines to the new globally define macro. Signed-off-by: Jiri Slaby [EMAIL PROTECTED] Acked-by: Ralf Baechle [EMAIL PROTECTED] for the MACE ethernet and MIPS bits. Ralf - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1] NFS: change the ip_map cache code to handle IPv6 addresses
According to Neil's comments, I have tried to correct the mistakes of my first sending Thank you for these comments Neil. This is a small part of missing pieces of IPv6 support for the server. It deals with the ip_map caching code part. It changes the ip_map structure to be able to store INET6 addresses. It adds also the changes in address hashing, and mapping to test it with INET addresses. Signed-off-by: Aurelien Charbon [EMAIL PROTECTED] --- fs/nfsd/export.c | 10 ++- fs/nfsd/nfsctl.c | 21 ++- include/linux/sunrpc/svcauth.h |4 - include/net/ipv6.h | 17 + net/sunrpc/svcauth_unix.c | 121 - 5 files changed, 129 insertions(+), 44 deletions(-) diff -p -u -r -N linux-2.6.23-rc3/fs/nfsd/export.c linux-2.6.23-rc3-IPv6-ipmap-cache/fs/nfsd/export.c --- linux-2.6.23-rc3/fs/nfsd/export.c2007-08-23 13:18:16.0 +0200 +++ linux-2.6.23-rc3-IPv6-ipmap-cache/fs/nfsd/export.c2007-08-23 13:51:08.0 +0200 @@ -35,6 +35,7 @@ #include linux/lockd/bind.h #include linux/sunrpc/msg_prot.h #include linux/sunrpc/gss_api.h +#include net/ipv6.h #define NFSDDBG_FACILITYNFSDDBG_EXPORT @@ -1559,6 +1560,7 @@ exp_addclient(struct nfsctl_client *ncp) { struct auth_domain*dom; inti, err; +struct in6_addr addr6; /* First, consistency check. */ err = -EINVAL; @@ -1577,9 +1579,11 @@ exp_addclient(struct nfsctl_client *ncp) goto out_unlock; /* Insert client into hashtable. */ -for (i = 0; i ncp-cl_naddr; i++) -auth_unix_add_addr(ncp-cl_addrlist[i], dom); - +for (i = 0; i ncp-cl_naddr; i++) { +/* Mapping address */ +ipv6_addr_map(ncp-cl_addrlist[i], addr6); +auth_unix_add_addr(addr6, dom); +} auth_unix_forget_old(dom); auth_domain_put(dom); diff -p -u -r -N linux-2.6.23-rc3/fs/nfsd/nfsctl.c linux-2.6.23-rc3-IPv6-ipmap-cache/fs/nfsd/nfsctl.c --- linux-2.6.23-rc3/fs/nfsd/nfsctl.c2007-08-23 13:18:16.0 +0200 +++ linux-2.6.23-rc3-IPv6-ipmap-cache/fs/nfsd/nfsctl.c2007-08-23 13:25:28.0 +0200 @@ -222,7 +222,7 @@ static ssize_t write_getfs(struct file * struct auth_domain *clp; int err = 0; struct knfsd_fh *res; - +struct in6_addr in6; if (size sizeof(*data)) return -EINVAL; data = (struct nfsctl_fsparm*)buf; @@ -236,7 +236,14 @@ static ssize_t write_getfs(struct file * res = (struct knfsd_fh*)buf; exp_readlock(); -if (!(clp = auth_unix_lookup(sin-sin_addr))) + +/* IPv6 address mapping */ +in6.s6_addr32[0] = 0; +in6.s6_addr32[1] = 0; +in6.s6_addr32[2] = htonl(0x); +in6.s6_addr32[3] = (uint32_t)sin-sin_addr.s_addr; + +if (!(clp = auth_unix_lookup(in6))) err = -EPERM; else { err = exp_rootfh(clp, data-gd_path, res, data-gd_maxlen); @@ -253,6 +260,7 @@ static ssize_t write_getfd(struct file * { struct nfsctl_fdparm *data; struct sockaddr_in *sin; +struct in6_addr in6; struct auth_domain *clp; int err = 0; struct knfsd_fh fh; @@ -271,7 +279,14 @@ static ssize_t write_getfd(struct file * res = buf; sin = (struct sockaddr_in *)data-gd_addr; exp_readlock(); -if (!(clp = auth_unix_lookup(sin-sin_addr))) + +/* IPv6 address mapping */ +in6.s6_addr32[0] = 0; +in6.s6_addr32[1] = 0; +in6.s6_addr32[2] = htonl(0x); +in6.s6_addr32[3] = (uint32_t)sin-sin_addr.s_addr; + +if (!(clp = auth_unix_lookup(in6))) err = -EPERM; else { err = exp_rootfh(clp, data-gd_path, fh, NFS_FHSIZE); diff -p -u -r -N linux-2.6.23-rc3/include/linux/sunrpc/svcauth.h linux-2.6.23-rc3-IPv6-ipmap-cache/include/linux/sunrpc/svcauth.h --- linux-2.6.23-rc3/include/linux/sunrpc/svcauth.h2007-08-23 13:18:21.0 +0200 +++ linux-2.6.23-rc3-IPv6-ipmap-cache/include/linux/sunrpc/svcauth.h 2007-08-23 13:25:28.0 +0200 @@ -120,10 +120,10 @@ extern voidsvc_auth_unregister(rpc_auth extern struct auth_domain *unix_domain_find(char *name); extern void auth_domain_put(struct auth_domain *item); -extern int auth_unix_add_addr(struct in_addr addr, struct auth_domain *dom); +extern int auth_unix_add_addr(struct in6_addr addr, struct auth_domain *dom); extern struct auth_domain *auth_domain_lookup(char *name, struct auth_domain *new); extern struct auth_domain *auth_domain_find(char *name); -extern struct auth_domain *auth_unix_lookup(struct in_addr addr); +extern struct auth_domain *auth_unix_lookup(struct in6_addr addr); extern int auth_unix_forget_old(struct auth_domain *dom); extern void svcauth_unix_purge(void); extern void svcauth_unix_info_release(void *); diff -p -u -r -N linux-2.6.23-rc3/include/net/ipv6.h linux-2.6.23-rc3-IPv6-ipmap-cache/include/net/ipv6.h --- linux-2.6.23-rc3/include/net/ipv6.h2007-08-23 13:18:23.0 +0200 +++ linux-2.6.23-rc3-IPv6-ipmap-cache/include/net/ipv6.h2007-08-23
[PATCH net-2.6.24] introduce MAC_FMT/MAC_ARG
The two different wireless code bases both define macros to ease printing MAC addresses: printk(KERN_INFO MAC address is MAC_FMT \n, MAC_ARG(addr)); This patch moves those macros to if_ether.h and uses them all over the tree. Signed-off-by: Johannes Berg [EMAIL PROTECTED] --- drivers/net/3c505.c |4 +--- drivers/net/8139cp.c| 11 ++- drivers/net/82596.c |4 ++-- drivers/net/a2065.c |4 +--- drivers/net/acenic.c|6 ++ drivers/net/ariadne.c |4 +--- drivers/net/dl2k.c |6 ++ drivers/net/forcedeth.c | 11 --- drivers/net/hp100.c |5 ++--- drivers/net/hydra.c |6 ++ drivers/net/ibmlana.c |6 ++ drivers/net/ioc3-eth.c |5 ++--- drivers/net/lguest_net.c|3 +-- drivers/net/lib82596.c |4 ++-- drivers/net/macb.c |6 ++ drivers/net/meth.c |4 +--- drivers/net/mv643xx_eth.c |5 ++--- drivers/net/mvme147.c |7 ++- drivers/net/myri_sbus.c |6 ++ drivers/net/ns83820.c |9 +++-- drivers/net/pasemi_mac.c|5 ++--- drivers/net/ps3_gelic_net.c |6 ++ drivers/net/qla3xxx.c |6 ++ drivers/net/rionet.c|5 ++--- drivers/net/s2io.c | 10 ++ drivers/net/skge.c |6 ++ drivers/net/sky2.c |6 ++ drivers/net/tsi108_eth.c|6 ++ drivers/net/zorro8390.c |6 ++ include/linux/etherdevice.h |1 + include/linux/if_ether.h|5 + include/net/ieee80211.h |5 - include/net/mac80211.h |4 33 files changed, 62 insertions(+), 125 deletions(-) --- netdev-2.6.orig/drivers/net/3c505.c 2007-08-22 20:33:10.921906163 +0200 +++ netdev-2.6/drivers/net/3c505.c 2007-08-22 20:40:01.011906163 +0200 @@ -1540,9 +1540,7 @@ static int __init elplus_setup(struct ne */ printk(KERN_INFO %s: 3c505 at %#lx, irq %d, dma %d, , dev-name, dev-base_addr, dev-irq, dev-dma); - printk(addr %02x:%02x:%02x:%02x:%02x:%02x, , - dev-dev_addr[0], dev-dev_addr[1], dev-dev_addr[2], - dev-dev_addr[3], dev-dev_addr[4], dev-dev_addr[5]); + printk(addr MAC_FMT , , MAC_ARG(dev-dev_addr)); /* * read more information from the adapter --- netdev-2.6.orig/drivers/net/8139cp.c2007-08-22 20:33:10.931906163 +0200 +++ netdev-2.6/drivers/net/8139cp.c 2007-08-22 20:40:01.011906163 +0200 @@ -1961,15 +1961,8 @@ static int cp_init_one (struct pci_dev * if (rc) goto err_out_iomap; - printk (KERN_INFO %s: RTL-8139C+ at 0x%lx, - %02x:%02x:%02x:%02x:%02x:%02x, - IRQ %d\n, - dev-name, - dev-base_addr, - dev-dev_addr[0], dev-dev_addr[1], - dev-dev_addr[2], dev-dev_addr[3], - dev-dev_addr[4], dev-dev_addr[5], - dev-irq); + printk (KERN_INFO %s: RTL-8139C+ at 0x%lx, MAC_FMT , IRQ %d\n, + dev-name, dev-base_addr, MAC_ARG(dev-dev_addr), dev-irq); pci_set_drvdata(pdev, dev); --- netdev-2.6.orig/drivers/net/82596.c 2007-08-22 20:33:10.941906163 +0200 +++ netdev-2.6/drivers/net/82596.c 2007-08-22 20:40:01.021906163 +0200 @@ -1561,8 +1561,8 @@ static void set_multicast_list(struct ne for (dmi = dev-mc_list; cnt dmi != NULL; dmi = dmi-next, cnt--, cp += 6) { memcpy(cp, dmi-dmi_addr, 6); if (i596_debug 1) - DEB(DEB_MULTI,printk(KERN_INFO %s: Adding address %02x:%02x:%02x:%02x:%02x:%02x\n, - dev-name, cp[0],cp[1],cp[2],cp[3],cp[4],cp[5])); + DEB(DEB_MULTI,printk(KERN_INFO %s: Adding address MAC_FMT \n, + dev-name, MAC_ARG(cp)); } i596_add_cmd(dev, cmd-cmd); } --- netdev-2.6.orig/drivers/net/a2065.c 2007-08-22 20:33:10.991906163 +0200 +++ netdev-2.6/drivers/net/a2065.c 2007-08-22 20:40:01.031906163 +0200 @@ -802,9 +802,7 @@ static int __devinit a2065_init_one(stru zorro_set_drvdata(z, dev); printk(KERN_INFO %s: A2065 at 0x%08lx, Ethernet Address - %02x:%02x:%02x:%02x:%02x:%02x\n, dev-name, board, - dev-dev_addr[0], dev-dev_addr[1], dev-dev_addr[2], - dev-dev_addr[3], dev-dev_addr[4], dev-dev_addr[5]); + MAC_FMT \n, dev-name, board, MAC_ARG(dev-dev_addr)); return 0; } --- netdev-2.6.orig/drivers/net/acenic.c2007-08-22 20:33:10.991906163 +0200 +++ netdev-2.6/drivers/net/acenic.c 2007-08-22 20:40:01.031906163 +0200 @@ -1013,10 +1013,6 @@ static int __devinit ace_init(struct net writel(mac1, regs-MacAddrHi); writel(mac2, regs-MacAddrLo); -
Re: [RFC IPROUTE]: Add flow classifier support
David Miller wrote: From: Stephen Hemminger [EMAIL PROTECTED] Date: Wed, 22 Aug 2007 10:46:15 -0700 This patch is on hold since the netlink changes haven't made it upstream yet. I don't have the kernel side in my queue either, perhaps I lost it or I didn't see it when it was sent out. Patrick? I didn't send it since I wasn't completely happy with it. Not sure if I ever finished it, I'll look into it :) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] net/core: Fix crash in dev_mc_sync()/dev_mc_unsync()
Benjamin Thery wrote: From: [EMAIL PROTECTED] Subject: net/core: Fix crash in dev_mc_sync()/dev_mc_unsync() This patch fixes a crash that may occur when the routine dev_mc_sync() deletes an address from the list it is currently going through. It saves the pointer to the next element before deleting the current one. The problem may also exist in dev_mc_unsync(). Signed-off-by: Benjamin Thery [EMAIL PROTECTED] Looks good, thanks Benjamin. Acked-by: Patrick McHardy [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [-MM, FIX] e1000e: incorporate napi_struct changes from net-2.6.24.git
This incorporates the new napi_struct changes into e1000e. Included bugfix for ifdown hang from Krishna Kumar for e1000. Signed-off-by: Auke Kok [EMAIL PROTECTED] --- drivers/net/e1000e/e1000.h |2 ++ drivers/net/e1000e/netdev.c | 35 --- 2 files changed, 18 insertions(+), 19 deletions(-) diff --git a/drivers/net/e1000e/e1000.h b/drivers/net/e1000e/e1000.h index e3cd877..ea6a9fe 100644 --- a/drivers/net/e1000e/e1000.h +++ b/drivers/net/e1000e/e1000.h @@ -196,6 +196,8 @@ struct e1000_adapter { struct e1000_ring *tx_ring /* One per active queue */ cacheline_aligned_in_smp; + struct napi_struct napi; + unsigned long tx_queue_len; unsigned int restart_queue; u32 txd_cmd; diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c index 8ebe238..0e35d0a 100644 --- a/drivers/net/e1000e/netdev.c +++ b/drivers/net/e1000e/netdev.c @@ -1149,12 +1149,12 @@ static irqreturn_t e1000_intr_msi(int irq, void *data) mod_timer(adapter-watchdog_timer, jiffies + 1); } - if (netif_rx_schedule_prep(netdev)) { + if (netif_rx_schedule_prep(netdev, adapter-napi)) { adapter-total_tx_bytes = 0; adapter-total_tx_packets = 0; adapter-total_rx_bytes = 0; adapter-total_rx_packets = 0; - __netif_rx_schedule(netdev); + __netif_rx_schedule(netdev, adapter-napi); } else { atomic_dec(adapter-irq_sem); } @@ -1212,12 +1212,12 @@ static irqreturn_t e1000_intr(int irq, void *data) mod_timer(adapter-watchdog_timer, jiffies + 1); } - if (netif_rx_schedule_prep(netdev)) { + if (netif_rx_schedule_prep(netdev, adapter-napi)) { adapter-total_tx_bytes = 0; adapter-total_tx_packets = 0; adapter-total_rx_bytes = 0; adapter-total_rx_packets = 0; - __netif_rx_schedule(netdev); + __netif_rx_schedule(netdev, adapter-napi); } else { atomic_dec(adapter-irq_sem); } @@ -1663,10 +1663,10 @@ set_itr_now: * e1000_clean - NAPI Rx polling callback * @adapter: board private structure **/ -static int e1000_clean(struct net_device *poll_dev, int *budget) +static int e1000_clean(struct napi_struct *napi, int budget) { - struct e1000_adapter *adapter; - int work_to_do = min(*budget, poll_dev-quota); + struct e1000_adapter *adapter = container_of(napi, struct e1000_adapter, napi); + struct net_device *poll_dev = adapter-netdev; int tx_cleaned = 0, work_done = 0; /* Must NOT use netdev_priv macro here. */ @@ -1685,17 +1685,15 @@ static int e1000_clean(struct net_device *poll_dev, int *budget) spin_unlock(adapter-tx_queue_lock); } - adapter-clean_rx(adapter, work_done, work_to_do); - *budget -= work_done; - poll_dev-quota -= work_done; + adapter-clean_rx(adapter, work_done, budget); /* If no Tx and not enough Rx work done, exit the polling mode */ - if ((!tx_cleaned (work_done == 0)) || + if ((tx_cleaned (work_done budget)) || !netif_running(poll_dev)) { quit_polling: if (adapter-itr_setting 3) e1000_set_itr(adapter); - netif_rx_complete(poll_dev); + netif_rx_complete(poll_dev, napi); if (test_bit(__E1000_DOWN, adapter-state)) atomic_dec(adapter-irq_sem); else @@ -1703,7 +1701,7 @@ quit_polling: return 0; } - return 1; + return work_done; } static void e1000_vlan_rx_add_vid(struct net_device *netdev, u16 vid) @@ -2441,7 +2439,7 @@ int e1000e_up(struct e1000_adapter *adapter) clear_bit(__E1000_DOWN, adapter-state); - netif_poll_enable(adapter-netdev); + napi_enable(adapter-napi); e1000_irq_enable(adapter); /* fire a link change interrupt to start the watchdog */ @@ -2474,7 +2472,7 @@ void e1000e_down(struct e1000_adapter *adapter) e1e_flush(); msleep(10); - netif_poll_disable(netdev); + napi_disable(adapter-napi); e1000_irq_disable(adapter); del_timer_sync(adapter-watchdog_timer); @@ -2607,7 +2605,7 @@ static int e1000_open(struct net_device *netdev) /* From here on the code is the same as e1000e_up() */ clear_bit(__E1000_DOWN, adapter-state); - netif_poll_enable(netdev); + napi_enable(adapter-napi); e1000_irq_enable(adapter); @@ -4102,8 +4100,7 @@ static int __devinit e1000_probe(struct pci_dev *pdev, e1000e_set_ethtool_ops(netdev); netdev-tx_timeout = e1000_tx_timeout; netdev-watchdog_timeo = 5 * HZ; - netdev-poll
Re: [PATCH 1/1] NFS: change the ip_map cache code to handle IPv6 addresses
Hi Aurelien, Aurélien Charbon wrote: According to Neil's comments, I have tried to correct the mistakes of my first sending I have some more comments. @@ -1559,6 +1560,7 @@ exp_addclient(struct nfsctl_client *ncp) { struct auth_domain*dom; inti, err; +struct in6_addr addr6; Indentation looks wrong. diff -p -u -r -N linux-2.6.23-rc3/fs/nfsd/nfsctl.c linux-2.6.23-rc3-IPv6-ipmap-cache/fs/nfsd/nfsctl.c --- linux-2.6.23-rc3/fs/nfsd/nfsctl.c2007-08-23 13:18:16.0 +0200 +++ linux-2.6.23-rc3-IPv6-ipmap-cache/fs/nfsd/nfsctl.c2007-08-23 13:25:28.0 +0200 @@ -222,7 +222,7 @@ static ssize_t write_getfs(struct file * struct auth_domain *clp; int err = 0; struct knfsd_fh *res; - +struct in6_addr in6; Indentation. if (size sizeof(*data)) return -EINVAL; data = (struct nfsctl_fsparm*)buf; @@ -236,7 +236,14 @@ static ssize_t write_getfs(struct file * res = (struct knfsd_fh*)buf; exp_readlock(); -if (!(clp = auth_unix_lookup(sin-sin_addr))) + +/* IPv6 address mapping */ +in6.s6_addr32[0] = 0; +in6.s6_addr32[1] = 0; +in6.s6_addr32[2] = htonl(0x); +in6.s6_addr32[3] = (uint32_t)sin-sin_addr.s_addr; Why didn't you use your new ipv6_addr_map() inline here? @@ -253,6 +260,7 @@ static ssize_t write_getfd(struct file * { struct nfsctl_fdparm *data; struct sockaddr_in *sin; +struct in6_addr in6; Indentation. @@ -271,7 +279,14 @@ static ssize_t write_getfd(struct file * res = buf; sin = (struct sockaddr_in *)data-gd_addr; exp_readlock(); -if (!(clp = auth_unix_lookup(sin-sin_addr))) + +/* IPv6 address mapping */ +in6.s6_addr32[0] = 0; +in6.s6_addr32[1] = 0; +in6.s6_addr32[2] = htonl(0x); +in6.s6_addr32[3] = (uint32_t)sin-sin_addr.s_addr; Why didn't you use your new ipv6_addr_map() inline here too? diff -p -u -r -N linux-2.6.23-rc3/include/net/ipv6.h linux-2.6.23-rc3-IPv6-ipmap-cache/include/net/ipv6.h --- linux-2.6.23-rc3/include/net/ipv6.h2007-08-23 13:18:23.0 +0200 +++ linux-2.6.23-rc3-IPv6-ipmap-cache/include/net/ipv6.h2007-08-23 13:25:28.0 +0200 @@ -21,6 +21,7 @@ #include net/ndisc.h #include net/flow.h #include net/snmp.h +#include linux/in.h #define SIN6_LEN_RFC213324 @@ -167,6 +168,12 @@ DECLARE_SNMP_STAT(struct udp_mib, udplit if (is_udplite) SNMP_INC_STATS_USER(udplite_stats_in6, field); \ elseSNMP_INC_STATS_USER(udp_stats_in6, field);} while(0) +#define IS_ADDR_MAPPED(a) \ +(((uint32_t *) (a))[0] == 0\ + ((uint32_t *) (a))[1] == 0\ + (((uint32_t *) (a))[2] == 0\ +|| ((uint32_t *) (a))[2] == htonl(0x))) I need to update a patch of mine that added a v4-mapped inline, let me send that out. In the kernel you should use u32 too, is that why you needed to include linux/net.h? +/* Maps a IPv4 address into a wright IPv6 address */ +static inline int ipv6_addr_map(const struct in_addr a1, struct in6_addr a2) +{ +a2.s6_addr32[0] = 0; +a2.s6_addr32[1] = 0; +a2.s6_addr32[2] = htonl(0x); +a2.s6_addr32[3] = (uint32_t)a1.s_addr; +return 0; +} This can be void, noone ever checks the return status. Maybe change the name to ipv6_addr_v4map() too? @@ -84,7 +85,7 @@ static void svcauth_unix_domain_release( struct ip_map { struct cache_headh; charm_class[8]; /* e.g. nfsd */ -struct in_addrm_addr; +struct in6_addrm_addr; Indentation. static void ip_map_init(struct cache_head *cnew, struct cache_head *citem) { @@ -125,7 +133,7 @@ static void ip_map_init(struct cache_hea struct ip_map *item = container_of(citem, struct ip_map, h); strcpy(new-m_class, item-m_class); -new-m_addr.s_addr = item-m_addr.s_addr; +memcpy((new-m_addr), (item-m_addr), sizeof(struct in6_addr)); Use ipv6_addr_copy(). @@ -151,20 +159,22 @@ static void ip_map_request(struct cache_ { char text_addr[20]; struct ip_map *im = container_of(h, struct ip_map, h); -__be32 addr = im-m_addr.s_addr; - -snprintf(text_addr, 20, %u.%u.%u.%u, - ntohl(addr) 24 0xff, - ntohl(addr) 16 0xff, - ntohl(addr) 8 0xff, - ntohl(addr) 0 0xff); +if (IS_ADDR_MAPPED(im-m_addr.s6_addr32)) { +snprintf(text_addr, 20, NIPQUAD_FMT, +ntohl(im-m_addr.s6_addr32[3]) 24 0xff, +ntohl(im-m_addr.s6_addr32[3]) 16 0xff, +ntohl(im-m_addr.s6_addr32[3]) 8 0xff, +ntohl(im-m_addr.s6_addr32[3]) 0 0xff); +} else { +snprintf(text_addr, 20, NIP6_FMT, NIP6(im-m_addr)); +} You'll need more than 20 bytes to print an IPv6 address, I'd make this at least 44 to account for some fluff. Surprised you didn't crash during testing. static int ip_map_parse(struct cache_detail *cd, @@ -175,10 +185,10 @@ static int ip_map_parse(struct cache_det
Re: [PATCH 1/1] NFS: change the ip_map cache code to handle IPv6 addresses
Hi Aurélien- Aurélien Charbon wrote: According to Neil's comments, I have tried to correct the mistakes of my first sending Thank you for these comments Neil. This is a small part of missing pieces of IPv6 support for the server. It deals with the ip_map caching code part. It changes the ip_map structure to be able to store INET6 addresses. It adds also the changes in address hashing, and mapping to test it with INET addresses. Signed-off-by: Aurelien Charbon [EMAIL PROTECTED] --- fs/nfsd/export.c | 10 ++- fs/nfsd/nfsctl.c | 21 ++- include/linux/sunrpc/svcauth.h |4 - include/net/ipv6.h | 17 + net/sunrpc/svcauth_unix.c | 121 - 5 files changed, 129 insertions(+), 44 deletions(-) diff -p -u -r -N linux-2.6.23-rc3/fs/nfsd/export.c linux-2.6.23-rc3-IPv6-ipmap-cache/fs/nfsd/export.c --- linux-2.6.23-rc3/fs/nfsd/export.c2007-08-23 13:18:16.0 +0200 +++ linux-2.6.23-rc3-IPv6-ipmap-cache/fs/nfsd/export.c2007-08-23 13:51:08.0 +0200 @@ -35,6 +35,7 @@ #include linux/lockd/bind.h #include linux/sunrpc/msg_prot.h #include linux/sunrpc/gss_api.h +#include net/ipv6.h #define NFSDDBG_FACILITYNFSDDBG_EXPORT @@ -1559,6 +1560,7 @@ exp_addclient(struct nfsctl_client *ncp) { struct auth_domain*dom; inti, err; +struct in6_addr addr6; /* First, consistency check. */ err = -EINVAL; @@ -1577,9 +1579,11 @@ exp_addclient(struct nfsctl_client *ncp) goto out_unlock; /* Insert client into hashtable. */ -for (i = 0; i ncp-cl_naddr; i++) -auth_unix_add_addr(ncp-cl_addrlist[i], dom); - +for (i = 0; i ncp-cl_naddr; i++) { +/* Mapping address */ +ipv6_addr_map(ncp-cl_addrlist[i], addr6); +auth_unix_add_addr(addr6, dom); +} auth_unix_forget_old(dom); auth_domain_put(dom); diff -p -u -r -N linux-2.6.23-rc3/fs/nfsd/nfsctl.c linux-2.6.23-rc3-IPv6-ipmap-cache/fs/nfsd/nfsctl.c --- linux-2.6.23-rc3/fs/nfsd/nfsctl.c2007-08-23 13:18:16.0 +0200 +++ linux-2.6.23-rc3-IPv6-ipmap-cache/fs/nfsd/nfsctl.c2007-08-23 13:25:28.0 +0200 @@ -222,7 +222,7 @@ static ssize_t write_getfs(struct file * struct auth_domain *clp; int err = 0; struct knfsd_fh *res; - +struct in6_addr in6; if (size sizeof(*data)) return -EINVAL; data = (struct nfsctl_fsparm*)buf; @@ -236,7 +236,14 @@ static ssize_t write_getfs(struct file * res = (struct knfsd_fh*)buf; exp_readlock(); -if (!(clp = auth_unix_lookup(sin-sin_addr))) + +/* IPv6 address mapping */ +in6.s6_addr32[0] = 0; +in6.s6_addr32[1] = 0; +in6.s6_addr32[2] = htonl(0x); +in6.s6_addr32[3] = (uint32_t)sin-sin_addr.s_addr; + +if (!(clp = auth_unix_lookup(in6))) err = -EPERM; else { err = exp_rootfh(clp, data-gd_path, res, data-gd_maxlen); @@ -253,6 +260,7 @@ static ssize_t write_getfd(struct file * { struct nfsctl_fdparm *data; struct sockaddr_in *sin; +struct in6_addr in6; struct auth_domain *clp; int err = 0; struct knfsd_fh fh; @@ -271,7 +279,14 @@ static ssize_t write_getfd(struct file * res = buf; sin = (struct sockaddr_in *)data-gd_addr; exp_readlock(); -if (!(clp = auth_unix_lookup(sin-sin_addr))) + +/* IPv6 address mapping */ +in6.s6_addr32[0] = 0; +in6.s6_addr32[1] = 0; +in6.s6_addr32[2] = htonl(0x); +in6.s6_addr32[3] = (uint32_t)sin-sin_addr.s_addr; The code canonicalizes IPv4 addresses in several places. Is there already a generic function defined somewhere to do this? If not, it might make sense to add one. begin:vcard fn:Chuck Lever n:Lever;Chuck org:Oracle Corporation;Corporate Architecture: Linux Projects Group adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA title:Principal Member of Staff tel;work:+1 248 614 5091 x-mozilla-html:FALSE url:http://oss.oracle.com/~cel version:2.1 end:vcard
Re: [2.6.20.17 review 35/58] forcedeth bug fix: realtek phy
On the day of Thursday 23 August 2007 Greg KH hast written: On Wed, Aug 22, 2007 at 10:42:25PM +0200, Willy Tarreau wrote: On Wed, Aug 22, 2007 at 08:15:03PM +0200, Prakash Punnoor wrote: Hi, even if Greg is waiting for some special invitation (http://lkml.org/lkml/2007/8/14/229), I suggest putting this patch by Ayaz on top: http://lkml.org/lkml/2007/8/10/296 That's what I prepare first, but then noticed it's not in mainline. Perhaps Ayaz wants to give Greg the clarification he needs... :sigh: He should, as the fix is not in mainline either :-( I don't think Greg asks for specific clarification, just a plain patch with a short commit log on its own which does not include remains of older mails. Exactly, that is what I am waiting for. And also I need the change to go into mainline first, as we can not diverge with the -stable releases. Can we get that into mainline then? I haven't seen forcedeth in MAINTAINERS, so I added netdev to the cc list. bye, -- (°= =°) //\ Prakash Punnoor /\\ V_/ \_V signature.asc Description: This is a digitally signed message part.
Re: [PATCH net-2.6.24] introduce MAC_FMT/MAC_ARG
On Wed, 2007-08-22 at 20:46 +0200, Johannes Berg wrote: The two different wireless code bases both define macros to ease printing MAC addresses: There are also several different uses of the equivalent of printk(%02x,addr[0]) for (i=1; i6; i++) printk(:%02x,addr[i]); to print an ethernet MAC address. http://www.uwsg.iu.edu/hypermail/linux/net/0602.1/0002.html As not all device MAC addresses are 6 bytes, colon separated, perhaps an appropriate ethernet/tr MAC designation is EUI48. http://standards.ieee.org/regauth/oui/tutorials/EUI48.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] E1000: Fix ifdown hang in git-2.6.24
Krishna Kumar wrote: Doing napi_disable twice hangs ifdown of the device. e1000_down is the common place to call napi_disable. Signed-off-by: Krishna Kumar [EMAIL PROTECTED] --- e1000_main.c |4 1 files changed, 4 deletions(-) diff -ruNp org/drivers/net/e1000/e1000_main.c new/drivers/net/e1000/e1000_main.c --- org/drivers/net/e1000/e1000_main.c 2007-08-23 13:32:16.0 +0530 +++ new/drivers/net/e1000/e1000_main.c 2007-08-23 13:32:34.0 +0530 @@ -1477,10 +1477,6 @@ e1000_close(struct net_device *netdev) { struct e1000_adapter *adapter = netdev_priv(netdev); -#ifdef CONFIG_E1000_NAPI - napi_disable(adapter-napi); -#endif - WARN_ON(test_bit(__E1000_RESETTING, adapter-flags)); e1000_down(adapter); e1000_power_down_phy(adapter); Acked-by: Auke Kok [EMAIL PROTECTED] I pushed this change to akpm for -mm as well in e1000e... Thanks Krishna, Auke - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-2.6.24] introduce MAC_FMT/MAC_ARG
On Thu, 2007-08-23 at 09:01 -0700, Joe Perches wrote: There are also several different uses of the equivalent of printk(%02x,addr[0]) for (i=1; i6; i++) printk(:%02x,addr[i]); to print an ethernet MAC address. Hm. I didn't know that, I can go through in a later patch if desired. http://www.uwsg.iu.edu/hypermail/linux/net/0602.1/0002.html As not all device MAC addresses are 6 bytes, colon separated, perhaps an appropriate ethernet/tr MAC designation is EUI48. http://standards.ieee.org/regauth/oui/tutorials/EUI48.html Practically, however, nobody is going to even find macros named EUI48_FMT/EUI48_ARG, would they? I don't much care, but I find it rather unsatisfying that both wireless code bases define these macros. johannes signature.asc Description: This is a digitally signed message part
New NAPI interface: netif_rx_reschedule not working
Hi David, when trying to get our driver working with the new interface, I found the following issue where I'm not sure how to solve it best: netif_rx_reschedule() does not work when called after netif_rx_complete(). The problem is that netif_rx_reschedule currently adds the napi struct once more to the poll list. However, net_rx_action will add it to the poll list as well (NAPI_STATE_SCHED set), so the device is scheduled twice. Next time netif_rx_complete is called for the second schedule, it will result in BUG() because NAPI_STATE_SCHED is not set anymore (cleared by first netif_rx_complete()). Modifying netif_rx_reschedule to only set NAPI_STATE_SCHED flag again and not adding the device to the poll_list will not solve the problem entirely. After netif_rx_complete() the driver activates the IRQs again. If an IRQ is caught on a different CPU before netif_rx_reschedule is called, we will have the napi device scheduled twice again... because net_rx_action will schedule it and netif_rx_schedule as well (add it to poll_list). I think this is an issue that can even occur if you don't use netif_rx_reschedule. Do I understand this correctly? Thanks, Jan-Bernd - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[IPv6] Add v4mapped address inline
Add v4mapped address inline to avoid calls to ipv6_addr_type(). diff --git a/include/net/ipv6.h b/include/net/ipv6.h index 9059e0e..c2b6c11 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -418,6 +418,12 @@ static inline int ipv6_addr_diff(const struct in6_addr *a1, const struct in6_add return __ipv6_addr_diff(a1, a2, sizeof(struct in6_addr)); } +static inline int ipv6_addr_v4mapped(const struct in6_addr *a) +{ + return ((a-s6_addr32[0] | a-s6_addr32[1]) == 0 + a-s6_addr32[2] == htonl(0x)); +} + /* * Prototypes exported by ipv6 */ diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c index 761a910..92d8119 100644 --- a/net/ipv6/ipv6_sockglue.c +++ b/net/ipv6/ipv6_sockglue.c @@ -249,7 +249,7 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, } if (ipv6_only_sock(sk) || - !(ipv6_addr_type(np-daddr) IPV6_ADDR_MAPPED)) { + !ipv6_addr_v4mapped(np-daddr)) { retv = -EADDRNOTAVAIL; break; } diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 0f7defb..d5c0175 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -697,7 +697,7 @@ static int tcp_v6_parse_md5_keys (struct sock *sk, char __user *optval, if (!cmd.tcpm_keylen) { if (!tcp_sk(sk)-md5sig_info) return -ENOENT; - if (ipv6_addr_type(sin6-sin6_addr) IPV6_ADDR_MAPPED) + if (ipv6_addr_v4mapped(sin6-sin6_addr)) return tcp_v4_md5_do_del(sk, sin6-sin6_addr.s6_addr32[3]); return tcp_v6_md5_do_del(sk, sin6-sin6_addr); } @@ -720,7 +720,7 @@ static int tcp_v6_parse_md5_keys (struct sock *sk, char __user *optval, newkey = kmemdup(cmd.tcpm_key, cmd.tcpm_keylen, GFP_KERNEL); if (!newkey) return -ENOMEM; - if (ipv6_addr_type(sin6-sin6_addr) IPV6_ADDR_MAPPED) { + if (ipv6_addr_v4mapped(sin6-sin6_addr)) { return tcp_v4_md5_do_add(sk, sin6-sin6_addr.s6_addr32[3], newkey, cmd.tcpm_keylen); } diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 4210951..3e0ca15 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -610,7 +610,7 @@ int udpv6_sendmsg(struct kiocb *iocb, struct sock *sk, daddr = NULL; if (daddr) { - if (ipv6_addr_type(daddr) == IPV6_ADDR_MAPPED) { + if (ipv6_addr_v4mapped(daddr)) { struct sockaddr_in sin; sin.sin_family = AF_INET; sin.sin_port = sin6 ? sin6-sin6_port : inet-dport; diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c index f8aa23d..cd57a51 100644 --- a/net/sctp/ipv6.c +++ b/net/sctp/ipv6.c @@ -481,7 +481,7 @@ static int sctp_v6_cmp_addr(const union sctp_addr *addr1, if (addr1-sa.sa_family != addr2-sa.sa_family) { if (addr1-sa.sa_family == AF_INET addr2-sa.sa_family == AF_INET6 - IPV6_ADDR_MAPPED == ipv6_addr_type(addr2-v6.sin6_addr)) { + ipv6_addr_v4mapped(addr2-v6.sin6_addr)) { if (addr2-v6.sin6_port == addr1-v4.sin_port addr2-v6.sin6_addr.s6_addr32[3] == addr1-v4.sin_addr.s_addr) @@ -489,7 +489,7 @@ static int sctp_v6_cmp_addr(const union sctp_addr *addr1, } if (addr2-sa.sa_family == AF_INET addr1-sa.sa_family == AF_INET6 - IPV6_ADDR_MAPPED == ipv6_addr_type(addr1-v6.sin6_addr)) { + ipv6_addr_v4mapped(addr1-v6.sin6_addr)) { if (addr1-v6.sin6_port == addr2-v4.sin_port addr1-v6.sin6_addr.s6_addr32[3] == addr2-v4.sin_addr.s_addr)
[PATCH] shaper: mark for removal
Subject: shaper: mark for removal This driver has been marked obsolete for a long time and is superseded by traffic schedulers. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- a/Documentation/feature-removal-schedule.txt2007-08-23 09:36:24.0 -0700 +++ b/Documentation/feature-removal-schedule.txt2007-08-23 09:43:24.0 -0700 @@ -290,3 +290,12 @@ Why: All mthca hardware also supports MS Who: Roland Dreier [EMAIL PROTECTED] --- + +What: shaper network driver +When: January 2008 +Files: drivers/net/shaper.c, include/linux/if_shaper.h +Why: This driver has been marked obsolete for many years. + It was only designed to work on lower speed links and has design + flaws that lead to machine crashes. The qdisc infrastructure in + 2.4 or later kernels, provides richer features and is more robust. +Who: Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: UDPv4 port allocation problem
Tóth László Attila wrote: Hello, I noticed that it is possible that the kernel allocates the same UDP _Which_ kernel - or rather which rev? There are lots of linux kernels potentially out there... port to an application that was used and closed immediately before the new application got it. This means that applications that do not specify an exact port and rely on the kernel to allocate a port for them might see traffic originally meant for another application. Imagine that two applications want to resolve a name in DNS at about the same time. The following happens: * first app sends out the DNS query then closes the socket without waiting for an answer (e.g. it got interrupted by Ctrl+C) * second app opens an UDP socket, and gets the same port, originally assigned to app#1, sends out the DNS query * DNS server responds, the response goes to app#2 DNS might not be the perfect example, but you get the idea. Applications do not expect to receive data on newly opened sockets, not to mention the security implications. Actually, all applications using UDP are required to cope with just about anything since there are no guarantees with UDP of anything other than the checksum generally protecting one from corrupt data. In the specific case of DNS, the resolver library will (damn well better) be checking the answer it gets against the query it sent. There will be a transaction ID check, and IIRC a check of the returned query against the query sent. TCP on the other hand increases the allocated port number for each new socket, the same behaviour for UDP would add certain amount of time that decreases this risk. Does it always? If you wait for the length of TIME_WAIT before issuing another bind() request does the port number still increase? While it might be nice to step through the anonymous port space in some fashion (I suspect the argument would be made that it should be somewhat random to preclude guessing from the outside), applications using UDP are still required to expect the unexpected wrt data arriving on their socket. rick jones Is the current behaviour intended? Regards, Laszlo Attila Toth - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] improved xfrm_audit_log() patch
On Wed, 2007-08-22 at 20:05 -0700, David Miller wrote: I would suggest, at this point, to make purpose built situation specific interfaces that pass specific objects (the ones being operated upon) to the audit layer. Let the audit layer pick out the bits it actually wants in the format it likes. For example, if we're creating a template, pass the policy and the templace to the audit layer via a function called: xfrm_audit_template_add() or something like that. That function only needs two arguments. All of these call sites will rarely need more than 2 or 3 arguments in any given situation, and the on-stack audit thing will be gone too. This is the suggestion I made to you over a month ago, but you choose to do the on-stack thing. I misunderstood. My bad. For clarification, I plan on removing xfrm_audit_log() and replacing it with more specific ipsec audit interfaces. For example, when auditing the addition of a policy, either xfrm_user_audit_policy_add(xp, result, skb) or pfkey_audit_policy_add(xp, result) will get called. I need two because xfrm_user gets loginuid/secid from netlink/skb and pfkey gets it from audit_get_loginuid(). Each will setup and format audit buffer according to what they want. Also, for deleting, there will be pfkey_audit_policy_delete(xp, result) and xfrm_user_audit_policy_delete(xp, result, skb). You must make this cost absolutely nothing when it is either not configured, and have next to no cost when not enabled at run time. And it is very doable. The new ipsec audit functions can be ifdef'd with CONFIG_AUDITSYSCALL just as xfrm_audit_log() was so that there is no cost when audit is not configured. Let me know if this is better. Regards, Joy - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-2.6.24] introduce MAC_FMT/MAC_ARG
On Thu, Aug 23, 2007 at 06:12:00PM +0200, Johannes Berg wrote: On Thu, 2007-08-23 at 09:01 -0700, Joe Perches wrote: There are also several different uses of the equivalent of printk(%02x,addr[0]) for (i=1; i6; i++) printk(:%02x,addr[i]); to print an ethernet MAC address. Hm. I didn't know that, I can go through in a later patch if desired. http://www.uwsg.iu.edu/hypermail/linux/net/0602.1/0002.html As not all device MAC addresses are 6 bytes, colon separated, perhaps an appropriate ethernet/tr MAC designation is EUI48. http://standards.ieee.org/regauth/oui/tutorials/EUI48.html Practically, however, nobody is going to even find macros named EUI48_FMT/EUI48_ARG, would they? I don't much care, but I find it rather unsatisfying that both wireless code bases define these macros. Yeah, accomodating non-48-bit MAC addresses is a bit pedantic. I ACK the original patch, FWIW. John -- John W. Linville [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [IPv6] Add v4mapped address inline
Hello. In article [EMAIL PROTECTED] (at Thu, 23 Aug 2007 12:40:54 -0400), Brian Haley [EMAIL PROTECTED] says: diff --git a/include/net/ipv6.h b/include/net/ipv6.h index 9059e0e..c2b6c11 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -418,6 +418,12 @@ static inline int ipv6_addr_diff(const struct in6_addr *a1, const struct in6_add return __ipv6_addr_diff(a1, a2, sizeof(struct in6_addr)); } +static inline int ipv6_addr_v4mapped(const struct in6_addr *a) +{ + return ((a-s6_addr32[0] | a-s6_addr32[1]) == 0 + a-s6_addr32[2] == htonl(0x)); +} + Please put this just after ipv6_addr_any(), not after ipv6_addr_diff(). --yoshfuji - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [stable] [2.6.20.17 review 35/58] forcedeth bug fix: realtek phy
On Thu, Aug 23, 2007 at 05:50:41PM +0200, Prakash Punnoor wrote: On the day of Thursday 23 August 2007 Greg KH hast written: On Wed, Aug 22, 2007 at 10:42:25PM +0200, Willy Tarreau wrote: On Wed, Aug 22, 2007 at 08:15:03PM +0200, Prakash Punnoor wrote: Hi, even if Greg is waiting for some special invitation (http://lkml.org/lkml/2007/8/14/229), I suggest putting this patch by Ayaz on top: http://lkml.org/lkml/2007/8/10/296 That's what I prepare first, but then noticed it's not in mainline. Perhaps Ayaz wants to give Greg the clarification he needs... :sigh: He should, as the fix is not in mainline either :-( I don't think Greg asks for specific clarification, just a plain patch with a short commit log on its own which does not include remains of older mails. Exactly, that is what I am waiting for. And also I need the change to go into mainline first, as we can not diverge with the -stable releases. Can we get that into mainline then? I haven't seen forcedeth in MAINTAINERS, so I added netdev to the cc list. It might help if someone sends a real patch that can be applied :) thanks, greg k-h - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [IPv6] Add v4mapped address inline
YOSHIFUJI Hideaki / wrote: Please put this just after ipv6_addr_any(), not after ipv6_addr_diff(). Ok, updated patch attached. -Brian Add v4mapped address inline to avoid calls to ipv6_addr_type(). Signed-off-by: Brian Haley [EMAIL PROTECTED] diff --git a/include/net/ipv6.h b/include/net/ipv6.h index 9059e0e..37bdb25 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -377,6 +377,12 @@ static inline int ipv6_addr_any(const struct in6_addr *a) a-s6_addr32[2] | a-s6_addr32[3] ) == 0); } +static inline int ipv6_addr_v4mapped(const struct in6_addr *a) +{ + return ((a-s6_addr32[0] | a-s6_addr32[1]) == 0 + a-s6_addr32[2] == htonl(0x)); +} + /* * find the first different bit between two addresses * length of address must be a multiple of 32bits diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c index 761a910..92d8119 100644 --- a/net/ipv6/ipv6_sockglue.c +++ b/net/ipv6/ipv6_sockglue.c @@ -249,7 +249,7 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, } if (ipv6_only_sock(sk) || - !(ipv6_addr_type(np-daddr) IPV6_ADDR_MAPPED)) { + !ipv6_addr_v4mapped(np-daddr)) { retv = -EADDRNOTAVAIL; break; } diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 0f7defb..d5c0175 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -697,7 +697,7 @@ static int tcp_v6_parse_md5_keys (struct sock *sk, char __user *optval, if (!cmd.tcpm_keylen) { if (!tcp_sk(sk)-md5sig_info) return -ENOENT; - if (ipv6_addr_type(sin6-sin6_addr) IPV6_ADDR_MAPPED) + if (ipv6_addr_v4mapped(sin6-sin6_addr)) return tcp_v4_md5_do_del(sk, sin6-sin6_addr.s6_addr32[3]); return tcp_v6_md5_do_del(sk, sin6-sin6_addr); } @@ -720,7 +720,7 @@ static int tcp_v6_parse_md5_keys (struct sock *sk, char __user *optval, newkey = kmemdup(cmd.tcpm_key, cmd.tcpm_keylen, GFP_KERNEL); if (!newkey) return -ENOMEM; - if (ipv6_addr_type(sin6-sin6_addr) IPV6_ADDR_MAPPED) { + if (ipv6_addr_v4mapped(sin6-sin6_addr)) { return tcp_v4_md5_do_add(sk, sin6-sin6_addr.s6_addr32[3], newkey, cmd.tcpm_keylen); } diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 4210951..3e0ca15 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -610,7 +610,7 @@ int udpv6_sendmsg(struct kiocb *iocb, struct sock *sk, daddr = NULL; if (daddr) { - if (ipv6_addr_type(daddr) == IPV6_ADDR_MAPPED) { + if (ipv6_addr_v4mapped(daddr)) { struct sockaddr_in sin; sin.sin_family = AF_INET; sin.sin_port = sin6 ? sin6-sin6_port : inet-dport; diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c index f8aa23d..cd57a51 100644 --- a/net/sctp/ipv6.c +++ b/net/sctp/ipv6.c @@ -481,7 +481,7 @@ static int sctp_v6_cmp_addr(const union sctp_addr *addr1, if (addr1-sa.sa_family != addr2-sa.sa_family) { if (addr1-sa.sa_family == AF_INET addr2-sa.sa_family == AF_INET6 - IPV6_ADDR_MAPPED == ipv6_addr_type(addr2-v6.sin6_addr)) { + ipv6_addr_v4mapped(addr2-v6.sin6_addr)) { if (addr2-v6.sin6_port == addr1-v4.sin_port addr2-v6.sin6_addr.s6_addr32[3] == addr1-v4.sin_addr.s_addr) @@ -489,7 +489,7 @@ static int sctp_v6_cmp_addr(const union sctp_addr *addr1, } if (addr2-sa.sa_family == AF_INET addr1-sa.sa_family == AF_INET6 - IPV6_ADDR_MAPPED == ipv6_addr_type(addr1-v6.sin6_addr)) { + ipv6_addr_v4mapped(addr1-v6.sin6_addr)) { if (addr1-v6.sin6_port == addr2-v4.sin_port addr1-v6.sin6_addr.s6_addr32[3] == addr2-v4.sin_addr.s_addr)
[PATCH] udp: randomize port selection
This patch causes UDP port allocation to be randomized like TCP. The earlier code would always choose same port (ie first empty list). Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- a/net/ipv4/udp.c2007-08-23 09:44:22.0 -0700 +++ b/net/ipv4/udp.c2007-08-23 11:29:02.0 -0700 @@ -113,9 +113,8 @@ DEFINE_SNMP_STAT(struct udp_mib, udp_sta struct hlist_head udp_hash[UDP_HTABLE_SIZE]; DEFINE_RWLOCK(udp_hash_lock); -static int udp_port_rover; - -static inline int __udp_lib_lport_inuse(__u16 num, struct hlist_head udptable[]) +static inline int __udp_lib_lport_inuse(__u16 num, + const struct hlist_head udptable[]) { struct sock *sk; struct hlist_node *node; @@ -132,11 +131,10 @@ static inline int __udp_lib_lport_inuse( * @sk: socket struct in question * @snum:port number to look up * @udptable:hash list table, must be of UDP_HTABLE_SIZE - * @port_rover: pointer to record of last unallocated port * @saddr_comp: AF-dependent comparison of bound local IP addresses */ int __udp_lib_get_port(struct sock *sk, unsigned short snum, - struct hlist_head udptable[], int *port_rover, + struct hlist_head udptable[], int (*saddr_comp)(const struct sock *sk1, const struct sock *sk2 )) { @@ -146,49 +144,56 @@ int __udp_lib_get_port(struct sock *sk, interror = 1; write_lock_bh(udp_hash_lock); - if (snum == 0) { - int best_size_so_far, best, result, i; - if (*port_rover sysctl_local_port_range[1] || - *port_rover sysctl_local_port_range[0]) - *port_rover = sysctl_local_port_range[0]; - best_size_so_far = 32767; - best = result = *port_rover; - for (i = 0; i UDP_HTABLE_SIZE; i++, result++) { - int size; - - head = udptable[result (UDP_HTABLE_SIZE - 1)]; - if (hlist_empty(head)) { - if (result sysctl_local_port_range[1]) - result = sysctl_local_port_range[0] + - ((result - sysctl_local_port_range[0]) -(UDP_HTABLE_SIZE - 1)); + if (!snum) { + int i; + int low = sysctl_local_port_range[0]; + int high = sysctl_local_port_range[1]; + unsigned rover, best, best_size_so_far; + + best_size_so_far = UINT_MAX; + best = rover = net_random() % (high - low) + low; + + /* 1st pass: look for empty (or shortest) hash chain */ + for (i = 0; i UDP_HTABLE_SIZE; i++) { + int size = 0; + + head = udptable[rover (UDP_HTABLE_SIZE - 1)]; + if (hlist_empty(head)) goto gotit; - } - size = 0; + sk_for_each(sk2, node, head) { if (++size = best_size_so_far) goto next; } best_size_so_far = size; - best = result; + best = rover; next: - ; + /* fold back if end of range */ + if (++rover high) + rover = low + ((rover - low) + (UDP_HTABLE_SIZE - 1)); + + } - result = best; - for (i = 0; i (1 16) / UDP_HTABLE_SIZE; -i++, result += UDP_HTABLE_SIZE) { - if (result sysctl_local_port_range[1]) - result = sysctl_local_port_range[0] - + ((result - sysctl_local_port_range[0]) - (UDP_HTABLE_SIZE - 1)); - if (! __udp_lib_lport_inuse(result, udptable)) - break; + + /* 2nd pass: find hole in shortest hash chain */ + rover = best; + for (i = 0; i (1 16) / UDP_HTABLE_SIZE; i++) { + if (! __udp_lib_lport_inuse(rover, udptable)) + goto gotit; + rover += UDP_HTABLE_SIZE; + if (rover high) + rover = low + ((rover - low) + (UDP_HTABLE_SIZE - 1)); } - if (i = (1 16) / UDP_HTABLE_SIZE) - goto fail; + + + /* All ports in use! */ + goto fail; +
Re: [PATCH] [02/10] pasemi_mac: Stop using the pci config space accessors for register read/writes
On Thu, Aug 23, 2007 at 10:31:03AM +1000, Stephen Rothwell wrote: On Wed, 22 Aug 2007 09:12:48 -0500 Olof Johansson [EMAIL PROTECTED] wrote: -static unsigned int read_iob_reg(struct pasemi_mac *mac, unsigned int reg) +static inline unsigned int read_iob_reg(struct pasemi_mac *mac, unsigned int reg) ^^ For static functions in C files, we tend not to bother marking them inline any more as the compiler does a pretty good job theses days. Yeah, sloppy coding on my behalf. It was still there from when I explicitly added noinline during debugging, forgot to take it out alltogether. - pci_read_config_dword(mac-iob_pdev, reg, val); + val = in_le32(mac-iob_regs+reg); + return val; Why not just return in_le32(mac-iob_regs+reg); ? And similarly below? Residual from debugging as well, I had debug hooks showing what was read/written that I took out, but didn't fix up the surrounding stuff. Refreshed patch posted separately. Thanks for the feedback. -Olof - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] [02/10] pasemi_mac: Stop using the pci config space accessors for register read/writes
Move away from using the pci config access functions for simple register access. Our device has all of the registers in the config space (hey, from the hardware point of view it looks reasonable :-), so we need to somehow get to it. Newer firmwares have it in the device tree such that we can just get it and ioremap it there (in case it ever moves in future products). For now, provide a hardcoded fallback for older firmwares. Signed-off-by: Olof Johansson [EMAIL PROTECTED] --- Updated: Removed explicit inlines, cleaned up read functions, fixed grammar. Index: mainline/drivers/net/pasemi_mac.c === --- mainline.orig/drivers/net/pasemi_mac.c +++ mainline/drivers/net/pasemi_mac.c @@ -83,44 +83,35 @@ static struct pasdma_status *dma_status; static unsigned int read_iob_reg(struct pasemi_mac *mac, unsigned int reg) { - unsigned int val; - - pci_read_config_dword(mac-iob_pdev, reg, val); - return val; + return in_le32(mac-iob_regs+reg); } static void write_iob_reg(struct pasemi_mac *mac, unsigned int reg, unsigned int val) { - pci_write_config_dword(mac-iob_pdev, reg, val); + out_le32(mac-iob_regs+reg, val); } static unsigned int read_mac_reg(struct pasemi_mac *mac, unsigned int reg) { - unsigned int val; - - pci_read_config_dword(mac-pdev, reg, val); - return val; + return in_le32(mac-regs+reg); } static void write_mac_reg(struct pasemi_mac *mac, unsigned int reg, unsigned int val) { - pci_write_config_dword(mac-pdev, reg, val); + out_le32(mac-regs+reg, val); } static unsigned int read_dma_reg(struct pasemi_mac *mac, unsigned int reg) { - unsigned int val; - - pci_read_config_dword(mac-dma_pdev, reg, val); - return val; + return in_le32(mac-dma_regs+reg); } static void write_dma_reg(struct pasemi_mac *mac, unsigned int reg, unsigned int val) { - pci_write_config_dword(mac-dma_pdev, reg, val); + out_le32(mac-dma_regs+reg, val); } static int pasemi_get_mac_addr(struct pasemi_mac *mac) @@ -585,7 +576,6 @@ static int pasemi_mac_clean_tx(struct pa } mac-tx-next_to_clean += count; spin_unlock_irqrestore(mac-tx-lock, flags); - netif_wake_queue(mac-netdev); return count; @@ -1076,6 +1066,73 @@ static int pasemi_mac_poll(struct net_de } } +static void __iomem * __devinit map_onedev(struct pci_dev *p, int index) +{ + struct device_node *dn; + void __iomem *ret; + + dn = pci_device_to_OF_node(p); + if (!dn) + goto fallback; + + ret = of_iomap(dn, index); + if (!ret) + goto fallback; + + return ret; +fallback: + /* This is hardcoded and ugly, but we have some firmware versions +* that don't provide the register space in the device tree. Luckily +* they are at well-known locations so we can just do the math here. +*/ + return ioremap(0xe000 + (p-devfn 12), 0x2000); +} + +static int __devinit pasemi_mac_map_regs(struct pasemi_mac *mac) +{ + struct resource res; + struct device_node *dn; + int err; + + mac-dma_pdev = pci_get_device(PCI_VENDOR_ID_PASEMI, 0xa007, NULL); + if (!mac-dma_pdev) { + dev_err(mac-pdev-dev, Can't find DMA Controller\n); + return -ENODEV; + } + + mac-iob_pdev = pci_get_device(PCI_VENDOR_ID_PASEMI, 0xa001, NULL); + if (!mac-iob_pdev) { + dev_err(mac-pdev-dev, Can't find I/O Bridge\n); + return -ENODEV; + } + + mac-regs = map_onedev(mac-pdev, 0); + mac-dma_regs = map_onedev(mac-dma_pdev, 0); + mac-iob_regs = map_onedev(mac-iob_pdev, 0); + + if (!mac-regs || !mac-dma_regs || !mac-iob_regs) { + dev_err(mac-pdev-dev, Can't map registers\n); + return -ENODEV; + } + + /* The dma status structure is located in the I/O bridge, and +* is cache coherent. +*/ + if (!dma_status) { + dn = pci_device_to_OF_node(mac-iob_pdev); + if (dn) + err = of_address_to_resource(dn, 1, res); + if (!dn || err) { + /* Fallback for old firmware */ + res.start = 0xfd80; + res.end = res.start + 0x1000; + } + dma_status = __ioremap(res.start, res.end-res.start, 0); + } + + return 0; +} + static int __devinit pasemi_mac_probe(struct pci_dev *pdev, const struct pci_device_id *ent) { @@ -1104,21 +1161,6 @@ pasemi_mac_probe(struct pci_dev *pdev, c mac-pdev = pdev; mac-netdev = dev; - mac-dma_pdev = pci_get_device(PCI_VENDOR_ID_PASEMI, 0xa007, NULL); - - if (!mac-dma_pdev) { -
[PATCH] fix realtek phy id in forcedeth
Hi Greg, On Thu, Aug 23, 2007 at 09:55:13AM -0700, Greg KH wrote: It might help if someone sends a real patch that can be applied :) This is getting really silly now :-) We're all wasting more time wondering who will send the patch than posting it. I've lost, I got fed up first, so here it is. Please apply to mainline then stable. Thanks, Willy -- From a0e2922b99eedd9863232368ea2afe072c52783e Mon Sep 17 00:00:00 2001 From: Willy Tarreau [EMAIL PROTECTED] Date: Thu, 23 Aug 2007 21:35:41 +0200 Subject: [PATCH] fix realtek phy id in forcedeth As noticed by Chuck Ebbert, commit c5e3ae8823693b260ce1f217adca8add1bc0b3de introduced a copy-paste typo, as realtek phy is 0x732 and not 0x1c1. Obvious fix below suggested by Ayaz Abdulla. Signed-off-by: Willy Tarreau [EMAIL PROTECTED] Cc: Ayaz Abdulla [EMAIL PROTECTED] Cc: Chuck Ebbert [EMAIL PROTECTED] --- drivers/net/forcedeth.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c index 10f4e3b..1938d6d 100644 --- a/drivers/net/forcedeth.c +++ b/drivers/net/forcedeth.c @@ -552,7 +552,7 @@ union ring_type { #define PHY_OUI_MARVELL0x5043 #define PHY_OUI_CICADA 0x03f1 #define PHY_OUI_VITESSE0x01c1 -#define PHY_OUI_REALTEK0x01c1 +#define PHY_OUI_REALTEK0x0732 #define PHYID1_OUI_MASK0x03ff #define PHYID1_OUI_SHFT6 #define PHYID2_OUI_MASK0xfc00 -- 1.5.2.5 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.23] cxgb3 - Fix dev-priv usage
This patch doesn't seem to have gone in yet Steve. David Miller wrote: From: Divy Le Ray [EMAIL PROTECTED] Date: Mon, 13 Aug 2007 12:33:04 -0700 From: Divy Le Ray [EMAIL PROTECTED] cxgb3 used netdev_priv() and dev-priv for different purposes. In 2.6.23, netdev_priv() == dev-priv, cxgb3 needs a fix. This patch is a partial backport of Dave Miller's changes in the net-2.6.24 git branch. Signed-off-by: Divy Le Ray [EMAIL PROTECTED] Thank you for doing this backport. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] improved xfrm_audit_log() patch
From: Joy Latten [EMAIL PROTECTED] Date: Thu, 23 Aug 2007 12:15:10 -0500 For example, when auditing the addition of a policy, either xfrm_user_audit_policy_add(xp, result, skb) or pfkey_audit_policy_add(xp, result) will get called. I need two because xfrm_user gets loginuid/secid from netlink/skb and pfkey gets it from audit_get_loginuid(). Each will setup and format audit buffer according to what they want. Also, for deleting, there will be pfkey_audit_policy_delete(xp, result) and xfrm_user_audit_policy_delete(xp, result, skb). This sounds great. How cheap is the auditing enabled test? Perhaps it can be even inlined into the xfrm audit hooks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [-MM, FIX] e1000e: incorporate napi_struct changes from net-2.6.24.git
From: Auke Kok [EMAIL PROTECTED] Date: Thu, 23 Aug 2007 07:59:11 -0700 This incorporates the new napi_struct changes into e1000e. Included bugfix for ifdown hang from Krishna Kumar for e1000. Signed-off-by: Auke Kok [EMAIL PROTECTED] Acked-by: David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -mm] ath5k: remove sysctl(2) support
Alexey Dobriyan napsal(a): sysctl(2) is supported but frozen. I've posted similar patch yesterday: http://marc.info/?l=linux-mm-commitsm=118782442602108w=2 Signed-off-by: Alexey Dobriyan [EMAIL PROTECTED] --- drivers/net/wireless/ath5k_base.c | 21 ++--- 1 file changed, 6 insertions(+), 15 deletions(-) --- a/drivers/net/wireless/ath5k_base.c +++ b/drivers/net/wireless/ath5k_base.c @@ -2438,21 +2438,12 @@ static struct pci_driver ath_pci_drv_id = { .resume = ath_pci_resume, }; -/* - * Static (i.e. global) sysctls. Note that the hal sysctls - * are located under ours by sharing the setting for DEV_ATH. - */ -enum { - DEV_ATH = 9,/* XXX known by hal */ -}; - static int mincalibrate = 1; static int maxcalibrate = INT_MAX / 1000; -#define CTL_AUTO-2 /* cannot be CTL_ANY or CTL_NONE */ static ctl_table ath_static_sysctls[] = { #if AR_DEBUG - { .ctl_name = CTL_AUTO, + { .procname = debug, .mode = 0644, .data = ath_debug, @@ -2460,28 +2451,28 @@ static ctl_table ath_static_sysctls[] = { .proc_handler = proc_dointvec }, #endif - { .ctl_name = CTL_AUTO, + { .procname = countrycode, .mode = 0444, .data = countrycode, .maxlen = sizeof(countrycode), .proc_handler = proc_dointvec }, - { .ctl_name = CTL_AUTO, + { .procname = outdoor, .mode = 0444, .data = outdoor, .maxlen = sizeof(outdoor), .proc_handler = proc_dointvec }, - { .ctl_name = CTL_AUTO, + { .procname = xchanmode, .mode = 0444, .data = xchanmode, .maxlen = sizeof(xchanmode), .proc_handler = proc_dointvec }, - { .ctl_name = CTL_AUTO, + { .procname = calibrate, .mode = 0644, .data = ath_calinterval, @@ -2493,7 +2484,7 @@ static ctl_table ath_static_sysctls[] = { { 0 } }; static ctl_table ath_ath_table[] = { - { .ctl_name = DEV_ATH, + { .procname = ath, .mode = 0555, .child= ath_static_sysctls Anyway thanks! -- Jiri Slaby ([EMAIL PROTECTED]) Faculty of Informatics, Masaryk University - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 1/1] Net: add ath5k wireless driver
On Sun, Aug 12, 2007 at 05:33:16PM +0200, Jiri Slaby wrote: add ath5k wireless driver Signed-off-by: Jiri Slaby [EMAIL PROTECTED] Review still pending, but I went ahead and added this on the 'ath5k' branch of wireless-dev. It is available on 'everything' as well. Thanks, John -- John W. Linville [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC] iw_cxgb3: Support iwarp-only interfaces to avoid 4-tuple conflicts with the host stack.
Roland/All, Here is the first swipe at keeping iwarp connections on their own ip addresses to avoid conflicts with the host stack. - this is a request for comments - it is not yet tested fully (tested a prototype of the initial concept) - still needs serialization/locking - stays in our RDMA sandbox ;-) For background reading (if you dare), see: http://www.mail-archive.com/[EMAIL PROTECTED]/msg05162.html and http://www.mail-archive.com/netdev@vger.kernel.org/msg44312.html Also: I'm on vacation starting tomorrow until Tuesday 9/4. I'll address comments when I return... Steve. --- iw_cxgb3: Support iwarp-only interfaces to avoid 4-tuple conflicts with the host stack. Design: The sysadmin creates for iwarp use only alias interfaces of the form devname:iw* where devname is the native interface name (eg eth0) for the iwarp netdev device. The alias label can be anything starting with iw. The iw immediately after the ':' is the key used by the iwarp driver. EG: ifconfig eth0 192.168.70.123 up ifconfig eth0:iw1 192.168.71.123 up ifconfig eth0:iw2 192.168.72.123 up In the above example, 192.168.70/24 is for TCP traffic, while 192.168.71/24 and 192.168.72/24 are for iWARP/RDMA use. The rdma-only interface must be on its own subnet. This allows routing all rdma traffic onto this interface. The iWARP driver must translate all listens on address 0.0.0.0 to the set of rdma-only ip addresses. This prevents incoming connects to the TCP ipaddresses from going up the rdma stack. Implementation Details: - The iwarp driver registers for inetaddr events via register_inetaddr_notifier(). This allows tracking the iwarp-only addresses/subnets as they get added and deleted. The iwarp driver maintains a list of the current iwarp-only addresses. - The iwarp driver builds the list of iwarp-only addresses for its devices at module insert time. This is needed because the inetaddr notifier callbacks don't replay address-add events when someone registers. So the driver must build the initial list at module load time. - When a listen is done on address 0.0.0.0, then the iwarp driver must translate that into a set of listens on the iwarp-only addresses. - When a new iwarp-only address is added or removed, the iwarp driver must traverse the set of listening endpoints and update them accordingly. This allows an application to bind to 0.0.0.0 prior to the iwarp-only interfaces being configured. It also allows changing the iwarp-only set of addresses and getting the expected behavior for apps already bound to 0.0.0.0. Signed-off-by: Steve Wise [EMAIL PROTECTED] --- drivers/infiniband/hw/cxgb3/iwch.c| 116 + drivers/infiniband/hw/cxgb3/iwch.h| 10 + drivers/infiniband/hw/cxgb3/iwch_cm.c | 229 ++--- drivers/infiniband/hw/cxgb3/iwch_cm.h | 11 +- 4 files changed, 318 insertions(+), 48 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch.c b/drivers/infiniband/hw/cxgb3/iwch.c index 0315c9d..da57b77 100644 --- a/drivers/infiniband/hw/cxgb3/iwch.c +++ b/drivers/infiniband/hw/cxgb3/iwch.c @@ -63,6 +63,115 @@ struct cxgb3_client t3c_client = { static LIST_HEAD(dev_list); static DEFINE_MUTEX(dev_mutex); +static void insert_ifa(struct iwch_dev *rnicp, struct in_ifaddr *ifa) +{ + struct iwch_addrlist *addr; + + addr = kmalloc(sizeof *addr, GFP_KERNEL); + if (!addr) { + printk(KERN_ERR MOD %s - failed to alloc memory!\n, + __FUNCTION__); + return; + } + addr-ifa = ifa; + list_add_tail(addr-entry, rnicp-addrlist); +} + +static void remove_ifa(struct iwch_dev *rnicp, struct in_ifaddr *ifa) +{ + struct iwch_addrlist *addr, *tmp; + + list_for_each_entry_safe(addr, tmp, rnicp-addrlist, entry) { + if (addr-ifa == ifa) { + list_del_init(addr-entry); + kfree(addr); + return; + } + } +} + +static int netdev_is_ours(struct iwch_dev *rnicp, struct net_device *netdev) +{ + int i; + + for (i = 0; i rnicp-rdev.port_info.nports; i++) + if (netdev == rnicp-rdev.port_info.lldevs[i]) + return 1; + return 0; +} + +static inline int is_iwarp_label(char *label) +{ + char *colon; + + colon = strchr(label, ':'); + if (colon !strncmp(colon+1, iw, 2)) + return 1; + return 0; +} + +static int nb_callback(struct notifier_block *self, unsigned long event, + void *ctx) +{ + struct in_ifaddr *ifa = ctx; + struct iwch_dev *rnicp = container_of(self, struct iwch_dev, nb); + + printk(KERN_INFO %s rnicp %p event %lx\n, __FUNCTION__, rnicp, event); + + switch (event) { + case NETDEV_UP: + if (netdev_is_ours(rnicp, ifa-ifa_dev-dev) + is_iwarp_label(ifa-ifa_label)) { +
Re: [PATCH 0/3] cxgb3 driver update
Hi Al, Speaking of cxgb3, could you explain what the hell is static int do_term(struct t3cdev *dev, struct sk_buff *skb) { unsigned int hwtid = ntohl(skb-priority) 8 0xf; doing? AFAIK, skb-priority is not net-endian... the RDMA connection id is saved in the skb's priority field for TERM messages because it is not in the CPL message that comes up from the hardware. Yet the RDMA driver needs it, so sge.c::process_responses() overloads the skb's priority and csum with these values. Another odd place is int t3_seeprom_write(struct adapter *adapter, u32 addr, u32 data) { u16 val; int attempts = EEPROM_MAX_POLL; unsigned int base = adapter-params.pci.vpd_cap_addr; if ((addr = EEPROMSIZE addr != EEPROM_STAT_ADDR) || (addr 3)) return -EINVAL; pci_write_config_dword(adapter-pdev, base + PCI_VPD_DATA, cpu_to_le32(data)); with callers like int t3_seeprom_wp(struct adapter *adapter, int enable) { return t3_seeprom_write(adapter, EEPROM_STAT_ADDR, enable ? 0xc : 0); IOW, you really get little-endian values passed to pci_write_config_dword() and it expects a host-endian as the last argument... It looks like a bug. Thanks for spotting this. Cheers, Divy - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB
On Wed, 2007-22-08 at 13:21 -0700, David Miller wrote: From: Rick Jones [EMAIL PROTECTED] Date: Wed, 22 Aug 2007 10:09:37 -0700 Should it be any more or less worrysome than small packet performance (eg the TCP_RR stuff I posted recently) being rather worse with TSO enabled than with it disabled? That, like any such thing shown by the batching changes, is a bug to fix. Possibly a bug - but you really should turn off TSO if you are doing huge interactive transactions (which is fair because there is a clear demarcation). The litmus test is the same as any change that is supposed to improve net performance - it has to demonstrate it is not intrusive and that it improves (consistently) performance. The standard metrics are {throughput, cpu-utilization, latency} i.e as long as one improves and others remain zero, it would make sense. Yes, i am religious for batching after all the invested sweat (and i continue to work on it hoping to demystify) - the theory makes a lot of sense. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2.6.23 RESEND] cxgb3 - Fix dev-priv usage
From: Divy Le Ray [EMAIL PROTECTED] cxgb3 used netdev_priv() and dev-priv for different purposes. In 2.6.23, netdev_priv() == dev-priv, cxgb3 needs a fix. This patch is a partial backport of Dave Miller's changes in the net-2.6.24 git branch. Without this fix, cxgb3 crashes on 2.6.23. Signed-off-by: Divy Le Ray [EMAIL PROTECTED] --- drivers/net/cxgb3/adapter.h | 10 +++ drivers/net/cxgb3/cxgb3_main.c| 126 + drivers/net/cxgb3/cxgb3_offload.c |6 +- drivers/net/cxgb3/sge.c | 23 --- drivers/net/cxgb3/t3cdev.h|3 - 5 files changed, 100 insertions(+), 68 deletions(-) diff --git a/drivers/net/cxgb3/adapter.h b/drivers/net/cxgb3/adapter.h index ab72563..c1dc344 100644 --- a/drivers/net/cxgb3/adapter.h +++ b/drivers/net/cxgb3/adapter.h @@ -50,7 +50,9 @@ typedef irqreturn_t(*intr_handler_t) (int, void *); struct vlan_group; +struct adapter; struct port_info { + struct adapter *adapter; struct vlan_group *vlan_grp; const struct port_type_info *port_type; u8 port_id; @@ -246,6 +248,14 @@ static inline void t3_write_reg(struct adapter *adapter, u32 reg_addr, u32 val) writel(val, adapter-regs + reg_addr); } +/* Get the t3cdev associated with a net_device */ +static inline struct t3cdev *dev2t3cdev(struct net_device *dev) +{ + const struct port_info *pi = netdev_priv(dev); + + return (struct t3cdev *)pi-adapter; +} + static inline struct port_info *adap2pinfo(struct adapter *adap, int idx) { return netdev_priv(adap-port[idx]); diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c index dc5d269..f3bf128 100644 --- a/drivers/net/cxgb3/cxgb3_main.c +++ b/drivers/net/cxgb3/cxgb3_main.c @@ -358,11 +358,14 @@ static int init_dummy_netdevs(struct adapter *adap) for (j = 0; j pi-nqsets - 1; j++) { if (!adap-dummy_netdev[dummy_idx]) { - nd = alloc_netdev(0, , ether_setup); + struct port_info *p; + + nd = alloc_netdev(sizeof(*p), , ether_setup); if (!nd) goto free_all; - nd-priv = adap; + p = netdev_priv(nd); + p-adapter = adap; nd-weight = 64; set_bit(__LINK_STATE_START, nd-state); adap-dummy_netdev[dummy_idx] = nd; @@ -482,7 +485,8 @@ static ssize_t attr_store(struct device *d, struct device_attribute *attr, #define CXGB3_SHOW(name, val_expr) \ static ssize_t format_##name(struct net_device *dev, char *buf) \ { \ - struct adapter *adap = dev-priv; \ + struct port_info *pi = netdev_priv(dev); \ + struct adapter *adap = pi-adapter; \ return sprintf(buf, %u\n, val_expr); \ } \ static ssize_t show_##name(struct device *d, struct device_attribute *attr, \ @@ -493,7 +497,8 @@ static ssize_t show_##name(struct device *d, struct device_attribute *attr, \ static ssize_t set_nfilters(struct net_device *dev, unsigned int val) { - struct adapter *adap = dev-priv; + struct port_info *pi = netdev_priv(dev); + struct adapter *adap = pi-adapter; int min_tids = is_offload(adap) ? MC5_MIN_TIDS : 0; if (adap-flags FULL_INIT_DONE) @@ -515,7 +520,8 @@ static ssize_t store_nfilters(struct device *d, struct device_attribute *attr, static ssize_t set_nservers(struct net_device *dev, unsigned int val) { - struct adapter *adap = dev-priv; + struct port_info *pi = netdev_priv(dev); + struct adapter *adap = pi-adapter; if (adap-flags FULL_INIT_DONE) return -EBUSY; @@ -556,9 +562,10 @@ static struct attribute_group cxgb3_attr_group = {.attrs = cxgb3_attrs }; static ssize_t tm_attr_show(struct device *d, struct device_attribute *attr, char *buf, int sched) { - ssize_t len; + struct port_info *pi = netdev_priv(to_net_dev(d)); + struct adapter *adap = pi-adapter; unsigned int v, addr, bpt, cpt; - struct adapter *adap = to_net_dev(d)-priv; + ssize_t len; addr = A_TP_TX_MOD_Q1_Q0_RATE_LIMIT - sched / 2; rtnl_lock(); @@ -581,10 +588,11 @@ static ssize_t tm_attr_show(struct device *d, struct device_attribute *attr, static ssize_t tm_attr_store(struct device *d, struct device_attribute *attr, const char *buf, size_t len, int sched) { + struct port_info *pi = netdev_priv(to_net_dev(d)); + struct adapter *adap = pi-adapter; + unsigned int val; char *endp; ssize_t ret; - unsigned int val; - struct adapter *adap = to_net_dev(d)-priv; if (!capable(CAP_NET_ADMIN)) return -EPERM; @@ -858,8
Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB
On Thu, 2007-23-08 at 18:04 -0400, jamal wrote: The litmus test is the same as any change that is supposed to improve net performance - it has to demonstrate it is not intrusive and that it improves (consistently) performance. The standard metrics are {throughput, cpu-utilization, latency} i.e as long as one improves and others remain zero, it would make sense. Yes, i am religious for batching after all the invested sweat (and i continue to work on it hoping to demystify) - the theory makes a lot of sense. Before someone jumps and strangles me ;- By litmus test i meant as applied to batching. [TSO already passed - iirc, it has been demostranted to really not add much to throughput (cant improve much over closeness to wire speed) but improve CPU utilization]. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 02/28] NET: Share correct feature code between bridging and bonding
-stable review patch. If anyone has any objections, please let us know. -- [NET]: Share correct feature code between bridging and bonding http://bugzilla.kernel.org/show_bug.cgi?id=8797 shows that the bonding driver may produce bogus combinations of the checksum flags and SG/TSO. For example, if you bond devices with NETIF_F_HW_CSUM and NETIF_F_IP_CSUM you'll end up with a bonding device that has neither flag set. If both have TSO then this produces an illegal combination. The bridge device on the other hand has the correct code to deal with this. In fact, the same code can be used for both. So this patch moves that logic into net/core/dev.c and uses it for both bonding and bridging. In the process I've made small adjustments such as only setting GSO_ROBUST if at least one constituent device supports it. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Acked-by: David S. Miller [EMAIL PROTECTED] Signed-off-by: Greg Kroah-Hartman [EMAIL PROTECTED] --- drivers/net/bonding/bond_main.c | 30 +- include/linux/netdevice.h |2 ++ net/bridge/br_device.c |3 ++- net/bridge/br_if.c | 28 net/core/dev.c | 38 ++ 5 files changed, 55 insertions(+), 46 deletions(-) --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1233,43 +1233,31 @@ int bond_sethwaddr(struct net_device *bo return 0; } -#define BOND_INTERSECT_FEATURES \ - (NETIF_F_SG | NETIF_F_ALL_CSUM | NETIF_F_TSO | NETIF_F_UFO) +#define BOND_VLAN_FEATURES \ + (NETIF_F_VLAN_CHALLENGED | NETIF_F_HW_VLAN_RX | NETIF_F_HW_VLAN_TX | \ +NETIF_F_HW_VLAN_FILTER) /* * Compute the common dev-feature set available to all slaves. Some - * feature bits are managed elsewhere, so preserve feature bits set on - * master device that are not part of the examined set. + * feature bits are managed elsewhere, so preserve those feature bits + * on the master device. */ static int bond_compute_features(struct bonding *bond) { - unsigned long features = BOND_INTERSECT_FEATURES; struct slave *slave; struct net_device *bond_dev = bond-dev; + unsigned long features = bond_dev-features ~BOND_VLAN_FEATURES; unsigned short max_hard_header_len = ETH_HLEN; int i; bond_for_each_slave(bond, slave, i) { - features = (slave-dev-features BOND_INTERSECT_FEATURES); + features = netdev_compute_features(features, + slave-dev-features); if (slave-dev-hard_header_len max_hard_header_len) max_hard_header_len = slave-dev-hard_header_len; } - if ((features NETIF_F_SG) - !(features NETIF_F_ALL_CSUM)) - features = ~NETIF_F_SG; - - /* -* features will include NETIF_F_TSO (NETIF_F_UFO) iff all -* slave devices support NETIF_F_TSO (NETIF_F_UFO), which -* implies that all slaves also support scatter-gather -* (NETIF_F_SG), which implies that features also includes -* NETIF_F_SG. So no need to check whether we have an -* illegal combination of NETIF_F_{TSO,UFO} and -* !NETIF_F_SG -*/ - - features |= (bond_dev-features ~BOND_INTERSECT_FEATURES); + features |= (bond_dev-features BOND_VLAN_FEATURES); bond_dev-features = features; bond_dev-hard_header_len = max_hard_header_len; --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1032,6 +1032,8 @@ extern void dev_seq_stop(struct seq_file extern void linkwatch_run_queue(void); +extern int netdev_compute_features(unsigned long all, unsigned long one); + static inline int net_gso_ok(int features, int gso_type) { int feature = gso_type NETIF_F_GSO_SHIFT; --- a/net/bridge/br_device.c +++ b/net/bridge/br_device.c @@ -179,5 +179,6 @@ void br_dev_setup(struct net_device *dev dev-priv_flags = IFF_EBRIDGE; dev-features = NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HIGHDMA | - NETIF_F_TSO | NETIF_F_NO_CSUM | NETIF_F_GSO_ROBUST; + NETIF_F_GSO_SOFTWARE | NETIF_F_NO_CSUM | + NETIF_F_GSO_ROBUST | NETIF_F_LLTX; } --- a/net/bridge/br_if.c +++ b/net/bridge/br_if.c @@ -360,35 +360,15 @@ int br_min_mtu(const struct net_bridge * void br_features_recompute(struct net_bridge *br) { struct net_bridge_port *p; - unsigned long features, checksum; + unsigned long features; - checksum = br-feature_mask NETIF_F_ALL_CSUM ? NETIF_F_NO_CSUM : 0; - features = br-feature_mask ~NETIF_F_ALL_CSUM; + features = br-feature_mask; list_for_each_entry(p, br-port_list, list) { - unsigned long feature = p-dev-features; - - if (checksum NETIF_F_NO_CSUM
Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB
From: jamal [EMAIL PROTECTED] Date: Thu, 23 Aug 2007 18:04:10 -0400 Possibly a bug - but you really should turn off TSO if you are doing huge interactive transactions (which is fair because there is a clear demarcation). I don't see how this can matter. TSO only ever does anything if you accumulate more than one MSS worth of data. And when that does happen, all it does is take whats in the send queue and send as much as possible at once. The packets are already built in big chunks, so there is no extra work to do. The card is going to send the things back to back and as fast as in the non-TSO case as well. It doesn't change application scheduling, and it absolutely does not penalize small sends by the application unless we have a bug somewhere. So I see no reason to disable TSO for any reason other than hardware implementation deficiencies. And for the drivers I am familiar with they do make smart default TSO enabling decisions based upon how well the chip does TSO. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB
jamal wrote: [TSO already passed - iirc, it has been demostranted to really not add much to throughput (cant improve much over closeness to wire speed) but improve CPU utilization]. In the one gig space sure, but in the 10 Gig space, TSO on/off does make a difference for throughput. rick jones - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB
On Thu, 2007-23-08 at 15:35 -0700, Rick Jones wrote: jamal wrote: [TSO already passed - iirc, it has been demostranted to really not add much to throughput (cant improve much over closeness to wire speed) but improve CPU utilization]. In the one gig space sure, but in the 10 Gig space, TSO on/off does make a difference for throughput. I am still so 1Gige;- I stand corrected again ;- cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: unable to handle kernel NULL pointer dereference - linux-2.6.22
[Adding netdev to CC] On 21/08/07, poison [EMAIL PROTECTED] wrote: Hello, after running a few instances of bittorent-curses on 2.6.22 - 2.6.22.3 it takes about 15min to 2hrs for my System to hang. 2.6.21.7 is definately fine, 2.6.21 probably (ran for 4hrs without hanging). If I'm lucky the Oops below makes it to my syslog (unfortunately SysRq-{p,s,i} doesn't work when it hangs, neither can I ssh into it): Aug 18 19:47:41 draco kernel: BUG: unable to handle kernel NULL pointer dereference at virtual address Aug 18 19:47:41 draco kernel: printing eip: Aug 18 19:47:41 draco kernel: c038fcba Aug 18 19:47:41 draco kernel: *pdpt = 33830001 Aug 18 19:47:41 draco kernel: *pde = Aug 18 19:47:41 draco kernel: Oops: 0002 [#1] Aug 18 19:47:41 draco kernel: SMP Aug 18 19:47:41 draco kernel: Modules linked in: snd_hda_intel snd_emu10k1 cls_u32 sch_sfq sch_htb snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss rfcomm hidp l2cap nfsd exportfs lockd sunrpc coretemp hwmon eeprom snd_rawmidi snd_ac97_codec hci_usb ac97_bus snd_seq_device snd_util_mem snd_pcm bluetooth snd_hwdep snd_timer snd snd_page_alloc i2c_i801 emu10k1_gp gameport i2c_core sg Aug 18 19:47:41 draco kernel: CPU:0 Aug 18 19:47:41 draco kernel: EIP:0060:[c038fcba]Not tainted VLI Aug 18 19:47:41 draco kernel: EFLAGS: 00210202 (2.6.22.2poison #14) Aug 18 19:47:41 draco kernel: EIP is at tcp_sendmsg+0x40a/0xb70 Aug 18 19:47:41 draco kernel: eax: ebx: ec5b807c ecx: c04b43a0 edx: ec5b807c Aug 18 19:47:41 draco kernel: esi: ec5b8000 edi: 0100 ebp: ec524180 esp: f3a11d30 Aug 18 19:47:41 draco kernel: ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Aug 18 19:47:41 draco kernel: Process bittorrent-curs (pid: 3974, ti=f3a1 task=f3a0e000 task.ti=f3a1) Aug 18 19:47:41 draco kernel: Stack: ebe562f5 000b f3a11d94 ec5b807c Aug 18 19:47:41 draco kernel:0001 00100100 f3a11f40 0040 0200 0200 04b6 Aug 18 19:47:41 draco kernel:08604707 00200200 f3e5c798 eeaa4b40 f3a0e000 01f5 00100100 Aug 18 19:47:41 draco kernel: Call Trace: Aug 18 19:47:41 draco kernel: [c03ac267] inet_sendmsg+0x37/0x70 Aug 18 19:47:41 draco kernel: [c03511ef] sock_sendmsg+0xbf/0xf0 Aug 18 19:47:41 draco kernel: [c012fe60] autoremove_wake_function+0x0/0x50 Aug 18 19:47:41 draco kernel: [c01188f0] default_wake_function+0x0/0x10 Aug 18 19:47:41 draco last message repeated 3 times Aug 18 19:47:41 draco kernel: [c015589d] find_extend_vma+0x1d/0x70 Aug 18 19:47:41 draco kernel: [c03515cf] sys_sendto+0x12f/0x180 Aug 18 19:47:41 draco kernel: [c0139dfc] futex_wake+0xac/0xd0 Aug 18 19:47:41 draco kernel: [c013a4dd] do_futex+0x6bd/0xbd0 Aug 18 19:47:41 draco kernel: [c0351653] sys_send+0x33/0x40 Aug 18 19:47:41 draco kernel: [c03525c2] sys_socketcall+0x142/0x280 Aug 18 19:47:41 draco kernel: [c0205d20] copy_to_user+0x30/0x60 Aug 18 19:47:41 draco kernel: [c0102a92] syscall_call+0x7/0xb Aug 18 19:47:41 draco kernel: === Aug 18 19:47:41 draco kernel: Code: 85 fb 06 00 00 80 ca 10 8b 83 94 00 00 00 88 53 68 f0 81 00 00 00 01 00 8b 44 24 18 ff 40 08 8b 54 24 18 8b 42 04 89 13 89 43 04 89 18 89 5a 04 8b 8e 2c 01 00 00 85 c9 0f 84 19 06 00 00 8b 83 Aug 18 19:47:41 draco kernel: EIP: [c038fcba] tcp_sendmsg+0x40a/0xb70 SS:ESP 0068:f3a11d30 Aug 18 19:47:51 draco kernel: Aug 18 19:47:51 draco kernel: Pid: 3812, comm:X Aug 18 19:47:51 draco kernel: EIP: 0060:[c014a4c2] CPU: 0 Aug 18 19:47:51 draco kernel: EIP is at __get_free_pages+0x22/0x40 Aug 18 19:47:51 draco kernel: EFLAGS: 3246Not tainted (2.6.22.2poison #14) Aug 18 19:47:51 draco kernel: EAX: 00d0 EBX: 00d0 ECX: c0496b40 EDX: Aug 18 19:47:51 draco kernel: ESI: EDI: f5ba1be4 EBP: f49a4d80 DS: 007b ES: 007b FS: 00d8 Aug 18 19:47:51 draco kernel: CR0: 8005003b CR2: b7384000 CR3: 37165000 CR4: 06f0 Aug 18 19:47:51 draco kernel: [c01734b6] __pollwait+0xa6/0x100 Aug 18 19:47:51 draco kernel: [c03c9597] unix_poll+0x17/0xa0 Aug 18 19:47:51 draco kernel: [c03500bc] sock_poll+0xc/0x10 Aug 18 19:47:51 draco kernel: [c0172bec] do_select+0x25c/0x490 Aug 18 19:47:51 draco kernel: [c0173410] __pollwait+0x0/0x100 Aug 18 19:47:51 draco kernel: [c01188f0] default_wake_function+0x0/0x10 Aug 18 19:47:51 draco last message repeated 19 times Aug 18 19:47:51 draco kernel: [c0172fe8] core_sys_select+0x1c8/0x2f0 Aug 18 19:47:51 draco kernel: [c0166a30] do_readv_writev+0x120/0x190 Aug 18 19:47:51 draco kernel: [c03503c0] sock_aio_write+0x0/0x110 Aug 18 19:47:51 draco kernel: [c017355d] sys_select+0x4d/0x1b0 Aug 18 19:47:51 draco kernel: [c0166adc] vfs_writev+0x3c/0x50 Aug 18 19:47:51 draco kernel: [c0166f97] sys_writev+0x47/0x80 Aug 18 19:47:51 draco kernel: [c0102a92] syscall_call+0x7/0xb Aug 18 19:47:51
2.6.22.5 forcedeth timeout hang
100% reproducible hang on xmit timeout. Just do a make -j4 modules on an nfs mounted kernel source. attached is the messages log berkley -- // E. F. Berkley Shands, MSc// ** Exegy Inc.** 349 Marshall Road, Suite 100 St. Louis , MO 63119 Direct: (314) 218-3600 X450 Cell: (314) 303-2546 Office: (314) 218-3600 Fax: (314) 218-3601 The Usual Disclaimer follows... This e-mail and any documents accompanying it may contain legally privileged and/or confidential information belonging to Exegy, Inc. Such information may be protected from disclosure by law. The information is intended for use by only the addressee. If you are not the intended recipient, you are hereby notified that any disclosure or use of the information is strictly prohibited. If you have received this e-mail in error, please immediately contact the sender by e-mail or phone regarding instructions for return or destruction and do not use or disclose the content to others. Aug 23 18:34:55 crash kernel: [30819.690155] NETDEV WATCHDOG: eth1: transmit timed out Aug 23 18:34:55 crash kernel: [30819.690162] eth1: Got tx_timeout. irq: 0036 Aug 23 18:34:55 crash kernel: [30819.690164] eth1: Ring at 16e086000 Aug 23 18:34:55 crash kernel: [30819.690166] eth1: Dumping tx registers Aug 23 18:34:55 crash kernel: [30819.690171] 0: 0036 00ff 0003 024e03ca Aug 23 18:34:55 crash kernel: [30819.690176] 20: 06255300 ff701365 Aug 23 18:34:55 crash kernel: [30819.690181] 40: 0420e20e a855 2e20 Aug 23 18:34:55 crash kernel: [30819.690186] 60: Aug 23 18:34:55 crash kernel: [30819.690192] 80: 003b0f3c 0001 0004 007f0020 061c 0001 0020 7f87 Aug 23 18:34:55 crash kernel: [30819.690197] a0: 0014050f 0016 5781e000 020a 0001 a800cccd fcf5 Aug 23 18:34:55 crash kernel: [30819.690203] c0: 1002 0001 0001 0001 0001 0001 0001 0001 Aug 23 18:34:55 crash kernel: [30819.690207] e0: 0001 0001 0001 0001 0001 0001 0001 0001 Aug 23 18:34:55 crash kernel: [30819.690213] 100: 6e086800 6e086000 007f00ff 8000 00010032 002c 6e0874c0 Aug 23 18:34:55 crash kernel: [30819.690220] 120: 6e086360 1ca37240 a000ffeb 6e0874cc 6e08636c 0fe08000 Aug 23 18:34:55 crash kernel: [30819.690225] 140: 00304120 80002600 0001 0001 Aug 23 18:34:55 crash kernel: [30819.690229] 160: Aug 23 18:34:55 crash kernel: [30819.690235] 180: 0016 0008 0194796d 8103 002a 3800 0194000f 0003 Aug 23 18:34:55 crash kernel: [30819.690241] 1a0: 0016 0008 0194796d 8103 002a 3800 0194000f 0003 Aug 23 18:34:55 crash kernel: [30819.690246] 1c0: 0016 0008 0194796d 8103 002a 3800 0194000f 0003 Aug 23 18:34:55 crash kernel: [30819.690252] 1e0: 0016 0008 0194796d 8103 002a 3800 0194000f 0003 Aug 23 18:34:55 crash kernel: [30819.690257] 200: Aug 23 18:34:55 crash kernel: [30819.690261] 220: Aug 23 18:34:55 crash kernel: [30819.690266] 240: Aug 23 18:34:55 crash kernel: [30819.690271] 260: fe020001 0100 7e020001 0100 Aug 23 18:34:55 crash kernel: [30819.690276] 280: Aug 23 18:34:55 crash kernel: [30819.690280] 2a0: Aug 23 18:34:55 crash kernel: [30819.690285] 2c0: 0001 0001 0001 Aug 23 18:34:55 crash kernel: [30819.690287] eth1: Dumping tx ring Aug 23 18:34:55 crash kernel: [30819.690292] 000: 8fd00892 2052 // 88115c92 2052 // 875ae892 2052 // 8a660492 2052 Aug 23 18:34:55 crash kernel: [30819.690298] 004: 0001 61fdb492 2052 // 8bf3f892 2052 // 8daa7092 2052 // 8fa29892 2052 Aug 23 18:34:55 crash kernel: [30819.690304] 008: 0001 0d558892 2052 // 8e0bf892 2052 // 8fd00492 2052 // 8d160092 2052 Aug 23 18:34:55 crash kernel: [30819.690310] 00c: 0001 27698092 2052 // 7fc6cc92 2052 // 8d03ec92 2052 // 88085492 2052 Aug 23 18:34:55 crash kernel: [30819.690317] 010: 850ee492 2052 // 8bba8c92 2052 // 0001 56108492 2052 //
[PATCH 14/30] net: Kill some unneeded allocation return value casts in libertas
kmalloc() and friends return void*, no need to cast it. Signed-off-by: Jesper Juhl [EMAIL PROTECTED] --- drivers/net/wireless/libertas/debugfs.c |2 +- drivers/net/wireless/libertas/ethtool.c |3 +-- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/net/wireless/libertas/debugfs.c b/drivers/net/wireless/libertas/debugfs.c index 715cbda..6ade63e 100644 --- a/drivers/net/wireless/libertas/debugfs.c +++ b/drivers/net/wireless/libertas/debugfs.c @@ -1839,7 +1839,7 @@ static ssize_t wlan_debugfs_write(struct file *f, const char __user *buf, char *p2; struct debug_data *d = (struct debug_data *)f-private_data; - pdata = (char *)kmalloc(cnt, GFP_KERNEL); + pdata = kmalloc(cnt, GFP_KERNEL); if (pdata == NULL) return 0; diff --git a/drivers/net/wireless/libertas/ethtool.c b/drivers/net/wireless/libertas/ethtool.c index 96f1974..7dad493 100644 --- a/drivers/net/wireless/libertas/ethtool.c +++ b/drivers/net/wireless/libertas/ethtool.c @@ -60,8 +60,7 @@ static int libertas_ethtool_get_eeprom(struct net_device *dev, // mutex_lock(priv-mutex); - adapter-prdeeprom = - (char *)kmalloc(eeprom-len+sizeof(regctrl), GFP_KERNEL); + adapter-prdeeprom = kmalloc(eeprom-len+sizeof(regctrl), GFP_KERNEL); if (!adapter-prdeeprom) return -ENOMEM; memcpy(adapter-prdeeprom, regctrl, sizeof(regctrl)); -- 1.5.2.2 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 16/30] net: Avoid pointless allocation casts in BSD compression module
The general kernel memory allocation functions return void pointers and there is no need to cast their return values. Signed-off-by: Jesper Juhl [EMAIL PROTECTED] --- drivers/net/bsd_comp.c |6 ++ 1 files changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/net/bsd_comp.c b/drivers/net/bsd_comp.c index 202d4a4..88edb98 100644 --- a/drivers/net/bsd_comp.c +++ b/drivers/net/bsd_comp.c @@ -406,8 +406,7 @@ static void *bsd_alloc (unsigned char *options, int opt_len, int decomp) * Allocate space for the dictionary. This may be more than one page in * length. */ -db-dict = (struct bsd_dict *) vmalloc (hsize * - sizeof (struct bsd_dict)); +db-dict = vmalloc(hsize * sizeof(struct bsd_dict)); if (!db-dict) { bsd_free (db); @@ -426,8 +425,7 @@ static void *bsd_alloc (unsigned char *options, int opt_len, int decomp) */ else { -db-lens = (unsigned short *) vmalloc ((maxmaxcode + 1) * - sizeof (db-lens[0])); +db-lens = vmalloc((maxmaxcode + 1) * sizeof(db-lens[0])); if (!db-lens) { bsd_free (db); -- 1.5.2.2 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Problem with implementation of TCP_DEFER_ACCEPT?
I'd welcome the views of those familiar with TCP_DEFER_ACCEPT on a recent issue I've worked on where connections between a Juniper DX (aka redline) load-balancer and Apache 2.2 cluster caused random connection failures. Today, after 2 weeks debugging the issue, we confirmed the problem was related to TCP_DEFER_ACCEPT. Part of the issue is caused by Juniper's implementation of persistent connections, but there remains a question as to whether the Linux kernel is correctly handling handshakes when a listening socket has TCP_DEFER_ACCEPT enabled. Upon reflection, and after having worked with the RFCs this past few weeks, I'm finding myself doubting the kernel's TCP_DEFER_ACCEPT implementation. Also, I'm unable to locate an RFC or other specification for TCP_DEFER_ACCEPT aka BSD's SO_ACCEPTFILTER - can you point me to one? The complete background and observations of the original problem and the workaround are available here: https://bugs.launchpad.net/ubuntu/+bug/134274 My specific concerns are explained in the following comments, for which I'd appreciate your views. An RFC 793 standard TCP handshake requires three packets: client SYN server LISTENING client SYN ACK server SYN_RECEIVED client ACK server ESTABLISHED client PSH ACK + data server TCP_DEFER_ACCEPT is designed to increase performance by reducing the number of TCP packets exchanged before the client can pass data: client SYN server LISTENING client SYN ACK server SYN_RECEIVED client PSH ACK + data server ESTABLISHED At present with TCP_DEFER_ACCEPT the kernel treats the RFC 793 handshake as invalid; dropping the ACK from the client without replying so the client doesn't know the server has in fact set it's internal ACKed flag. If the client doesn't send a packet containing data before the SYN_ACK time-outs finally expire the connection will be dropped. For a client obeying RFC 793 what we see is: client SYN server LISTENING client SYN ACK server SYN_RECEIVED (time-out 3s) server: inet_rsk(req)-acked = 1 client ACK server (discarded) client SYN ACK (DUP) server (time-out 6s) client ACK (DUP) server (discarded) client SYN ACK (DUP) server (time-out 12s) client ACK (DUP) server (discarded) client SYN ACK (DUP) server (time-out 24s) client ACK (DUP) server (discarded) client SYN ACK (DUP) server (time-out 48s) client ACK (DUP) server (discarded) client SYN ACK (DUP) server (time-out 96s) client ACK (DUP) server (discarded) server: half-open socket closed. With each client ACK being dropped by the kernel's TCP_DEFER_ACCEPT mechanism eventually the handshake fails after the 'SYN ACK' retries and time-outs expire. There is a case for arguing the kernel should be operating in an enhanced handshaking mode when TCP_DEFER_ACCEPT is enabled, not an alternative mode, and therefore should accept *both* RFC 793 and TCP_DEFER_ACCEPT. I've been unable to find a specification or RFC for implementing TCP_DEFER_ACCEPT aka BSD's SO_ACCEPTFILTER to give me firm guidance. It seems incorrect to penalise a client that is trying to complete the handshake according to the RFC 793 specification, especially as the client has no way of knowing ahead of time whether or not the server is operating deferred accept. --- net/ipv4/tcp_minisocks.c::tcp_check_req() implements the TCP_DEFER_ACCEPT check: /* If TCP_DEFER_ACCEPT is set, drop bare ACK. */ if (inet_csk(sk)-icsk_accept_queue.rskq_defer_accept TCP_SKB_CB(skb)-end_seq == tcp_rsk(req)-rcv_isn + 1) { inet_rsk(req)-acked = 1; return NULL; } Thanks TJ. Ubuntu ACPI Kernel Team - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] [IPROUTE2] ip: xfrm: Fix policy and state flags.
o Support policy flag with string format. Note that kernel defines only one name localok for the flag and it has not had any effect currently. o Support state flag value XFRM_STATE_NOPMTUDISC. o Fix to show detailed flags value when -s option is used. o Fix minor typo. Signed-off-by: Masahide NAKAMURA [EMAIL PROTECTED] --- ip/ipxfrm.c | 18 +--- ip/xfrm.h|1 + ip/xfrm_policy.c | 55 - ip/xfrm_state.c |6 +++- 4 files changed, 72 insertions(+), 8 deletions(-) diff --git a/ip/ipxfrm.c b/ip/ipxfrm.c index d9b0e3b..359a2d2 100644 --- a/ip/ipxfrm.c +++ b/ip/ipxfrm.c @@ -745,12 +745,13 @@ void xfrm_state_info_print(struct xfrm_usersa_info *xsinfo, fprintf(fp, flag ); XFRM_FLAG_PRINT(fp, flags, XFRM_STATE_NOECN, noecn); XFRM_FLAG_PRINT(fp, flags, XFRM_STATE_DECAP_DSCP, decap-dscp); + XFRM_FLAG_PRINT(fp, flags, XFRM_STATE_NOPMTUDISC, nopmtudisc); XFRM_FLAG_PRINT(fp, flags, XFRM_STATE_WILDRECV, wildrecv); if (flags) fprintf(fp, %x, flags); - if (show_stats 0) - fprintf(fp, (0x%s), strxf_mask8(flags)); } + if (show_stats 0) + fprintf(fp, (0x%s), strxf_mask8(xsinfo-flags)); fprintf(fp, %s, _SL_); xfrm_xfrma_print(tb, xsinfo-family, fp, buf); @@ -845,10 +846,19 @@ void xfrm_policy_info_print(struct xfrm_userpolicy_info *xpinfo, } fprintf(fp, ); - if (show_stats 0) { + if (show_stats 0) fprintf(fp, share %s , strxf_share(xpinfo-share)); - fprintf(fp, flag 0x%s, strxf_mask8(xpinfo-flags)); + + if (show_stats 0 || xpinfo-flags) { + __u8 flags = xpinfo-flags; + + fprintf(fp, flag ); + XFRM_FLAG_PRINT(fp, flags, XFRM_POLICY_LOCALOK, localok); + if (flags) + fprintf(fp, %x, flags); } + if (show_stats 0) + fprintf(fp, (0x%s), strxf_mask8(xpinfo-flags)); fprintf(fp, %s, _SL_); if (show_stats 0) diff --git a/ip/xfrm.h b/ip/xfrm.h index 71345b9..335c2a5 100644 --- a/ip/xfrm.h +++ b/ip/xfrm.h @@ -98,6 +98,7 @@ struct xfrm_filter { __u32 index_mask; __u8 action_mask; __u32 priority_mask; + __u8 policy_flags_mask; __u8 ptype; __u8 ptype_mask; diff --git a/ip/xfrm_policy.c b/ip/xfrm_policy.c index f4488ac..419ca67 100644 --- a/ip/xfrm_policy.c +++ b/ip/xfrm_policy.c @@ -54,10 +54,10 @@ static void usage(void) __attribute__((noreturn)); static void usage(void) { fprintf(stderr, Usage: ip xfrm policy { add | update } dir DIR SELECTOR [ index INDEX ] [ ptype PTYPE ]\n); - fprintf(stderr, [ action ACTION ] [ priority PRIORITY ] [ LIMIT-LIST ] [ TMPL-LIST ]\n); + fprintf(stderr, [ action ACTION ] [ priority PRIORITY ] [ flag FLAG-LIST ] [ LIMIT-LIST ] [ TMPL-LIST ]\n); fprintf(stderr, Usage: ip xfrm policy { delete | get } dir DIR [ SELECTOR | index INDEX ] [ ptype PTYPE ]\n); fprintf(stderr, Usage: ip xfrm policy { deleteall | list } [ dir DIR ] [ SELECTOR ]\n); - fprintf(stderr, [ index INDEX ] [ action ACTION ] [ priority PRIORITY ]\n); + fprintf(stderr, [ index INDEX ] [ action ACTION ] [ priority PRIORITY ] [ flag FLAG-LIST ]\n); fprintf(stderr, Usage: ip xfrm policy flush [ ptype PTYPE ]\n); fprintf(stderr, Usage: ip xfrm count\n); fprintf(stderr, PTYPE := [ main | sub ](default=main)\n); @@ -74,6 +74,9 @@ static void usage(void) //fprintf(stderr, PRIORITY - priority value(default=0)\n); + fprintf(stderr, FLAG-LIST := [ FLAG-LIST ] FLAG\n); + fprintf(stderr, FLAG := [ localok ]\n); + fprintf(stderr, LIMIT-LIST := [ LIMIT-LIST ] | [ limit LIMIT ]\n); fprintf(stderr, LIMIT := [ [time-soft|time-hard|time-use-soft|time-use-hard] SECONDS ] |\n); fprintf(stderr, [ [byte-soft|byte-hard] SIZE ] | [ [packet-soft|packet-hard] NUMBER ]\n); @@ -135,6 +138,39 @@ static int xfrm_policy_ptype_parse(__u8 *ptype, int *argcp, char ***argvp) return 0; } +static int xfrm_policy_flag_parse(__u8 *flags, int *argcp, char ***argvp) +{ + int argc = *argcp; + char **argv = *argvp; + int len = strlen(*argv); + + if (len 2 strncmp(*argv, 0x, 2) == 0) { + __u8 val = 0; + + if (get_u8(val, *argv, 16)) + invarg(\FLAG\ is invalid, *argv); + *flags = val; + } else { + while (1) { + if (strcmp(*argv, localok) == 0) + *flags |= XFRM_POLICY_LOCALOK; + else { + PREV_ARG(); /* back track */ + break; +
[PATCH 1/3] [IPROUTE2] ip: xfrm: Clean-up for internal mask to filter.
Remove unused or redundant usage for xfrm_filter. Signed-off-by: Masahide NAKAMURA [EMAIL PROTECTED] --- ip/xfrm_policy.c | 17 - ip/xfrm_state.c |2 -- 2 files changed, 0 insertions(+), 19 deletions(-) diff --git a/ip/xfrm_policy.c b/ip/xfrm_policy.c index c1086f1..f4488ac 100644 --- a/ip/xfrm_policy.c +++ b/ip/xfrm_policy.c @@ -222,16 +222,10 @@ static int xfrm_policy_modify(int cmd, unsigned flags, int argc, char **argv) NEXT_ARG(); xfrm_policy_dir_parse(req.xpinfo.dir, argc, argv); - - filter.dir_mask = XFRM_FILTER_MASK_FULL; - } else if (strcmp(*argv, index) == 0) { NEXT_ARG(); if (get_u32(req.xpinfo.index, *argv, 0)) invarg(\INDEX\ is invalid, *argv); - - filter.index_mask = XFRM_FILTER_MASK_FULL; - } else if (strcmp(*argv, ptype) == 0) { if (ptypep) duparg(ptype, *argv); @@ -239,9 +233,6 @@ static int xfrm_policy_modify(int cmd, unsigned flags, int argc, char **argv) NEXT_ARG(); xfrm_policy_ptype_parse(upt.type, argc, argv); - - filter.dir_mask = XFRM_FILTER_MASK_FULL; - } else if (strcmp(*argv, action) == 0) { NEXT_ARG(); if (strcmp(*argv, allow) == 0) @@ -250,16 +241,10 @@ static int xfrm_policy_modify(int cmd, unsigned flags, int argc, char **argv) req.xpinfo.action = XFRM_POLICY_BLOCK; else invarg(\action\ value is invalid\n, *argv); - - filter.action_mask = XFRM_FILTER_MASK_FULL; - } else if (strcmp(*argv, priority) == 0) { NEXT_ARG(); if (get_u32(req.xpinfo.priority, *argv, 0)) invarg(\PRIORITY\ is invalid, *argv); - - filter.priority_mask = XFRM_FILTER_MASK_FULL; - } else if (strcmp(*argv, limit) == 0) { NEXT_ARG(); xfrm_lifetime_cfg_parse(req.xpinfo.lft, argc, argv); @@ -888,8 +873,6 @@ static int xfrm_policy_flush(int argc, char **argv) NEXT_ARG(); xfrm_policy_ptype_parse(upt.type, argc, argv); - - filter.dir_mask = XFRM_FILTER_MASK_FULL; } else invarg(unknown, *argv); diff --git a/ip/xfrm_state.c b/ip/xfrm_state.c index 54e1330..2b68f49 100644 --- a/ip/xfrm_state.c +++ b/ip/xfrm_state.c @@ -216,8 +216,6 @@ static int xfrm_state_flag_parse(__u8 *flags, int *argcp, char ***argvp) } } - filter.state_flags_mask = XFRM_FILTER_MASK_FULL; - *argcp = argc; *argvp = argv; -- 1.4.4.2 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] [IPROUTE2] ip: xfrm: Fix flush message.
Fix xfrm state or policy flush message. And minor updates are included: o Use static buffer to show unknown value as string. o Show policy type (ptype) only when kernel specified it. o Clean-up xfrm_monitor. Signed-off-by: Masahide NAKAMURA [EMAIL PROTECTED] --- ip/ipxfrm.c | 48 + ip/xfrm.h |1 + ip/xfrm_monitor.c | 122 +--- ip/xfrm_state.c |1 - 4 files changed, 117 insertions(+), 55 deletions(-) diff --git a/ip/ipxfrm.c b/ip/ipxfrm.c index 359a2d2..80dbb52 100644 --- a/ip/ipxfrm.c +++ b/ip/ipxfrm.c @@ -114,6 +114,7 @@ struct typeent { static const struct typeent xfrmproto_types[]= { { esp, IPPROTO_ESP }, { ah, IPPROTO_AH }, { comp, IPPROTO_COMP }, { route2, IPPROTO_ROUTING }, { hao, IPPROTO_DSTOPTS }, + { ipsec-any, IPSEC_PROTO_ANY }, { NULL, -1 } }; @@ -135,6 +136,7 @@ int xfrm_xfrmproto_getbyname(char *name) const char *strxf_xfrmproto(__u8 proto) { + static char str[16]; int i; for (i = 0; ; i++) { @@ -146,7 +148,8 @@ const char *strxf_xfrmproto(__u8 proto) return t-t_name; } - return NULL; + sprintf(str, %u, proto); + return str; } static const struct typeent algo_types[]= { @@ -172,6 +175,7 @@ int xfrm_algotype_getbyname(char *name) const char *strxf_algotype(int type) { + static char str[32]; int i; for (i = 0; ; i++) { @@ -183,7 +187,8 @@ const char *strxf_algotype(int type) return t-t_name; } - return NULL; + sprintf(str, %d, type); + return str; } const char *strxf_mask8(__u8 mask) @@ -251,6 +256,25 @@ const char *strxf_proto(__u8 proto) return p; } +const char *strxf_ptype(__u8 ptype) +{ + static char str[16]; + + switch (ptype) { + case XFRM_POLICY_TYPE_MAIN: + strcpy(str, main); + break; + case XFRM_POLICY_TYPE_SUB: + strcpy(str, sub); + break; + default: + sprintf(str, %u, ptype); + break; + } + + return str; +} + void xfrm_id_info_print(xfrm_address_t *saddr, struct xfrm_id *id, __u8 mode, __u32 reqid, __u16 family, int force_spi, FILE *fp, const char *prefix, const char *title) @@ -776,7 +800,6 @@ void xfrm_policy_info_print(struct xfrm_userpolicy_info *xpinfo, const char *title) { char buf[STRBUF_SIZE]; - __u8 ptype = XFRM_POLICY_TYPE_MAIN; memset(buf, '\0', sizeof(buf)); @@ -821,31 +844,18 @@ void xfrm_policy_info_print(struct xfrm_userpolicy_info *xpinfo, fprintf(fp, index %u , xpinfo-index); fprintf(fp, priority %u , xpinfo-priority); - fprintf(fp, ptype ); - if (tb[XFRMA_POLICY_TYPE]) { struct xfrm_userpolicy_type *upt; + fprintf(fp, ptype ); + if (RTA_PAYLOAD(tb[XFRMA_POLICY_TYPE]) sizeof(*upt)) fprintf(fp, (ERROR truncated)); upt = (struct xfrm_userpolicy_type *)RTA_DATA(tb[XFRMA_POLICY_TYPE]); - ptype = upt-type; + fprintf(fp, %s , strxf_ptype(upt-type)); } - switch (ptype) { - case XFRM_POLICY_TYPE_MAIN: - fprintf(fp, main); - break; - case XFRM_POLICY_TYPE_SUB: - fprintf(fp, sub); - break; - default: - fprintf(fp, %u, ptype); - break; - } - fprintf(fp, ); - if (show_stats 0) fprintf(fp, share %s , strxf_share(xpinfo-share)); diff --git a/ip/xfrm.h b/ip/xfrm.h index 335c2a5..930bb3f 100644 --- a/ip/xfrm.h +++ b/ip/xfrm.h @@ -127,6 +127,7 @@ const char *strxf_mask8(__u8 mask); const char *strxf_mask32(__u32 mask); const char *strxf_share(__u8 share); const char *strxf_proto(__u8 proto); +const char *strxf_ptype(__u8 ptype); void xfrm_id_info_print(xfrm_address_t *saddr, struct xfrm_id *id, __u8 mode, __u32 reqid, __u16 family, int force_spi, FILE *fp, const char *prefix, const char *title); diff --git a/ip/xfrm_monitor.c b/ip/xfrm_monitor.c index bdbf4a6..dc12fca 100644 --- a/ip/xfrm_monitor.c +++ b/ip/xfrm_monitor.c @@ -50,12 +50,6 @@ static int xfrm_acquire_print(const struct sockaddr_nl *who, struct rtattr * tb[XFRMA_MAX+1]; __u16 family; - if (n-nlmsg_type != XFRM_MSG_ACQUIRE) { - fprintf(stderr, Not an acquire: %08x %08x %08x\n, - n-nlmsg_len, n-nlmsg_type, n-nlmsg_flags); - return 0; - } - len -= NLMSG_LENGTH(sizeof(*xacq)); if (len 0) { fprintf(stderr, BUG: wrong nlmsg len %d\n, len); @@ -108,6 +102,74 @@ static int xfrm_acquire_print(const struct sockaddr_nl *who,
[PATCH 0/3] [IPROUTE2] ip command updates
Hello, There are updates for ip command. They are almost minor fixes and are not changes about 2.6.23 new features. Please apply if it is not too late for next release. -- Masahide NAKAMURA - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/9] s2io, rename BIT macro
Jiri Slaby wrote: s2io, rename BIT macro BIT macro will be global definiton of (1x) Signed-off-by: Jiri Slaby [EMAIL PROTECTED] --- [snip] cnt++; if (cnt == 5) diff --git a/drivers/net/s2io.h b/drivers/net/s2io.h index 92983ee..448f899 100644 --- a/drivers/net/s2io.h +++ b/drivers/net/s2io.h @@ -14,7 +14,7 @@ #define _S2IO_H #define TBD 0 -#define BIT(loc) (0x8000ULL (loc)) +#define s2BIT(loc) (0x8000ULL (loc)) #define vBIT(val, loc, sz) (((u64)val) (64-loc-sz)) #define INV(d) ((d0xff)24) | (((d8)0xff)16) | (((d16)0xff)8)| ((d24)0xff) Sorry for the late response, but would it not be better/easier to use BIT() instead (or a global #define LLBIT(nr) (1ULL (nr))) and just recalculate the values? Richard Knutsson - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB
On Thu, 23 Aug 2007, Rick Jones wrote: jamal wrote: [TSO already passed - iirc, it has been demostranted to really not add much to throughput (cant improve much over closeness to wire speed) but improve CPU utilization]. In the one gig space sure, but in the 10 Gig space, TSO on/off does make a difference for throughput. Not too much. TSO enabled: [EMAIL PROTECTED] ~]# ethtool -k eth2 Offload parameters for eth2: rx-checksumming: on tx-checksumming: on scatter-gather: on tcp segmentation offload: on [EMAIL PROTECTED] ~]# nuttcp -w10m 192.168.88.16 11813.4375 MB / 10.00 sec = 9906.1644 Mbps 99 %TX 80 %RX TSO disabled: [EMAIL PROTECTED] ~]# ethtool -K eth2 tso off [EMAIL PROTECTED] ~]# ethtool -k eth2 Offload parameters for eth2: rx-checksumming: on tx-checksumming: on scatter-gather: on tcp segmentation offload: off [EMAIL PROTECTED] ~]# nuttcp -w10m 192.168.88.16 11818.2500 MB / 10.00 sec = 9910.0176 Mbps 100 %TX 78 %RX Pretty negligible difference it seems. This is with a 2.6.20.7 kernel, Myricom 10-GigE NICs, and 9000 byte jumbo frames, in a LAN environment. For grins, I also did a couple of tests with an MSS of 1460 to emulate a standard 1500 byte Ethernet MTU. TSO enabled: [EMAIL PROTECTED] ~]# ethtool -k eth2 Offload parameters for eth2: rx-checksumming: on tx-checksumming: on scatter-gather: on tcp segmentation offload: on [EMAIL PROTECTED] ~]# nuttcp -M1460 -w10m 192.168.88.16 5102.8503 MB / 10.06 sec = 4253.9124 Mbps 39 %TX 99 %RX TSO disabled: [EMAIL PROTECTED] ~]# ethtool -K eth2 tso off [EMAIL PROTECTED] ~]# ethtool -k eth2 Offload parameters for eth2: rx-checksumming: on tx-checksumming: on scatter-gather: on tcp segmentation offload: off [EMAIL PROTECTED] ~]# nuttcp -M1460 -w10m 192.168.88.16 5399.5625 MB / 10.00 sec = 4527.9070 Mbps 99 %TX 76 %RX Here you can see there is a major difference in the TX CPU utilization (99 % with TSO disabled versus only 39 % with TSO enabled), although the TSO disabled case was able to squeeze out a little extra performance from its extra CPU utilization. Interestingly, with TSO enabled, the receiver actually consumed more CPU than with TSO disabled, so I guess the receiver CPU saturation in that case (99 %) was what restricted its performance somewhat (this was consistent across a few test runs). -Bill - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
subscribe netdev - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB
On Thu, 23 Aug 2007 18:38:22 -0400 jamal [EMAIL PROTECTED] wrote: On Thu, 2007-23-08 at 15:30 -0700, David Miller wrote: From: jamal [EMAIL PROTECTED] Date: Thu, 23 Aug 2007 18:04:10 -0400 Possibly a bug - but you really should turn off TSO if you are doing huge interactive transactions (which is fair because there is a clear demarcation). I don't see how this can matter. TSO only ever does anything if you accumulate more than one MSS worth of data. I stand corrected then. cheers, jamal For most normal Internet TCP connections, you will see only 2 or 3 packets per TSO because of ACK clocking. If you turn off delayed ACK on the receiver it will be even less. A current hot topic of research is reducing the number of ACK's to make TCP work better over asymmetric links like 3G. -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] [02/10] pasemi_mac: Stop using the pci config space accessors for register read/writes
On Thu, 23 Aug 2007 13:13:10 -0500 Olof Johansson [EMAIL PROTECTED] wrote: out: - pci_dev_put(mac-iob_pdev); -out_put_dma_pdev: - pci_dev_put(mac-dma_pdev); -out_free_netdev: + if (mac-iob_pdev) + pci_dev_put(mac-iob_pdev); + if (mac-dma_pdev) + pci_dev_put(mac-dma_pdev); It is not documented as such (as far as I can see), but pci_dev_put is safe to call with NULL. And there are other places in the kernel that explicitly use that fact. -- Cheers, Stephen Rothwell[EMAIL PROTECTED] http://www.canb.auug.org.au/~sfr/ pgpSTX4qXTUGw.pgp Description: PGP signature
Re: Problem with implementation of TCP_DEFER_ACCEPT?
TJ wrote: client SYN server LISTENING client SYN ACK server SYN_RECEIVED (time-out 3s) server: inet_rsk(req)-acked = 1 client ACK server (discarded) client SYN ACK (DUP) server (time-out 6s) client ACK (DUP) server (discarded) client SYN ACK (DUP) server (time-out 12s) client ACK (DUP) server (discarded) client SYN ACK (DUP) server (time-out 24s) client ACK (DUP) server (discarded) client SYN ACK (DUP) server (time-out 48s) client ACK (DUP) server (discarded) client SYN ACK (DUP) server (time-out 96s) client ACK (DUP) server (discarded) server: half-open socket closed. With each client ACK being dropped by the kernel's TCP_DEFER_ACCEPT mechanism eventually the handshake fails after the 'SYN ACK' retries and time-outs expire. There is a case for arguing the kernel should be operating in an enhanced handshaking mode when TCP_DEFER_ACCEPT is enabled, not an alternative mode, and therefore should accept *both* RFC 793 and TCP_DEFER_ACCEPT. I've been unable to find a specification or RFC for implementing TCP_DEFER_ACCEPT aka BSD's SO_ACCEPTFILTER to give me firm guidance. It seems incorrect to penalise a client that is trying to complete the handshake according to the RFC 793 specification, especially as the client has no way of knowing ahead of time whether or not the server is operating deferred accept. Interesting problem. TCP_DEFER_ACCEPT does not conform to any standard I'm aware of. (In fact, I'd say it's in violation of RFC 793.) The implementation does exactly what it claims, though -- it allows a listener to be awakened only when data arrives on the socket. I think a more useful spec might have been allows a listener to be awakened only when data arrives on the socket, unless the specified timeout has expired. Once the timeout expires, it should process the embryonic connection as if TCP_DEFER_ACCEPT is not set. Unfortunately, I don't think we can retroactively change this definition, as an application might depend on data being available and do a non-blocking read() after the accept(), expecting data to be there. Is this worth trying to fix? Also, a listen socket with a backlog and TCP_DEFER_ACCEPT will have reqs sit in the backlog for the full defer timeout, even if they've received data, which is not really the right thing to do. I've attached a patch implementing this suggestion (compile tested only -- I think I got the logic right but it's late ;). Kind of ugly, and uses up a bit in struct inet_request_sock. Maybe can be done better... -John diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h index 62daf21..f9f64a5 100644 --- a/include/net/inet_sock.h +++ b/include/net/inet_sock.h @@ -72,7 +72,8 @@ struct inet_request_sock { sack_ok: 1, wscale_ok : 1, ecn_ok : 1, - acked : 1; + acked : 1, + deferred : 1; struct ip_options *opt; }; diff --git a/include/net/tcp.h b/include/net/tcp.h index 185c7ec..cad2490 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -978,6 +978,7 @@ static inline void tcp_openreq_init(struct request_sock *req, ireq-snd_wscale = rx_opt-snd_wscale; ireq-wscale_ok = rx_opt-wscale_ok; ireq-acked = 0; + ireq-deferred = 0; ireq-ecn_ok = 0; ireq-rmt_port = tcp_hdr(skb)-source; } diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index fbe7714..1207fb8 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -444,9 +444,6 @@ void inet_csk_reqsk_queue_prune(struct sock *parent, } } - if (queue-rskq_defer_accept) - max_retries = queue-rskq_defer_accept; - budget = 2 * (lopt-nr_table_entries / (timeout / interval)); i = lopt-clock_hand; @@ -455,7 +452,9 @@ void inet_csk_reqsk_queue_prune(struct sock *parent, while ((req = *reqp) != NULL) { if (time_after_eq(now, req-expires)) { if ((req-retrans thresh || -(inet_rsk(req)-acked req-retrans max_retries)) +(inet_rsk(req)-acked req-retrans max_retries) || +(inet_rsk(req)-deferred req-retrans + queue-rskq_defer_accept + max_retries)) !req-rsk_ops-rtx_syn_ack(parent, req, NULL)) { unsigned long timeo; diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index a12b08f..c4867f3 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -637,8 +637,10 @@ struct sock *tcp_check_req(struct sock *sk,struct sk_buff *skb, /*
Re: 2.6.22.5 forcedeth timeout hang
On Thu, Aug 23, 2007 at 06:48:23PM -0500, Mr. Berkley Shands wrote: 100% reproducible hang on xmit timeout. Just do a make -j4 modules on an nfs mounted kernel source. Most likely you also had the problem with 2.6.22.2 (maybe you have not tested this one, though). There were bug fixes for forcedeth introduced in this version, one of them being buggy. The patch below fixes it. Can you please give it a try ? If it does not fix the problem, please try 2.6.22.1 which does not include those changes. I'm interested because I have those changes pending for 2.6.20.17 too. diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c index 10f4e3b..1938d6d 100644 --- a/drivers/net/forcedeth.c +++ b/drivers/net/forcedeth.c @@ -552,7 +552,7 @@ union ring_type { #define PHY_OUI_MARVELL0x5043 #define PHY_OUI_CICADA 0x03f1 #define PHY_OUI_VITESSE0x01c1 -#define PHY_OUI_REALTEK0x01c1 +#define PHY_OUI_REALTEK0x0732 #define PHYID1_OUI_MASK0x03ff #define PHYID1_OUI_SHFT6 #define PHYID2_OUI_MASK0xfc00 Thanks, Willy - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH (take 2)] request_irq fix DEBUG_SHIRQ handling Re: 2.6.23-rc2-mm1: rtl8139 inconsistent lock state
On Thu, Aug 23, 2007 at 10:44:30AM +0200, Jarek Poplawski wrote: Andrew Morton pointed out that my changelog was unusable. Sorry! Here is a second try with the changelog and kernel version changed. ... (take 2) Subject: request_irq() - fix DEBUG_SHIRQ handling ... Signed-off-by: Jarek Poplawski [EMAIL PROTECTED] --- diff -Nurp 2.6.23-rc3-git6-/kernel/irq/manage.c 2.6.23-rc3-git6/kernel/irq/manage.c --- 2.6.23-rc3-git6-/kernel/irq/manage.c 2007-08-23 10:11:35.0 +0200 +++ 2.6.23-rc3-git6/kernel/irq/manage.c 2007-08-23 10:16:29.0 +0200 So, this time I f-ed the diff part: it's not exactly against 2.6.23-rc-git6. But, it's Andrew to blame: he should've known that some old slow chips can't do science and poetry at the same time. Sorry (for him)! Anyway, beside an offset, should be OK... Regards, Jarek P. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html