date:20070823

[PATCH 0/3] cxgb3 driver update

2007-08-23 Thread Divy Le Ray


Hi Jeff,

I'm submitting three more patches for inclusion in netdev#upstream.
These patches are built over the series I resent yesterday night.
The patch numbering reflects the stacking.

Here is a brief description:
-   avoid false positives in the xgmac hang workaround
-   Properly set the CQ_ERR bit in RDMA CQ contexts.
-   Update CQ context operations time out values

Cheers,
Divy


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 12/11] cxgb3 - remove false positive in xgmac workaround

2007-08-23 Thread Divy Le Ray

From: Divy Le Ray [EMAIL PROTECTED]

Qualify toggling of xgmac tx enable with not getting pause frames, 
we might not make forward progress because the peer is sending 
lots of pause frames.

Signed-off-by: Divy Le Ray [EMAIL PROTECTED]
---

 drivers/net/cxgb3/common.h |1 +
 drivers/net/cxgb3/xgmac.c  |4 +++-
 2 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/net/cxgb3/common.h b/drivers/net/cxgb3/common.h
index ff867c2..3e5b0db 100644
--- a/drivers/net/cxgb3/common.h
+++ b/drivers/net/cxgb3/common.h
@@ -514,6 +514,7 @@ struct cmac {
u64 rx_mcnt;
unsigned int toggle_cnt;
unsigned int txen;
+   u64 rx_pause;
struct mac_stats stats;
 };
 
diff --git a/drivers/net/cxgb3/xgmac.c b/drivers/net/cxgb3/xgmac.c
index 1d1c391..ff9e9dc 100644
--- a/drivers/net/cxgb3/xgmac.c
+++ b/drivers/net/cxgb3/xgmac.c
@@ -452,6 +452,7 @@ int t3_mac_enable(struct cmac *mac, int which)
A_XGM_TX_SPI4_SOP_EOP_CNT +
oft)));
mac-rx_mcnt = s-rx_frames;
+   mac-rx_pause = s-rx_pause;
mac-rx_xcnt = (G_TXSPI4SOPCNT(t3_read_reg(adap,
A_XGM_RX_SPI4_SOP_EOP_CNT +
oft)));
@@ -504,7 +505,7 @@ int t3b2_mac_watchdog_task(struct cmac *mac)
tx_xcnt = 1;/* By default tx_xcnt is making progress */
tx_tcnt = mac-tx_tcnt; /* If tx_mcnt is progressing ignore tx_tcnt */
rx_xcnt = 1;/* By default rx_xcnt is making progress */
-   if (tx_mcnt == mac-tx_mcnt) {
+   if (tx_mcnt == mac-tx_mcnt  mac-rx_pause == s-rx_pause) {
tx_xcnt = (G_TXSPI4SOPCNT(t3_read_reg(adap,
A_XGM_TX_SPI4_SOP_EOP_CNT +
mac-offset)));
@@ -560,6 +561,7 @@ out:
mac-tx_mcnt = s-tx_frames;
mac-rx_xcnt = rx_xcnt;
mac-rx_mcnt = s-rx_frames;
+   mac-rx_pause = s-rx_pause;
if (status == 1) {
t3_write_reg(adap, A_XGM_TX_CTRL + mac-offset, 0);
t3_read_reg(adap, A_XGM_TX_CTRL + mac-offset);  /* flush */
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 13/11] cxgb3 - Set the CQ_ERR bit in CQ contexts.

2007-08-23 Thread Divy Le Ray

From: Divy Le Ray [EMAIL PROTECTED]

The cxgb3 driver is incorrectly configuring the HW CQ context for CQ's
that use overflow-avoidance.  Namely the RDMA control CQ.  This results
in a bad DMA from the device to bus address 0.  The solution is to set
the CQ_ERR bit in the context for these types of CQs.

Signed-off-by: Divy Le Ray [EMAIL PROTECTED]
---

 drivers/net/cxgb3/sge_defs.h |4 
 drivers/net/cxgb3/t3_hw.c|3 ++-
 2 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/drivers/net/cxgb3/sge_defs.h b/drivers/net/cxgb3/sge_defs.h
index 514869e..29b6c80 100644
--- a/drivers/net/cxgb3/sge_defs.h
+++ b/drivers/net/cxgb3/sge_defs.h
@@ -106,6 +106,10 @@
 #define V_CQ_GEN(x) ((x)  S_CQ_GEN)
 #define F_CQ_GENV_CQ_GEN(1U)
 
+#define S_CQ_ERR30
+#define V_CQ_ERR(x) ((x)  S_CQ_ERR)
+#define F_CQ_ERRV_CQ_ERR(1U)
+
 #define S_CQ_OVERFLOW_MODE31
 #define V_CQ_OVERFLOW_MODE(x) ((x)  S_CQ_OVERFLOW_MODE)
 #define F_CQ_OVERFLOW_MODEV_CQ_OVERFLOW_MODE(1U)
diff --git a/drivers/net/cxgb3/t3_hw.c b/drivers/net/cxgb3/t3_hw.c
index 538b254..9358959 100644
--- a/drivers/net/cxgb3/t3_hw.c
+++ b/drivers/net/cxgb3/t3_hw.c
@@ -2043,7 +2043,8 @@ int t3_sge_init_cqcntxt(struct adapter *adapter, unsigned 
int id, u64 base_addr,
base_addr = 32;
t3_write_reg(adapter, A_SG_CONTEXT_DATA2,
 V_CQ_BASE_HI((u32) base_addr) | V_CQ_RSPQ(rspq) |
-V_CQ_GEN(1) | V_CQ_OVERFLOW_MODE(ovfl_mode));
+V_CQ_GEN(1) | V_CQ_OVERFLOW_MODE(ovfl_mode) |
+V_CQ_ERR(ovfl_mode));
t3_write_reg(adapter, A_SG_CONTEXT_DATA3, V_CQ_CREDITS(credits) |
 V_CQ_CREDIT_THRES(credit_thres));
return t3_sge_write_context(adapter, id, F_CQ);
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 14/11] cxgb3 - CQ context operations time out too soon.

2007-08-23 Thread Divy Le Ray

From: Divy Le Ray [EMAIL PROTECTED]

Currently, the driver only tries up to 5 times (5us) to get the results
of a CQ context operation.  Testing has shown the chip can take as much
as 50us to return the response on SG_CONTEXT_CMD operations.  So we up
the retry count to 100 to cover high loads.

Signed-off-by: Divy Le Ray [EMAIL PROTECTED]
---

 drivers/net/cxgb3/t3_hw.c |   19 +++
 1 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/net/cxgb3/t3_hw.c b/drivers/net/cxgb3/t3_hw.c
index 9358959..8f6efdb 100644
--- a/drivers/net/cxgb3/t3_hw.c
+++ b/drivers/net/cxgb3/t3_hw.c
@@ -1867,6 +1867,8 @@ void t3_port_intr_clear(struct adapter *adapter, int idx)
phy-ops-intr_clear(phy);
 }
 
+#define SG_CONTEXT_CMD_ATTEMPTS 100
+
 /**
  * t3_sge_write_context - write an SGE context
  * @adapter: the adapter
@@ -1886,7 +1888,7 @@ static int t3_sge_write_context(struct adapter *adapter, 
unsigned int id,
t3_write_reg(adapter, A_SG_CONTEXT_CMD,
 V_CONTEXT_CMD_OPCODE(1) | type | V_CONTEXT(id));
return t3_wait_op_done(adapter, A_SG_CONTEXT_CMD, F_CONTEXT_CMD_BUSY,
-  0, 5, 1);
+  0, SG_CONTEXT_CMD_ATTEMPTS, 1);
 }
 
 /**
@@ -2072,7 +2074,7 @@ int t3_sge_enable_ecntxt(struct adapter *adapter, 
unsigned int id, int enable)
t3_write_reg(adapter, A_SG_CONTEXT_CMD,
 V_CONTEXT_CMD_OPCODE(1) | F_EGRESS | V_CONTEXT(id));
return t3_wait_op_done(adapter, A_SG_CONTEXT_CMD, F_CONTEXT_CMD_BUSY,
-  0, 5, 1);
+  0, SG_CONTEXT_CMD_ATTEMPTS, 1);
 }
 
 /**
@@ -2096,7 +2098,7 @@ int t3_sge_disable_fl(struct adapter *adapter, unsigned 
int id)
t3_write_reg(adapter, A_SG_CONTEXT_CMD,
 V_CONTEXT_CMD_OPCODE(1) | F_FREELIST | V_CONTEXT(id));
return t3_wait_op_done(adapter, A_SG_CONTEXT_CMD, F_CONTEXT_CMD_BUSY,
-  0, 5, 1);
+  0, SG_CONTEXT_CMD_ATTEMPTS, 1);
 }
 
 /**
@@ -2120,7 +2122,7 @@ int t3_sge_disable_rspcntxt(struct adapter *adapter, 
unsigned int id)
t3_write_reg(adapter, A_SG_CONTEXT_CMD,
 V_CONTEXT_CMD_OPCODE(1) | F_RESPONSEQ | V_CONTEXT(id));
return t3_wait_op_done(adapter, A_SG_CONTEXT_CMD, F_CONTEXT_CMD_BUSY,
-  0, 5, 1);
+  0, SG_CONTEXT_CMD_ATTEMPTS, 1);
 }
 
 /**
@@ -2144,7 +2146,7 @@ int t3_sge_disable_cqcntxt(struct adapter *adapter, 
unsigned int id)
t3_write_reg(adapter, A_SG_CONTEXT_CMD,
 V_CONTEXT_CMD_OPCODE(1) | F_CQ | V_CONTEXT(id));
return t3_wait_op_done(adapter, A_SG_CONTEXT_CMD, F_CONTEXT_CMD_BUSY,
-  0, 5, 1);
+  0, SG_CONTEXT_CMD_ATTEMPTS, 1);
 }
 
 /**
@@ -2169,7 +2171,7 @@ int t3_sge_cqcntxt_op(struct adapter *adapter, unsigned 
int id, unsigned int op,
t3_write_reg(adapter, A_SG_CONTEXT_CMD, V_CONTEXT_CMD_OPCODE(op) |
 V_CONTEXT(id) | F_CQ);
if (t3_wait_op_done_val(adapter, A_SG_CONTEXT_CMD, F_CONTEXT_CMD_BUSY,
-   0, 5, 1, val))
+   0, SG_CONTEXT_CMD_ATTEMPTS, 1, val))
return -EIO;
 
if (op = 2  op  7) {
@@ -2179,7 +2181,8 @@ int t3_sge_cqcntxt_op(struct adapter *adapter, unsigned 
int id, unsigned int op,
t3_write_reg(adapter, A_SG_CONTEXT_CMD,
 V_CONTEXT_CMD_OPCODE(0) | F_CQ | V_CONTEXT(id));
if (t3_wait_op_done(adapter, A_SG_CONTEXT_CMD,
-   F_CONTEXT_CMD_BUSY, 0, 5, 1))
+   F_CONTEXT_CMD_BUSY, 0,
+   SG_CONTEXT_CMD_ATTEMPTS, 1))
return -EIO;
return G_CQ_INDEX(t3_read_reg(adapter, A_SG_CONTEXT_DATA0));
}
@@ -2205,7 +2208,7 @@ static int t3_sge_read_context(unsigned int type, struct 
adapter *adapter,
t3_write_reg(adapter, A_SG_CONTEXT_CMD,
 V_CONTEXT_CMD_OPCODE(0) | type | V_CONTEXT(id));
if (t3_wait_op_done(adapter, A_SG_CONTEXT_CMD, F_CONTEXT_CMD_BUSY, 0,
-   5, 1))
+   SG_CONTEXT_CMD_ATTEMPTS, 1))
return -EIO;
data[0] = t3_read_reg(adapter, A_SG_CONTEXT_DATA0);
data[1] = t3_read_reg(adapter, A_SG_CONTEXT_DATA1);
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] cxgb3 driver update

2007-08-23 Thread Al Viro

On Wed, Aug 22, 2007 at 11:35:20PM -0700, Divy Le Ray wrote:
 Hi Jeff,
 
 I'm submitting three more patches for inclusion in netdev#upstream.
 These patches are built over the series I resent yesterday night.
 The patch numbering reflects the stacking.
 
 Here is a brief description:
 -   avoid false positives in the xgmac hang workaround
 -   Properly set the CQ_ERR bit in RDMA CQ contexts.
 -   Update CQ context operations time out values

Speaking of cxgb3, could you explain what the hell is
static int do_term(struct t3cdev *dev, struct sk_buff *skb)
{
unsigned int hwtid = ntohl(skb-priority)  8  0xf;
doing?  AFAIK, skb-priority is not net-endian...

Another odd place is
int t3_seeprom_write(struct adapter *adapter, u32 addr, u32 data)
{   
u16 val;
int attempts = EEPROM_MAX_POLL;
unsigned int base = adapter-params.pci.vpd_cap_addr;

if ((addr = EEPROMSIZE  addr != EEPROM_STAT_ADDR) || (addr  3))   
return -EINVAL;

pci_write_config_dword(adapter-pdev, base + PCI_VPD_DATA,
   cpu_to_le32(data));
with callers like
int t3_seeprom_wp(struct adapter *adapter, int enable)
{
return t3_seeprom_write(adapter, EEPROM_STAT_ADDR, enable ? 0xc : 0);

IOW, you really get little-endian values passed to pci_write_config_dword()
and it expects a host-endian as the last argument...
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: eHEA driver issues from net-2.6.24

2007-08-23 Thread Jan-Bernd Themann

On Thursday 23 August 2007 00:20, Andrew Theurer wrote:
 David Miller wrote:
  From: Andrew Theurer [EMAIL PROTECTED]
  Date: Wed, 22 Aug 2007 16:55:03 -0500

  Thanks for finally getting to test this, I thought nobody
  would test this until it got merged into 2.6.24 :-/

Yes, sorry for the delay. 

  kernel BUG at include/linux/netdevice.h:318!
  enter ? for help
  [cf613e40] c03fe394 .net_rx_action+0x1b8/0x254
  [cf613ef0] c0057b70 .__do_softirq+0xa8/0x164
  [cf613f90] c0024438 .call_do_softirq+0x14/0x24
  [c00b8ffbf9f0] c000bd30 .do_softirq+0x68/0xac
  [c00b8ffbfa80] c0057cc4 .irq_exit+0x54/0x6c
  [c00b8ffbfb00] c000c358 .do_IRQ+0x170/0x1ac
  [c00b8ffbfb90] c0004780 hardware_interrupt_entry+0x18/0x98
  --- Exception: 501 (Hardware Interrupt) at c0010bdc 
  .cpu_idle+0x114/0x1e0
  [c00b8ffbfe80] c0010bd0 .cpu_idle+0x108/0x1e0 (unreliable)
  [c00b8ffbff00] c0026db0 .start_secondary+0x160/0x184
  [c00b8ffbff90] c0008364 .start_secondary_prolog+0xc/0x10

  I'm a little confused if the port_napi_enable() is being called when the 
  device is initialized, but then again, this is all new to me (should it 
  be called in ehea_open?).  I see it called on some reset routines, but 
  not on the first initialization.

  This is similar to the problem that Arnaldo hit a few minutes
  ago in the VIA Rhine driver.

  You can't only make a napi_enable() call when there has been
  a previous napi_disable().

  One way to fix this would be to forcefully napi_disable() on
  all the per-port NAPI structs at the beginning of ehea_open(),
  which should set things up to satisfy the pre-condition of the
  napi_enable() calls.

 OK, Ill try this. 

Let me fix this. I'll try to get it done today.

  You'll need to audit the entire driver to make sure this invariant
  is held properly.

  Also, on this code, in ehea_sense_port_attr()

  /* Number of default QPs */
  if (use_mcs)
  port-num_def_qps = cb0-num_default_qps;
  else
  port-num_def_qps = 1;

  When using napi, since we have multi-queue napi support now, wouldn't we 
  want to use all the default qps instead of 1?

  I don't know how this hardware works, you tell me :-)

 Heh, I don't know it well, either. Maybe Jan Bernd can chime in.

We'd like to keep the possibility to switch back to a single queue for now.
However, we could activate multi queue support as default now.
I'll include this in the patch.

 Thanks for your help,

 -Andrew

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

UDPv4 port allocation problem

2007-08-23 Thread Tóth László Attila

Hello,

I noticed that it is possible that the kernel allocates the same UDP
port to an application that was used and closed immediately before the
new application got it. This means that applications that do not specify
an exact port and rely on the  kernel to allocate a port for them might
see traffic originally meant for another application.

Imagine that two applications want to resolve a name in DNS at about the
same time. The following happens:
 * first app sends out the DNS query then closes the socket without
waiting for an answer (e.g. it got interrupted by Ctrl+C)
 * second app opens an UDP socket, and gets the same port, originally
assigned to app#1, sends out the DNS query
 * DNS server responds, the response goes to app#2

DNS might not be the perfect example, but you get the idea. 
Applications do not expect to receive data on newly opened sockets, not
to mention the security implications.

TCP on the other hand increases the allocated port number for each new
socket, the same behaviour for UDP would add certain amount of time that
decreases this risk.

Is the current behaviour intended?

Regards,
Laszlo Attila Toth
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: eHEA driver issues from net-2.6.24

2007-08-23 Thread David Miller

From: Jan-Bernd Themann [EMAIL PROTECTED]
Date: Thu, 23 Aug 2007 08:55:29 +0200

 We'd like to keep the possibility to switch back to a single queue
 for now.

Please do not do this, we already have way too much configurability
out there.

If you have the physical hardware queues enabled, use multiqueue napi
support.

If you add a knob to use or not use multi-napi, this makes life
more miserable for your users and your driver more complicated
and harder to maintain.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: eHEA driver issues from net-2.6.24

2007-08-23 Thread Jan-Bernd Themann

Hi David,

On Thursday 23 August 2007 10:17, David Miller wrote:
 From: Jan-Bernd Themann [EMAIL PROTECTED]
 Date: Thu, 23 Aug 2007 08:55:29 +0200
 
  We'd like to keep the possibility to switch back to a single queue
  for now.
 
 Please do not do this, we already have way too much configurability
 out there.

ok, we decided to remove the switch for kernel 2.6.24

Regards,
Jan-Bernd
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH -mm] ath5k: remove sysctl(2) support

2007-08-23 Thread Alexey Dobriyan

sysctl(2) is supported but frozen.

Signed-off-by: Alexey Dobriyan [EMAIL PROTECTED]
---

 drivers/net/wireless/ath5k_base.c |   21 ++---
 1 file changed, 6 insertions(+), 15 deletions(-)

--- a/drivers/net/wireless/ath5k_base.c
+++ b/drivers/net/wireless/ath5k_base.c
@@ -2438,21 +2438,12 @@ static struct pci_driver ath_pci_drv_id = {
.resume = ath_pci_resume,
 };
 
-/*
- * Static (i.e. global) sysctls.  Note that the hal sysctls
- * are located under ours by sharing the setting for DEV_ATH.
- */
-enum {
-   DEV_ATH = 9,/* XXX known by hal */
-};
-
 static int mincalibrate = 1;
 static int maxcalibrate = INT_MAX / 1000;
-#defineCTL_AUTO-2  /* cannot be CTL_ANY or CTL_NONE */
 
 static ctl_table ath_static_sysctls[] = {
 #if AR_DEBUG
-   { .ctl_name = CTL_AUTO,
+   {
  .procname = debug,
  .mode = 0644,
  .data = ath_debug,
@@ -2460,28 +2451,28 @@ static ctl_table ath_static_sysctls[] = {
  .proc_handler = proc_dointvec
},
 #endif
-   { .ctl_name = CTL_AUTO,
+   {
  .procname = countrycode,
  .mode = 0444,
  .data = countrycode,
  .maxlen   = sizeof(countrycode),
  .proc_handler = proc_dointvec
},
-   { .ctl_name = CTL_AUTO,
+   {
  .procname = outdoor,
  .mode = 0444,
  .data = outdoor,
  .maxlen   = sizeof(outdoor),
  .proc_handler = proc_dointvec
},
-   { .ctl_name = CTL_AUTO,
+   {
  .procname = xchanmode,
  .mode = 0444,
  .data = xchanmode,
  .maxlen   = sizeof(xchanmode),
  .proc_handler = proc_dointvec
},
-   { .ctl_name = CTL_AUTO,
+   {
  .procname = calibrate,
  .mode = 0644,
  .data = ath_calinterval,
@@ -2493,7 +2484,7 @@ static ctl_table ath_static_sysctls[] = {
{ 0 }
 };
 static ctl_table ath_ath_table[] = {
-   { .ctl_name = DEV_ATH,
+   {
  .procname = ath,
  .mode = 0555,
  .child= ath_static_sysctls

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH (take 2)] request_irq fix DEBUG_SHIRQ handling Re: 2.6.23-rc2-mm1: rtl8139 inconsistent lock state

2007-08-23 Thread Jarek Poplawski

Andrew Morton pointed out that my changelog was unusable. Sorry!
Here is a second try with the changelog and kernel version changed.

Regards,
Jarek P.

(take 2)

Subject: request_irq() - fix DEBUG_SHIRQ handling

Mariusz Kozlowski reported lockdep's warning:

 =
 [ INFO: inconsistent lock state ]
 2.6.23-rc2-mm1 #7
 -
 inconsistent {in-hardirq-W} - {hardirq-on-W} usage.
 ifconfig/5492 [HC0[0]:SC0[0]:HE1:SE1] takes:
  (tp-lock){+...}, at: [de8706e0] rtl8139_interrupt+0x27/0x46b [8139too]
 {in-hardirq-W} state was registered at:
   [c0138eeb] __lock_acquire+0x949/0x11ac
   [c01397e7] lock_acquire+0x99/0xb2
   [c0452ff3] _spin_lock+0x35/0x42
   [de8706e0] rtl8139_interrupt+0x27/0x46b [8139too]
   [c0147a5d] handle_IRQ_event+0x28/0x59
   [c01493ca] handle_level_irq+0xad/0x10b
   [c0105a13] do_IRQ+0x93/0xd0
   [c010441e] common_interrupt+0x2e/0x34
...
 other info that might help us debug this:
 1 lock held by ifconfig/5492:
  #0:  (rtnl_mutex){--..}, at: [c0451778] mutex_lock+0x1c/0x1f
 
 stack backtrace:
...
  [c0452ff3] _spin_lock+0x35/0x42
  [de8706e0] rtl8139_interrupt+0x27/0x46b [8139too]
  [c01480fd] free_irq+0x11b/0x146
  [de871d59] rtl8139_close+0x8a/0x14a [8139too]
  [c03bde63] dev_close+0x57/0x74
...

This shows that a driver's irq handler was running both in hard interrupt
and process contexts with irqs enabled. The latter was done during
free_irq() call and was possible only with CONFIG_DEBUG_SHIRQ enabled.
This was fixed by another patch.

But similar problem is possible with request_irq(): any locks taken from
irq handler could be vulnerable - especially with soft interrupts. This
patch fixes it by disabling local interrupts during handler's run. (It
seems, disabling softirqs should be enough, but it needs more checking
on possible races or other special cases).

This patch is recommended to all stable versions since 2.6.21, too.

Reported-by: Mariusz Kozlowski [EMAIL PROTECTED]
Signed-off-by: Jarek Poplawski [EMAIL PROTECTED]

---

diff -Nurp 2.6.23-rc3-git6-/kernel/irq/manage.c 
2.6.23-rc3-git6/kernel/irq/manage.c
--- 2.6.23-rc3-git6-/kernel/irq/manage.c2007-08-23 10:11:35.0 
+0200
+++ 2.6.23-rc3-git6/kernel/irq/manage.c 2007-08-23 10:16:29.0 +0200
@@ -555,14 +555,11 @@ int request_irq(unsigned int irq, irq_ha
 * We do this before actually registering it, to make sure that
 * a 'real' IRQ doesn't run in parallel with our fake
 */
-   if (irqflags  IRQF_DISABLED) {
-   unsigned long flags;
+   unsigned long flags;
 
-   local_irq_save(flags);
-   handler(irq, dev_id);
-   local_irq_restore(flags);
-   } else
-   handler(irq, dev_id);
+   local_irq_save(flags);
+   handler(irq, dev_id);
+   local_irq_restore(flags);
}
 #endif
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] E1000: Fix ifdown hang in git-2.6.24

2007-08-23 Thread Krishna Kumar

Doing napi_disable twice hangs ifdown of the device. e1000_down is the
common place to call napi_disable.

Signed-off-by: Krishna Kumar [EMAIL PROTECTED]
---
 e1000_main.c |4 
 1 files changed, 4 deletions(-)

diff -ruNp org/drivers/net/e1000/e1000_main.c new/drivers/net/e1000/e1000_main.c
--- org/drivers/net/e1000/e1000_main.c  2007-08-23 13:32:16.0 +0530
+++ new/drivers/net/e1000/e1000_main.c  2007-08-23 13:32:34.0 +0530
@@ -1477,10 +1477,6 @@ e1000_close(struct net_device *netdev)
 {
struct e1000_adapter *adapter = netdev_priv(netdev);
 
-#ifdef CONFIG_E1000_NAPI
-   napi_disable(adapter-napi);
-#endif
-
WARN_ON(test_bit(__E1000_RESETTING, adapter-flags));
e1000_down(adapter);
e1000_power_down_phy(adapter);
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] [RFC] E1000: Fix hang in netdev_wait_allrefs()

2007-08-23 Thread Krishna Kumar

After applying patch1, I started getting waiting for count messages when
doing ifdown. Not sure if this is the right fix since the count was already
showing as -1 in that message, but this patch fixes the problem.

Signed-off-by: Krishna Kumar [EMAIL PROTECTED]
---
 e1000_main.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletion(-)

diff -ruNp new/drivers/net/e1000/e1000_main.c 
new2/drivers/net/e1000/e1000_main.c
--- new/drivers/net/e1000/e1000_main.c  2007-08-23 13:32:34.0 +0530
+++ new2/drivers/net/e1000/e1000_main.c 2007-08-23 14:28:12.0 +0530
@@ -1219,12 +1219,13 @@ e1000_remove(struct pci_dev *pdev)
 * would have already happened in close and is redundant. */
e1000_release_hw_control(adapter);
 
-   unregister_netdev(netdev);
 #ifdef CONFIG_E1000_NAPI
for (i = 0; i  adapter-num_rx_queues; i++)
dev_put(adapter-polling_netdev[i]);
 #endif
 
+   unregister_netdev(netdev);
+
if (!e1000_check_phy_reset_block(adapter-hw))
e1000_phy_hw_reset(adapter-hw);
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 8/9] define global BIT macro

2007-08-23 Thread Ralf Baechle

On Sat, Aug 18, 2007 at 11:44:12AM +0200, Jiri Slaby wrote:

 define global BIT macro
 
 move all local BIT defines to the new globally define macro.
 
 Signed-off-by: Jiri Slaby [EMAIL PROTECTED]

Acked-by: Ralf Baechle [EMAIL PROTECTED]

for the MACE ethernet and MIPS bits.

  Ralf
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/1] NFS: change the ip_map cache code to handle IPv6 addresses

2007-08-23 Thread Aurélien Charbon


According to Neil's comments, I have tried to correct the mistakes of my first 
sending
Thank you for these comments Neil.

This is a small part of missing pieces of IPv6 support for the server.
It deals with the ip_map caching code part.

It changes the ip_map structure to be able to store INET6 addresses.
It adds also the changes in address hashing, and mapping to test it with INET 
addresses.

Signed-off-by: Aurelien Charbon [EMAIL PROTECTED]
---

fs/nfsd/export.c   |   10 ++-
fs/nfsd/nfsctl.c   |   21 ++-
include/linux/sunrpc/svcauth.h |4 -
include/net/ipv6.h |   17 +
net/sunrpc/svcauth_unix.c  |  121 
-

5 files changed, 129 insertions(+), 44 deletions(-)


diff -p -u -r -N linux-2.6.23-rc3/fs/nfsd/export.c 
linux-2.6.23-rc3-IPv6-ipmap-cache/fs/nfsd/export.c

--- linux-2.6.23-rc3/fs/nfsd/export.c2007-08-23 13:18:16.0 +0200
+++ linux-2.6.23-rc3-IPv6-ipmap-cache/fs/nfsd/export.c2007-08-23 
13:51:08.0 +0200

@@ -35,6 +35,7 @@
#include linux/lockd/bind.h
#include linux/sunrpc/msg_prot.h
#include linux/sunrpc/gss_api.h
+#include net/ipv6.h

#define NFSDDBG_FACILITYNFSDDBG_EXPORT

@@ -1559,6 +1560,7 @@ exp_addclient(struct nfsctl_client *ncp)
{
struct auth_domain*dom;
inti, err;
+struct in6_addr addr6;

/* First, consistency check. */
err = -EINVAL;
@@ -1577,9 +1579,11 @@ exp_addclient(struct nfsctl_client *ncp)
goto out_unlock;

/* Insert client into hashtable. */
-for (i = 0; i  ncp-cl_naddr; i++)
-auth_unix_add_addr(ncp-cl_addrlist[i], dom);
-
+for (i = 0; i  ncp-cl_naddr; i++) {
+/* Mapping address */
+ipv6_addr_map(ncp-cl_addrlist[i], addr6);
+auth_unix_add_addr(addr6, dom);
+}
auth_unix_forget_old(dom);
auth_domain_put(dom);

diff -p -u -r -N linux-2.6.23-rc3/fs/nfsd/nfsctl.c 
linux-2.6.23-rc3-IPv6-ipmap-cache/fs/nfsd/nfsctl.c

--- linux-2.6.23-rc3/fs/nfsd/nfsctl.c2007-08-23 13:18:16.0 +0200
+++ linux-2.6.23-rc3-IPv6-ipmap-cache/fs/nfsd/nfsctl.c2007-08-23 
13:25:28.0 +0200

@@ -222,7 +222,7 @@ static ssize_t write_getfs(struct file *
struct auth_domain *clp;
int err = 0;
struct knfsd_fh *res;
-
+struct in6_addr in6;
if (size  sizeof(*data))
return -EINVAL;
data = (struct nfsctl_fsparm*)buf;
@@ -236,7 +236,14 @@ static ssize_t write_getfs(struct file *
res = (struct knfsd_fh*)buf;

exp_readlock();
-if (!(clp = auth_unix_lookup(sin-sin_addr)))
+
+/* IPv6 address mapping */
+in6.s6_addr32[0] = 0;
+in6.s6_addr32[1] = 0;
+in6.s6_addr32[2] = htonl(0x);
+in6.s6_addr32[3] = (uint32_t)sin-sin_addr.s_addr;
+
+if (!(clp = auth_unix_lookup(in6)))
err = -EPERM;
else {
err = exp_rootfh(clp, data-gd_path, res, data-gd_maxlen);
@@ -253,6 +260,7 @@ static ssize_t write_getfd(struct file *
{
struct nfsctl_fdparm *data;
struct sockaddr_in *sin;
+struct in6_addr in6;
struct auth_domain *clp;
int err = 0;
struct knfsd_fh fh;
@@ -271,7 +279,14 @@ static ssize_t write_getfd(struct file *
res = buf;
sin = (struct sockaddr_in *)data-gd_addr;
exp_readlock();
-if (!(clp = auth_unix_lookup(sin-sin_addr)))
+
+/* IPv6 address mapping */
+in6.s6_addr32[0] = 0;
+in6.s6_addr32[1] = 0;
+in6.s6_addr32[2] = htonl(0x);
+in6.s6_addr32[3] = (uint32_t)sin-sin_addr.s_addr;
+
+if (!(clp = auth_unix_lookup(in6)))
err = -EPERM;
else {
err = exp_rootfh(clp, data-gd_path, fh, NFS_FHSIZE);
diff -p -u -r -N linux-2.6.23-rc3/include/linux/sunrpc/svcauth.h 
linux-2.6.23-rc3-IPv6-ipmap-cache/include/linux/sunrpc/svcauth.h
--- linux-2.6.23-rc3/include/linux/sunrpc/svcauth.h2007-08-23 
13:18:21.0 +0200
+++ linux-2.6.23-rc3-IPv6-ipmap-cache/include/linux/sunrpc/svcauth.h
2007-08-23 13:25:28.0 +0200

@@ -120,10 +120,10 @@ extern voidsvc_auth_unregister(rpc_auth

extern struct auth_domain *unix_domain_find(char *name);
extern void auth_domain_put(struct auth_domain *item);
-extern int auth_unix_add_addr(struct in_addr addr, struct auth_domain 
*dom);
+extern int auth_unix_add_addr(struct in6_addr addr, struct auth_domain 
*dom);
extern struct auth_domain *auth_domain_lookup(char *name, struct 
auth_domain *new);

extern struct auth_domain *auth_domain_find(char *name);
-extern struct auth_domain *auth_unix_lookup(struct in_addr addr);
+extern struct auth_domain *auth_unix_lookup(struct in6_addr addr);
extern int auth_unix_forget_old(struct auth_domain *dom);
extern void svcauth_unix_purge(void);
extern void svcauth_unix_info_release(void *);
diff -p -u -r -N linux-2.6.23-rc3/include/net/ipv6.h 
linux-2.6.23-rc3-IPv6-ipmap-cache/include/net/ipv6.h
--- linux-2.6.23-rc3/include/net/ipv6.h2007-08-23 13:18:23.0 
+0200
+++ linux-2.6.23-rc3-IPv6-ipmap-cache/include/net/ipv6.h2007-08-23

[PATCH net-2.6.24] introduce MAC_FMT/MAC_ARG

2007-08-23 Thread Johannes Berg

The two different wireless code bases both define macros to ease
printing MAC addresses:

printk(KERN_INFO MAC address is  MAC_FMT \n, MAC_ARG(addr));

This patch moves those macros to if_ether.h and uses them all over the
tree.

Signed-off-by: Johannes Berg [EMAIL PROTECTED]

---
 drivers/net/3c505.c |4 +---
 drivers/net/8139cp.c|   11 ++-
 drivers/net/82596.c |4 ++--
 drivers/net/a2065.c |4 +---
 drivers/net/acenic.c|6 ++
 drivers/net/ariadne.c   |4 +---
 drivers/net/dl2k.c  |6 ++
 drivers/net/forcedeth.c |   11 ---
 drivers/net/hp100.c |5 ++---
 drivers/net/hydra.c |6 ++
 drivers/net/ibmlana.c   |6 ++
 drivers/net/ioc3-eth.c  |5 ++---
 drivers/net/lguest_net.c|3 +--
 drivers/net/lib82596.c  |4 ++--
 drivers/net/macb.c  |6 ++
 drivers/net/meth.c  |4 +---
 drivers/net/mv643xx_eth.c   |5 ++---
 drivers/net/mvme147.c   |7 ++-
 drivers/net/myri_sbus.c |6 ++
 drivers/net/ns83820.c   |9 +++--
 drivers/net/pasemi_mac.c|5 ++---
 drivers/net/ps3_gelic_net.c |6 ++
 drivers/net/qla3xxx.c   |6 ++
 drivers/net/rionet.c|5 ++---
 drivers/net/s2io.c  |   10 ++
 drivers/net/skge.c  |6 ++
 drivers/net/sky2.c  |6 ++
 drivers/net/tsi108_eth.c|6 ++
 drivers/net/zorro8390.c |6 ++
 include/linux/etherdevice.h |1 +
 include/linux/if_ether.h|5 +
 include/net/ieee80211.h |5 -
 include/net/mac80211.h  |4 
 33 files changed, 62 insertions(+), 125 deletions(-)

--- netdev-2.6.orig/drivers/net/3c505.c 2007-08-22 20:33:10.921906163 +0200
+++ netdev-2.6/drivers/net/3c505.c  2007-08-22 20:40:01.011906163 +0200
@@ -1540,9 +1540,7 @@ static int __init elplus_setup(struct ne
 */
printk(KERN_INFO %s: 3c505 at %#lx, irq %d, dma %d, ,
   dev-name, dev-base_addr, dev-irq, dev-dma);
-   printk(addr %02x:%02x:%02x:%02x:%02x:%02x, ,
-  dev-dev_addr[0], dev-dev_addr[1], dev-dev_addr[2],
-  dev-dev_addr[3], dev-dev_addr[4], dev-dev_addr[5]);
+   printk(addr  MAC_FMT , , MAC_ARG(dev-dev_addr));
 
/*
 * read more information from the adapter
--- netdev-2.6.orig/drivers/net/8139cp.c2007-08-22 20:33:10.931906163 
+0200
+++ netdev-2.6/drivers/net/8139cp.c 2007-08-22 20:40:01.011906163 +0200
@@ -1961,15 +1961,8 @@ static int cp_init_one (struct pci_dev *
if (rc)
goto err_out_iomap;
 
-   printk (KERN_INFO %s: RTL-8139C+ at 0x%lx, 
-   %02x:%02x:%02x:%02x:%02x:%02x, 
-   IRQ %d\n,
-   dev-name,
-   dev-base_addr,
-   dev-dev_addr[0], dev-dev_addr[1],
-   dev-dev_addr[2], dev-dev_addr[3],
-   dev-dev_addr[4], dev-dev_addr[5],
-   dev-irq);
+   printk (KERN_INFO %s: RTL-8139C+ at 0x%lx,  MAC_FMT , IRQ %d\n,
+   dev-name, dev-base_addr, MAC_ARG(dev-dev_addr), dev-irq);
 
pci_set_drvdata(pdev, dev);
 
--- netdev-2.6.orig/drivers/net/82596.c 2007-08-22 20:33:10.941906163 +0200
+++ netdev-2.6/drivers/net/82596.c  2007-08-22 20:40:01.021906163 +0200
@@ -1561,8 +1561,8 @@ static void set_multicast_list(struct ne
for (dmi = dev-mc_list; cnt  dmi != NULL; dmi = dmi-next, 
cnt--, cp += 6) {
memcpy(cp, dmi-dmi_addr, 6);
if (i596_debug  1)
-   DEB(DEB_MULTI,printk(KERN_INFO %s: Adding 
address %02x:%02x:%02x:%02x:%02x:%02x\n,
-   dev-name, 
cp[0],cp[1],cp[2],cp[3],cp[4],cp[5]));
+   DEB(DEB_MULTI,printk(KERN_INFO %s: Adding 
address  MAC_FMT \n,
+   dev-name, MAC_ARG(cp));
}
i596_add_cmd(dev, cmd-cmd);
}
--- netdev-2.6.orig/drivers/net/a2065.c 2007-08-22 20:33:10.991906163 +0200
+++ netdev-2.6/drivers/net/a2065.c  2007-08-22 20:40:01.031906163 +0200
@@ -802,9 +802,7 @@ static int __devinit a2065_init_one(stru
zorro_set_drvdata(z, dev);
 
printk(KERN_INFO %s: A2065 at 0x%08lx, Ethernet Address 
-  %02x:%02x:%02x:%02x:%02x:%02x\n, dev-name, board,
-  dev-dev_addr[0], dev-dev_addr[1], dev-dev_addr[2],
-  dev-dev_addr[3], dev-dev_addr[4], dev-dev_addr[5]);
+  MAC_FMT \n, dev-name, board, MAC_ARG(dev-dev_addr));
 
return 0;
 }
--- netdev-2.6.orig/drivers/net/acenic.c2007-08-22 20:33:10.991906163 
+0200
+++ netdev-2.6/drivers/net/acenic.c 2007-08-22 20:40:01.031906163 +0200
@@ -1013,10 +1013,6 @@ static int __devinit ace_init(struct net
writel(mac1, regs-MacAddrHi);
writel(mac2, regs-MacAddrLo);
 
-

Re: [RFC IPROUTE]: Add flow classifier support

2007-08-23 Thread Patrick McHardy

David Miller wrote:
 From: Stephen Hemminger [EMAIL PROTECTED]
 Date: Wed, 22 Aug 2007 10:46:15 -0700

This patch is on hold since the netlink changes haven't made it upstream yet.

 I don't have the kernel side in my queue either, perhaps
 I lost it or I didn't see it when it was sent out.

 Patrick?

I didn't send it since I wasn't completely happy with it. Not sure
if I ever finished it, I'll look into it :)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] net/core: Fix crash in dev_mc_sync()/dev_mc_unsync()

2007-08-23 Thread Patrick McHardy

Benjamin Thery wrote:
 From: [EMAIL PROTECTED]
 Subject: net/core: Fix crash in dev_mc_sync()/dev_mc_unsync()

 This patch fixes a crash that may occur when the routine dev_mc_sync()
 deletes an address from the list it is currently going through. It 
 saves the pointer to the next element before deleting the current one.
 The problem may also exist in dev_mc_unsync().

 Signed-off-by: Benjamin Thery [EMAIL PROTECTED]

Looks good, thanks Benjamin.

Acked-by: Patrick McHardy [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] [-MM, FIX] e1000e: incorporate napi_struct changes from net-2.6.24.git

2007-08-23 Thread Auke Kok

This incorporates the new napi_struct changes into e1000e. Included
bugfix for ifdown hang from Krishna Kumar for e1000.

Signed-off-by: Auke Kok [EMAIL PROTECTED]
---

 drivers/net/e1000e/e1000.h  |2 ++
 drivers/net/e1000e/netdev.c |   35 ---
 2 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/drivers/net/e1000e/e1000.h b/drivers/net/e1000e/e1000.h
index e3cd877..ea6a9fe 100644
--- a/drivers/net/e1000e/e1000.h
+++ b/drivers/net/e1000e/e1000.h
@@ -196,6 +196,8 @@ struct e1000_adapter {
struct e1000_ring *tx_ring /* One per active queue */
cacheline_aligned_in_smp;
 
+   struct napi_struct napi;
+
unsigned long tx_queue_len;
unsigned int restart_queue;
u32 txd_cmd;
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 8ebe238..0e35d0a 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -1149,12 +1149,12 @@ static irqreturn_t e1000_intr_msi(int irq, void *data)
mod_timer(adapter-watchdog_timer, jiffies + 1);
}
 
-   if (netif_rx_schedule_prep(netdev)) {
+   if (netif_rx_schedule_prep(netdev, adapter-napi)) {
adapter-total_tx_bytes = 0;
adapter-total_tx_packets = 0;
adapter-total_rx_bytes = 0;
adapter-total_rx_packets = 0;
-   __netif_rx_schedule(netdev);
+   __netif_rx_schedule(netdev, adapter-napi);
} else {
atomic_dec(adapter-irq_sem);
}
@@ -1212,12 +1212,12 @@ static irqreturn_t e1000_intr(int irq, void *data)
mod_timer(adapter-watchdog_timer, jiffies + 1);
}
 
-   if (netif_rx_schedule_prep(netdev)) {
+   if (netif_rx_schedule_prep(netdev, adapter-napi)) {
adapter-total_tx_bytes = 0;
adapter-total_tx_packets = 0;
adapter-total_rx_bytes = 0;
adapter-total_rx_packets = 0;
-   __netif_rx_schedule(netdev);
+   __netif_rx_schedule(netdev, adapter-napi);
} else {
atomic_dec(adapter-irq_sem);
}
@@ -1663,10 +1663,10 @@ set_itr_now:
  * e1000_clean - NAPI Rx polling callback
  * @adapter: board private structure
  **/
-static int e1000_clean(struct net_device *poll_dev, int *budget)
+static int e1000_clean(struct napi_struct *napi, int budget)
 {
-   struct e1000_adapter *adapter;
-   int work_to_do = min(*budget, poll_dev-quota);
+   struct e1000_adapter *adapter = container_of(napi, struct 
e1000_adapter, napi);
+   struct net_device *poll_dev = adapter-netdev;
int tx_cleaned = 0, work_done = 0;
 
/* Must NOT use netdev_priv macro here. */
@@ -1685,17 +1685,15 @@ static int e1000_clean(struct net_device *poll_dev, int 
*budget)
spin_unlock(adapter-tx_queue_lock);
}
 
-   adapter-clean_rx(adapter, work_done, work_to_do);
-   *budget -= work_done;
-   poll_dev-quota -= work_done;
+   adapter-clean_rx(adapter, work_done, budget);
 
/* If no Tx and not enough Rx work done, exit the polling mode */
-   if ((!tx_cleaned  (work_done == 0)) ||
+   if ((tx_cleaned  (work_done  budget)) ||
   !netif_running(poll_dev)) {
 quit_polling:
if (adapter-itr_setting  3)
e1000_set_itr(adapter);
-   netif_rx_complete(poll_dev);
+   netif_rx_complete(poll_dev, napi);
if (test_bit(__E1000_DOWN, adapter-state))
atomic_dec(adapter-irq_sem);
else
@@ -1703,7 +1701,7 @@ quit_polling:
return 0;
}
 
-   return 1;
+   return work_done;
 }
 
 static void e1000_vlan_rx_add_vid(struct net_device *netdev, u16 vid)
@@ -2441,7 +2439,7 @@ int e1000e_up(struct e1000_adapter *adapter)
 
clear_bit(__E1000_DOWN, adapter-state);
 
-   netif_poll_enable(adapter-netdev);
+   napi_enable(adapter-napi);
e1000_irq_enable(adapter);
 
/* fire a link change interrupt to start the watchdog */
@@ -2474,7 +2472,7 @@ void e1000e_down(struct e1000_adapter *adapter)
e1e_flush();
msleep(10);
 
-   netif_poll_disable(netdev);
+   napi_disable(adapter-napi);
e1000_irq_disable(adapter);
 
del_timer_sync(adapter-watchdog_timer);
@@ -2607,7 +2605,7 @@ static int e1000_open(struct net_device *netdev)
/* From here on the code is the same as e1000e_up() */
clear_bit(__E1000_DOWN, adapter-state);
 
-   netif_poll_enable(netdev);
+   napi_enable(adapter-napi);
 
e1000_irq_enable(adapter);
 
@@ -4102,8 +4100,7 @@ static int __devinit e1000_probe(struct pci_dev *pdev,
e1000e_set_ethtool_ops(netdev);
netdev-tx_timeout  = e1000_tx_timeout;
netdev-watchdog_timeo  = 5 * HZ;
-   netdev-poll

Re: [PATCH 1/1] NFS: change the ip_map cache code to handle IPv6 addresses

2007-08-23 Thread Brian Haley


Hi Aurelien,

Aurélien Charbon wrote:
According to Neil's comments, I have tried to correct the mistakes of my 
first sending


I have some more comments.


@@ -1559,6 +1560,7 @@ exp_addclient(struct nfsctl_client *ncp)
{
struct auth_domain*dom;
inti, err;
+struct in6_addr addr6;


Indentation looks wrong.

diff -p -u -r -N linux-2.6.23-rc3/fs/nfsd/nfsctl.c 
linux-2.6.23-rc3-IPv6-ipmap-cache/fs/nfsd/nfsctl.c
--- linux-2.6.23-rc3/fs/nfsd/nfsctl.c2007-08-23 13:18:16.0 
+0200
+++ linux-2.6.23-rc3-IPv6-ipmap-cache/fs/nfsd/nfsctl.c2007-08-23 
13:25:28.0 +0200

@@ -222,7 +222,7 @@ static ssize_t write_getfs(struct file *
struct auth_domain *clp;
int err = 0;
struct knfsd_fh *res;
-
+struct in6_addr in6;


Indentation.


if (size  sizeof(*data))
return -EINVAL;
data = (struct nfsctl_fsparm*)buf;
@@ -236,7 +236,14 @@ static ssize_t write_getfs(struct file *
res = (struct knfsd_fh*)buf;

exp_readlock();
-if (!(clp = auth_unix_lookup(sin-sin_addr)))
+
+/* IPv6 address mapping */
+in6.s6_addr32[0] = 0;
+in6.s6_addr32[1] = 0;
+in6.s6_addr32[2] = htonl(0x);
+in6.s6_addr32[3] = (uint32_t)sin-sin_addr.s_addr;


Why didn't you use your new ipv6_addr_map() inline here?


@@ -253,6 +260,7 @@ static ssize_t write_getfd(struct file *
{
struct nfsctl_fdparm *data;
struct sockaddr_in *sin;
+struct in6_addr in6;


Indentation.


@@ -271,7 +279,14 @@ static ssize_t write_getfd(struct file *
res = buf;
sin = (struct sockaddr_in *)data-gd_addr;
exp_readlock();
-if (!(clp = auth_unix_lookup(sin-sin_addr)))
+
+/* IPv6 address mapping */
+in6.s6_addr32[0] = 0;
+in6.s6_addr32[1] = 0;
+in6.s6_addr32[2] = htonl(0x);
+in6.s6_addr32[3] = (uint32_t)sin-sin_addr.s_addr;


Why didn't you use your new ipv6_addr_map() inline here too?

diff -p -u -r -N linux-2.6.23-rc3/include/net/ipv6.h 
linux-2.6.23-rc3-IPv6-ipmap-cache/include/net/ipv6.h
--- linux-2.6.23-rc3/include/net/ipv6.h2007-08-23 13:18:23.0 
+0200
+++ linux-2.6.23-rc3-IPv6-ipmap-cache/include/net/ipv6.h2007-08-23 
13:25:28.0 +0200

@@ -21,6 +21,7 @@
#include net/ndisc.h
#include net/flow.h
#include net/snmp.h
+#include linux/in.h

#define SIN6_LEN_RFC213324

@@ -167,6 +168,12 @@ DECLARE_SNMP_STAT(struct udp_mib, udplit
if (is_udplite) SNMP_INC_STATS_USER(udplite_stats_in6, 
field); \

elseSNMP_INC_STATS_USER(udp_stats_in6, field);} while(0)

+#define IS_ADDR_MAPPED(a) \
+(((uint32_t *) (a))[0] == 0\
+ ((uint32_t *) (a))[1] == 0\
+ (((uint32_t *) (a))[2] == 0\
+|| ((uint32_t *) (a))[2] == htonl(0x)))


I need to update a patch of mine that added a v4-mapped inline, let me 
send that out.  In the kernel you should use u32 too, is that why you 
needed to include linux/net.h?



+/* Maps a IPv4 address into a wright IPv6 address */
+static inline int ipv6_addr_map(const struct in_addr a1, struct 
in6_addr a2)

+{
+a2.s6_addr32[0] = 0;
+a2.s6_addr32[1] = 0;
+a2.s6_addr32[2] = htonl(0x);
+a2.s6_addr32[3] = (uint32_t)a1.s_addr;
+return 0;
+}


This can be void, noone ever checks the return status.  Maybe change the 
name to ipv6_addr_v4map() too?



@@ -84,7 +85,7 @@ static void svcauth_unix_domain_release(
struct ip_map {
struct cache_headh;
charm_class[8]; /* e.g. nfsd */
-struct in_addrm_addr;
+struct in6_addrm_addr;


Indentation.


static void ip_map_init(struct cache_head *cnew, struct cache_head *citem)
{
@@ -125,7 +133,7 @@ static void ip_map_init(struct cache_hea
struct ip_map *item = container_of(citem, struct ip_map, h);

strcpy(new-m_class, item-m_class);
-new-m_addr.s_addr = item-m_addr.s_addr;
+memcpy((new-m_addr), (item-m_addr), sizeof(struct in6_addr));


Use ipv6_addr_copy().


@@ -151,20 +159,22 @@ static void ip_map_request(struct cache_
{
char text_addr[20];
struct ip_map *im = container_of(h, struct ip_map, h);
-__be32 addr = im-m_addr.s_addr;
-
-snprintf(text_addr, 20, %u.%u.%u.%u,
- ntohl(addr)  24  0xff,
- ntohl(addr)  16  0xff,
- ntohl(addr)   8  0xff,
- ntohl(addr)   0  0xff);

+if (IS_ADDR_MAPPED(im-m_addr.s6_addr32)) {
+snprintf(text_addr, 20, NIPQUAD_FMT,
+ntohl(im-m_addr.s6_addr32[3])  24  0xff,
+ntohl(im-m_addr.s6_addr32[3])  16  0xff,
+ntohl(im-m_addr.s6_addr32[3])   8  0xff,
+ntohl(im-m_addr.s6_addr32[3])   0  0xff);
+} else {
+snprintf(text_addr, 20, NIP6_FMT, NIP6(im-m_addr));
+}


You'll need more than 20 bytes to print an IPv6 address, I'd make this 
at least 44 to account for some fluff.  Surprised you didn't crash 
during testing.



static int ip_map_parse(struct cache_detail *cd,
@@ -175,10 +185,10 @@ static int ip_map_parse(struct cache_det

Re: [PATCH 1/1] NFS: change the ip_map cache code to handle IPv6 addresses

2007-08-23 Thread Chuck Lever


Hi Aurélien-

Aurélien Charbon wrote:

According to Neil's comments, I have tried to correct the mistakes of my first 
sending
Thank you for these comments Neil.

This is a small part of missing pieces of IPv6 support for the server.
It deals with the ip_map caching code part.

It changes the ip_map structure to be able to store INET6 addresses.
It adds also the changes in address hashing, and mapping to test it with INET 
addresses.

Signed-off-by: Aurelien Charbon [EMAIL PROTECTED]
---

 fs/nfsd/export.c   |   10 ++-
 fs/nfsd/nfsctl.c   |   21 ++-
 include/linux/sunrpc/svcauth.h |4 -
 include/net/ipv6.h |   17 +
 net/sunrpc/svcauth_unix.c  |  121 
-

 5 files changed, 129 insertions(+), 44 deletions(-)


diff -p -u -r -N linux-2.6.23-rc3/fs/nfsd/export.c 
linux-2.6.23-rc3-IPv6-ipmap-cache/fs/nfsd/export.c

--- linux-2.6.23-rc3/fs/nfsd/export.c2007-08-23 13:18:16.0 +0200
+++ linux-2.6.23-rc3-IPv6-ipmap-cache/fs/nfsd/export.c2007-08-23 
13:51:08.0 +0200

@@ -35,6 +35,7 @@
 #include linux/lockd/bind.h
 #include linux/sunrpc/msg_prot.h
 #include linux/sunrpc/gss_api.h
+#include net/ipv6.h
 
 #define NFSDDBG_FACILITYNFSDDBG_EXPORT
 
@@ -1559,6 +1560,7 @@ exp_addclient(struct nfsctl_client *ncp)

 {
 struct auth_domain*dom;
 inti, err;
+struct in6_addr addr6;
 
 /* First, consistency check. */

 err = -EINVAL;
@@ -1577,9 +1579,11 @@ exp_addclient(struct nfsctl_client *ncp)
 goto out_unlock;
 
 /* Insert client into hashtable. */

-for (i = 0; i  ncp-cl_naddr; i++)
-auth_unix_add_addr(ncp-cl_addrlist[i], dom);
-
+for (i = 0; i  ncp-cl_naddr; i++) {
+/* Mapping address */
+ipv6_addr_map(ncp-cl_addrlist[i], addr6);
+auth_unix_add_addr(addr6, dom);
+}
 auth_unix_forget_old(dom);
 auth_domain_put(dom);
 
diff -p -u -r -N linux-2.6.23-rc3/fs/nfsd/nfsctl.c 
linux-2.6.23-rc3-IPv6-ipmap-cache/fs/nfsd/nfsctl.c

--- linux-2.6.23-rc3/fs/nfsd/nfsctl.c2007-08-23 13:18:16.0 +0200
+++ linux-2.6.23-rc3-IPv6-ipmap-cache/fs/nfsd/nfsctl.c2007-08-23 
13:25:28.0 +0200

@@ -222,7 +222,7 @@ static ssize_t write_getfs(struct file *
 struct auth_domain *clp;
 int err = 0;
 struct knfsd_fh *res;
-
+struct in6_addr in6;
 if (size  sizeof(*data))
 return -EINVAL;
 data = (struct nfsctl_fsparm*)buf;
@@ -236,7 +236,14 @@ static ssize_t write_getfs(struct file *
 res = (struct knfsd_fh*)buf;
 
 exp_readlock();

-if (!(clp = auth_unix_lookup(sin-sin_addr)))
+
+/* IPv6 address mapping */
+in6.s6_addr32[0] = 0;
+in6.s6_addr32[1] = 0;
+in6.s6_addr32[2] = htonl(0x);
+in6.s6_addr32[3] = (uint32_t)sin-sin_addr.s_addr;
+
+if (!(clp = auth_unix_lookup(in6)))
 err = -EPERM;
 else {
 err = exp_rootfh(clp, data-gd_path, res, data-gd_maxlen);
@@ -253,6 +260,7 @@ static ssize_t write_getfd(struct file *
 {
 struct nfsctl_fdparm *data;
 struct sockaddr_in *sin;
+struct in6_addr in6;
 struct auth_domain *clp;
 int err = 0;
 struct knfsd_fh fh;
@@ -271,7 +279,14 @@ static ssize_t write_getfd(struct file *
 res = buf;
 sin = (struct sockaddr_in *)data-gd_addr;
 exp_readlock();
-if (!(clp = auth_unix_lookup(sin-sin_addr)))
+
+/* IPv6 address mapping */
+in6.s6_addr32[0] = 0;
+in6.s6_addr32[1] = 0;
+in6.s6_addr32[2] = htonl(0x);
+in6.s6_addr32[3] = (uint32_t)sin-sin_addr.s_addr;


The code canonicalizes IPv4 addresses in several places.  Is there 
already a generic function defined somewhere to do this?  If not, it 
might make sense to add one.
begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
url:http://oss.oracle.com/~cel
version:2.1
end:vcard

Re: [2.6.20.17 review 35/58] forcedeth bug fix: realtek phy

2007-08-23 Thread Prakash Punnoor

On the day of Thursday 23 August 2007 Greg KH hast written:
 On Wed, Aug 22, 2007 at 10:42:25PM +0200, Willy Tarreau wrote:
  On Wed, Aug 22, 2007 at 08:15:03PM +0200, Prakash Punnoor wrote:
   Hi,
  
   even if Greg is waiting for some special invitation
   (http://lkml.org/lkml/2007/8/14/229), I suggest putting this patch by
   Ayaz on top:
  
   http://lkml.org/lkml/2007/8/10/296
 
  That's what I prepare first, but then noticed it's not in mainline.
 
   Perhaps Ayaz wants to give Greg the clarification he needs... :sigh:
 
  He should, as the fix is not in mainline either :-(
  I don't think Greg asks for specific clarification, just a plain patch
  with a short commit log on its own which does not include remains of
  older mails.

 Exactly, that is what I am waiting for.

 And also I need the change to go into mainline first, as we can not
 diverge with the -stable releases.

Can we get that into mainline then? I haven't seen forcedeth in MAINTAINERS, 
so I added netdev to the cc list.

bye,
-- 
(°= =°)
//\ Prakash Punnoor /\\
V_/ \_V


signature.asc
Description: This is a digitally signed message part.

Re: [PATCH net-2.6.24] introduce MAC_FMT/MAC_ARG

2007-08-23 Thread Joe Perches

On Wed, 2007-08-22 at 20:46 +0200, Johannes Berg wrote:
 The two different wireless code bases both define macros to ease
 printing MAC addresses:

There are also several different uses of the equivalent of

printk(%02x,addr[0])
for (i=1; i6; i++)
printk(:%02x,addr[i]);

to print an ethernet MAC address.

http://www.uwsg.iu.edu/hypermail/linux/net/0602.1/0002.html

As not all device MAC addresses are 6 bytes, colon separated,
perhaps an appropriate ethernet/tr MAC designation is EUI48.

http://standards.ieee.org/regauth/oui/tutorials/EUI48.html

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] E1000: Fix ifdown hang in git-2.6.24

2007-08-23 Thread Kok, Auke


Krishna Kumar wrote:

Doing napi_disable twice hangs ifdown of the device. e1000_down is the
common place to call napi_disable.

Signed-off-by: Krishna Kumar [EMAIL PROTECTED]
---
 e1000_main.c |4 
 1 files changed, 4 deletions(-)

diff -ruNp org/drivers/net/e1000/e1000_main.c new/drivers/net/e1000/e1000_main.c
--- org/drivers/net/e1000/e1000_main.c  2007-08-23 13:32:16.0 +0530
+++ new/drivers/net/e1000/e1000_main.c  2007-08-23 13:32:34.0 +0530
@@ -1477,10 +1477,6 @@ e1000_close(struct net_device *netdev)
 {
struct e1000_adapter *adapter = netdev_priv(netdev);
 
-#ifdef CONFIG_E1000_NAPI

-   napi_disable(adapter-napi);
-#endif
-
WARN_ON(test_bit(__E1000_RESETTING, adapter-flags));
e1000_down(adapter);
e1000_power_down_phy(adapter);


Acked-by: Auke Kok [EMAIL PROTECTED]

I pushed this change to akpm for -mm as well in e1000e...

Thanks Krishna,

Auke
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-2.6.24] introduce MAC_FMT/MAC_ARG

2007-08-23 Thread Johannes Berg

On Thu, 2007-08-23 at 09:01 -0700, Joe Perches wrote:
 There are also several different uses of the equivalent of
 
   printk(%02x,addr[0])
   for (i=1; i6; i++)
   printk(:%02x,addr[i]);
 
 to print an ethernet MAC address.

Hm. I didn't know that, I can go through in a later patch if desired.

 http://www.uwsg.iu.edu/hypermail/linux/net/0602.1/0002.html
 
 As not all device MAC addresses are 6 bytes, colon separated,
 perhaps an appropriate ethernet/tr MAC designation is EUI48.
 
 http://standards.ieee.org/regauth/oui/tutorials/EUI48.html

Practically, however, nobody is going to even find macros named
EUI48_FMT/EUI48_ARG, would they? I don't much care, but I find it rather
unsatisfying that both wireless code bases define these macros.

johannes


signature.asc
Description: This is a digitally signed message part

New NAPI interface: netif_rx_reschedule not working

2007-08-23 Thread Jan-Bernd Themann

Hi David,

when trying to get our driver working with the new interface, 
I found the following issue where I'm not sure how to solve it best:

netif_rx_reschedule() does not work when called after netif_rx_complete().
The problem is that netif_rx_reschedule currently adds the napi struct once 
more to the poll list. However, net_rx_action will add it to the poll list
as well (NAPI_STATE_SCHED set), so the device is scheduled twice. 
Next time netif_rx_complete is called for the second schedule, 
it will result in BUG() because NAPI_STATE_SCHED
is not set anymore (cleared by first netif_rx_complete()).

Modifying netif_rx_reschedule to only set NAPI_STATE_SCHED flag again
and not adding the device to the poll_list will not solve the problem entirely.

After netif_rx_complete() the driver activates the IRQs again. If
an IRQ is caught on a different CPU before netif_rx_reschedule is called,
we will have the napi device scheduled twice again... because
net_rx_action will schedule it and netif_rx_schedule as well
(add it to poll_list).

I think this is an issue that can even occur if you don't use
netif_rx_reschedule. Do I understand this correctly?

Thanks,
Jan-Bernd

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[IPv6] Add v4mapped address inline

2007-08-23 Thread Brian Haley


Add v4mapped address inline to avoid calls to ipv6_addr_type().
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 9059e0e..c2b6c11 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -418,6 +418,12 @@ static inline int ipv6_addr_diff(const struct in6_addr *a1, const struct in6_add
 	return __ipv6_addr_diff(a1, a2, sizeof(struct in6_addr));
 }
 
+static inline int ipv6_addr_v4mapped(const struct in6_addr *a)
+{
+	return ((a-s6_addr32[0] | a-s6_addr32[1]) == 0  
+		 a-s6_addr32[2] == htonl(0x)); 
+}
+
 /*
  *	Prototypes exported by ipv6
  */
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 761a910..92d8119 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -249,7 +249,7 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
 			}
 
 			if (ipv6_only_sock(sk) ||
-			!(ipv6_addr_type(np-daddr)  IPV6_ADDR_MAPPED)) {
+			!ipv6_addr_v4mapped(np-daddr)) {
 retv = -EADDRNOTAVAIL;
 break;
 			}
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 0f7defb..d5c0175 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -697,7 +697,7 @@ static int tcp_v6_parse_md5_keys (struct sock *sk, char __user *optval,
 	if (!cmd.tcpm_keylen) {
 		if (!tcp_sk(sk)-md5sig_info)
 			return -ENOENT;
-		if (ipv6_addr_type(sin6-sin6_addr)  IPV6_ADDR_MAPPED)
+		if (ipv6_addr_v4mapped(sin6-sin6_addr))
 			return tcp_v4_md5_do_del(sk, sin6-sin6_addr.s6_addr32[3]);
 		return tcp_v6_md5_do_del(sk, sin6-sin6_addr);
 	}
@@ -720,7 +720,7 @@ static int tcp_v6_parse_md5_keys (struct sock *sk, char __user *optval,
 	newkey = kmemdup(cmd.tcpm_key, cmd.tcpm_keylen, GFP_KERNEL);
 	if (!newkey)
 		return -ENOMEM;
-	if (ipv6_addr_type(sin6-sin6_addr)  IPV6_ADDR_MAPPED) {
+	if (ipv6_addr_v4mapped(sin6-sin6_addr)) {
 		return tcp_v4_md5_do_add(sk, sin6-sin6_addr.s6_addr32[3],
 	 newkey, cmd.tcpm_keylen);
 	}
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 4210951..3e0ca15 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -610,7 +610,7 @@ int udpv6_sendmsg(struct kiocb *iocb, struct sock *sk,
 		daddr = NULL;
 
 	if (daddr) {
-		if (ipv6_addr_type(daddr) == IPV6_ADDR_MAPPED) {
+		if (ipv6_addr_v4mapped(daddr)) {
 			struct sockaddr_in sin;
 			sin.sin_family = AF_INET;
 			sin.sin_port = sin6 ? sin6-sin6_port : inet-dport;
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index f8aa23d..cd57a51 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -481,7 +481,7 @@ static int sctp_v6_cmp_addr(const union sctp_addr *addr1,
 	if (addr1-sa.sa_family != addr2-sa.sa_family) {
 		if (addr1-sa.sa_family == AF_INET 
 		addr2-sa.sa_family == AF_INET6 
-		IPV6_ADDR_MAPPED == ipv6_addr_type(addr2-v6.sin6_addr)) {
+		ipv6_addr_v4mapped(addr2-v6.sin6_addr)) {
 			if (addr2-v6.sin6_port == addr1-v4.sin_port 
 			addr2-v6.sin6_addr.s6_addr32[3] ==
 			addr1-v4.sin_addr.s_addr)
@@ -489,7 +489,7 @@ static int sctp_v6_cmp_addr(const union sctp_addr *addr1,
 		}
 		if (addr2-sa.sa_family == AF_INET 
 		addr1-sa.sa_family == AF_INET6 
-		IPV6_ADDR_MAPPED == ipv6_addr_type(addr1-v6.sin6_addr)) {
+		ipv6_addr_v4mapped(addr1-v6.sin6_addr)) {
 			if (addr1-v6.sin6_port == addr2-v4.sin_port 
 			addr1-v6.sin6_addr.s6_addr32[3] ==
 			addr2-v4.sin_addr.s_addr)

[PATCH] shaper: mark for removal

2007-08-23 Thread Stephen Hemminger

Subject: shaper: mark for removal

This driver has been marked obsolete for a long time and
is superseded by traffic schedulers.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]


--- a/Documentation/feature-removal-schedule.txt2007-08-23 
09:36:24.0 -0700
+++ b/Documentation/feature-removal-schedule.txt2007-08-23 
09:43:24.0 -0700
@@ -290,3 +290,12 @@ Why:   All mthca hardware also supports MS
 Who:   Roland Dreier [EMAIL PROTECTED]
 
 ---
+
+What:  shaper network driver
+When:  January 2008
+Files: drivers/net/shaper.c, include/linux/if_shaper.h
+Why:   This driver has been marked obsolete for many years.
+   It was only designed to work on lower speed links and has design
+   flaws that lead to machine crashes. The qdisc infrastructure in
+   2.4 or later kernels, provides richer features and is more robust.
+Who:   Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: UDPv4 port allocation problem

2007-08-23 Thread Rick Jones


Tóth László Attila wrote:

Hello,

I noticed that it is possible that the kernel allocates the same UDP


_Which_ kernel - or rather which rev?  There are lots of linux kernels 
potentially out there...



port to an application that was used and closed immediately before the
new application got it. This means that applications that do not specify
an exact port and rely on the  kernel to allocate a port for them might
see traffic originally meant for another application.

Imagine that two applications want to resolve a name in DNS at about the
same time. The following happens:
 * first app sends out the DNS query then closes the socket without
waiting for an answer (e.g. it got interrupted by Ctrl+C)
 * second app opens an UDP socket, and gets the same port, originally
assigned to app#1, sends out the DNS query
 * DNS server responds, the response goes to app#2

DNS might not be the perfect example, but you get the idea. 
Applications do not expect to receive data on newly opened sockets, not

to mention the security implications.


Actually, all applications using UDP are required to cope with just about 
anything since there are no guarantees with UDP of anything other than the 
checksum generally protecting one from corrupt data.


In the specific case of DNS, the resolver library will (damn well better) be 
checking the answer it gets against the query it sent.  There will be a 
transaction ID check, and IIRC a check of the returned query against the query sent.



TCP on the other hand increases the allocated port number for each new
socket, the same behaviour for UDP would add certain amount of time that
decreases this risk.


Does it always?  If you wait for the length of TIME_WAIT before issuing another 
bind() request does the port number still increase?



While it might be nice to step through the anonymous port space in some fashion 
(I suspect the argument would be made that it should be somewhat random to 
preclude guessing from the outside), applications using UDP are still required 
to expect the unexpected wrt data arriving on their socket.


rick jones



Is the current behaviour intended?

Regards,
Laszlo Attila Toth
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] improved xfrm_audit_log() patch

2007-08-23 Thread Joy Latten

On Wed, 2007-08-22 at 20:05 -0700, David Miller wrote:
 I would suggest, at this point, to make purpose built situation
 specific interfaces that pass specific objects (the ones being
 operated upon) to the audit layer.
 
 Let the audit layer pick out the bits it actually wants in the
 format it likes.
 
 For example, if we're creating a template, pass the policy and
 the templace to the audit layer via a function called:
 
 xfrm_audit_template_add()
 
 or something like that.  That function only needs two arguments.
 
 All of these call sites will rarely need more than 2 or 3 arguments in
 any given situation, and the on-stack audit thing will be gone too.
 
 This is the suggestion I made to you over a month ago, but you choose
 to do the on-stack thing.
 
I misunderstood. My bad.

For clarification, I plan on removing xfrm_audit_log() and replacing it
with more specific ipsec audit interfaces. 

For example, when auditing the addition of a policy, either
xfrm_user_audit_policy_add(xp, result, skb) or
pfkey_audit_policy_add(xp, result) will get called. 
I need two because xfrm_user gets loginuid/secid from netlink/skb
and pfkey gets it from audit_get_loginuid(). 
Each will setup and format audit buffer according
to what they want.

Also, for deleting, there will be pfkey_audit_policy_delete(xp, result)
and xfrm_user_audit_policy_delete(xp, result, skb).

 You must make this cost absolutely nothing when it is either
 not configured, and have next to no cost when not enabled at
 run time.  And it is very doable.

The new ipsec audit functions can be ifdef'd with CONFIG_AUDITSYSCALL
just as xfrm_audit_log() was so that there is no cost when 
audit is not configured. 

Let me know if this is better.

Regards,
Joy
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-2.6.24] introduce MAC_FMT/MAC_ARG

2007-08-23 Thread John W. Linville

On Thu, Aug 23, 2007 at 06:12:00PM +0200, Johannes Berg wrote:
 On Thu, 2007-08-23 at 09:01 -0700, Joe Perches wrote:
  There are also several different uses of the equivalent of
  
  printk(%02x,addr[0])
  for (i=1; i6; i++)
  printk(:%02x,addr[i]);
  
  to print an ethernet MAC address.
 
 Hm. I didn't know that, I can go through in a later patch if desired.
 
  http://www.uwsg.iu.edu/hypermail/linux/net/0602.1/0002.html
  
  As not all device MAC addresses are 6 bytes, colon separated,
  perhaps an appropriate ethernet/tr MAC designation is EUI48.
  
  http://standards.ieee.org/regauth/oui/tutorials/EUI48.html
 
 Practically, however, nobody is going to even find macros named
 EUI48_FMT/EUI48_ARG, would they? I don't much care, but I find it rather
 unsatisfying that both wireless code bases define these macros.

Yeah, accomodating non-48-bit MAC addresses is a bit pedantic.

I ACK the original patch, FWIW.

John
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [IPv6] Add v4mapped address inline

2007-08-23 Thread YOSHIFUJI Hideaki / 吉藤英明

Hello.

In article [EMAIL PROTECTED] (at Thu, 23 Aug 2007 12:40:54 -0400), Brian 
Haley [EMAIL PROTECTED] says:

 diff --git a/include/net/ipv6.h b/include/net/ipv6.h
 index 9059e0e..c2b6c11 100644
 --- a/include/net/ipv6.h
 +++ b/include/net/ipv6.h
 @@ -418,6 +418,12 @@ static inline int ipv6_addr_diff(const struct in6_addr 
 *a1, const struct in6_add
   return __ipv6_addr_diff(a1, a2, sizeof(struct in6_addr));
  }
  
 +static inline int ipv6_addr_v4mapped(const struct in6_addr *a)
 +{
 + return ((a-s6_addr32[0] | a-s6_addr32[1]) == 0  
 +  a-s6_addr32[2] == htonl(0x)); 
 +}
 +

Please put this just after ipv6_addr_any(), not after
ipv6_addr_diff().

--yoshfuji
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [stable] [2.6.20.17 review 35/58] forcedeth bug fix: realtek phy

2007-08-23 Thread Greg KH

On Thu, Aug 23, 2007 at 05:50:41PM +0200, Prakash Punnoor wrote:
 On the day of Thursday 23 August 2007 Greg KH hast written:
  On Wed, Aug 22, 2007 at 10:42:25PM +0200, Willy Tarreau wrote:
   On Wed, Aug 22, 2007 at 08:15:03PM +0200, Prakash Punnoor wrote:
Hi,
   
even if Greg is waiting for some special invitation
(http://lkml.org/lkml/2007/8/14/229), I suggest putting this patch by
Ayaz on top:
   
http://lkml.org/lkml/2007/8/10/296
  
   That's what I prepare first, but then noticed it's not in mainline.
  
Perhaps Ayaz wants to give Greg the clarification he needs... :sigh:
  
   He should, as the fix is not in mainline either :-(
   I don't think Greg asks for specific clarification, just a plain patch
   with a short commit log on its own which does not include remains of
   older mails.
 
  Exactly, that is what I am waiting for.
 
  And also I need the change to go into mainline first, as we can not
  diverge with the -stable releases.
 
 Can we get that into mainline then? I haven't seen forcedeth in MAINTAINERS, 
 so I added netdev to the cc list.

It might help if someone sends a real patch that can be applied :)

thanks,

greg k-h
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [IPv6] Add v4mapped address inline

2007-08-23 Thread Brian Haley


YOSHIFUJI Hideaki /  wrote:

Please put this just after ipv6_addr_any(), not after
ipv6_addr_diff().


Ok, updated patch attached.

-Brian


Add v4mapped address inline to avoid calls to ipv6_addr_type().

Signed-off-by: Brian Haley [EMAIL PROTECTED]
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 9059e0e..37bdb25 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -377,6 +377,12 @@ static inline int ipv6_addr_any(const struct in6_addr *a)
 		 a-s6_addr32[2] | a-s6_addr32[3] ) == 0); 
 }
 
+static inline int ipv6_addr_v4mapped(const struct in6_addr *a)
+{
+	return ((a-s6_addr32[0] | a-s6_addr32[1]) == 0  
+		 a-s6_addr32[2] == htonl(0x)); 
+}
+
 /*
  * find the first different bit between two addresses
  * length of address must be a multiple of 32bits
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 761a910..92d8119 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -249,7 +249,7 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
 			}
 
 			if (ipv6_only_sock(sk) ||
-			!(ipv6_addr_type(np-daddr)  IPV6_ADDR_MAPPED)) {
+			!ipv6_addr_v4mapped(np-daddr)) {
 retv = -EADDRNOTAVAIL;
 break;
 			}
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 0f7defb..d5c0175 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -697,7 +697,7 @@ static int tcp_v6_parse_md5_keys (struct sock *sk, char __user *optval,
 	if (!cmd.tcpm_keylen) {
 		if (!tcp_sk(sk)-md5sig_info)
 			return -ENOENT;
-		if (ipv6_addr_type(sin6-sin6_addr)  IPV6_ADDR_MAPPED)
+		if (ipv6_addr_v4mapped(sin6-sin6_addr))
 			return tcp_v4_md5_do_del(sk, sin6-sin6_addr.s6_addr32[3]);
 		return tcp_v6_md5_do_del(sk, sin6-sin6_addr);
 	}
@@ -720,7 +720,7 @@ static int tcp_v6_parse_md5_keys (struct sock *sk, char __user *optval,
 	newkey = kmemdup(cmd.tcpm_key, cmd.tcpm_keylen, GFP_KERNEL);
 	if (!newkey)
 		return -ENOMEM;
-	if (ipv6_addr_type(sin6-sin6_addr)  IPV6_ADDR_MAPPED) {
+	if (ipv6_addr_v4mapped(sin6-sin6_addr)) {
 		return tcp_v4_md5_do_add(sk, sin6-sin6_addr.s6_addr32[3],
 	 newkey, cmd.tcpm_keylen);
 	}
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 4210951..3e0ca15 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -610,7 +610,7 @@ int udpv6_sendmsg(struct kiocb *iocb, struct sock *sk,
 		daddr = NULL;
 
 	if (daddr) {
-		if (ipv6_addr_type(daddr) == IPV6_ADDR_MAPPED) {
+		if (ipv6_addr_v4mapped(daddr)) {
 			struct sockaddr_in sin;
 			sin.sin_family = AF_INET;
 			sin.sin_port = sin6 ? sin6-sin6_port : inet-dport;
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index f8aa23d..cd57a51 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -481,7 +481,7 @@ static int sctp_v6_cmp_addr(const union sctp_addr *addr1,
 	if (addr1-sa.sa_family != addr2-sa.sa_family) {
 		if (addr1-sa.sa_family == AF_INET 
 		addr2-sa.sa_family == AF_INET6 
-		IPV6_ADDR_MAPPED == ipv6_addr_type(addr2-v6.sin6_addr)) {
+		ipv6_addr_v4mapped(addr2-v6.sin6_addr)) {
 			if (addr2-v6.sin6_port == addr1-v4.sin_port 
 			addr2-v6.sin6_addr.s6_addr32[3] ==
 			addr1-v4.sin_addr.s_addr)
@@ -489,7 +489,7 @@ static int sctp_v6_cmp_addr(const union sctp_addr *addr1,
 		}
 		if (addr2-sa.sa_family == AF_INET 
 		addr1-sa.sa_family == AF_INET6 
-		IPV6_ADDR_MAPPED == ipv6_addr_type(addr1-v6.sin6_addr)) {
+		ipv6_addr_v4mapped(addr1-v6.sin6_addr)) {
 			if (addr1-v6.sin6_port == addr2-v4.sin_port 
 			addr1-v6.sin6_addr.s6_addr32[3] ==
 			addr2-v4.sin_addr.s_addr)

[PATCH] udp: randomize port selection

2007-08-23 Thread Stephen Hemminger

This patch causes UDP port allocation to be randomized like TCP.
The earlier code would always choose same port (ie first empty list).

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]


--- a/net/ipv4/udp.c2007-08-23 09:44:22.0 -0700
+++ b/net/ipv4/udp.c2007-08-23 11:29:02.0 -0700
@@ -113,9 +113,8 @@ DEFINE_SNMP_STAT(struct udp_mib, udp_sta
 struct hlist_head udp_hash[UDP_HTABLE_SIZE];
 DEFINE_RWLOCK(udp_hash_lock);
 
-static int udp_port_rover;
-
-static inline int __udp_lib_lport_inuse(__u16 num, struct hlist_head 
udptable[])
+static inline int __udp_lib_lport_inuse(__u16 num,
+   const struct hlist_head udptable[])
 {
struct sock *sk;
struct hlist_node *node;
@@ -132,11 +131,10 @@ static inline int __udp_lib_lport_inuse(
  *  @sk:  socket struct in question
  *  @snum:port number to look up
  *  @udptable:hash list table, must be of UDP_HTABLE_SIZE
- *  @port_rover:  pointer to record of last unallocated port
  *  @saddr_comp:  AF-dependent comparison of bound local IP addresses
  */
 int __udp_lib_get_port(struct sock *sk, unsigned short snum,
-  struct hlist_head udptable[], int *port_rover,
+  struct hlist_head udptable[],
   int (*saddr_comp)(const struct sock *sk1,
 const struct sock *sk2 ))
 {
@@ -146,49 +144,56 @@ int __udp_lib_get_port(struct sock *sk, 
interror = 1;
 
write_lock_bh(udp_hash_lock);
-   if (snum == 0) {
-   int best_size_so_far, best, result, i;
 
-   if (*port_rover  sysctl_local_port_range[1] ||
-   *port_rover  sysctl_local_port_range[0])
-   *port_rover = sysctl_local_port_range[0];
-   best_size_so_far = 32767;
-   best = result = *port_rover;
-   for (i = 0; i  UDP_HTABLE_SIZE; i++, result++) {
-   int size;
-
-   head = udptable[result  (UDP_HTABLE_SIZE - 1)];
-   if (hlist_empty(head)) {
-   if (result  sysctl_local_port_range[1])
-   result = sysctl_local_port_range[0] +
-   ((result - 
sysctl_local_port_range[0]) 
-(UDP_HTABLE_SIZE - 1));
+   if (!snum) {
+   int i;
+   int low = sysctl_local_port_range[0];
+   int high = sysctl_local_port_range[1];
+   unsigned rover, best, best_size_so_far;
+
+   best_size_so_far = UINT_MAX;
+   best = rover = net_random() % (high - low) + low;
+
+   /* 1st pass: look for empty (or shortest) hash chain */
+   for (i = 0; i  UDP_HTABLE_SIZE; i++) {
+   int size = 0;
+
+   head = udptable[rover  (UDP_HTABLE_SIZE - 1)];
+   if (hlist_empty(head))
goto gotit;
-   }
-   size = 0;
+
sk_for_each(sk2, node, head) {
if (++size = best_size_so_far)
goto next;
}
best_size_so_far = size;
-   best = result;
+   best = rover;
next:
-   ;
+   /* fold back if end of range */
+   if (++rover  high)
+   rover = low + ((rover - low)
+   (UDP_HTABLE_SIZE - 1));
+
+
}
-   result = best;
-   for (i = 0; i  (1  16) / UDP_HTABLE_SIZE;
-i++, result += UDP_HTABLE_SIZE) {
-   if (result  sysctl_local_port_range[1])
-   result = sysctl_local_port_range[0]
-   + ((result - 
sysctl_local_port_range[0]) 
-  (UDP_HTABLE_SIZE - 1));
-   if (! __udp_lib_lport_inuse(result, udptable))
-   break;
+
+   /* 2nd pass: find hole in shortest hash chain */
+   rover = best;
+   for (i = 0; i  (1  16) / UDP_HTABLE_SIZE; i++) {
+   if (! __udp_lib_lport_inuse(rover, udptable))
+   goto gotit;
+   rover += UDP_HTABLE_SIZE;
+   if (rover  high)
+   rover = low + ((rover - low)
+   (UDP_HTABLE_SIZE - 1));
}
-   if (i = (1  16) / UDP_HTABLE_SIZE)
-   goto fail;
+
+
+   /* All ports in use! */
+   goto fail;
+

Re: [PATCH] [02/10] pasemi_mac: Stop using the pci config space accessors for register read/writes

2007-08-23 Thread Olof Johansson

On Thu, Aug 23, 2007 at 10:31:03AM +1000, Stephen Rothwell wrote:
 On Wed, 22 Aug 2007 09:12:48 -0500 Olof Johansson [EMAIL PROTECTED] wrote:
 
  -static unsigned int read_iob_reg(struct pasemi_mac *mac, unsigned int reg)
  +static inline unsigned int read_iob_reg(struct pasemi_mac *mac, unsigned 
  int reg)
   ^^
 For static functions in C files, we tend not to bother marking them
 inline any more as the compiler does a pretty good job theses days.

Yeah, sloppy coding on my behalf. It was still there from when I
explicitly added noinline during debugging, forgot to take it out
alltogether.

  -   pci_read_config_dword(mac-iob_pdev, reg, val);
  +   val = in_le32(mac-iob_regs+reg);
  +
  return val;
 
 Why not just return in_le32(mac-iob_regs+reg); ?
 And similarly below?

Residual from debugging as well, I had debug hooks showing what was
read/written that I took out, but didn't fix up the surrounding stuff.


Refreshed patch posted separately. Thanks for the feedback.

-Olof
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] [02/10] pasemi_mac: Stop using the pci config space accessors for register read/writes

2007-08-23 Thread Olof Johansson

Move away from using the pci config access functions for simple register
access.  Our device has all of the registers in the config space (hey,
from the hardware point of view it looks reasonable :-), so we need to
somehow get to it. Newer firmwares have it in the device tree such that
we can just get it and ioremap it there (in case it ever moves in future
products). For now, provide a hardcoded fallback for older firmwares.


Signed-off-by: Olof Johansson [EMAIL PROTECTED]


---

Updated: Removed explicit inlines, cleaned up read functions, fixed
grammar.


Index: mainline/drivers/net/pasemi_mac.c
===
--- mainline.orig/drivers/net/pasemi_mac.c
+++ mainline/drivers/net/pasemi_mac.c
@@ -83,44 +83,35 @@ static struct pasdma_status *dma_status;
 
 static unsigned int read_iob_reg(struct pasemi_mac *mac, unsigned int reg)
 {
-   unsigned int val;
-
-   pci_read_config_dword(mac-iob_pdev, reg, val);
-   return val;
+   return in_le32(mac-iob_regs+reg);
 }
 
 static void write_iob_reg(struct pasemi_mac *mac, unsigned int reg,
  unsigned int val)
 {
-   pci_write_config_dword(mac-iob_pdev, reg, val);
+   out_le32(mac-iob_regs+reg, val);
 }
 
 static unsigned int read_mac_reg(struct pasemi_mac *mac, unsigned int reg)
 {
-   unsigned int val;
-
-   pci_read_config_dword(mac-pdev, reg, val);
-   return val;
+   return in_le32(mac-regs+reg);
 }
 
 static void write_mac_reg(struct pasemi_mac *mac, unsigned int reg,
  unsigned int val)
 {
-   pci_write_config_dword(mac-pdev, reg, val);
+   out_le32(mac-regs+reg, val);
 }
 
 static unsigned int read_dma_reg(struct pasemi_mac *mac, unsigned int reg)
 {
-   unsigned int val;
-
-   pci_read_config_dword(mac-dma_pdev, reg, val);
-   return val;
+   return in_le32(mac-dma_regs+reg);
 }
 
 static void write_dma_reg(struct pasemi_mac *mac, unsigned int reg,
  unsigned int val)
 {
-   pci_write_config_dword(mac-dma_pdev, reg, val);
+   out_le32(mac-dma_regs+reg, val);
 }
 
 static int pasemi_get_mac_addr(struct pasemi_mac *mac)
@@ -585,7 +576,6 @@ static int pasemi_mac_clean_tx(struct pa
}
mac-tx-next_to_clean += count;
spin_unlock_irqrestore(mac-tx-lock, flags);
-
netif_wake_queue(mac-netdev);
 
return count;
@@ -1076,6 +1066,73 @@ static int pasemi_mac_poll(struct net_de
}
 }
 
+static void __iomem * __devinit map_onedev(struct pci_dev *p, int index)
+{
+   struct device_node *dn;
+   void __iomem *ret;
+
+   dn = pci_device_to_OF_node(p);
+   if (!dn)
+   goto fallback;
+
+   ret = of_iomap(dn, index);
+   if (!ret)
+   goto fallback;
+
+   return ret;
+fallback:
+   /* This is hardcoded and ugly, but we have some firmware versions
+* that don't provide the register space in the device tree. Luckily
+* they are at well-known locations so we can just do the math here.
+*/
+   return ioremap(0xe000 + (p-devfn  12), 0x2000);
+}
+
+static int __devinit pasemi_mac_map_regs(struct pasemi_mac *mac)
+{
+   struct resource res;
+   struct device_node *dn;
+   int err;
+
+   mac-dma_pdev = pci_get_device(PCI_VENDOR_ID_PASEMI, 0xa007, NULL);
+   if (!mac-dma_pdev) {
+   dev_err(mac-pdev-dev, Can't find DMA Controller\n);
+   return -ENODEV;
+   }
+
+   mac-iob_pdev = pci_get_device(PCI_VENDOR_ID_PASEMI, 0xa001, NULL);
+   if (!mac-iob_pdev) {
+   dev_err(mac-pdev-dev, Can't find I/O Bridge\n);
+   return -ENODEV;
+   }
+
+   mac-regs = map_onedev(mac-pdev, 0);
+   mac-dma_regs = map_onedev(mac-dma_pdev, 0);
+   mac-iob_regs = map_onedev(mac-iob_pdev, 0);
+
+   if (!mac-regs || !mac-dma_regs || !mac-iob_regs) {
+   dev_err(mac-pdev-dev, Can't map registers\n);
+   return -ENODEV;
+   }
+
+   /* The dma status structure is located in the I/O bridge, and
+* is cache coherent.
+*/
+   if (!dma_status) {
+   dn = pci_device_to_OF_node(mac-iob_pdev);
+   if (dn)
+   err = of_address_to_resource(dn, 1, res);
+   if (!dn || err) {
+   /* Fallback for old firmware */
+   res.start = 0xfd80;
+   res.end = res.start + 0x1000;
+   }
+   dma_status = __ioremap(res.start, res.end-res.start, 0);
+   }
+
+   return 0;
+}
+
 static int __devinit
 pasemi_mac_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 {
@@ -1104,21 +1161,6 @@ pasemi_mac_probe(struct pci_dev *pdev, c
 
mac-pdev = pdev;
mac-netdev = dev;
-   mac-dma_pdev = pci_get_device(PCI_VENDOR_ID_PASEMI, 0xa007, NULL);
-
-   if (!mac-dma_pdev) {
-

[PATCH] fix realtek phy id in forcedeth

2007-08-23 Thread Willy Tarreau

Hi Greg,

On Thu, Aug 23, 2007 at 09:55:13AM -0700, Greg KH wrote:
 It might help if someone sends a real patch that can be applied :)

This is getting really silly now :-) We're all wasting more time
wondering who will send the patch than posting it. I've lost, I got
fed up first, so here it is. Please apply to mainline then stable.

Thanks,
Willy

--

From a0e2922b99eedd9863232368ea2afe072c52783e Mon Sep 17 00:00:00 2001
From: Willy Tarreau [EMAIL PROTECTED]
Date: Thu, 23 Aug 2007 21:35:41 +0200
Subject: [PATCH] fix realtek phy id in forcedeth

As noticed by Chuck Ebbert, commit c5e3ae8823693b260ce1f217adca8add1bc0b3de
introduced a copy-paste typo, as realtek phy is 0x732 and not 0x1c1. Obvious
fix below suggested by Ayaz Abdulla.

Signed-off-by: Willy Tarreau [EMAIL PROTECTED]
Cc: Ayaz Abdulla [EMAIL PROTECTED]
Cc: Chuck Ebbert [EMAIL PROTECTED]
---
 drivers/net/forcedeth.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index 10f4e3b..1938d6d 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -552,7 +552,7 @@ union ring_type {
 #define PHY_OUI_MARVELL0x5043
 #define PHY_OUI_CICADA 0x03f1
 #define PHY_OUI_VITESSE0x01c1
-#define PHY_OUI_REALTEK0x01c1
+#define PHY_OUI_REALTEK0x0732
 #define PHYID1_OUI_MASK0x03ff
 #define PHYID1_OUI_SHFT6
 #define PHYID2_OUI_MASK0xfc00
-- 
1.5.2.5

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.23] cxgb3 - Fix dev-priv usage

2007-08-23 Thread Steve Wise

This patch doesn't seem to have gone in yet

Steve.

David Miller wrote:

From: Divy Le Ray [EMAIL PROTECTED]
Date: Mon, 13 Aug 2007 12:33:04 -0700

From: Divy Le Ray [EMAIL PROTECTED]

cxgb3 used netdev_priv() and dev-priv for different purposes.
In 2.6.23, netdev_priv() == dev-priv, cxgb3 needs a fix.
This patch is a partial backport of Dave Miller's changes in the 
net-2.6.24 git branch. 

Signed-off-by: Divy Le Ray [EMAIL PROTECTED]

Thank you for doing this backport.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] improved xfrm_audit_log() patch

2007-08-23 Thread David Miller

From: Joy Latten [EMAIL PROTECTED]
Date: Thu, 23 Aug 2007 12:15:10 -0500

 For example, when auditing the addition of a policy, either
 xfrm_user_audit_policy_add(xp, result, skb) or
 pfkey_audit_policy_add(xp, result) will get called. 
 I need two because xfrm_user gets loginuid/secid from netlink/skb
 and pfkey gets it from audit_get_loginuid(). 
 Each will setup and format audit buffer according
 to what they want.

 Also, for deleting, there will be pfkey_audit_policy_delete(xp, result)
 and xfrm_user_audit_policy_delete(xp, result, skb).

This sounds great.

How cheap is the auditing enabled test?  Perhaps it can
be even inlined into the xfrm audit hooks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] [-MM, FIX] e1000e: incorporate napi_struct changes from net-2.6.24.git

2007-08-23 Thread David Miller

From: Auke Kok [EMAIL PROTECTED]
Date: Thu, 23 Aug 2007 07:59:11 -0700

 This incorporates the new napi_struct changes into e1000e. Included
 bugfix for ifdown hang from Krishna Kumar for e1000.

 Signed-off-by: Auke Kok [EMAIL PROTECTED]

Acked-by: David S. Miller [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH -mm] ath5k: remove sysctl(2) support

2007-08-23 Thread Jiri Slaby

Alexey Dobriyan napsal(a):
 sysctl(2) is supported but frozen.

I've posted similar patch yesterday:
http://marc.info/?l=linux-mm-commitsm=118782442602108w=2

 Signed-off-by: Alexey Dobriyan [EMAIL PROTECTED]
 ---
 
  drivers/net/wireless/ath5k_base.c |   21 ++---
  1 file changed, 6 insertions(+), 15 deletions(-)
 
 --- a/drivers/net/wireless/ath5k_base.c
 +++ b/drivers/net/wireless/ath5k_base.c
 @@ -2438,21 +2438,12 @@ static struct pci_driver ath_pci_drv_id = {
   .resume = ath_pci_resume,
  };
  
 -/*
 - * Static (i.e. global) sysctls.  Note that the hal sysctls
 - * are located under ours by sharing the setting for DEV_ATH.
 - */
 -enum {
 - DEV_ATH = 9,/* XXX known by hal */
 -};
 -
  static int mincalibrate = 1;
  static int maxcalibrate = INT_MAX / 1000;
 -#define  CTL_AUTO-2  /* cannot be CTL_ANY or CTL_NONE */
  
  static ctl_table ath_static_sysctls[] = {
  #if AR_DEBUG
 - { .ctl_name = CTL_AUTO,
 + {
 .procname = debug,
 .mode = 0644,
 .data = ath_debug,
 @@ -2460,28 +2451,28 @@ static ctl_table ath_static_sysctls[] = {
 .proc_handler = proc_dointvec
   },
  #endif
 - { .ctl_name = CTL_AUTO,
 + {
 .procname = countrycode,
 .mode = 0444,
 .data = countrycode,
 .maxlen   = sizeof(countrycode),
 .proc_handler = proc_dointvec
   },
 - { .ctl_name = CTL_AUTO,
 + {
 .procname = outdoor,
 .mode = 0444,
 .data = outdoor,
 .maxlen   = sizeof(outdoor),
 .proc_handler = proc_dointvec
   },
 - { .ctl_name = CTL_AUTO,
 + {
 .procname = xchanmode,
 .mode = 0444,
 .data = xchanmode,
 .maxlen   = sizeof(xchanmode),
 .proc_handler = proc_dointvec
   },
 - { .ctl_name = CTL_AUTO,
 + {
 .procname = calibrate,
 .mode = 0644,
 .data = ath_calinterval,
 @@ -2493,7 +2484,7 @@ static ctl_table ath_static_sysctls[] = {
   { 0 }
  };
  static ctl_table ath_ath_table[] = {
 - { .ctl_name = DEV_ATH,
 + {
 .procname = ath,
 .mode = 0555,
 .child= ath_static_sysctls

Anyway thanks!

-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC 1/1] Net: add ath5k wireless driver

2007-08-23 Thread John W. Linville

On Sun, Aug 12, 2007 at 05:33:16PM +0200, Jiri Slaby wrote:
 add ath5k wireless driver
 
 Signed-off-by: Jiri Slaby [EMAIL PROTECTED]

Review still pending, but I went ahead and added this on the 'ath5k'
branch of wireless-dev.  It is available on 'everything' as well.

Thanks,

John
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH RFC] iw_cxgb3: Support iwarp-only interfaces to avoid 4-tuple conflicts with the host stack.

2007-08-23 Thread Steve Wise

Roland/All,

Here is the first swipe at keeping iwarp connections on their own ip
addresses to avoid conflicts with the host stack.

- this is a request for comments

- it is not yet tested fully (tested a prototype of the initial concept)

- still needs serialization/locking 

- stays in our RDMA sandbox ;-)


For background reading (if you dare), see:

http://www.mail-archive.com/[EMAIL PROTECTED]/msg05162.html

and

http://www.mail-archive.com/netdev@vger.kernel.org/msg44312.html


Also: I'm on vacation starting tomorrow until Tuesday 9/4.  I'll address
comments when I return...


Steve.

---

iw_cxgb3: Support iwarp-only interfaces to avoid 4-tuple conflicts with
the host stack.

Design:

The sysadmin creates for iwarp use only alias interfaces of the form
devname:iw* where devname is the native interface name (eg eth0) for the
iwarp netdev device.  The alias label can be anything starting with iw.
The iw immediately after the ':' is the key used by the iwarp driver.

EG:
ifconfig eth0 192.168.70.123 up
ifconfig eth0:iw1 192.168.71.123 up
ifconfig eth0:iw2 192.168.72.123 up

In the above example, 192.168.70/24 is for TCP traffic, while
192.168.71/24 and 192.168.72/24 are for iWARP/RDMA use.

The rdma-only interface must be on its own subnet. This allows routing
all rdma traffic onto this interface.

The iWARP driver must translate all listens on address 0.0.0.0 to the
set of rdma-only ip addresses.  This prevents incoming connects to the
TCP ipaddresses from going up the rdma stack.

Implementation Details:

- The iwarp driver registers for inetaddr events via
register_inetaddr_notifier().  This allows tracking the iwarp-only
addresses/subnets as they get added and deleted.  The iwarp driver
maintains a list of the current iwarp-only addresses.

- The iwarp driver builds the list of iwarp-only addresses for its devices
at module insert time.  This is needed because the inetaddr notifier
callbacks don't replay address-add events when someone registers.
So the driver must build the initial list at module load time.

- When a listen is done on address 0.0.0.0, then the iwarp driver must
translate that into a set of listens on the iwarp-only addresses.

- When a new iwarp-only address is added or removed, the iwarp driver
must traverse the set of listening endpoints and update them accordingly.
This allows an application to bind to 0.0.0.0 prior to the iwarp-only
interfaces being configured.  It also allows changing the iwarp-only set
of addresses and getting the expected behavior for apps already bound
to 0.0.0.0.

Signed-off-by: Steve Wise [EMAIL PROTECTED]
---

 drivers/infiniband/hw/cxgb3/iwch.c|  116 +
 drivers/infiniband/hw/cxgb3/iwch.h|   10 +
 drivers/infiniband/hw/cxgb3/iwch_cm.c |  229 ++---
 drivers/infiniband/hw/cxgb3/iwch_cm.h |   11 +-
 4 files changed, 318 insertions(+), 48 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch.c 
b/drivers/infiniband/hw/cxgb3/iwch.c
index 0315c9d..da57b77 100644
--- a/drivers/infiniband/hw/cxgb3/iwch.c
+++ b/drivers/infiniband/hw/cxgb3/iwch.c
@@ -63,6 +63,115 @@ struct cxgb3_client t3c_client = {
 static LIST_HEAD(dev_list);
 static DEFINE_MUTEX(dev_mutex);
 
+static void insert_ifa(struct iwch_dev *rnicp, struct in_ifaddr *ifa)
+{
+   struct iwch_addrlist *addr;
+
+   addr = kmalloc(sizeof *addr, GFP_KERNEL);
+   if (!addr) {
+   printk(KERN_ERR MOD %s - failed to alloc memory!\n,
+  __FUNCTION__);
+   return;
+   }
+   addr-ifa = ifa;
+   list_add_tail(addr-entry, rnicp-addrlist);
+}
+
+static void remove_ifa(struct iwch_dev *rnicp, struct in_ifaddr *ifa)
+{
+   struct iwch_addrlist *addr, *tmp;
+
+   list_for_each_entry_safe(addr, tmp, rnicp-addrlist, entry) {
+   if (addr-ifa == ifa) {
+   list_del_init(addr-entry);
+   kfree(addr);
+   return;
+   }
+   }
+}
+
+static int netdev_is_ours(struct iwch_dev *rnicp, struct net_device *netdev)
+{
+   int i;
+
+   for (i = 0; i  rnicp-rdev.port_info.nports; i++)
+   if (netdev == rnicp-rdev.port_info.lldevs[i])
+   return 1;
+   return 0;
+}
+
+static inline int is_iwarp_label(char *label)
+{
+   char *colon;
+
+   colon = strchr(label, ':');
+   if (colon  !strncmp(colon+1, iw, 2))
+   return 1;
+   return 0;
+}
+
+static int nb_callback(struct notifier_block *self, unsigned long event,
+  void *ctx)
+{
+   struct in_ifaddr *ifa = ctx;
+   struct iwch_dev *rnicp = container_of(self, struct iwch_dev, nb);
+
+   printk(KERN_INFO %s rnicp %p event %lx\n, __FUNCTION__, rnicp, event);
+
+   switch (event) {
+   case NETDEV_UP:
+   if (netdev_is_ours(rnicp, ifa-ifa_dev-dev) 
+   is_iwarp_label(ifa-ifa_label)) {
+

Re: [PATCH 0/3] cxgb3 driver update

2007-08-23 Thread Divy Le Ray


Hi Al,


Speaking of cxgb3, could you explain what the hell is
static int do_term(struct t3cdev *dev, struct sk_buff *skb)
{
unsigned int hwtid = ntohl(skb-priority)  8  0xf;
doing?  AFAIK, skb-priority is not net-endian...



the RDMA connection id is saved in the skb's priority field for TERM 
messages

because it is not in the CPL message that comes up from the hardware.
Yet the RDMA driver needs it, so sge.c::process_responses() overloads
the skb's priority and csum with these values.




Another odd place is
int t3_seeprom_write(struct adapter *adapter, u32 addr, u32 data)
{  
u16 val;

int attempts = EEPROM_MAX_POLL;
unsigned int base = adapter-params.pci.vpd_cap_addr;

if ((addr = EEPROMSIZE  addr != EEPROM_STAT_ADDR) || (addr 
 3))  
return -EINVAL;


pci_write_config_dword(adapter-pdev, base + PCI_VPD_DATA,
   cpu_to_le32(data));
with callers like
int t3_seeprom_wp(struct adapter *adapter, int enable)
{
return t3_seeprom_write(adapter, EEPROM_STAT_ADDR, enable ? 
0xc : 0);


IOW, you really get little-endian values passed to 
pci_write_config_dword()

and it expects a host-endian as the last argument...



It looks like a bug. Thanks for spotting this.

Cheers,
Divy

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB

2007-08-23 Thread jamal

On Wed, 2007-22-08 at 13:21 -0700, David Miller wrote:
 From: Rick Jones [EMAIL PROTECTED]
 Date: Wed, 22 Aug 2007 10:09:37 -0700

  Should it be any more or less worrysome than small packet
  performance (eg the TCP_RR stuff I posted recently) being rather
  worse with TSO enabled than with it disabled?

 That, like any such thing shown by the batching changes, is a bug
 to fix.

Possibly a bug - but you really should turn off TSO if you are doing
huge interactive transactions (which is fair because there is a clear
demarcation).
The litmus test is the same as any change that is supposed to improve
net performance - it has to demonstrate it is not intrusive and that it
improves (consistently) performance. The standard metrics are
{throughput, cpu-utilization, latency} i.e as long as one improves and
others remain zero, it would make sense. Yes, i am religious for
batching after all the invested sweat (and i continue to work on it
hoping to demystify) - the theory makes a lot of sense.

cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2.6.23 RESEND] cxgb3 - Fix dev-priv usage

2007-08-23 Thread Divy Le Ray

From: Divy Le Ray [EMAIL PROTECTED]

cxgb3 used netdev_priv() and dev-priv for different purposes.
In 2.6.23, netdev_priv() == dev-priv, cxgb3 needs a fix.
This patch is a partial backport of Dave Miller's changes in the 
net-2.6.24 git branch. 

Without this fix, cxgb3 crashes on 2.6.23.

Signed-off-by: Divy Le Ray [EMAIL PROTECTED]
---

 drivers/net/cxgb3/adapter.h   |   10 +++
 drivers/net/cxgb3/cxgb3_main.c|  126 +
 drivers/net/cxgb3/cxgb3_offload.c |6 +-
 drivers/net/cxgb3/sge.c   |   23 ---
 drivers/net/cxgb3/t3cdev.h|3 -
 5 files changed, 100 insertions(+), 68 deletions(-)

diff --git a/drivers/net/cxgb3/adapter.h b/drivers/net/cxgb3/adapter.h
index ab72563..c1dc344 100644
--- a/drivers/net/cxgb3/adapter.h
+++ b/drivers/net/cxgb3/adapter.h
@@ -50,7 +50,9 @@ typedef irqreturn_t(*intr_handler_t) (int, void *);
 
 struct vlan_group;
 
+struct adapter;
 struct port_info {
+   struct adapter *adapter;
struct vlan_group *vlan_grp;
const struct port_type_info *port_type;
u8 port_id;
@@ -246,6 +248,14 @@ static inline void t3_write_reg(struct adapter *adapter, 
u32 reg_addr, u32 val)
writel(val, adapter-regs + reg_addr);
 }
 
+/* Get the t3cdev associated with a net_device */
+static inline struct t3cdev *dev2t3cdev(struct net_device *dev)
+{
+   const struct port_info *pi = netdev_priv(dev);
+
+   return (struct t3cdev *)pi-adapter;
+}
+
 static inline struct port_info *adap2pinfo(struct adapter *adap, int idx)
 {
return netdev_priv(adap-port[idx]);
diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c
index dc5d269..f3bf128 100644
--- a/drivers/net/cxgb3/cxgb3_main.c
+++ b/drivers/net/cxgb3/cxgb3_main.c
@@ -358,11 +358,14 @@ static int init_dummy_netdevs(struct adapter *adap)
 
for (j = 0; j  pi-nqsets - 1; j++) {
if (!adap-dummy_netdev[dummy_idx]) {
-   nd = alloc_netdev(0, , ether_setup);
+   struct port_info *p;
+
+   nd = alloc_netdev(sizeof(*p), , ether_setup);
if (!nd)
goto free_all;
 
-   nd-priv = adap;
+   p = netdev_priv(nd);
+   p-adapter = adap;
nd-weight = 64;
set_bit(__LINK_STATE_START, nd-state);
adap-dummy_netdev[dummy_idx] = nd;
@@ -482,7 +485,8 @@ static ssize_t attr_store(struct device *d, struct 
device_attribute *attr,
 #define CXGB3_SHOW(name, val_expr) \
 static ssize_t format_##name(struct net_device *dev, char *buf) \
 { \
-   struct adapter *adap = dev-priv; \
+   struct port_info *pi = netdev_priv(dev); \
+   struct adapter *adap = pi-adapter; \
return sprintf(buf, %u\n, val_expr); \
 } \
 static ssize_t show_##name(struct device *d, struct device_attribute *attr, \
@@ -493,7 +497,8 @@ static ssize_t show_##name(struct device *d, struct 
device_attribute *attr, \
 
 static ssize_t set_nfilters(struct net_device *dev, unsigned int val)
 {
-   struct adapter *adap = dev-priv;
+   struct port_info *pi = netdev_priv(dev);
+   struct adapter *adap = pi-adapter;
int min_tids = is_offload(adap) ? MC5_MIN_TIDS : 0;
 
if (adap-flags  FULL_INIT_DONE)
@@ -515,7 +520,8 @@ static ssize_t store_nfilters(struct device *d, struct 
device_attribute *attr,
 
 static ssize_t set_nservers(struct net_device *dev, unsigned int val)
 {
-   struct adapter *adap = dev-priv;
+   struct port_info *pi = netdev_priv(dev);
+   struct adapter *adap = pi-adapter;
 
if (adap-flags  FULL_INIT_DONE)
return -EBUSY;
@@ -556,9 +562,10 @@ static struct attribute_group cxgb3_attr_group = {.attrs = 
cxgb3_attrs };
 static ssize_t tm_attr_show(struct device *d, struct device_attribute *attr,
char *buf, int sched)
 {
-   ssize_t len;
+   struct port_info *pi = netdev_priv(to_net_dev(d));
+   struct adapter *adap = pi-adapter;
unsigned int v, addr, bpt, cpt;
-   struct adapter *adap = to_net_dev(d)-priv;
+   ssize_t len;
 
addr = A_TP_TX_MOD_Q1_Q0_RATE_LIMIT - sched / 2;
rtnl_lock();
@@ -581,10 +588,11 @@ static ssize_t tm_attr_show(struct device *d, struct 
device_attribute *attr,
 static ssize_t tm_attr_store(struct device *d, struct device_attribute *attr,
 const char *buf, size_t len, int sched)
 {
+   struct port_info *pi = netdev_priv(to_net_dev(d));
+   struct adapter *adap = pi-adapter;
+   unsigned int val;
char *endp;
ssize_t ret;
-   unsigned int val;
-   struct adapter *adap = to_net_dev(d)-priv;
 
if (!capable(CAP_NET_ADMIN))
return -EPERM;
@@ -858,8

Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB

2007-08-23 Thread jamal

On Thu, 2007-23-08 at 18:04 -0400, jamal wrote:

 The litmus test is the same as any change that is supposed to improve
 net performance - it has to demonstrate it is not intrusive and that it
 improves (consistently) performance. The standard metrics are
 {throughput, cpu-utilization, latency} i.e as long as one improves and
 others remain zero, it would make sense. Yes, i am religious for
 batching after all the invested sweat (and i continue to work on it
 hoping to demystify) - the theory makes a lot of sense.

Before someone jumps and strangles me ;- By litmus test i meant as
applied to batching. [TSO already passed - iirc, it has been
demostranted to really not add much to throughput (cant improve much
over closeness to wire speed) but improve CPU utilization].

cheers,
jamal


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 02/28] NET: Share correct feature code between bridging and bonding

2007-08-23 Thread Greg KH

-stable review patch.  If anyone has any objections, please let us know.

--

[NET]: Share correct feature code between bridging and bonding

http://bugzilla.kernel.org/show_bug.cgi?id=8797 shows that the
bonding driver may produce bogus combinations of the checksum
flags and SG/TSO.

For example, if you bond devices with NETIF_F_HW_CSUM and
NETIF_F_IP_CSUM you'll end up with a bonding device that
has neither flag set.  If both have TSO then this produces
an illegal combination.

The bridge device on the other hand has the correct code to
deal with this.

In fact, the same code can be used for both.  So this patch
moves that logic into net/core/dev.c and uses it for both
bonding and bridging.

In the process I've made small adjustments such as only
setting GSO_ROBUST if at least one constituent device
supports it.

Signed-off-by: Herbert Xu [EMAIL PROTECTED]
Acked-by: David S. Miller [EMAIL PROTECTED]
Signed-off-by: Greg Kroah-Hartman [EMAIL PROTECTED]

---
 drivers/net/bonding/bond_main.c |   30 +-
 include/linux/netdevice.h   |2 ++
 net/bridge/br_device.c  |3 ++-
 net/bridge/br_if.c  |   28 
 net/core/dev.c  |   38 ++
 5 files changed, 55 insertions(+), 46 deletions(-)

--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1233,43 +1233,31 @@ int bond_sethwaddr(struct net_device *bo
return 0;
 }
 
-#define BOND_INTERSECT_FEATURES \
-   (NETIF_F_SG | NETIF_F_ALL_CSUM | NETIF_F_TSO | NETIF_F_UFO)
+#define BOND_VLAN_FEATURES \
+   (NETIF_F_VLAN_CHALLENGED | NETIF_F_HW_VLAN_RX | NETIF_F_HW_VLAN_TX | \
+NETIF_F_HW_VLAN_FILTER)
 
 /* 
  * Compute the common dev-feature set available to all slaves.  Some
- * feature bits are managed elsewhere, so preserve feature bits set on
- * master device that are not part of the examined set.
+ * feature bits are managed elsewhere, so preserve those feature bits
+ * on the master device.
  */
 static int bond_compute_features(struct bonding *bond)
 {
-   unsigned long features = BOND_INTERSECT_FEATURES;
struct slave *slave;
struct net_device *bond_dev = bond-dev;
+   unsigned long features = bond_dev-features  ~BOND_VLAN_FEATURES;
unsigned short max_hard_header_len = ETH_HLEN;
int i;
 
bond_for_each_slave(bond, slave, i) {
-   features = (slave-dev-features  BOND_INTERSECT_FEATURES);
+   features = netdev_compute_features(features,
+  slave-dev-features);
if (slave-dev-hard_header_len  max_hard_header_len)
max_hard_header_len = slave-dev-hard_header_len;
}
 
-   if ((features  NETIF_F_SG)  
-   !(features  NETIF_F_ALL_CSUM))
-   features = ~NETIF_F_SG;
-
-   /* 
-* features will include NETIF_F_TSO (NETIF_F_UFO) iff all 
-* slave devices support NETIF_F_TSO (NETIF_F_UFO), which 
-* implies that all slaves also support scatter-gather 
-* (NETIF_F_SG), which implies that features also includes 
-* NETIF_F_SG. So no need to check whether we have an  
-* illegal combination of NETIF_F_{TSO,UFO} and 
-* !NETIF_F_SG 
-*/
-
-   features |= (bond_dev-features  ~BOND_INTERSECT_FEATURES);
+   features |= (bond_dev-features  BOND_VLAN_FEATURES);
bond_dev-features = features;
bond_dev-hard_header_len = max_hard_header_len;
 
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1032,6 +1032,8 @@ extern void dev_seq_stop(struct seq_file
 
 extern void linkwatch_run_queue(void);
 
+extern int netdev_compute_features(unsigned long all, unsigned long one);
+
 static inline int net_gso_ok(int features, int gso_type)
 {
int feature = gso_type  NETIF_F_GSO_SHIFT;
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -179,5 +179,6 @@ void br_dev_setup(struct net_device *dev
dev-priv_flags = IFF_EBRIDGE;
 
dev-features = NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HIGHDMA |
-   NETIF_F_TSO | NETIF_F_NO_CSUM | NETIF_F_GSO_ROBUST;
+   NETIF_F_GSO_SOFTWARE | NETIF_F_NO_CSUM |
+   NETIF_F_GSO_ROBUST | NETIF_F_LLTX;
 }
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -360,35 +360,15 @@ int br_min_mtu(const struct net_bridge *
 void br_features_recompute(struct net_bridge *br)
 {
struct net_bridge_port *p;
-   unsigned long features, checksum;
+   unsigned long features;
 
-   checksum = br-feature_mask  NETIF_F_ALL_CSUM ? NETIF_F_NO_CSUM : 0;
-   features = br-feature_mask  ~NETIF_F_ALL_CSUM;
+   features = br-feature_mask;
 
list_for_each_entry(p, br-port_list, list) {
-   unsigned long feature = p-dev-features;
-
-   if (checksum  NETIF_F_NO_CSUM

Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB

2007-08-23 Thread David Miller

From: jamal [EMAIL PROTECTED]
Date: Thu, 23 Aug 2007 18:04:10 -0400

 Possibly a bug - but you really should turn off TSO if you are doing
 huge interactive transactions (which is fair because there is a clear
 demarcation).

I don't see how this can matter.

TSO only ever does anything if you accumulate more than one MSS
worth of data.

And when that does happen, all it does is take whats in the send queue
and send as much as possible at once.  The packets are already built
in big chunks, so there is no extra work to do.

The card is going to send the things back to back and as fast as
in the non-TSO case as well.

It doesn't change application scheduling, and it absolutely does not
penalize small sends by the application unless we have a bug
somewhere.

So I see no reason to disable TSO for any reason other than hardware
implementation deficiencies.  And for the drivers I am familiar with
they do make smart default TSO enabling decisions based upon how well
the chip does TSO.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB

2007-08-23 Thread Rick Jones


jamal wrote:

[TSO already passed - iirc, it has been
demostranted to really not add much to throughput (cant improve much
over closeness to wire speed) but improve CPU utilization].


In the one gig space sure, but in the 10 Gig space, TSO on/off does make a 
difference for throughput.


rick jones
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB

2007-08-23 Thread jamal

On Thu, 2007-23-08 at 15:35 -0700, Rick Jones wrote:
 jamal wrote:
  [TSO already passed - iirc, it has been
  demostranted to really not add much to throughput (cant improve much
  over closeness to wire speed) but improve CPU utilization].
 
 In the one gig space sure, but in the 10 Gig space, TSO on/off does make a 
 difference for throughput.

I am still so 1Gige;- I stand corrected again ;-

cheers,
jamal


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BUG: unable to handle kernel NULL pointer dereference - linux-2.6.22

2007-08-23 Thread Michal Piotrowski

[Adding netdev to CC]

On 21/08/07, poison [EMAIL PROTECTED] wrote:
 Hello,
 after running a few instances of bittorent-curses on 2.6.22 - 2.6.22.3 it
 takes about 15min to 2hrs for my System to hang. 2.6.21.7 is definately fine,
 2.6.21 probably (ran for 4hrs without hanging).
 If I'm lucky the Oops below makes it to my syslog (unfortunately SysRq-{p,s,i}
 doesn't work when it hangs, neither can I ssh into it):

 Aug 18 19:47:41 draco kernel: BUG: unable to handle kernel NULL pointer
 dereference at virtual address 
 Aug 18 19:47:41 draco kernel:  printing eip:
 Aug 18 19:47:41 draco kernel: c038fcba
 Aug 18 19:47:41 draco kernel: *pdpt = 33830001
 Aug 18 19:47:41 draco kernel: *pde = 
 Aug 18 19:47:41 draco kernel: Oops: 0002 [#1]
 Aug 18 19:47:41 draco kernel: SMP
 Aug 18 19:47:41 draco kernel: Modules linked in: snd_hda_intel snd_emu10k1
 cls_u32 sch_sfq sch_htb snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq
 snd_pcm_oss snd_mixer_oss rfcomm hidp l2cap nfsd exportfs lockd sunrpc
 coretemp hwmon eeprom snd_rawmidi snd_ac97_codec hci_usb ac97_bus
 snd_seq_device snd_util_mem snd_pcm bluetooth snd_hwdep snd_timer snd
 snd_page_alloc i2c_i801 emu10k1_gp gameport i2c_core sg
 Aug 18 19:47:41 draco kernel: CPU:0
 Aug 18 19:47:41 draco kernel: EIP:0060:[c038fcba]Not tainted VLI
 Aug 18 19:47:41 draco kernel: EFLAGS: 00210202   (2.6.22.2poison #14)
 Aug 18 19:47:41 draco kernel: EIP is at tcp_sendmsg+0x40a/0xb70
 Aug 18 19:47:41 draco kernel: eax:    ebx: ec5b807c   ecx: c04b43a0
 edx: ec5b807c
 Aug 18 19:47:41 draco kernel: esi: ec5b8000   edi: 0100   ebp: ec524180
 esp: f3a11d30
 Aug 18 19:47:41 draco kernel: ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss:
 0068
 Aug 18 19:47:41 draco kernel: Process bittorrent-curs (pid: 3974, ti=f3a1
 task=f3a0e000 task.ti=f3a1)
 Aug 18 19:47:41 draco kernel: Stack:  ebe562f5 000b 
 f3a11d94  ec5b807c 
 Aug 18 19:47:41 draco kernel:0001 00100100 f3a11f40 
 0040 0200 0200 04b6
 Aug 18 19:47:41 draco kernel:08604707 00200200 f3e5c798 eeaa4b40
  f3a0e000 01f5 00100100
 Aug 18 19:47:41 draco kernel: Call Trace:
 Aug 18 19:47:41 draco kernel:  [c03ac267] inet_sendmsg+0x37/0x70
 Aug 18 19:47:41 draco kernel:  [c03511ef] sock_sendmsg+0xbf/0xf0
 Aug 18 19:47:41 draco kernel:  [c012fe60] autoremove_wake_function+0x0/0x50
 Aug 18 19:47:41 draco kernel:  [c01188f0] default_wake_function+0x0/0x10
 Aug 18 19:47:41 draco last message repeated 3 times
 Aug 18 19:47:41 draco kernel:  [c015589d] find_extend_vma+0x1d/0x70
 Aug 18 19:47:41 draco kernel:  [c03515cf] sys_sendto+0x12f/0x180
 Aug 18 19:47:41 draco kernel:  [c0139dfc] futex_wake+0xac/0xd0
 Aug 18 19:47:41 draco kernel:  [c013a4dd] do_futex+0x6bd/0xbd0
 Aug 18 19:47:41 draco kernel:  [c0351653] sys_send+0x33/0x40
 Aug 18 19:47:41 draco kernel:  [c03525c2] sys_socketcall+0x142/0x280
 Aug 18 19:47:41 draco kernel:  [c0205d20] copy_to_user+0x30/0x60
 Aug 18 19:47:41 draco kernel:  [c0102a92] syscall_call+0x7/0xb
 Aug 18 19:47:41 draco kernel:  ===
 Aug 18 19:47:41 draco kernel: Code: 85 fb 06 00 00 80 ca 10 8b 83 94 00 00 00
 88 53 68 f0 81 00 00 00 01 00 8b 44 24 18 ff 40 08 8b 54 24 18 8b 42 04 89 13
 89 43 04 89 18 89 5a 04 8b 8e 2c 01 00 00 85 c9 0f 84 19 06 00 00 8b 83
 Aug 18 19:47:41 draco kernel: EIP: [c038fcba] tcp_sendmsg+0x40a/0xb70 SS:ESP
 0068:f3a11d30
 Aug 18 19:47:51 draco kernel:
 Aug 18 19:47:51 draco kernel: Pid: 3812, comm:X
 Aug 18 19:47:51 draco kernel: EIP: 0060:[c014a4c2] CPU: 0
 Aug 18 19:47:51 draco kernel: EIP is at __get_free_pages+0x22/0x40
 Aug 18 19:47:51 draco kernel:  EFLAGS: 3246Not tainted
 (2.6.22.2poison #14)
 Aug 18 19:47:51 draco kernel: EAX: 00d0 EBX: 00d0 ECX: c0496b40 EDX:
 
 Aug 18 19:47:51 draco kernel: ESI:  EDI: f5ba1be4 EBP: f49a4d80 DS:
 007b ES: 007b FS: 00d8
 Aug 18 19:47:51 draco kernel: CR0: 8005003b CR2: b7384000 CR3: 37165000 CR4:
 06f0
 Aug 18 19:47:51 draco kernel:  [c01734b6] __pollwait+0xa6/0x100
 Aug 18 19:47:51 draco kernel:  [c03c9597] unix_poll+0x17/0xa0
 Aug 18 19:47:51 draco kernel:  [c03500bc] sock_poll+0xc/0x10
 Aug 18 19:47:51 draco kernel:  [c0172bec] do_select+0x25c/0x490
 Aug 18 19:47:51 draco kernel:  [c0173410] __pollwait+0x0/0x100
 Aug 18 19:47:51 draco kernel:  [c01188f0] default_wake_function+0x0/0x10
 Aug 18 19:47:51 draco last message repeated 19 times
 Aug 18 19:47:51 draco kernel:  [c0172fe8] core_sys_select+0x1c8/0x2f0
 Aug 18 19:47:51 draco kernel:  [c0166a30] do_readv_writev+0x120/0x190
 Aug 18 19:47:51 draco kernel:  [c03503c0] sock_aio_write+0x0/0x110
 Aug 18 19:47:51 draco kernel:  [c017355d] sys_select+0x4d/0x1b0
 Aug 18 19:47:51 draco kernel:  [c0166adc] vfs_writev+0x3c/0x50
 Aug 18 19:47:51 draco kernel:  [c0166f97] sys_writev+0x47/0x80
 Aug 18 19:47:51 draco kernel:  [c0102a92] syscall_call+0x7/0xb
 Aug 18 19:47:51

2.6.22.5 forcedeth timeout hang

2007-08-23 Thread Mr. Berkley Shands


100% reproducible hang on xmit timeout.
Just do a make -j4 modules on an nfs mounted kernel source.

attached is the messages log

berkley
--

// E. F. Berkley Shands, MSc//

** Exegy Inc.**

349 Marshall Road, Suite 100

St. Louis , MO  63119

Direct:  (314) 218-3600 X450

Cell:  (314) 303-2546

Office:  (314) 218-3600

Fax:  (314) 218-3601



The Usual Disclaimer follows...

This e-mail and any documents accompanying it may contain legally privileged 
and/or confidential information belonging to Exegy, Inc. Such information may 
be protected from disclosure by law. The information is intended for use by 
only the addressee. If you are not the intended recipient, you are hereby 
notified that any disclosure or use of the information is strictly prohibited. 
If you have received this e-mail in error, please immediately contact the 
sender by e-mail or phone regarding instructions for return or destruction and 
do not use or disclose the content to others.
Aug 23 18:34:55 crash kernel: [30819.690155] NETDEV WATCHDOG: eth1: transmit 
timed out
Aug 23 18:34:55 crash kernel: [30819.690162] eth1: Got tx_timeout. irq: 0036
Aug 23 18:34:55 crash kernel: [30819.690164] eth1: Ring at 16e086000
Aug 23 18:34:55 crash kernel: [30819.690166] eth1: Dumping tx registers
Aug 23 18:34:55 crash kernel: [30819.690171]   0: 0036 00ff 0003 
024e03ca    
Aug 23 18:34:55 crash kernel: [30819.690176]  20: 06255300 ff701365  
    
Aug 23 18:34:55 crash kernel: [30819.690181]  40: 0420e20e a855 2e20 
    
Aug 23 18:34:55 crash kernel: [30819.690186]  60:    
    
Aug 23 18:34:55 crash kernel: [30819.690192]  80: 003b0f3c 0001 0004 
007f0020 061c 0001 0020 7f87
Aug 23 18:34:55 crash kernel: [30819.690197]  a0: 0014050f 0016 5781e000 
020a 0001  a800cccd fcf5
Aug 23 18:34:55 crash kernel: [30819.690203]  c0: 1002 0001 0001 
0001 0001 0001 0001 0001
Aug 23 18:34:55 crash kernel: [30819.690207]  e0: 0001 0001 0001 
0001 0001 0001 0001 0001
Aug 23 18:34:55 crash kernel: [30819.690213] 100: 6e086800 6e086000 007f00ff 
8000 00010032  002c 6e0874c0
Aug 23 18:34:55 crash kernel: [30819.690220] 120: 6e086360 1ca37240 a000ffeb 
  6e0874cc 6e08636c 0fe08000
Aug 23 18:34:55 crash kernel: [30819.690225] 140: 00304120 80002600 0001 
0001    
Aug 23 18:34:55 crash kernel: [30819.690229] 160:    
    
Aug 23 18:34:55 crash kernel: [30819.690235] 180: 0016 0008 0194796d 
8103 002a 3800 0194000f 0003
Aug 23 18:34:55 crash kernel: [30819.690241] 1a0: 0016 0008 0194796d 
8103 002a 3800 0194000f 0003
Aug 23 18:34:55 crash kernel: [30819.690246] 1c0: 0016 0008 0194796d 
8103 002a 3800 0194000f 0003
Aug 23 18:34:55 crash kernel: [30819.690252] 1e0: 0016 0008 0194796d 
8103 002a 3800 0194000f 0003
Aug 23 18:34:55 crash kernel: [30819.690257] 200:    
    
Aug 23 18:34:55 crash kernel: [30819.690261] 220:    
    
Aug 23 18:34:55 crash kernel: [30819.690266] 240:    
    
Aug 23 18:34:55 crash kernel: [30819.690271] 260:   fe020001 
0100   7e020001 0100
Aug 23 18:34:55 crash kernel: [30819.690276] 280:    
    
Aug 23 18:34:55 crash kernel: [30819.690280] 2a0:    
    
Aug 23 18:34:55 crash kernel: [30819.690285] 2c0:    
  0001 0001 0001
Aug 23 18:34:55 crash kernel: [30819.690287] eth1: Dumping tx ring
Aug 23 18:34:55 crash kernel: [30819.690292] 000:  8fd00892 2052 // 
 88115c92 2052 //  875ae892 2052 //  8a660492 
2052
Aug 23 18:34:55 crash kernel: [30819.690298] 004: 0001 61fdb492 2052 // 
 8bf3f892 2052 //  8daa7092 2052 //  8fa29892 
2052
Aug 23 18:34:55 crash kernel: [30819.690304] 008: 0001 0d558892 2052 // 
 8e0bf892 2052 //  8fd00492 2052 //  8d160092 
2052
Aug 23 18:34:55 crash kernel: [30819.690310] 00c: 0001 27698092 2052 // 
 7fc6cc92 2052 //  8d03ec92 2052 //  88085492 
2052
Aug 23 18:34:55 crash kernel: [30819.690317] 010:  850ee492 2052 // 
 8bba8c92 2052 // 0001 56108492 2052 //

[PATCH 14/30] net: Kill some unneeded allocation return value casts in libertas

2007-08-23 Thread Jesper Juhl

kmalloc() and friends return void*, no need to cast it.

Signed-off-by: Jesper Juhl [EMAIL PROTECTED]
---
 drivers/net/wireless/libertas/debugfs.c |2 +-
 drivers/net/wireless/libertas/ethtool.c |3 +--
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/libertas/debugfs.c 
b/drivers/net/wireless/libertas/debugfs.c
index 715cbda..6ade63e 100644
--- a/drivers/net/wireless/libertas/debugfs.c
+++ b/drivers/net/wireless/libertas/debugfs.c
@@ -1839,7 +1839,7 @@ static ssize_t wlan_debugfs_write(struct file *f, const 
char __user *buf,
char *p2;
struct debug_data *d = (struct debug_data *)f-private_data;
 
-   pdata = (char *)kmalloc(cnt, GFP_KERNEL);
+   pdata = kmalloc(cnt, GFP_KERNEL);
if (pdata == NULL)
return 0;
 
diff --git a/drivers/net/wireless/libertas/ethtool.c 
b/drivers/net/wireless/libertas/ethtool.c
index 96f1974..7dad493 100644
--- a/drivers/net/wireless/libertas/ethtool.c
+++ b/drivers/net/wireless/libertas/ethtool.c
@@ -60,8 +60,7 @@ static int libertas_ethtool_get_eeprom(struct net_device *dev,
 
 //  mutex_lock(priv-mutex);
 
-   adapter-prdeeprom =
-   (char *)kmalloc(eeprom-len+sizeof(regctrl), GFP_KERNEL);
+   adapter-prdeeprom = kmalloc(eeprom-len+sizeof(regctrl), GFP_KERNEL);
if (!adapter-prdeeprom)
return -ENOMEM;
memcpy(adapter-prdeeprom, regctrl, sizeof(regctrl));
-- 
1.5.2.2

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 16/30] net: Avoid pointless allocation casts in BSD compression module

2007-08-23 Thread Jesper Juhl

The general kernel memory allocation functions return void pointers
and there is no need to cast their return values.

Signed-off-by: Jesper Juhl [EMAIL PROTECTED]
---
 drivers/net/bsd_comp.c |6 ++
 1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/bsd_comp.c b/drivers/net/bsd_comp.c
index 202d4a4..88edb98 100644
--- a/drivers/net/bsd_comp.c
+++ b/drivers/net/bsd_comp.c
@@ -406,8 +406,7 @@ static void *bsd_alloc (unsigned char *options, int 
opt_len, int decomp)
  * Allocate space for the dictionary. This may be more than one page in
  * length.
  */
-db-dict = (struct bsd_dict *) vmalloc (hsize *
-   sizeof (struct bsd_dict));
+db-dict = vmalloc(hsize * sizeof(struct bsd_dict));
 if (!db-dict)
   {
bsd_free (db);
@@ -426,8 +425,7 @@ static void *bsd_alloc (unsigned char *options, int 
opt_len, int decomp)
  */
 else
   {
-db-lens = (unsigned short *) vmalloc ((maxmaxcode + 1) *
-  sizeof (db-lens[0]));
+db-lens = vmalloc((maxmaxcode + 1) * sizeof(db-lens[0]));
if (!db-lens)
  {
bsd_free (db);
-- 
1.5.2.2

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Problem with implementation of TCP_DEFER_ACCEPT?

2007-08-23 Thread TJ

I'd welcome the views of those familiar with TCP_DEFER_ACCEPT on a
recent issue I've worked on where connections between a Juniper DX (aka
redline) load-balancer and Apache 2.2 cluster caused random connection
failures.

Today, after 2 weeks debugging the issue, we confirmed the problem was
related to TCP_DEFER_ACCEPT. Part of the issue is caused by Juniper's
implementation of persistent connections, but there remains a question
as to whether the Linux kernel is correctly handling handshakes when a
listening socket has TCP_DEFER_ACCEPT enabled.

Upon reflection, and after having worked with the RFCs this past few
weeks, I'm finding myself doubting the kernel's TCP_DEFER_ACCEPT
implementation.

Also, I'm unable to locate an RFC or other specification for
TCP_DEFER_ACCEPT aka BSD's SO_ACCEPTFILTER - can you point me to one?

The complete background and observations of the original problem and the
workaround are available here:

https://bugs.launchpad.net/ubuntu/+bug/134274

My specific concerns are explained in the following comments, for which
I'd appreciate your views. 



An RFC 793 standard TCP handshake requires three packets:

client SYN  server LISTENING
client  SYN ACK server SYN_RECEIVED
client ACK  server ESTABLISHED

client PSH ACK + data  server

TCP_DEFER_ACCEPT is designed to increase performance by reducing the
number of TCP packets exchanged before the client can pass data:

client SYN  server LISTENING
client  SYN ACK server SYN_RECEIVED

client PSH ACK + data  server ESTABLISHED

At present with TCP_DEFER_ACCEPT the kernel treats the RFC 793 handshake
as invalid; dropping the ACK from the client without replying so the
client doesn't know the server has in fact set it's internal ACKed flag.

If the client doesn't send a packet containing data before the SYN_ACK
time-outs finally expire the connection will be dropped.

For a client obeying RFC 793 what we see is:

client SYN  server LISTENING
client  SYN ACK server SYN_RECEIVED (time-out 3s)
 server: inet_rsk(req)-acked = 1

client ACK  server (discarded)

client  SYN ACK (DUP) server (time-out 6s)
client ACK (DUP)  server (discarded)

client  SYN ACK (DUP) server (time-out 12s)
client ACK (DUP)  server (discarded)

client  SYN ACK (DUP) server (time-out 24s)
client ACK (DUP)  server (discarded)

client  SYN ACK (DUP) server (time-out 48s)
client ACK (DUP)  server (discarded)

client  SYN ACK (DUP) server (time-out 96s)
client ACK (DUP)  server (discarded)

server: half-open socket closed.

With each client ACK being dropped by the kernel's TCP_DEFER_ACCEPT
mechanism eventually the handshake fails after the 'SYN ACK' retries and
time-outs expire.

There is a case for arguing the kernel should be operating in an
enhanced handshaking mode when TCP_DEFER_ACCEPT is enabled, not an
alternative mode, and therefore should accept *both* RFC 793 and
TCP_DEFER_ACCEPT. I've been unable to find a specification or RFC for
implementing TCP_DEFER_ACCEPT aka BSD's SO_ACCEPTFILTER to give me firm
guidance.

It seems incorrect to penalise a client that is trying to complete the
handshake according to the RFC 793 specification, especially as the
client has no way of knowing ahead of time whether or not the server is
operating deferred accept.

---

net/ipv4/tcp_minisocks.c::tcp_check_req() implements the
TCP_DEFER_ACCEPT check:

/* If TCP_DEFER_ACCEPT is set, drop bare ACK. */
if (inet_csk(sk)-icsk_accept_queue.rskq_defer_accept 
TCP_SKB_CB(skb)-end_seq == tcp_rsk(req)-rcv_isn + 1) {

inet_rsk(req)-acked = 1;
return NULL;
}



Thanks

TJ.
Ubuntu ACPI Kernel Team


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/3] [IPROUTE2] ip: xfrm: Fix policy and state flags.

2007-08-23 Thread Masahide NAKAMURA

o Support policy flag with string format.
  Note that kernel defines only one name localok for the flag
  and it has not had any effect currently.
o Support state flag value XFRM_STATE_NOPMTUDISC.
o Fix to show detailed flags value when -s option is used.
o Fix minor typo.

Signed-off-by: Masahide NAKAMURA [EMAIL PROTECTED]
---
 ip/ipxfrm.c  |   18 +---
 ip/xfrm.h|1 +
 ip/xfrm_policy.c |   55 -
 ip/xfrm_state.c  |6 +++-
 4 files changed, 72 insertions(+), 8 deletions(-)

diff --git a/ip/ipxfrm.c b/ip/ipxfrm.c
index d9b0e3b..359a2d2 100644
--- a/ip/ipxfrm.c
+++ b/ip/ipxfrm.c
@@ -745,12 +745,13 @@ void xfrm_state_info_print(struct xfrm_usersa_info 
*xsinfo,
fprintf(fp, flag );
XFRM_FLAG_PRINT(fp, flags, XFRM_STATE_NOECN, noecn);
XFRM_FLAG_PRINT(fp, flags, XFRM_STATE_DECAP_DSCP, decap-dscp);
+   XFRM_FLAG_PRINT(fp, flags, XFRM_STATE_NOPMTUDISC, nopmtudisc);
XFRM_FLAG_PRINT(fp, flags, XFRM_STATE_WILDRECV, wildrecv);
if (flags)
fprintf(fp, %x, flags);
-   if (show_stats  0)
-   fprintf(fp,  (0x%s), strxf_mask8(flags));
}
+   if (show_stats  0)
+   fprintf(fp,  (0x%s), strxf_mask8(xsinfo-flags));
fprintf(fp, %s, _SL_);
 
xfrm_xfrma_print(tb, xsinfo-family, fp, buf);
@@ -845,10 +846,19 @@ void xfrm_policy_info_print(struct xfrm_userpolicy_info 
*xpinfo,
}
fprintf(fp,  );
 
-   if (show_stats  0) {
+   if (show_stats  0)
fprintf(fp, share %s , strxf_share(xpinfo-share));
-   fprintf(fp, flag 0x%s, strxf_mask8(xpinfo-flags));
+
+   if (show_stats  0 || xpinfo-flags) {
+   __u8 flags = xpinfo-flags;
+
+   fprintf(fp, flag );
+   XFRM_FLAG_PRINT(fp, flags, XFRM_POLICY_LOCALOK, localok);
+   if (flags)
+   fprintf(fp, %x, flags);
}
+   if (show_stats  0)
+   fprintf(fp,  (0x%s), strxf_mask8(xpinfo-flags));
fprintf(fp, %s, _SL_);
 
if (show_stats  0)
diff --git a/ip/xfrm.h b/ip/xfrm.h
index 71345b9..335c2a5 100644
--- a/ip/xfrm.h
+++ b/ip/xfrm.h
@@ -98,6 +98,7 @@ struct xfrm_filter {
__u32 index_mask;
__u8 action_mask;
__u32 priority_mask;
+   __u8 policy_flags_mask;
 
__u8 ptype;
__u8 ptype_mask;
diff --git a/ip/xfrm_policy.c b/ip/xfrm_policy.c
index f4488ac..419ca67 100644
--- a/ip/xfrm_policy.c
+++ b/ip/xfrm_policy.c
@@ -54,10 +54,10 @@ static void usage(void) __attribute__((noreturn));
 static void usage(void)
 {
fprintf(stderr, Usage: ip xfrm policy { add | update } dir DIR 
SELECTOR [ index INDEX ] [ ptype PTYPE ]\n);
-   fprintf(stderr, [ action ACTION ] [ priority PRIORITY ] [ 
LIMIT-LIST ] [ TMPL-LIST ]\n);
+   fprintf(stderr, [ action ACTION ] [ priority PRIORITY ] [ flag 
FLAG-LIST ] [ LIMIT-LIST ] [ TMPL-LIST ]\n);
fprintf(stderr, Usage: ip xfrm policy { delete | get } dir DIR [ 
SELECTOR | index INDEX ] [ ptype PTYPE ]\n);
fprintf(stderr, Usage: ip xfrm policy { deleteall | list } [ dir DIR ] 
[ SELECTOR ]\n);
-   fprintf(stderr, [ index INDEX ] [ action ACTION ] [ priority 
PRIORITY ]\n);
+   fprintf(stderr, [ index INDEX ] [ action ACTION ] [ priority 
PRIORITY ]  [ flag FLAG-LIST ]\n);
fprintf(stderr, Usage: ip xfrm policy flush [ ptype PTYPE ]\n);
fprintf(stderr, Usage: ip xfrm count\n);
fprintf(stderr, PTYPE := [ main | sub ](default=main)\n);
@@ -74,6 +74,9 @@ static void usage(void)
 
//fprintf(stderr, PRIORITY - priority value(default=0)\n);
 
+   fprintf(stderr, FLAG-LIST := [ FLAG-LIST ] FLAG\n);
+   fprintf(stderr, FLAG := [ localok ]\n);
+
fprintf(stderr, LIMIT-LIST := [ LIMIT-LIST ] | [ limit LIMIT ]\n);
fprintf(stderr, LIMIT := [ 
[time-soft|time-hard|time-use-soft|time-use-hard] SECONDS ] |\n);
fprintf(stderr,  [ [byte-soft|byte-hard] SIZE ] | [ 
[packet-soft|packet-hard] NUMBER ]\n);
@@ -135,6 +138,39 @@ static int xfrm_policy_ptype_parse(__u8 *ptype, int 
*argcp, char ***argvp)
return 0;
 }
 
+static int xfrm_policy_flag_parse(__u8 *flags, int *argcp, char ***argvp)
+{
+   int argc = *argcp;
+   char **argv = *argvp;
+   int len = strlen(*argv);
+
+   if (len  2  strncmp(*argv, 0x, 2) == 0) {
+   __u8 val = 0;
+
+   if (get_u8(val, *argv, 16))
+   invarg(\FLAG\ is invalid, *argv);
+   *flags = val;
+   } else {
+   while (1) {
+   if (strcmp(*argv, localok) == 0)
+   *flags |= XFRM_POLICY_LOCALOK;
+   else {
+   PREV_ARG(); /* back track */
+   break;
+

[PATCH 1/3] [IPROUTE2] ip: xfrm: Clean-up for internal mask to filter.

2007-08-23 Thread Masahide NAKAMURA

Remove unused or redundant usage for xfrm_filter.

Signed-off-by: Masahide NAKAMURA [EMAIL PROTECTED]
---
 ip/xfrm_policy.c |   17 -
 ip/xfrm_state.c  |2 --
 2 files changed, 0 insertions(+), 19 deletions(-)

diff --git a/ip/xfrm_policy.c b/ip/xfrm_policy.c
index c1086f1..f4488ac 100644
--- a/ip/xfrm_policy.c
+++ b/ip/xfrm_policy.c
@@ -222,16 +222,10 @@ static int xfrm_policy_modify(int cmd, unsigned flags, 
int argc, char **argv)
 
NEXT_ARG();
xfrm_policy_dir_parse(req.xpinfo.dir, argc, argv);
-
-   filter.dir_mask = XFRM_FILTER_MASK_FULL;
-
} else if (strcmp(*argv, index) == 0) {
NEXT_ARG();
if (get_u32(req.xpinfo.index, *argv, 0))
invarg(\INDEX\ is invalid, *argv);
-
-   filter.index_mask = XFRM_FILTER_MASK_FULL;
-
} else if (strcmp(*argv, ptype) == 0) {
if (ptypep)
duparg(ptype, *argv);
@@ -239,9 +233,6 @@ static int xfrm_policy_modify(int cmd, unsigned flags, int 
argc, char **argv)
 
NEXT_ARG();
xfrm_policy_ptype_parse(upt.type, argc, argv);
-
-   filter.dir_mask = XFRM_FILTER_MASK_FULL;
-
} else if (strcmp(*argv, action) == 0) {
NEXT_ARG();
if (strcmp(*argv, allow) == 0)
@@ -250,16 +241,10 @@ static int xfrm_policy_modify(int cmd, unsigned flags, 
int argc, char **argv)
req.xpinfo.action = XFRM_POLICY_BLOCK;
else
invarg(\action\ value is invalid\n, *argv);
-
-   filter.action_mask = XFRM_FILTER_MASK_FULL;
-
} else if (strcmp(*argv, priority) == 0) {
NEXT_ARG();
if (get_u32(req.xpinfo.priority, *argv, 0))
invarg(\PRIORITY\ is invalid, *argv);
-
-   filter.priority_mask = XFRM_FILTER_MASK_FULL;
-
} else if (strcmp(*argv, limit) == 0) {
NEXT_ARG();
xfrm_lifetime_cfg_parse(req.xpinfo.lft, argc, argv);
@@ -888,8 +873,6 @@ static int xfrm_policy_flush(int argc, char **argv)
 
NEXT_ARG();
xfrm_policy_ptype_parse(upt.type, argc, argv);
-
-   filter.dir_mask = XFRM_FILTER_MASK_FULL;
} else
invarg(unknown, *argv);
 
diff --git a/ip/xfrm_state.c b/ip/xfrm_state.c
index 54e1330..2b68f49 100644
--- a/ip/xfrm_state.c
+++ b/ip/xfrm_state.c
@@ -216,8 +216,6 @@ static int xfrm_state_flag_parse(__u8 *flags, int *argcp, 
char ***argvp)
}
}
 
-   filter.state_flags_mask = XFRM_FILTER_MASK_FULL;
-
*argcp = argc;
*argvp = argv;
 
-- 
1.4.4.2

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/3] [IPROUTE2] ip: xfrm: Fix flush message.

2007-08-23 Thread Masahide NAKAMURA

Fix xfrm state or policy flush message.
And minor updates are included:
o Use static buffer to show unknown value as string.
o Show policy type (ptype) only when kernel specified it.
o Clean-up xfrm_monitor.

Signed-off-by: Masahide NAKAMURA [EMAIL PROTECTED]
---
 ip/ipxfrm.c   |   48 +
 ip/xfrm.h |1 +
 ip/xfrm_monitor.c |  122 +---
 ip/xfrm_state.c   |1 -
 4 files changed, 117 insertions(+), 55 deletions(-)

diff --git a/ip/ipxfrm.c b/ip/ipxfrm.c
index 359a2d2..80dbb52 100644
--- a/ip/ipxfrm.c
+++ b/ip/ipxfrm.c
@@ -114,6 +114,7 @@ struct typeent {
 static const struct typeent xfrmproto_types[]= {
{ esp, IPPROTO_ESP }, { ah, IPPROTO_AH }, { comp, IPPROTO_COMP },
{ route2, IPPROTO_ROUTING }, { hao, IPPROTO_DSTOPTS },
+   { ipsec-any, IPSEC_PROTO_ANY },
{ NULL, -1 }
 };
 
@@ -135,6 +136,7 @@ int xfrm_xfrmproto_getbyname(char *name)
 
 const char *strxf_xfrmproto(__u8 proto)
 {
+   static char str[16];
int i;
 
for (i = 0; ; i++) {
@@ -146,7 +148,8 @@ const char *strxf_xfrmproto(__u8 proto)
return t-t_name;
}
 
-   return NULL;
+   sprintf(str, %u, proto);
+   return str;
 }
 
 static const struct typeent algo_types[]= {
@@ -172,6 +175,7 @@ int xfrm_algotype_getbyname(char *name)
 
 const char *strxf_algotype(int type)
 {
+   static char str[32];
int i;
 
for (i = 0; ; i++) {
@@ -183,7 +187,8 @@ const char *strxf_algotype(int type)
return t-t_name;
}
 
-   return NULL;
+   sprintf(str, %d, type);
+   return str;
 }
 
 const char *strxf_mask8(__u8 mask)
@@ -251,6 +256,25 @@ const char *strxf_proto(__u8 proto)
return p;
 }
 
+const char *strxf_ptype(__u8 ptype)
+{
+   static char str[16];
+
+   switch (ptype) {
+   case XFRM_POLICY_TYPE_MAIN:
+   strcpy(str, main);
+   break;
+   case XFRM_POLICY_TYPE_SUB:
+   strcpy(str, sub);
+   break;
+   default:
+   sprintf(str, %u, ptype);
+   break;
+   }
+
+   return str;
+}
+
 void xfrm_id_info_print(xfrm_address_t *saddr, struct xfrm_id *id,
__u8 mode, __u32 reqid, __u16 family, int force_spi,
FILE *fp, const char *prefix, const char *title)
@@ -776,7 +800,6 @@ void xfrm_policy_info_print(struct xfrm_userpolicy_info 
*xpinfo,
const char *title)
 {
char buf[STRBUF_SIZE];
-   __u8 ptype = XFRM_POLICY_TYPE_MAIN;
 
memset(buf, '\0', sizeof(buf));
 
@@ -821,31 +844,18 @@ void xfrm_policy_info_print(struct xfrm_userpolicy_info 
*xpinfo,
fprintf(fp, index %u , xpinfo-index);
fprintf(fp, priority %u , xpinfo-priority);
 
-   fprintf(fp, ptype );
-
if (tb[XFRMA_POLICY_TYPE]) {
struct xfrm_userpolicy_type *upt;
 
+   fprintf(fp, ptype );
+
if (RTA_PAYLOAD(tb[XFRMA_POLICY_TYPE])  sizeof(*upt))
fprintf(fp, (ERROR truncated));
 
upt = (struct xfrm_userpolicy_type 
*)RTA_DATA(tb[XFRMA_POLICY_TYPE]);
-   ptype = upt-type;
+   fprintf(fp, %s , strxf_ptype(upt-type));
}
 
-   switch (ptype) {
-   case XFRM_POLICY_TYPE_MAIN:
-   fprintf(fp, main);
-   break;
-   case XFRM_POLICY_TYPE_SUB:
-   fprintf(fp, sub);
-   break;
-   default:
-   fprintf(fp, %u, ptype);
-   break;
-   }
-   fprintf(fp,  );
-
if (show_stats  0)
fprintf(fp, share %s , strxf_share(xpinfo-share));
 
diff --git a/ip/xfrm.h b/ip/xfrm.h
index 335c2a5..930bb3f 100644
--- a/ip/xfrm.h
+++ b/ip/xfrm.h
@@ -127,6 +127,7 @@ const char *strxf_mask8(__u8 mask);
 const char *strxf_mask32(__u32 mask);
 const char *strxf_share(__u8 share);
 const char *strxf_proto(__u8 proto);
+const char *strxf_ptype(__u8 ptype);
 void xfrm_id_info_print(xfrm_address_t *saddr, struct xfrm_id *id,
__u8 mode, __u32 reqid, __u16 family, int force_spi,
FILE *fp, const char *prefix, const char *title);
diff --git a/ip/xfrm_monitor.c b/ip/xfrm_monitor.c
index bdbf4a6..dc12fca 100644
--- a/ip/xfrm_monitor.c
+++ b/ip/xfrm_monitor.c
@@ -50,12 +50,6 @@ static int xfrm_acquire_print(const struct sockaddr_nl *who,
struct rtattr * tb[XFRMA_MAX+1];
__u16 family;
 
-   if (n-nlmsg_type != XFRM_MSG_ACQUIRE) {
-   fprintf(stderr, Not an acquire: %08x %08x %08x\n,
-   n-nlmsg_len, n-nlmsg_type, n-nlmsg_flags);
-   return 0;
-   }
-
len -= NLMSG_LENGTH(sizeof(*xacq));
if (len  0) {
fprintf(stderr, BUG: wrong nlmsg len %d\n, len);
@@ -108,6 +102,74 @@ static int xfrm_acquire_print(const struct sockaddr_nl 
*who,

[PATCH 0/3] [IPROUTE2] ip command updates

2007-08-23 Thread Masahide NAKAMURA

Hello,

There are updates for ip command. They are almost minor fixes
and are not changes about 2.6.23 new features.

Please apply if it is not too late for next release.

-- 
Masahide NAKAMURA
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/9] s2io, rename BIT macro

2007-08-23 Thread Richard Knutsson


Jiri Slaby wrote:

s2io, rename BIT macro

BIT macro will be global definiton of (1x)

Signed-off-by: Jiri Slaby [EMAIL PROTECTED]

---
  

[snip]

cnt++;
if (cnt == 5)
diff --git a/drivers/net/s2io.h b/drivers/net/s2io.h
index 92983ee..448f899 100644
--- a/drivers/net/s2io.h
+++ b/drivers/net/s2io.h
@@ -14,7 +14,7 @@
 #define _S2IO_H
 
 #define TBD 0

-#define BIT(loc)   (0x8000ULL  (loc))
+#define s2BIT(loc) (0x8000ULL  (loc))
 #define vBIT(val, loc, sz) (((u64)val)  (64-loc-sz))
 #define INV(d)  ((d0xff)24) | (((d8)0xff)16) | (((d16)0xff)8)| 
((d24)0xff)
 
  
Sorry for the late response, but would it not be better/easier to use 
BIT() instead (or a global #define LLBIT(nr)  (1ULL  (nr))) and just 
recalculate the values?


Richard Knutsson

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB

2007-08-23 Thread Bill Fink

On Thu, 23 Aug 2007, Rick Jones wrote:

 jamal wrote:
  [TSO already passed - iirc, it has been
  demostranted to really not add much to throughput (cant improve much
  over closeness to wire speed) but improve CPU utilization].
 
 In the one gig space sure, but in the 10 Gig space, TSO on/off does make a 
 difference for throughput.

Not too much.

TSO enabled:

[EMAIL PROTECTED] ~]# ethtool -k eth2
Offload parameters for eth2:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on

[EMAIL PROTECTED] ~]# nuttcp -w10m 192.168.88.16
11813.4375 MB /  10.00 sec = 9906.1644 Mbps 99 %TX 80 %RX

TSO disabled:

[EMAIL PROTECTED] ~]# ethtool -K eth2 tso off
[EMAIL PROTECTED] ~]# ethtool -k eth2
Offload parameters for eth2:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: off

[EMAIL PROTECTED] ~]# nuttcp -w10m 192.168.88.16
11818.2500 MB /  10.00 sec = 9910.0176 Mbps 100 %TX 78 %RX

Pretty negligible difference it seems.

This is with a 2.6.20.7 kernel, Myricom 10-GigE NICs, and 9000 byte
jumbo frames, in a LAN environment.

For grins, I also did a couple of tests with an MSS of 1460 to
emulate a standard 1500 byte Ethernet MTU.

TSO enabled:

[EMAIL PROTECTED] ~]# ethtool -k eth2
Offload parameters for eth2:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on

[EMAIL PROTECTED] ~]# nuttcp -M1460 -w10m 192.168.88.16
 5102.8503 MB /  10.06 sec = 4253.9124 Mbps 39 %TX 99 %RX

TSO disabled:

[EMAIL PROTECTED] ~]# ethtool -K eth2 tso off
[EMAIL PROTECTED] ~]# ethtool -k eth2
Offload parameters for eth2:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: off

[EMAIL PROTECTED] ~]# nuttcp -M1460 -w10m 192.168.88.16
 5399.5625 MB /  10.00 sec = 4527.9070 Mbps 99 %TX 76 %RX

Here you can see there is a major difference in the TX CPU utilization
(99 % with TSO disabled versus only 39 % with TSO enabled), although
the TSO disabled case was able to squeeze out a little extra performance
from its extra CPU utilization.  Interestingly, with TSO enabled, the
receiver actually consumed more CPU than with TSO disabled, so I guess
the receiver CPU saturation in that case (99 %) was what restricted
its performance somewhat (this was consistent across a few test runs).

-Bill
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[no subject]

2007-08-23 Thread Eugene Teo

subscribe netdev
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB

2007-08-23 Thread Stephen Hemminger

On Thu, 23 Aug 2007 18:38:22 -0400
jamal [EMAIL PROTECTED] wrote:

 On Thu, 2007-23-08 at 15:30 -0700, David Miller wrote:
  From: jamal [EMAIL PROTECTED]
  Date: Thu, 23 Aug 2007 18:04:10 -0400

   Possibly a bug - but you really should turn off TSO if you are doing
   huge interactive transactions (which is fair because there is a clear
   demarcation).

  I don't see how this can matter.

  TSO only ever does anything if you accumulate more than one MSS
  worth of data.

 I stand corrected then.

 cheers,
 jamal

For most normal Internet TCP connections, you will see only 2 or 3 packets per 
TSO
because of ACK clocking. If you turn off delayed ACK on the receiver it
will be even less.

A current hot topic of research is reducing the number of ACK's to make TCP
work better over asymmetric links like 3G.

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] [02/10] pasemi_mac: Stop using the pci config space accessors for register read/writes

2007-08-23 Thread Stephen Rothwell

On Thu, 23 Aug 2007 13:13:10 -0500 Olof Johansson [EMAIL PROTECTED] wrote:

  out:
 - pci_dev_put(mac-iob_pdev);
 -out_put_dma_pdev:
 - pci_dev_put(mac-dma_pdev);
 -out_free_netdev:
 + if (mac-iob_pdev)
 + pci_dev_put(mac-iob_pdev);
 + if (mac-dma_pdev)
 + pci_dev_put(mac-dma_pdev);

It is not documented as such (as far as I can see), but pci_dev_put is
safe to call with NULL. And there are other places in the kernel that
explicitly use that fact.

-- 
Cheers,
Stephen Rothwell[EMAIL PROTECTED]
http://www.canb.auug.org.au/~sfr/


pgpSTX4qXTUGw.pgp
Description: PGP signature

Re: Problem with implementation of TCP_DEFER_ACCEPT?

2007-08-23 Thread John Heffner


TJ wrote:

client SYN  server LISTENING
client  SYN ACK server SYN_RECEIVED (time-out 3s)
 server: inet_rsk(req)-acked = 1

client ACK  server (discarded)

client  SYN ACK (DUP) server (time-out 6s)
client ACK (DUP)  server (discarded)

client  SYN ACK (DUP) server (time-out 12s)
client ACK (DUP)  server (discarded)

client  SYN ACK (DUP) server (time-out 24s)
client ACK (DUP)  server (discarded)

client  SYN ACK (DUP) server (time-out 48s)
client ACK (DUP)  server (discarded)

client  SYN ACK (DUP) server (time-out 96s)
client ACK (DUP)  server (discarded)

server: half-open socket closed.

With each client ACK being dropped by the kernel's TCP_DEFER_ACCEPT
mechanism eventually the handshake fails after the 'SYN ACK' retries and
time-outs expire.

There is a case for arguing the kernel should be operating in an
enhanced handshaking mode when TCP_DEFER_ACCEPT is enabled, not an
alternative mode, and therefore should accept *both* RFC 793 and
TCP_DEFER_ACCEPT. I've been unable to find a specification or RFC for
implementing TCP_DEFER_ACCEPT aka BSD's SO_ACCEPTFILTER to give me firm
guidance.

It seems incorrect to penalise a client that is trying to complete the
handshake according to the RFC 793 specification, especially as the
client has no way of knowing ahead of time whether or not the server is
operating deferred accept.


Interesting problem.  TCP_DEFER_ACCEPT does not conform to any standard 
I'm aware of.  (In fact, I'd say it's in violation of RFC 793.)  The 
implementation does exactly what it claims, though -- it allows a 
listener to be awakened only  when  data  arrives  on  the  socket.


I think a more useful spec might have been allows a listener to be 
awakened only when data arrives on the socket, unless the specified 
timeout has expired.  Once the timeout expires, it should process the 
embryonic connection as if TCP_DEFER_ACCEPT is not set.  Unfortunately, 
I don't think we can retroactively change this definition, as an 
application might depend on data being available and do a non-blocking 
read() after the accept(), expecting data to be there.  Is this worth 
trying to fix?


Also, a listen socket with a backlog and TCP_DEFER_ACCEPT will have reqs 
sit in the backlog for the full defer timeout, even if they've received 
data, which is not really the right thing to do.


I've attached a patch implementing this suggestion (compile tested only 
-- I think I got the logic right but it's late ;).  Kind of ugly, and 
uses up a bit in struct inet_request_sock.  Maybe can be done better...


  -John
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 62daf21..f9f64a5 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -72,7 +72,8 @@ struct inet_request_sock {
sack_ok: 1,
wscale_ok  : 1,
ecn_ok : 1,
-   acked  : 1;
+   acked  : 1,
+   deferred   : 1;
struct ip_options   *opt;
 };
 
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 185c7ec..cad2490 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -978,6 +978,7 @@ static inline void tcp_openreq_init(struct request_sock 
*req,
ireq-snd_wscale = rx_opt-snd_wscale;
ireq-wscale_ok = rx_opt-wscale_ok;
ireq-acked = 0;
+   ireq-deferred = 0;
ireq-ecn_ok = 0;
ireq-rmt_port = tcp_hdr(skb)-source;
 }
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index fbe7714..1207fb8 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -444,9 +444,6 @@ void inet_csk_reqsk_queue_prune(struct sock *parent,
}
}
 
-   if (queue-rskq_defer_accept)
-   max_retries = queue-rskq_defer_accept;
-
budget = 2 * (lopt-nr_table_entries / (timeout / interval));
i = lopt-clock_hand;
 
@@ -455,7 +452,9 @@ void inet_csk_reqsk_queue_prune(struct sock *parent,
while ((req = *reqp) != NULL) {
if (time_after_eq(now, req-expires)) {
if ((req-retrans  thresh ||
-(inet_rsk(req)-acked  req-retrans  
max_retries))
+(inet_rsk(req)-acked  req-retrans  
max_retries) ||
+(inet_rsk(req)-deferred  req-retrans 
+ queue-rskq_defer_accept + max_retries))
 !req-rsk_ops-rtx_syn_ack(parent, req, 
NULL)) {
unsigned long timeo;
 
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index a12b08f..c4867f3 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -637,8 +637,10 @@ struct sock *tcp_check_req(struct sock *sk,struct sk_buff 
*skb,
 
/*

Re: 2.6.22.5 forcedeth timeout hang

2007-08-23 Thread Willy Tarreau

On Thu, Aug 23, 2007 at 06:48:23PM -0500, Mr. Berkley Shands wrote:
 100% reproducible hang on xmit timeout.
 Just do a make -j4 modules on an nfs mounted kernel source.

Most likely you also had the problem with 2.6.22.2 (maybe you have not
tested this one, though). There were bug fixes for forcedeth introduced
in this version, one of them being buggy. The patch below fixes it. Can
you please give it a try ? If it does not fix the problem, please try
2.6.22.1 which does not include those changes. I'm interested because
I have those changes pending for 2.6.20.17 too.

diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index 10f4e3b..1938d6d 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -552,7 +552,7 @@ union ring_type {
 #define PHY_OUI_MARVELL0x5043
 #define PHY_OUI_CICADA 0x03f1
 #define PHY_OUI_VITESSE0x01c1
-#define PHY_OUI_REALTEK0x01c1
+#define PHY_OUI_REALTEK0x0732
 #define PHYID1_OUI_MASK0x03ff
 #define PHYID1_OUI_SHFT6
 #define PHYID2_OUI_MASK0xfc00

Thanks,
Willy

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH (take 2)] request_irq fix DEBUG_SHIRQ handling Re: 2.6.23-rc2-mm1: rtl8139 inconsistent lock state

2007-08-23 Thread Jarek Poplawski

On Thu, Aug 23, 2007 at 10:44:30AM +0200, Jarek Poplawski wrote:
 Andrew Morton pointed out that my changelog was unusable. Sorry!
 Here is a second try with the changelog and kernel version changed.
...
 (take 2)
 
 Subject: request_irq() - fix DEBUG_SHIRQ handling
...
 Signed-off-by: Jarek Poplawski [EMAIL PROTECTED]
 
 ---
 
 diff -Nurp 2.6.23-rc3-git6-/kernel/irq/manage.c 
 2.6.23-rc3-git6/kernel/irq/manage.c
 --- 2.6.23-rc3-git6-/kernel/irq/manage.c  2007-08-23 10:11:35.0 
 +0200
 +++ 2.6.23-rc3-git6/kernel/irq/manage.c   2007-08-23 10:16:29.0 
 +0200

So, this time I f-ed the diff part: it's not exactly against 2.6.23-rc-git6.
But, it's Andrew to blame: he should've known that some old  slow chips
can't do science and poetry at the same time. Sorry (for him)!

Anyway, beside an offset, should be OK...

Regards,
Jarek P.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

69 matches

Mail list logo