Re: Please pull 'upstream-jgarzik' branch of wireless-2.6

2007-08-24 Thread Jeff Garzik

John W. Linville wrote:

A few items intended for 2.6.24.

Individual patches here:


http://www.kernel.org/pub/linux/kernel/people/linville/wireless-2.6/upstream-jgarzik/

Thanks!

John

---

The following changes since commit 39d3520c92cf7a28c07229ca00cc35a1e8026c77:
  Linus Torvalds (1):
Linux 2.6.23-rc3

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git 
upstream-jgarzik

Daniel Drake (2):
  zd1211rw: Add ID for Sitecom WL-162
  zd1211rw: Add ID for ZyXEL M-202 XtremeMIMO

Mariusz Kozlowski (1):
  drivers/net/wireless/wl3501_cs.c: remove redundant memset

Ulrich Kunitz (2):
  zd1211rw: removed noisy debug messages
  zd1211rw: consistent handling of ZD1211 specific rates

 drivers/net/wireless/wl3501_cs.c |1 -
 drivers/net/wireless/zd1211rw/zd_chip.c  |   69 +++---
 drivers/net/wireless/zd1211rw/zd_ieee80211.h |   43 +++
 drivers/net/wireless/zd1211rw/zd_mac.c   |   99 +++---
 drivers/net/wireless/zd1211rw/zd_mac.h   |   65 +++--
 drivers/net/wireless/zd1211rw/zd_usb.c   |2 +
 6 files changed, 128 insertions(+), 151 deletions(-)


pulled


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] [IPV6] IPSEC: Omit redirect for tunnelled packet.

2007-08-24 Thread David Miller
From: Masahide NAKAMURA <[EMAIL PROTECTED]>
Date: Fri, 24 Aug 2007 19:08:55 +0900

> IPv6 IPsec tunnel gateway incorrectly sends redirect to
> router or sender when network device the IPsec tunnelled packet
> is arrived is the same as the one the decapsulated packet
> is sent.
> 
> With this patch, it omits to send the redirect when the forwarding
> skbuff carries secpath, since such skbuff should be assumed as
> a decapsulated packet from IPsec tunnel by own.
> 
> It may be a rare case for an IPsec security gateway, however
> it is not rare when the gateway is MIPv6 Home Agent since
> the another tunnel end-point is Mobile Node and it changes
> the attached network.
> 
> Signed-off-by: Masahide NAKAMURA <[EMAIL PROTECTED]>

Patch applied, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] [IPV4] IPSEC: Omit redirect for tunnelled packet.

2007-08-24 Thread David Miller
From: Masahide NAKAMURA <[EMAIL PROTECTED]>
Date: Fri, 24 Aug 2007 19:09:01 +0900

> IPv4 IPsec tunnel gateway incorrectly sends redirect to
> sender if it is onlink host when network device the IPsec tunnelled
> packet is arrived is the same as the one the decapsulated packet
> is sent.
> 
> With this patch, it omits to send the redirect when the forwarding
> skbuff carries secpath, since such skbuff should be assumed as
> a decapsulated packet from IPsec tunnel by own.
> 
> Request for comments:
> Alternatively we'd have another way to change net/ipv4/route.c
> (__mkroute_input) to use RTCF_DOREDIRECT flag unless skbuff
> has no secpath. It is better than this patch at performance
> point of view because IPv4 redirect judgement is done at
> routing slow-path. However, it should be taken care of resource
> changes between SAD(XFRM states) and routing table. In other words,
> When IPv4 SAD is changed does the related routing entry go to its
> slow-path? If not, it is reasonable to apply this patch.
> 
> Signed-off-by: Masahide NAKAMURA <[EMAIL PROTECTED]>

Also applied, thank you!
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.23 RESEND] cxgb3 - Fix dev->priv usage

2007-08-24 Thread Jeff Garzik

Jeff Garzik wrote:

Divy Le Ray wrote:

From: Divy Le Ray <[EMAIL PROTECTED]>

cxgb3 used netdev_priv() and dev->priv for different purposes.
In 2.6.23, netdev_priv() == dev->priv, cxgb3 needs a fix.
This patch is a partial backport of Dave Miller's changes in the 
net-2.6.24 git branch.

Without this fix, cxgb3 crashes on 2.6.23.

Signed-off-by: Divy Le Ray <[EMAIL PROTECTED]>
---

 drivers/net/cxgb3/adapter.h   |   10 +++
 drivers/net/cxgb3/cxgb3_main.c|  126 
+

 drivers/net/cxgb3/cxgb3_offload.c |6 +-
 drivers/net/cxgb3/sge.c   |   23 ---
 drivers/net/cxgb3/t3cdev.h|3 -
 5 files changed, 100 insertions(+), 68 deletions(-)



applied


I take that back.  Rejected -- it breaks infiniband build.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [IPV6] XFRM: Fix connected socket to use transformation.

2007-08-24 Thread David Miller
From: Masahide NAKAMURA <[EMAIL PROTECTED]>
Date: Fri, 24 Aug 2007 19:08:38 +0900

> When XFRM policy and state are ready after TCP connection is started,
> the traffic should be transformed immediately, however it does not
> on IPv6 TCP.
> 
> It depends on a dst cache replacement policy with connected socket.
> It seems that the replacement is always done for IPv4, however, on
> IPv6 case it is done only when routing cookie is changed.
> 
> This patch fix that non-transformation dst can be changed to
> transformation one.
> This behavior is required by MIPv6 and improves IPv6 IPsec.
> 
> Signed-off-by: Noriaki TAKAMIYA <[EMAIL PROTECTED]>
> Signed-off-by: Masahide NAKAMURA <[EMAIL PROTECTED]>

Applied to net-2.6.24, thank you!
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [XFRM] : Fix pointer copy size for encap_tmpl and coaddr.

2007-08-24 Thread David Miller
From: Masahide NAKAMURA <[EMAIL PROTECTED]>
Date: Fri, 24 Aug 2007 19:05:20 +0900

> This is minor fix about sizeof argument using with kmemdup().
> 
> Signed-off-by: Masahide NAKAMURA <[EMAIL PROTECTED]>

Patch applied, thank you!
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 16/30] net: Avoid pointless allocation casts in BSD compression module

2007-08-24 Thread David Miller
From: Jesper Juhl <[EMAIL PROTECTED]>
Date: Fri, 24 Aug 2007 02:06:58 +0200

> The general kernel memory allocation functions return void pointers
> and there is no need to cast their return values.
> 
> Signed-off-by: Jesper Juhl <[EMAIL PROTECTED]>

Applied.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [IPv6] Add v4mapped address inline

2007-08-24 Thread David Miller
From: Brian Haley <[EMAIL PROTECTED]>
Date: Thu, 23 Aug 2007 14:14:35 -0400

> YOSHIFUJI Hideaki /  wrote:
> > Please put this just after ipv6_addr_any(), not after
> > ipv6_addr_diff().
> 
> Ok, updated patch attached.
> 
> Add v4mapped address inline to avoid calls to ipv6_addr_type().
> 
> Signed-off-by: Brian Haley <[EMAIL PROTECTED]>

Applied, thanks Brian.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-2.6.24] introduce MAC_FMT/MAC_ARG

2007-08-24 Thread David Miller
From: "John W. Linville" <[EMAIL PROTECTED]>
Date: Thu, 23 Aug 2007 13:08:30 -0400

> On Thu, Aug 23, 2007 at 06:12:00PM +0200, Johannes Berg wrote:
> > On Thu, 2007-08-23 at 09:01 -0700, Joe Perches wrote:
> > > There are also several different uses of the equivalent of
> > > 
> > >   printk("%02x",addr[0])
> > >   for (i=1; i<6; i++)
> > >   printk(":%02x",addr[i]);
> > > 
> > > to print an ethernet MAC address.
> > 
> > Hm. I didn't know that, I can go through in a later patch if desired.
> > 
> > > http://www.uwsg.iu.edu/hypermail/linux/net/0602.1/0002.html
> > > 
> > > As not all device MAC addresses are 6 bytes, colon separated,
> > > perhaps an appropriate ethernet/tr MAC designation is EUI48.
> > > 
> > > http://standards.ieee.org/regauth/oui/tutorials/EUI48.html
> > 
> > Practically, however, nobody is going to even find macros named
> > EUI48_FMT/EUI48_ARG, would they? I don't much care, but I find it rather
> > unsatisfying that both wireless code bases define these macros.
> 
> Yeah, accomodating non-48-bit MAC addresses is a bit pedantic.
> 
> I ACK the original patch, FWIW.

I like the patch too, applied to net-2.6.24, thanks everyone.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] net/core: Fix crash in dev_mc_sync()/dev_mc_unsync()

2007-08-24 Thread David Miller
From: Patrick McHardy <[EMAIL PROTECTED]>
Date: Thu, 23 Aug 2007 16:48:28 +0200

> Benjamin Thery wrote:
> > From: [EMAIL PROTECTED]
> > Subject: net/core: Fix crash in dev_mc_sync()/dev_mc_unsync()
> > 
> > This patch fixes a crash that may occur when the routine dev_mc_sync()
> > deletes an address from the list it is currently going through. It 
> > saves the pointer to the next element before deleting the current one.
> > The problem may also exist in dev_mc_unsync().
> > 
> > Signed-off-by: Benjamin Thery <[EMAIL PROTECTED]>
> 
> Looks good, thanks Benjamin.
> 
> Acked-by: Patrick McHardy <[EMAIL PROTECTED]>

Applied, thanks everyone.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] shaper: mark for removal

2007-08-24 Thread David Miller
From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Thu, 23 Aug 2007 09:44:30 -0700

> Subject: shaper: mark for removal
> 
> This driver has been marked obsolete for a long time and
> is superseded by traffic schedulers.
> 
> Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

Applied, thanks Stephen.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] udp: randomize port selection

2007-08-24 Thread David Miller
From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Thu, 23 Aug 2007 11:32:26 -0700

> This patch causes UDP port allocation to be randomized like TCP.
> The earlier code would always choose same port (ie first empty list).
> 
> Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

Applied to net-2.6.24, thanks Stephen.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] [RFC] E1000: Fix hang in netdev_wait_allrefs()

2007-08-24 Thread David Miller
From: Krishna Kumar <[EMAIL PROTECTED]>
Date: Thu, 23 Aug 2007 14:34:31 +0530

> After applying patch1, I started getting "waiting for count" messages when
> doing ifdown. Not sure if this is the right fix since the count was already
> showing as -1 in that message, but this patch fixes the problem.
> 
> Signed-off-by: Krishna Kumar <[EMAIL PROTECTED]>

I've applied this because is fixes the problem and there have
be no objections coming along with better fixes :)

If this is bogus we can rever it and put in a more proper fix.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] E1000: Fix ifdown hang in git-2.6.24

2007-08-24 Thread David Miller
From: Krishna Kumar <[EMAIL PROTECTED]>
Date: Thu, 23 Aug 2007 14:34:18 +0530

> Doing napi_disable twice hangs "ifdown" of the device. e1000_down is the
> common place to call napi_disable.
> 
> Signed-off-by: Krishna Kumar <[EMAIL PROTECTED]>

Applied, thanks a lot.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IOAT: ioatdma needs to to play nice in a multi-dma-client world

2007-08-24 Thread David Miller
From: Shannon Nelson <[EMAIL PROTECTED]>
Date: Wed, 22 Aug 2007 17:12:18 -0700

> Now that the DMA engine has a multi-client interface, fix the ioatdma
> driver to play along.  At the same time, remove a couple of unnecessary
> reads and writes.
> 
> Signed-off-by: Shannon Nelson <[EMAIL PROTECTED]>

Applied, thanks Shannon.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-2.6.24] [NET] Cleanup: DIV_ROUND_UP

2007-08-24 Thread David Miller
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
Date: Wed, 22 Aug 2007 13:48:14 +0300 (EEST)

> Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>

Applied, thanks Ilpo.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ppp dependency on slhc

2007-08-24 Thread David Miller
From: Andrew Morton <[EMAIL PROTECTED]>
Date: Tue, 21 Aug 2007 01:30:57 -0700

> ERROR: "slhc_init" [drivers/net/ppp_generic.ko] undefined!
> ERROR: "slhc_free" [drivers/net/ppp_generic.ko] undefined!
> ERROR: "slhc_uncompress" [drivers/net/ppp_generic.ko] undefined!
> ERROR: "slhc_compress" [drivers/net/ppp_generic.ko] undefined!
> ERROR: "slhc_toss" [drivers/net/ppp_generic.ko] undefined!
> ERROR: "slhc_remember" [drivers/net/ppp_generic.ko] undefined!
> 
> yet another reminder that select doesn't work ;)

Indeed :-)

However it is a good example of the kind of cases select was
made for, nobody should have to know about SLHC in order to
get PPP offered in the config.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] [TCP] MIB: Add counters for discarded SACK blocks

2007-08-24 Thread David Miller
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
Date: Mon, 20 Aug 2007 16:16:33 +0300

> In DSACK case, some events are not extraordinary, such as packet
> duplication generated DSACK. They can arrive easily below
> snd_una when undo_marker is not set (TCP being in CA_Open),
> counting such DSACKs amoung SACK discards will likely just
> mislead if they occur in some scenario when there are other
> problems as well. Similarly, excessively delayed packets could
> cause "normal" DSACKs. Therefore, separate counters are
> allocated for DSACK events.
> 
> Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>

Also applied, thanks a lot!
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] [TCP]: Discard fuzzy SACK blocks

2007-08-24 Thread David Miller
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
Date: Mon, 20 Aug 2007 16:16:32 +0300

> SACK processing code has been a sort of russian roulette as no
> validation of SACK blocks is previously attempted. Besides, it
> is not very clear what all kinds of broken SACK blocks really
> mean (e.g., one that has start and end sequence numbers
> reversed). So now close the roulette once and for all.
> 
> Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>

Thanks a lot for coding this up, I like it a lot, applied.

I have some minor worries about the D-SACK lower bound, but
it's probably OK and I'm just being paranoid :-)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/5] [TCP]: Rename tcp_ack_packets_out -> tcp_rearm_rto

2007-08-24 Thread David Miller
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
Date: Mon, 20 Aug 2007 16:16:31 +0300

> Only thing that tiny function does is rearming the RTO (if
> necessary), name it accordingly.
> 
> Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>

Applied, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] [TCP]: tcp_packets_out_inc to tcp_output.c (no callers elsewhere)

2007-08-24 Thread David Miller
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
Date: Mon, 20 Aug 2007 16:16:30 +0300

> Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>

Applied.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] [TCP]: Remove unnecessary wrapper tcp_packets_out_dec

2007-08-24 Thread David Miller
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
Date: Mon, 20 Aug 2007 16:16:29 +0300

> Makes caller side more obvious, there's no need to have
> a wrapper for this oneliner!
> 
> Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>

Applied, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net/802: indentation cleanup

2007-08-24 Thread David Miller
From: David Miller <[EMAIL PROTECTED]>
Date: Fri, 24 Aug 2007 22:39:40 -0700 (PDT)

> From: Stephen Hemminger <[EMAIL PROTECTED]>
> Date: Fri, 17 Aug 2007 18:53:11 -0700
> 
> > Run the 802 related protocols through Lindent (and hand cleanup)
> > to fix indentation and whitespace style issues.
> 
> Applied to net-2.6.24, thanks.

Actually reverted.

Nothing in the world makes me more furious than a "coding
style" change that wasn't even compile tested.

net/802/tr.c: In function $,1rx(Btr_add_rif_info$,1ry(B:
net/802/tr.c:400: error: expected identifier before $,1rx(B!$,1ry(B token

Stephen I see you do things like this, forget sign offs,
and many other things that all say in big huge letters
"sloppy".

Please shape up and test your changes no matter how trivial.

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net/802: indentation cleanup

2007-08-24 Thread David Miller
From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Fri, 17 Aug 2007 18:53:11 -0700

> Run the 802 related protocols through Lindent (and hand cleanup)
> to fix indentation and whitespace style issues.

Applied to net-2.6.24, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] atm: replace DPRINTK() with pr_debug

2007-08-24 Thread David Miller
From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Fri, 17 Aug 2007 18:31:31 -0700

> Get rid of using DPRINTK macro in ATM and use pr_debug (in kernel.h).
> Using the standard macro is cleaner and forces code to check for bad arguments
> and formatting.
> 
> Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

Applied to net-2.6.24, thanks Stephen.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ethernet: optimize memcpy and memset

2007-08-24 Thread David Miller
From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Fri, 17 Aug 2007 16:29:50 -0700

> The ethernet header management only needs to handle a fixed
> size address (6 bytes). If the memcpy/memset are changed to
> be passed a constant length, then compiler can optimize for
> this case (and if it is smart eliminate string instructions).
> 
> Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

Applied.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kj] is_power_of_2 in net/core/neighbour.c

2007-08-24 Thread David Miller
From: vignesh babu <[EMAIL PROTECTED]>
Date: Mon, 13 Aug 2007 18:33:47 +0530

> Replacing n & (n - 1) for power of 2 check by is_power_of_2(n)
> 
> Signed-off-by: vignesh babu <[EMAIL PROTECTED]>

Patch applied, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [NET]: fix multicast list when cloning sockets

2007-08-24 Thread David Miller
From: Flavio Leitner <[EMAIL PROTECTED]>
Date: Tue, 31 Jul 2007 15:29:40 -0300

> On Tue, Jul 31, 2007 at 12:00:41AM -0300, Arnaldo Carvalho de Melo wrote:
> > On 7/30/07, David Miller <[EMAIL PROTECTED]> wrote:
> > > Allowing non-datagram sockets to end up with a non-NULL inet->mc_list
> > > in the first place is a bug.
> > >
> > > Multicast subscriptions cannot even be used with TCP and DCCP, which
> > > are the only two users of these connection oriented socket functions.
> > >
> > > The first thing that TCP and DCCP do, in fact, for input packet
> > > processing is drop the packet if it is not unicast.
> > >
> > > Therefore the fix really is for the inet layer to reject multicast
> > > subscription requests on sockets for which that absolutely does not
> > > make sense.  There is no reason these functions in
> > > inet_connection_sock.c should need to be mindful of multicast
> > > state. :-)
> > 
> > Well, we can add a BUG_ON there then 8)
> > 
> > Flavio, take a look at  do_ip_setsockopt in net/ipv4/ip_sockglue.c, in
> > the IP_{ADD,DROP}_MEMBERSHIP labels.
> > 
> > Don't forget IPV6 (net/ipv6/ipv6_sockglue.c)
> 
> yes, right. What about the one below?
> 
> [NET]: Fix IP_ADD/DROP_MEMBERSHIP to handle only connectionless
> 
> Fix IP[V6]_ADD_MEMBERSHIP and IP[V6]_DROP_MEMBERSHIP to
> return -EPROTO for connection oriented sockets.
> 
> Signed-off-by: Flavio Leitner <[EMAIL PROTECTED]>

This looks great, patch applied.

Thanks!
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] myri10ge: use pcie_get/set_readrq

2007-08-24 Thread Jeff Garzik

Brice Goglin wrote:

Based on a patch from Peter Oruba, convert myri10ge to use pcie_get_readrq()
and pcie_set_readrq() instead of our own PCI calls and arithmetics.

These driver changes incorporate the proposed PCI-X / PCI-Express read byte
count interface.  Reading and setting those values doesn't take place
"manually", instead wrapping functions are called to allow quirks for some
PCI bridges.

Signed-off-by: Brice Goglin <[EMAIL PROTECTED]>
Signed-off by: Peter Oruba <[EMAIL PROTECTED]>
Based on work by Stephen Hemminger <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---
 drivers/net/myri10ge/myri10ge.c |   32 ++--
 1 file changed, 6 insertions(+), 26 deletions(-)


applied 1-2


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] ehea: fix interface to DLPAR tools

2007-08-24 Thread Jeff Garzik

Jan-Bernd Themann wrote:

Userspace DLPAR tool expects decimal numbers to be written to
and read from sysfs entries.

Signed-off-by: Jan-Bernd Themann <[EMAIL PROTECTED]>


applied 1-3


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET] sgiseeq: Fix return type of sgiseeq_remove

2007-08-24 Thread Jeff Garzik

Ralf Baechle wrote:

The driver remove method needs to return an int not void.  This was just
never noticed because usually this driver is not being built as a module.

Signed-off-by: Ralf Baechle <[EMAIL PROTECTED]>


applied


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] phy layer: fix genphy_setup_forced (don't reset)

2007-08-24 Thread Jeff Garzik

Domen Puncer wrote:

Writing BMCR_RESET bit will reset MII_BMCR to default values. This is
clearly not what we want.


Signed-off-by: Domen Puncer <[EMAIL PROTECTED]>

---
 drivers/net/phy/phy_device.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: work-powerpc.git/drivers/net/phy/phy_device.c
===
--- work-powerpc.git.orig/drivers/net/phy/phy_device.c
+++ work-powerpc.git/drivers/net/phy/phy_device.c
@@ -364,7 +364,7 @@ EXPORT_SYMBOL(genphy_config_advert);
  */
 int genphy_setup_forced(struct phy_device *phydev)
 {
-   int ctl = BMCR_RESET;
+   int ctl = 0;
 
 	phydev->pause = phydev->asym_pause = 0;


applied


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] DM9000: fix interface hang under load

2007-08-24 Thread Jeff Garzik

Florian Westphal wrote:

When transferring data at full speed, the DM9000 network interface
sometimes stops sending/receiving data. Worse, ksoftirqd consumes
100% cpu and the net tx watchdog never triggers.
Fix by spin_lock_irqsave() in dm9000_start_xmit() to prevent the
interrupt handler from interfering.

Signed-off-by: Florian Westphal <[EMAIL PROTECTED]>
---
 Actually the comments ('Disable all interrupts, iow(db, DM9000_IMR, IMR_PAR) 
etc)
 give the impression that the interrupt handler cannot run during 
dm9000_start_xmit(),
 however this isn't correct (perhaps the chipset has some weird timing issues?).
 The interface lockup usually occurs between 30 and 360 seconds after starting 
transmitting
 data (netcat /dev/zero) at full speed; with this patch applied I haven't been 
able
 to reproduce hangs yet (ran for > 2h).
 FTR: This is a dm9000 on XScale-PXA255 rev 6 (ARMv5TE)/Compulab CM-x255, i.e.
 a module not supported by the vanilla kernel. Tested on (patched) 2.6.18.

 dm9000.c |   25 +++--
 1 file changed, 7 insertions(+), 18 deletions(-)


applied


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [METH] Don't use GFP_DMA for zone allocation.

2007-08-24 Thread Jeff Garzik

Ralf Baechle wrote:

IP32 doesn't even have a ZONE_DMA so no point in using GFP_DMA in any
IP32-specific device driver.

Signed-off-by: Ralf Baechle <[EMAIL PROTECTED]>


applied


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ucc_geth: kill unused include

2007-08-24 Thread Jeff Garzik

Kumar Gala wrote:

The ucc_geth_mii code is based on the gianfar_mii code that use to include
ocp.h.  ucc never need this and it causes issues when we want to kill
arch/ppc includes from arch/powerpc.

Signed-off-by: Kumar Gala <[EMAIL PROTECTED]>
---

Jeff, if you issue with this for 2.6.23, I'd prefer to push this via
the powerpc.git trees in 2.6.24 as part of a larger cleanup.  Let me know
one way or the other.

- k

 drivers/net/ucc_geth_mii.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ucc_geth_mii.c b/drivers/net/ucc_geth_mii.c
index 6c257b8..df884f0 100644
--- a/drivers/net/ucc_geth_mii.c
+++ b/drivers/net/ucc_geth_mii.c
@@ -32,7 +32,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 


Feel free to push via PPC git


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] ehea: show physical port state

2007-08-24 Thread Jeff Garzik

Jan-Bernd Themann wrote:

Introduces a module parameter to decide whether the physical
port link state is propagated to the network stack or not.
It makes sense not to take the physical port state into account
on machines with more logical partitions that communicate
with each other. This is always possible no matter what the physical
port state is. Thus eHEA can be considered as a switch there.

Signed-off-by: Jan-Bernd Themann <[EMAIL PROTECTED]>

---
 drivers/net/ehea/ehea.h  |5 -
 drivers/net/ehea/ehea_main.c |   14 +-
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ehea/ehea.h b/drivers/net/ehea/ehea.h
index d67f97b..8d58be5 100644
--- a/drivers/net/ehea/ehea.h
+++ b/drivers/net/ehea/ehea.h
@@ -39,7 +39,7 @@
 #include 
 
 #define DRV_NAME	"ehea"

-#define DRV_VERSION"EHEA_0073"
+#define DRV_VERSION"EHEA_0074"
 
 /* eHEA capability flags */

 #define DLPAR_PORT_ADD_REM 1
@@ -402,6 +402,8 @@ struct ehea_mc_list {
 
 #define EHEA_PORT_UP 1

 #define EHEA_PORT_DOWN 0
+#define EHEA_PHY_LINK_UP 1
+#define EHEA_PHY_LINK_DOWN 0
 #define EHEA_MAX_PORT_RES 16
 struct ehea_port {
struct ehea_adapter *adapter;/* adapter that owns this port */
@@ -427,6 +429,7 @@ struct ehea_port {
u32 msg_enable;
u32 sig_comp_iv;
u32 state;
+   u8 phy_link;
u8 full_duplex;
u8 autoneg;
u8 num_def_qps;
diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index db57474..1804c99 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -53,17 +53,21 @@ static int rq3_entries = EHEA_DEF_ENTRIES_RQ3;
 static int sq_entries = EHEA_DEF_ENTRIES_SQ;
 static int use_mcs = 0;
 static int num_tx_qps = EHEA_NUM_TX_QP;
+static int show_phys_link = 0;
 
 module_param(msg_level, int, 0);

 module_param(rq1_entries, int, 0);
 module_param(rq2_entries, int, 0);
 module_param(rq3_entries, int, 0);
 module_param(sq_entries, int, 0);
+module_param(show_phys_link, int, 0);
 module_param(use_mcs, int, 0);
 module_param(num_tx_qps, int, 0);
 
 MODULE_PARM_DESC(num_tx_qps, "Number of TX-QPS");

 MODULE_PARM_DESC(msg_level, "msg_level");
+MODULE_PARM_DESC(show_phys_link, "Show link state of external port"
+"1:yes, 0: no.  Default = 0 ");
 MODULE_PARM_DESC(rq3_entries, "Number of entries for Receive Queue 3 "
 "[2^x - 1], x = [6..14]. Default = "
 __MODULE_STRING(EHEA_DEF_ENTRIES_RQ3) ")");
@@ -814,7 +818,9 @@ int ehea_set_portspeed(struct ehea_port *port, u32 
port_speed)
ehea_error("Failed setting port speed");
}
}
-   netif_carrier_on(port->netdev);
+   if (!show_phys_link || (port->phy_link == EHEA_PHY_LINK_UP))
+   netif_carrier_on(port->netdev);
+
kfree(cb4);
 out:
return ret;
@@ -869,13 +875,19 @@ static void ehea_parse_eqe(struct ehea_adapter *adapter, 
u64 eqe)
}
 
 		if (EHEA_BMASK_GET(NEQE_EXTSWITCH_PORT_UP, eqe)) {

+   port->phy_link = EHEA_PHY_LINK_UP;
if (netif_msg_link(port))
ehea_info("%s: Physical port up",
  port->netdev->name);
+   if (show_phys_link)
+   netif_carrier_on(port->netdev);
} else {
+   port->phy_link = EHEA_PHY_LINK_DOWN;
if (netif_msg_link(port))
ehea_info("%s: Physical port down",
  port->netdev->name);
+   if (show_phys_link)
+   netif_carrier_off(port->netdev);


I think it's misnamed, calling it "show_xxx", because this (as the 
change description notes) controls propagation of carrier to the network 
stack.


Jeff



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] [DM9000] Added support for big-endian hosts

2007-08-24 Thread Jeff Garzik

Laurent Pinchart wrote:

This patch splits the receive status in 8bit wide fields and convert the
packet length from little endian to CPU byte order.

Signed-off-by: Laurent Pinchart <[EMAIL PROTECTED]>
---
 drivers/net/dm9000.c |   13 +++--
 1 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/dm9000.c b/drivers/net/dm9000.c
index c3de81b..a424810 100644
--- a/drivers/net/dm9000.c
+++ b/drivers/net/dm9000.c
@@ -894,7 +894,8 @@ dm9000_timer(unsigned long data)
 }
 
 struct dm9000_rxhdr {

-   u16 RxStatus;
+   u8  RxPktReady;
+   u8  RxStatus;
u16 RxLen;
 } __attribute__((__packed__));


why does this not need endian conversions as well?

Jeff



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix realtek phy id in forcedeth

2007-08-24 Thread Jeff Garzik

Willy Tarreau wrote:

Hi Greg,

On Thu, Aug 23, 2007 at 09:55:13AM -0700, Greg KH wrote:

It might help if someone sends a real patch that can be applied :)


This is getting really silly now :-) We're all wasting more time
wondering who will send the patch than posting it. I've lost, I got
fed up first, so here it is. Please apply to mainline then stable.

Thanks,
Willy

--


From a0e2922b99eedd9863232368ea2afe072c52783e Mon Sep 17 00:00:00 2001

From: Willy Tarreau <[EMAIL PROTECTED]>
Date: Thu, 23 Aug 2007 21:35:41 +0200
Subject: [PATCH] fix realtek phy id in forcedeth

As noticed by Chuck Ebbert, commit c5e3ae8823693b260ce1f217adca8add1bc0b3de
introduced a copy-paste typo, as realtek phy is 0x732 and not 0x1c1. Obvious
fix below suggested by Ayaz Abdulla.

Signed-off-by: Willy Tarreau <[EMAIL PROTECTED]>
Cc: Ayaz Abdulla <[EMAIL PROTECTED]>
Cc: Chuck Ebbert <[EMAIL PROTECTED]>
---
 drivers/net/forcedeth.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)


applied


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.23 RESEND] cxgb3 - Fix dev->priv usage

2007-08-24 Thread Jeff Garzik

Divy Le Ray wrote:

From: Divy Le Ray <[EMAIL PROTECTED]>

cxgb3 used netdev_priv() and dev->priv for different purposes.
In 2.6.23, netdev_priv() == dev->priv, cxgb3 needs a fix.
This patch is a partial backport of Dave Miller's changes in the 
net-2.6.24 git branch. 


Without this fix, cxgb3 crashes on 2.6.23.

Signed-off-by: Divy Le Ray <[EMAIL PROTECTED]>
---

 drivers/net/cxgb3/adapter.h   |   10 +++
 drivers/net/cxgb3/cxgb3_main.c|  126 +
 drivers/net/cxgb3/cxgb3_offload.c |6 +-
 drivers/net/cxgb3/sge.c   |   23 ---
 drivers/net/cxgb3/t3cdev.h|3 -
 5 files changed, 100 insertions(+), 68 deletions(-)



applied


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] [02/10] pasemi_mac: Stop using the pci config space accessors for register read/writes

2007-08-24 Thread Stephen Rothwell
On Fri, 24 Aug 2007 13:11:04 -0500 Olof Johansson <[EMAIL PROTECTED]> wrote:
>
> On Fri, Aug 24, 2007 at 02:05:31PM +1000, Stephen Rothwell wrote:
> > 
> > It is not documented as such (as far as I can see), but pci_dev_put is
> > safe to call with NULL. And there are other places in the kernel that
> > explicitly use that fact.
> 
> Some places check, others do not. I'll leave it be for now but might take
> care of it during some future cleanup. Thanks for point it out though.

No worries.

-- 
Cheers,
Stephen Rothwell[EMAIL PROTECTED]
http://www.canb.auug.org.au/~sfr/


pgpIT5kuiWewe.pgp
Description: PGP signature


Re: [PATCH] via-velocity: more cleanup

2007-08-24 Thread Al Viro
On Fri, Aug 24, 2007 at 02:40:45PM -0700, Stephen Hemminger wrote:
> +static void mac_set_vlan_cam(struct mac_regs __iomem * regs, int idx,
> +  const u8 *addr)

ITYM const u16 *, if not an outright u16.  These casts (one below and
ones in callers) really should die.

> + writew(*((u16 *) addr), ®s->MARCAM[0]);
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB

2007-08-24 Thread John Heffner

Bill Fink wrote:

Here you can see there is a major difference in the TX CPU utilization
(99 % with TSO disabled versus only 39 % with TSO enabled), although
the TSO disabled case was able to squeeze out a little extra performance
from its extra CPU utilization.  Interestingly, with TSO enabled, the
receiver actually consumed more CPU than with TSO disabled, so I guess
the receiver CPU saturation in that case (99 %) was what restricted
its performance somewhat (this was consistent across a few test runs).



One possibility is that I think the receive-side processing tends to do 
better when receiving into an empty queue.  When the (non-TSO) sender is 
the flow's bottleneck, this is going to be the case.  But when you 
switch to TSO, the receiver becomes the bottleneck and you're always 
going to have to put the packets at the back of the receive queue.  This 
might help account for the reason why you have both lower throughput and 
higher CPU utilization -- there's a point of instability right where the 
receiver becomes the bottleneck and you end up pushing it over to the 
bad side. :)


Just a theory.  I'm honestly surprised this effect would be so 
significant.  What do the numbers from netstat -s look like in the two 
cases?


  -John
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Prefix each line of multiline printk(KERN_ "foo\nbar") with KERN_

2007-08-24 Thread Joe Perches
Corrected printk calls with multiple output lines which
did not correctly preface each line with KERN_

Fixed uses of some single lines with too many KERN_

Please pull from:
git://repo.or.cz/linux-2.6/trivial-mods.git pr_newlines

Signed-off-by: Joe Perches <[EMAIL PROTECTED]>

 arch/arm/kernel/ecard.c  |3 ++-
 arch/blackfin/kernel/dualcore_test.c |3 ++-
 arch/blackfin/kernel/traps.c |4 +++-
 arch/h8300/kernel/setup.c|4 +++-
 arch/i386/kernel/io_apic.c   |3 ++-
 arch/m68knommu/kernel/setup.c|4 +++-
 arch/m68knommu/kernel/traps.c|5 +++--
 arch/m68knommu/mm/init.c |9 ++---
 arch/m68knommu/platform/68328/config.c   |3 ++-
 arch/m68knommu/platform/68360/config.c   |3 ++-
 arch/m68knommu/platform/68EZ328/config.c |3 ++-
 arch/mips/vr41xx/common/pmu.c|9 ++---
 arch/parisc/kernel/traps.c   |3 ++-
 arch/parisc/math-emu/driver.c|5 +++--
 arch/v850/kernel/setup.c |6 --
 arch/x86_64/kernel/io_apic.c |3 ++-
 arch/x86_64/kernel/mpparse.c |3 ++-
 drivers/acpi/acpi_memhotplug.c   |3 ++-
 drivers/char/dtlk.c  |3 ++-
 drivers/char/tpm/tpm_bios.c  |2 +-
 drivers/ide/ide-cd.c |3 ++-
 drivers/input/serio/hil_mlc.c|2 +-
 drivers/message/fusion/mptlan.c  |3 ++-
 drivers/mtd/maps/cdb89712.c  |5 -
 drivers/net/cs89x0.c |2 +-
 drivers/net/dgrs.c   |3 ++-
 drivers/net/wireless/arlan-main.c|2 +-
 drivers/net/wireless/arlan-proc.c|   19 ++-
 drivers/parisc/led.c |3 ++-
 drivers/scsi/aha152x.c   |   16 +++-
 drivers/scsi/dpt_i2o.c   |3 ++-
 drivers/scsi/mac_scsi.c  |3 ++-
 drivers/scsi/megaraid.c  |3 ++-
 drivers/scsi/megaraid/megaraid_sas.c |   25 -
 drivers/scsi/osst.c  |3 ++-
 drivers/scsi/zalon.c |2 +-
 drivers/video/savage/savagefb_driver.c   |   21 -
 fs/dlm/dlm_internal.h|9 +
 fs/freevxfs/vxfs_bmap.c  |8 ++--
 fs/jffs2/wbuf.c  |3 ++-
 mm/slub.c|   18 --
 41 files changed, 152 insertions(+), 85 deletions(-)

diff --git a/arch/arm/kernel/ecard.c b/arch/arm/kernel/ecard.c
index f56d48c..6402ad2 100644
--- a/arch/arm/kernel/ecard.c
+++ b/arch/arm/kernel/ecard.c
@@ -547,7 +547,8 @@ static void ecard_check_lockup(struct irq_desc *desc)
if (last == jiffies) {
lockup += 1;
if (lockup > 100) {
-   printk(KERN_ERR "\nInterrupt lockup detected - "
+   printk(KERN_ERR "\n"
+  KERN_ERR "Interrupt lockup detected - "
   "disabling all expansion card interrupts\n");
 
desc->chip->mask(IRQ_EXPANSIONCARD);
diff --git a/arch/blackfin/kernel/dualcore_test.c 
b/arch/blackfin/kernel/dualcore_test.c
index 0fcba74..3c94199 100644
--- a/arch/blackfin/kernel/dualcore_test.c
+++ b/arch/blackfin/kernel/dualcore_test.c
@@ -35,7 +35,8 @@ static int *testarg = (int *)0xfeb0;
 static int test_init(void)
 {
*testarg = 1;
-   printk(KERN_INFO "Dual core test module inserted: set testarg = [%d]\n 
@ [%p]\n",
+   printk(KERN_INFO "Dual core test module inserted: set testarg = [%d]\n"
+  KERN_INFO "@ [%p]\n",
   *testarg, testarg);
return 0;
 }
diff --git a/arch/blackfin/kernel/traps.c b/arch/blackfin/kernel/traps.c
index 792a841..9255012 100644
--- a/arch/blackfin/kernel/traps.c
+++ b/arch/blackfin/kernel/traps.c
@@ -351,7 +351,9 @@ asmlinkage void trap_c(struct pt_regs *fp)
info.si_code = ILL_CPLB_MULHIT;
 #ifdef CONFIG_DEBUG_HUNT_FOR_ZERO
sig = SIGSEGV;
-   printk(KERN_EMERG "\n\nJump to address 0 - 0x0fff\n");
+   printk(KERN_EMERG "\n"
+  KERN_EMERG "\n"
+  KERN_EMERG "Jump to address 0 - 0x0fff\n");
 #else
sig = SIGILL;
printk(KERN_EMERG EXC_0x2D);
diff --git a/arch/h8300/kernel/setup.c b/arch/h8300/kernel/setup.c
index b2e86d0..cb45404 100644
--- a/arch/h8300/kernel/setup.c
+++ b/arch/h8300/kernel/setup.c
@@ -127,7 +127,9 @@ void __init setup_arch(char **cmdline_p)
register_console((struct console *)&gdb_console);
 #endif
 
-   printk(KERN_INFO "\r\n\nuClinux " CPU "\n");
+   printk(KERN_INFO "\r\n"
+  KERN_INFO "\n"
+  KERN_INFO "uClinux " CPU "\n");
printk(KERN_INFO "Target Hardwa

Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB

2007-08-24 Thread Herbert Xu
On Fri, Aug 24, 2007 at 02:25:03PM -0700, David Miller wrote:
>
> My hunch is that even if in the non-TSO case the TX packets were all
> back to back in the cards TX ring, TSO still spits them out faster on
> the wire.

If this is the case then we should see an improvement by
disabling TSO and enabling GSO.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread akepner
On Fri, Aug 24, 2007 at 02:47:11PM -0700, David Miller wrote:

> 
> Someone should reference that thread _now_ before this discussion goes
> too far and we repeat a lot of information ..

Here's part of the thread:
http://marc.info/?t=11159530601&r=1&w=2

Also, Jamal's paper may be of interest - Google for ""when napi comes 
to town".

-- 
Arthur

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Linas Vepstas
On Fri, Aug 24, 2007 at 02:44:36PM -0700, David Miller wrote:
> From: David Stevens <[EMAIL PROTECTED]>
> Date: Fri, 24 Aug 2007 09:50:58 -0700
> 
> > Problem is if it increases rapidly, you may drop packets
> > before you notice that the ring is full in the current estimated
> > interval.
> 
> This is one of many reasons why hardware interrupt mitigation
> is really needed for this.

When turning off interrupts, don't turn them *all* off.
Leave the queue-full interrupt always on.

--linas
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread David Miller
From: James Chapman <[EMAIL PROTECTED]>
Date: Fri, 24 Aug 2007 18:16:45 +0100

> Does hardware interrupt mitigation really interact well with NAPI?

It interacts quite excellently.

There was a long saga about this with tg3 and huge SGI numa
systems with large costs for interrupt processing, and the
fix was to do a minimal amount of interrupt mitigation and
this basically cleared up all the problems.

Someone should reference that thread _now_ before this discussion goes
too far and we repeat a lot of information and people like myself have
to stay up all night correcting the misinformation and
misunderstandings that are basically guarenteed for this topic :)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread David Miller
From: David Stevens <[EMAIL PROTECTED]>
Date: Fri, 24 Aug 2007 09:50:58 -0700

> Problem is if it increases rapidly, you may drop packets
> before you notice that the ring is full in the current estimated
> interval.

This is one of many reasons why hardware interrupt mitigation
is really needed for this.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread David Miller
From: [EMAIL PROTECTED] (Linas Vepstas)
Date: Fri, 24 Aug 2007 11:45:41 -0500

> In the end, I just let it be, and let the system work as a
> busy-beaver, with the high interrupt rate. Is this a wise thing to
> do?

The tradeoff is always going to be latency vs. throughput.

A sane default should defer enough to catch multiple packets coming in
at something close to line rate, but not so much that latency unduly
suffers.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] via-velocity: more cleanup

2007-08-24 Thread Stephen Hemminger
Per Al's suggestion, get rid of the stupid stuff:
Remove cam_type switch,
And deinline things that aren't important for speed.
And make big macro and inline.
And remove some dead/unused code.
And use const char * for chip name.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>


--- a/drivers/net/via-velocity.c2007-08-24 13:49:17.0 -0700
+++ b/drivers/net/via-velocity.c2007-08-24 14:39:14.0 -0700
@@ -85,6 +85,163 @@
 static int velocity_nics = 0;
 static int msglevel = MSG_LEVEL_INFO;
 
+/**
+ * mac_get_cam_mask-   Read a CAM mask
+ * @regs: register block for this velocity
+ * @mask: buffer to store mask
+ *
+ * Fetch the mask bits of the selected CAM and store them into the
+ * provided mask buffer.
+ */
+
+static void mac_get_cam_mask(struct mac_regs __iomem * regs, u8 * mask)
+{
+   int i;
+
+   /* Select CAM mask */
+   BYTE_REG_BITS_SET(CAMCR_PS_CAM_MASK, CAMCR_PS1 | CAMCR_PS0, 
®s->CAMCR);
+
+   writeb(0, ®s->CAMADDR);
+
+   /* read mask */
+   for (i = 0; i < 8; i++)
+   *mask++ = readb(&(regs->MARCAM[i]));
+
+   /* disable CAMEN */
+   writeb(0, ®s->CAMADDR);
+
+   /* Select mar */
+   BYTE_REG_BITS_SET(CAMCR_PS_MAR, CAMCR_PS1 | CAMCR_PS0, ®s->CAMCR);
+
+}
+
+
+/**
+ * mac_set_cam_mask-   Set a CAM mask
+ * @regs: register block for this velocity
+ * @mask: CAM mask to load
+ *
+ * Store a new mask into a CAM
+ */
+
+static void mac_set_cam_mask(struct mac_regs __iomem * regs, u8 * mask)
+{
+   int i;
+   /* Select CAM mask */
+   BYTE_REG_BITS_SET(CAMCR_PS_CAM_MASK, CAMCR_PS1 | CAMCR_PS0, 
®s->CAMCR);
+
+   writeb(CAMADDR_CAMEN, ®s->CAMADDR);
+
+   for (i = 0; i < 8; i++) {
+   writeb(*mask++, &(regs->MARCAM[i]));
+   }
+   /* disable CAMEN */
+   writeb(0, ®s->CAMADDR);
+
+   /* Select mar */
+   BYTE_REG_BITS_SET(CAMCR_PS_MAR, CAMCR_PS1 | CAMCR_PS0, ®s->CAMCR);
+}
+
+static void mac_set_vlan_cam_mask(struct mac_regs __iomem * regs, u8 * mask)
+{
+   int i;
+   /* Select CAM mask */
+   BYTE_REG_BITS_SET(CAMCR_PS_CAM_MASK, CAMCR_PS1 | CAMCR_PS0, 
®s->CAMCR);
+
+   writeb(CAMADDR_CAMEN | CAMADDR_VCAMSL, ®s->CAMADDR);
+
+   for (i = 0; i < 8; i++) {
+   writeb(*mask++, &(regs->MARCAM[i]));
+   }
+   /* disable CAMEN */
+   writeb(0, ®s->CAMADDR);
+
+   /* Select mar */
+   BYTE_REG_BITS_SET(CAMCR_PS_MAR, CAMCR_PS1 | CAMCR_PS0, ®s->CAMCR);
+}
+
+/**
+ * mac_set_cam -   set CAM data
+ * @regs: register block of this velocity
+ * @idx: Cam index
+ * @addr: 2 or 6 bytes of CAM data
+ *
+ * Load an address or vlan tag into a CAM
+ */
+
+static void mac_set_cam(struct mac_regs __iomem * regs, int idx, const u8 
*addr)
+{
+   int i;
+
+   /* Select CAM mask */
+   BYTE_REG_BITS_SET(CAMCR_PS_CAM_DATA, CAMCR_PS1 | CAMCR_PS0, 
®s->CAMCR);
+
+   idx &= (64 - 1);
+
+   writeb(CAMADDR_CAMEN | idx, ®s->CAMADDR);
+
+   for (i = 0; i < 6; i++) {
+   writeb(*addr++, &(regs->MARCAM[i]));
+   }
+   BYTE_REG_BITS_ON(CAMCR_CAMWR, ®s->CAMCR);
+
+   udelay(10);
+
+   writeb(0, ®s->CAMADDR);
+
+   /* Select mar */
+   BYTE_REG_BITS_SET(CAMCR_PS_MAR, CAMCR_PS1 | CAMCR_PS0, ®s->CAMCR);
+}
+
+static void mac_set_vlan_cam(struct mac_regs __iomem * regs, int idx,
+const u8 *addr)
+{
+
+   /* Select CAM mask */
+   BYTE_REG_BITS_SET(CAMCR_PS_CAM_DATA, CAMCR_PS1 | CAMCR_PS0, 
®s->CAMCR);
+
+   idx &= (64 - 1);
+
+   writeb(CAMADDR_CAMEN | CAMADDR_VCAMSL | idx, ®s->CAMADDR);
+   writew(*((u16 *) addr), ®s->MARCAM[0]);
+
+   BYTE_REG_BITS_ON(CAMCR_CAMWR, ®s->CAMCR);
+
+   udelay(10);
+
+   writeb(0, ®s->CAMADDR);
+
+   /* Select mar */
+   BYTE_REG_BITS_SET(CAMCR_PS_MAR, CAMCR_PS1 | CAMCR_PS0, ®s->CAMCR);
+}
+
+
+/**
+ * mac_wol_reset   -   reset WOL after exiting low power
+ * @regs: register block of this velocity
+ *
+ * Called after we drop out of wake on lan mode in order to
+ * reset the Wake on lan features. This function doesn't restore
+ * the rest of the logic from the result of sleep/wakeup
+ */
+
+static void mac_wol_reset(struct mac_regs __iomem * regs)
+{
+
+   /* Turn off SWPTAG right after leaving power mode */
+   BYTE_REG_BITS_OFF(STICKHW_SWPTAG, ®s->STICKHW);
+   /* clear sticky bits */
+   BYTE_REG_BITS_OFF((STICKHW_DS1 | STICKHW_DS0), ®s->STICKHW);
+
+   BYTE_REG_BITS_OFF(CHIPGCR_FCGMII, ®s->CHIPGCR);
+   BYTE_REG_BITS_OFF(CHIPGCR_FCMODE, ®s->CHIPGCR);
+   /* disable force PME-enable */
+   writeb(WOLCFG_PMEOVR, ®s->WOLCFGClr);
+   /* disable power-event config bit */
+   writew(0x, ®s->WOLCRClr);
+   /* clear power status */
+   writew(0x, ®s->WOLSRClr);
+}
 
 static int velocity_mii_ioctl(struct net_device *dev, struct ifreq *if

Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread David Miller
From: Jan-Bernd Themann <[EMAIL PROTECTED]>
Date: Fri, 24 Aug 2007 15:59:16 +0200

> 1) The current implementation of netif_rx_schedule, netif_rx_complete
>    and the net_rx_action have the following problem: netif_rx_schedule
>    sets the NAPI_STATE_SCHED flag and adds the NAPI instance to the poll_list.
>    netif_rx_action checks NAPI_STATE_SCHED, if set it will add the device
>    to the poll_list again (as well). netif_rx_complete clears the 
> NAPI_STATE_SCHED.
>    If an interrupt handler calls netif_rx_schedule on CPU 2
>    after netif_rx_complete has been called on CPU 1 (and the poll function 
>    has not returned yet), the NAPI instance will be added twice to the 
>    poll_list (by netif_rx_schedule and net_rx_action). Problems occur when 
>    netif_rx_complete is called twice for the device (BUG() called)

Indeed, this is the "who should manage the list" problem.
Probably the answer is that whoever transitions the NAPI_STATE_SCHED
bit from cleared to set should do the list addition.

Patches welcome :-)

> 3) On modern systems the incoming packets are processed very fast. Especially
>    on SMP systems when we use multiple queues we process only a few packets
>    per napi poll cycle. So NAPI does not work very well here and the 
> interrupt 
>    rate is still high. What we need would be some sort of timer polling mode 
>    which will schedule a device after a certain amount of time for high load 
>    situations. With high precision timers this could work well. Current
>    usual timers are too slow. A finer granularity would be needed to keep the
>latency down (and queue length moderate).

This is why minimal levels of HW interrupt mitigation should be enabled
in your chip.  If it does not support this, you will indeed need to look
into using high resolution timers or other schemes to alleviate this.

I do not think it deserves a generic core networking helper facility,
the chips that can't mitigate interrupts are few and obscure.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Linas Vepstas
On Fri, Aug 24, 2007 at 11:11:56PM +0200, Jan-Bernd Themann wrote:
> (when they are available for
> POWER in our case). 

hrtimer worked fine on the powerpc cell arch last summer.
I assume they work on p5 and p6 too, no ??

> I tried to implement something with "normal" timers, but the result
> was everything but great. The timers seem to be far too slow.
> I'm not sure if it helps to increase it from 1000HZ to 2500HZ
> or more.

Heh. Do the math. Even on 1gigabit cards, that's not enough:

(1gigabit/sec) x (byte/8 bits) x (packet/1500bytes) x (sec/1000 jiffy) 

is 83 packets a jiffy (for big packets, even more for small packets, 
and more again for 10 gigabit cards). So polling once per jiffy is a 
latency disaster.

--linas  

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread David Miller
From: Jan-Bernd Themann <[EMAIL PROTECTED]>
Date: Fri, 24 Aug 2007 15:59:16 +0200

>    It would be nice if it is possible to schedule queues to other CPU's, or
>    at least to use interrupts to put the queue to another cpu (not nice for 
>    as you never know which one you will hit). 
>    I'm not sure how bad the tradeoff would be.

Once the per-cpu NAPI poll queues start needing locks, much of the
gain will be lost.  This is strictly what we want to avoid.

We need real facilities for IRQ distribution policies.  With that none
of this is an issue.

This is also a platform specific problem with IRQ behavior, the IRQ
distibution scheme you mention would never occur on sparc64 for
example.  We use a fixed round-robin distribution of interrupts to
CPUS there, they don't move.

Each scheme has it's advantages, but you want a difference scheme here
than what is implemented and the fix is therefore not in the
networking :-)

Furthermore, most cards that will be using multi-queue will be
using hashes on the packet headers to choose the MSI-X interrupt
and thus the cpu to be targetted.  Those cards will want fixed
instead of dynamic interrupt to cpu distribution schemes as well,
so your problem is not unique and they'll need the same fix as
you do.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB

2007-08-24 Thread David Miller
From: jamal <[EMAIL PROTECTED]>
Date: Fri, 24 Aug 2007 08:14:16 -0400

> Seems the receive side of the sender is also consuming a lot more cpu
> i suspect because receiver is generating a lot more ACKs with TSO.

I've seen this behavior before on a low cpu powered receiver and the
issue is that batching too much actually hurts a receiver.

If the data packets were better spaced out, the receive would handle
the load better.

This is the thing the TOE guys keep talking about overcoming with
their packet pacing algorithms in their on-card TOE stack.

My hunch is that even if in the non-TSO case the TX packets were all
back to back in the cards TX ring, TSO still spits them out faster on
the wire.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Devel] [PATCH 1/1] Dynamically allocate the loopback device

2007-08-24 Thread Denis V. Lunev
no, and this is important. Loopback is initialized in fs_initcall which
is called sufficiently before module_init.

I have checked the code and do not see initialization order mistakes
right now. But, from now on, maintainer should pay attention for this
unfortunate consequence :(

Regards,
Den

Stephen Hemminger wrote:
> On Fri, 24 Aug 2007 19:55:47 +0400
> "Denis V. Lunev" <[EMAIL PROTECTED]> wrote:
> 
>> [EMAIL PROTECTED] wrote:
>>> From: Daniel Lezcano <[EMAIL PROTECTED]>
>>>
>>> Doing this makes loopback.c a better example of how to do a
>>> simple network device, and it removes the special case
>>> single static allocation of a struct net_device, hopefully
>>> making maintenance easier.
>>>
>>> Applies against net-2.6.24
>>>
>>> Tested on i386, x86_64
>>> Compiled on ia64, sparc
>> I think that a small note, that initialization order is changed will be
>> good to record. After this, loopback MUST be allocated before any other
>> networking subsystem initialization. And this is an important change.
>>
>> Regards,
>> Den
> 
> Yes, this code would break when other drivers are directly linked
> in. 
> ___
> Containers mailing list
> [EMAIL PROTECTED]
> https://lists.linux-foundation.org/mailman/listinfo/containers
> 
> ___
> Devel mailing list
> [EMAIL PROTECTED]
> https://openvz.org/mailman/listinfo/devel
> 

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] via-velocity: use standard VLAN interface (resend)

2007-08-24 Thread Al Viro
On Fri, Aug 24, 2007 at 01:56:49PM -0700, Stephen Hemminger wrote:

>  static void velocity_init_cam_filter(struct velocity_info *vptr)
>  {
>   struct mac_regs __iomem * regs = vptr->mac_regs;
> + unsigned short vid;
  
> - mac_set_cam(regs, 0, (u8 *) & (vptr->options.vid), 
> VELOCITY_VLAN_ID_CAM);
> + mac_set_cam(regs, 0, (u8 *) &vid,
> + VELOCITY_VLAN_ID_CAM);

This mac_set_cam() dreck should be split in two properly typed functions.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Jan-Bernd Themann

Linas Vepstas schrieb:

On Fri, Aug 24, 2007 at 09:04:56PM +0200, Bodo Eggert wrote:
  

Linas Vepstas <[EMAIL PROTECTED]> wrote:


On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:
  

3) On modern systems the incoming packets are processed very fast. Especially
on SMP systems when we use multiple queues we process only a few packets
per napi poll cycle. So NAPI does not work very well here and the interrupt
rate is still high.


worst-case network ping-pong app: send one
packet, wait for reply, send one packet, etc.
  

Possible solution / possible brainfart:

Introduce a timer, but don't start to use it to combine packets unless you
receive n packets within the timeframe. If you receive less than m packets
within one timeframe, stop using the timer. The system should now have a
decent response time when the network is idle, and when the network is
busy, nobody will complain about the latency.-)



Ohh, that was inspirational. Let me free-associate some wild ideas.

Suppose we keep a running average of the recent packet arrival rate,
Lets say its 10 per millisecond ("typical" for a gigabit eth runnning
flat-out).  If we could poll the driver at a rate of 10-20 per
millisecond (i.e. letting the OS do other useful work for 0.05 millisec),
then we could potentially service the card without ever having to enable 
interrupts on the card, and without hurting latency.


If the packet arrival rate becomes slow enough, we go back to an
interrupt-driven scheme (to keep latency down).

The main problem here is that, even for HZ=1000 machines, this amounts 
to 10-20 polls per jiffy.  Which, if implemented in kernel, requires 
using the high-resolution timers. And, umm, don't the HR timers require

a cpu timer interrupt to make them go? So its not clear that this is much
of a win.
  

That is indeed a good question. At least for 10G eHEA we see
that the average number of packets/poll cycle is very low.
With high precision timers we could control the poll interval
better and thus make sure we get enough packets on the queue in
high load situations to benefit from LRO while keeping the
latency moderate. When the traffic load is low we could just
stick to plain NAPI. I don't know how expensive hp timers are,
we probably just have to test it (when they are available for
POWER in our case). However, having more packets
per poll run would make LRO more efficient and thus the total
CPU utilization would decrease.

I guess on most systems there are not many different network
cards working in parallel. So if the driver could set the poll
interval for its devices, it could be well optimized depending
on the NICs characteristics.

Maybe it would be good enough to have a timer that schedules
the device for NAPI (and thus triggers SoftIRQs, which will
trigger NAPI). Whether this timer would be used via a generic
interface or would be implemented as a proprietary solution
would depend on whether other drivers want / need this feature
as well. Drivers / NICs that work fine with plain NAPI don't
have to use timer :-)

I tried to implement something with "normal" timers, but the result
was everything but great. The timers seem to be far too slow.
I'm not sure if it helps to increase it from 1000HZ to 2500HZ
or more.

Regards,
Jan-Bernd

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] via-velocity: use standard VLAN interface (resend)

2007-08-24 Thread Stephen Hemminger
The via-velocity is using a non-standard VLAN interface configured
via module parameters (yuck).

Replace with the standard acceleration interface.
It solves a number of problems with being able to handle multiple
vlans, and dynamically reconfigure.

This is compile tested only, don't have this board.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>


---
 drivers/net/via-velocity.c |   71 +++--
 drivers/net/via-velocity.h |3 +
 2 files changed, 45 insertions(+), 29 deletions(-)

--- a/drivers/net/via-velocity.c2007-08-18 07:50:10.0 -0700
+++ b/drivers/net/via-velocity.c2007-08-24 13:49:17.0 -0700
@@ -72,6 +72,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -111,15 +112,6 @@ VELOCITY_PARAM(RxDescriptors, "Number of
 #define TX_DESC_DEF 64
 VELOCITY_PARAM(TxDescriptors, "Number of transmit descriptors");
 
-#define VLAN_ID_MIN 0
-#define VLAN_ID_MAX 4095
-#define VLAN_ID_DEF 0
-/* VID_setting[] is used for setting the VID of NIC.
-   0: default VID.
-   1-4094: other VIDs.
-*/
-VELOCITY_PARAM(VID_setting, "802.1Q VLAN ID");
-
 #define RX_THRESH_MIN   0
 #define RX_THRESH_MAX   3
 #define RX_THRESH_DEF   0
@@ -147,13 +139,6 @@ VELOCITY_PARAM(rx_thresh, "Receive fifo 
 */
 VELOCITY_PARAM(DMA_length, "DMA length");
 
-#define TAGGING_DEF 0
-/* enable_tagging[] is used for enabling 802.1Q VID tagging.
-   0: disable VID seeting(default).
-   1: enable VID setting.
-*/
-VELOCITY_PARAM(enable_tagging, "Enable 802.1Q tagging");
-
 #define IP_ALIG_DEF 0
 /* IP_byte_align[] is used for IP header DWORD byte aligned
0: indicate the IP header won't be DWORD byte aligned.(Default) .
@@ -442,8 +427,7 @@ static void __devinit velocity_get_optio
velocity_set_int_opt(&opts->DMA_length, DMA_length[index], 
DMA_LENGTH_MIN, DMA_LENGTH_MAX, DMA_LENGTH_DEF, "DMA_length", devname);
velocity_set_int_opt(&opts->numrx, RxDescriptors[index], RX_DESC_MIN, 
RX_DESC_MAX, RX_DESC_DEF, "RxDescriptors", devname);
velocity_set_int_opt(&opts->numtx, TxDescriptors[index], TX_DESC_MIN, 
TX_DESC_MAX, TX_DESC_DEF, "TxDescriptors", devname);
-   velocity_set_int_opt(&opts->vid, VID_setting[index], VLAN_ID_MIN, 
VLAN_ID_MAX, VLAN_ID_DEF, "VID_setting", devname);
-   velocity_set_bool_opt(&opts->flags, enable_tagging[index], TAGGING_DEF, 
VELOCITY_FLAGS_TAGGING, "enable_tagging", devname);
+
velocity_set_bool_opt(&opts->flags, txcsum_offload[index], TX_CSUM_DEF, 
VELOCITY_FLAGS_TX_CSUM, "txcsum_offload", devname);
velocity_set_int_opt(&opts->flow_cntl, flow_control[index], 
FLOW_CNTL_MIN, FLOW_CNTL_MAX, FLOW_CNTL_DEF, "flow_control", devname);
velocity_set_bool_opt(&opts->flags, IP_byte_align[index], IP_ALIG_DEF, 
VELOCITY_FLAGS_IP_ALIGN, "IP_byte_align", devname);
@@ -465,6 +449,7 @@ static void __devinit velocity_get_optio
 static void velocity_init_cam_filter(struct velocity_info *vptr)
 {
struct mac_regs __iomem * regs = vptr->mac_regs;
+   unsigned short vid;
 
/* Turn on MCFG_PQEN, turn off MCFG_RTGOPT */
WORD_REG_BITS_SET(MCFG_PQEN, MCFG_RTGOPT, ®s->MCFG);
@@ -477,13 +462,19 @@ static void velocity_init_cam_filter(str
mac_set_cam_mask(regs, vptr->mCAMmask, VELOCITY_MULTICAST_CAM);
 
/* Enable first VCAM */
-   if (vptr->flags & VELOCITY_FLAGS_TAGGING) {
-   /* If Tagging option is enabled and VLAN ID is not zero, then
-  turn on MCFG_RTGOPT also */
-   if (vptr->options.vid != 0)
-   WORD_REG_BITS_ON(MCFG_RTGOPT, ®s->MCFG);
+   if (vptr->vlgrp) {
+   for (vid = 0; vid < VLAN_VID_MASK; vid++) {
+   if (vlan_group_get_device(vptr->vlgrp, vid)) {
+   /* If Tagging option is enabled and
+  VLAN ID is not zero, then
+  turn on MCFG_RTGOPT also */
+   if (vid != 0)
+   WORD_REG_BITS_ON(MCFG_RTGOPT, 
®s->MCFG);
 
-   mac_set_cam(regs, 0, (u8 *) & (vptr->options.vid), 
VELOCITY_VLAN_ID_CAM);
+   mac_set_cam(regs, 0, (u8 *) &vid,
+   VELOCITY_VLAN_ID_CAM);
+   }
+   }
vptr->vCAMmask[0] |= 1;
mac_set_cam_mask(regs, vptr->vCAMmask, VELOCITY_VLAN_ID_CAM);
} else {
@@ -494,6 +485,26 @@ static void velocity_init_cam_filter(str
}
 }
 
+static void velocity_vlan_rx_add_vid(struct net_device *dev, unsigned short 
vid)
+{
+   struct velocity_info *vptr = netdev_priv(dev);
+
+spin_lock_irq(&vptr->lock);
+   velocity_init_cam_filter(vptr);
+spin_unlock_irq(&vptr->lock);
+}
+
+static void velocity_vlan_rx_kill_vid(struct net_device *dev, unsigned short 
vid)
+{
+   struct velocity_info

[PATCH 2/3] net: wrap hard_header_parse

2007-08-24 Thread Stephen Hemminger
Wrap the hard_header_parse function to simplify next step
of header_ops conversion.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

--- a/include/linux/netdevice.h 2007-08-23 21:25:57.0 -0700
+++ b/include/linux/netdevice.h 2007-08-23 22:25:35.0 -0700
@@ -639,7 +639,7 @@ struct net_device
void(*vlan_rx_kill_vid)(struct net_device *dev,
unsigned short vid);
 
-   int (*hard_header_parse)(struct sk_buff *skb,
+   int (*hard_header_parse)(const struct sk_buff *skb,
 unsigned char *haddr);
int (*neigh_setup)(struct net_device *dev, struct 
neigh_parms *);
 #ifdef CONFIG_NETPOLL
@@ -787,6 +787,16 @@ static inline int dev_hard_header(struct
return dev->hard_header(skb, dev, type, daddr, saddr, len);
 }
 
+static inline int dev_parse_header(const struct sk_buff *skb,
+  unsigned char *haddr)
+{
+   const struct net_device *dev = skb->dev;
+
+   if (!dev->hard_header_parse)
+   return 0;
+   return dev->hard_header_parse(skb, haddr);
+}
+
 typedef int gifconf_func_t(struct net_device * dev, char __user * bufptr, int 
len);
 extern int register_gifconf(unsigned int family, gifconf_func_t * 
gifconf);
 static inline int unregister_gifconf(unsigned int family)
--- a/net/netfilter/nfnetlink_log.c 2007-08-23 09:44:22.0 -0700
+++ b/net/netfilter/nfnetlink_log.c 2007-08-23 21:43:32.0 -0700
@@ -480,12 +480,13 @@ __build_packet_message(struct nfulnl_ins
NFA_PUT(inst->skb, NFULA_MARK, sizeof(tmp_uint), &tmp_uint);
}
 
-   if (indev && skb->dev && skb->dev->hard_header_parse) {
+   if (indev && skb->dev) {
struct nfulnl_msg_packet_hw phw;
-   int len = skb->dev->hard_header_parse((struct sk_buff *)skb,
-   phw.hw_addr);
-   phw.hw_addrlen = htons(len);
-   NFA_PUT(inst->skb, NFULA_HWADDR, sizeof(phw), &phw);
+   int len = dev_parse_header(skb, phw.hw_addr);
+   if (len > 0) {
+   phw.hw_addrlen = htons(len);
+   NFA_PUT(inst->skb, NFULA_HWADDR, sizeof(phw), &phw);
+   }
}
 
if (skb->tstamp.tv64) {
--- a/net/netfilter/nfnetlink_queue.c   2007-08-23 09:44:22.0 -0700
+++ b/net/netfilter/nfnetlink_queue.c   2007-08-23 21:33:50.0 -0700
@@ -485,14 +485,13 @@ nfqnl_build_packet_message(struct nfqnl_
NFA_PUT(skb, NFQA_MARK, sizeof(u_int32_t), &tmp_uint);
}
 
-   if (indev && entskb->dev
-   && entskb->dev->hard_header_parse) {
+   if (indev && entskb->dev) {
struct nfqnl_msg_packet_hw phw;
-
-   int len = entskb->dev->hard_header_parse(entskb,
-  phw.hw_addr);
-   phw.hw_addrlen = htons(len);
-   NFA_PUT(skb, NFQA_HWADDR, sizeof(phw), &phw);
+   int len = dev_parse_header(entskb, phw.hw_addr);
+   if (len) {
+   phw.hw_addrlen = htons(len);
+   NFA_PUT(skb, NFQA_HWADDR, sizeof(phw), &phw);
+   }
}
 
if (entskb->tstamp.tv64) {
--- a/net/packet/af_packet.c2007-08-23 21:25:57.0 -0700
+++ b/net/packet/af_packet.c2007-08-23 22:25:19.0 -0700
@@ -512,10 +512,8 @@ static int packet_rcv(struct sk_buff *sk
sll->sll_ifindex = orig_dev->ifindex;
else
sll->sll_ifindex = dev->ifindex;
-   sll->sll_halen = 0;
 
-   if (dev->hard_header_parse)
-   sll->sll_halen = dev->hard_header_parse(skb, sll->sll_addr);
+   sll->sll_halen = dev_parse_header(skb, sll->sll_addr);
 
PACKET_SKB_CB(skb)->origlen = skb->len;
 
@@ -649,9 +647,7 @@ static int tpacket_rcv(struct sk_buff *s
h->tp_usec = tv.tv_usec;
 
sll = (struct sockaddr_ll*)((u8*)h + TPACKET_ALIGN(sizeof(*h)));
-   sll->sll_halen = 0;
-   if (dev->hard_header_parse)
-   sll->sll_halen = dev->hard_header_parse(skb, sll->sll_addr);
+   sll->sll_halen = dev_parse_header(skb, sll->sll_addr);
sll->sll_family = AF_PACKET;
sll->sll_hatype = dev->type;
sll->sll_protocol = skb->protocol;
--- a/net/ethernet/eth.c2007-08-23 21:25:57.0 -0700
+++ b/net/ethernet/eth.c2007-08-23 22:25:19.0 -0700
@@ -207,9 +207,9 @@ EXPORT_SYMBOL(eth_type_trans);
  * @skb: packet to extract header from
  * @haddr: destination buffer
  */
-static int eth_header_parse(struct sk_buff *skb, unsigned char *haddr)
+static int eth_header_parse(const struct sk_buff *skb, unsigned char *haddr)
 {
-   struct ethhdr *eth = eth_hdr(skb);
+   const struct

[PATCH 1/3] net: wrap netdevice hardware header creation

2007-08-24 Thread Stephen Hemminger
Add inline for common usage of hardware header creation, and
fix bug in IPV6 mcast where the assumption about negative return value
was wrong.

Negative return from hard_header means not enough space was available,
(ie -N bytes).

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>


--- a/include/linux/netdevice.h 2007-08-23 09:44:19.0 -0700
+++ b/include/linux/netdevice.h 2007-08-24 12:47:11.0 -0700
@@ -778,6 +778,15 @@ extern int dev_restart(struct net_devic
 extern int netpoll_trap(void);
 #endif
 
+static inline int dev_hard_header(struct sk_buff *skb, struct net_device *dev,
+ unsigned short type,
+ void *daddr, void *saddr, unsigned len)
+{
+   if (!dev->hard_header)
+   return 0;
+   return dev->hard_header(skb, dev, type, daddr, saddr, len);
+}
+
 typedef int gifconf_func_t(struct net_device * dev, char __user * bufptr, int 
len);
 extern int register_gifconf(unsigned int family, gifconf_func_t * 
gifconf);
 static inline int unregister_gifconf(unsigned int family)
--- a/net/ipv4/arp.c2007-08-23 09:44:22.0 -0700
+++ b/net/ipv4/arp.c2007-08-24 12:47:11.0 -0700
@@ -590,8 +590,7 @@ struct sk_buff *arp_create(int type, int
/*
 *  Fill the device header for the ARP frame
 */
-   if (dev->hard_header &&
-   dev->hard_header(skb,dev,ptype,dest_hw,src_hw,skb->len) < 0)
+   if (dev_hard_header(skb, dev, ptype, dest_hw, src_hw, skb->len) < 0)
goto out;
 
/*
--- a/net/core/neighbour.c  2007-08-23 09:44:22.0 -0700
+++ b/net/core/neighbour.c  2007-08-24 12:47:11.0 -0700
@@ -1123,9 +1123,8 @@ int neigh_compat_output(struct sk_buff *
 
__skb_pull(skb, skb_network_offset(skb));
 
-   if (dev->hard_header &&
-   dev->hard_header(skb, dev, ntohs(skb->protocol), NULL, NULL,
-skb->len) < 0 &&
+   if (dev_hard_header(skb, dev, ntohs(skb->protocol), NULL, NULL,
+   skb->len) < 0 &&
dev->rebuild_header(skb))
return 0;
 
@@ -1152,13 +1151,13 @@ int neigh_resolve_output(struct sk_buff 
write_lock_bh(&neigh->lock);
if (!dst->hh)
neigh_hh_init(neigh, dst, dst->ops->protocol);
-   err = dev->hard_header(skb, dev, ntohs(skb->protocol),
-  neigh->ha, NULL, skb->len);
+   err = dev_hard_header(skb, dev, ntohs(skb->protocol),
+ neigh->ha, NULL, skb->len);
write_unlock_bh(&neigh->lock);
} else {
read_lock_bh(&neigh->lock);
-   err = dev->hard_header(skb, dev, ntohs(skb->protocol),
-  neigh->ha, NULL, skb->len);
+   err = dev_hard_header(skb, dev, ntohs(skb->protocol),
+ neigh->ha, NULL, skb->len);
read_unlock_bh(&neigh->lock);
}
if (err >= 0)
@@ -1189,8 +1188,8 @@ int neigh_connected_output(struct sk_buf
__skb_pull(skb, skb_network_offset(skb));
 
read_lock_bh(&neigh->lock);
-   err = dev->hard_header(skb, dev, ntohs(skb->protocol),
-  neigh->ha, NULL, skb->len);
+   err = dev_hard_header(skb, dev, ntohs(skb->protocol),
+ neigh->ha, NULL, skb->len);
read_unlock_bh(&neigh->lock);
if (err >= 0)
err = neigh->ops->queue_xmit(skb);
--- a/net/8021q/vlan_dev.c  2007-08-23 09:44:21.0 -0700
+++ b/net/8021q/vlan_dev.c  2007-08-24 12:47:11.0 -0700
@@ -419,21 +419,19 @@ int vlan_dev_hard_header(struct sk_buff 
 
if (build_vlan_header) {
/* Now make the underlying real hard header */
-   rc = dev->hard_header(skb, dev, ETH_P_8021Q, daddr, saddr, len 
+ VLAN_HLEN);
-
-   if (rc > 0) {
+   rc = dev_hard_header(skb, dev, ETH_P_8021Q, daddr, saddr,
+len + VLAN_HLEN);
+   if (rc > 0)
rc += VLAN_HLEN;
-   } else if (rc < 0) {
+   else if (rc < 0)
rc -= VLAN_HLEN;
-   }
-   } else {
+   } else
/* If here, then we'll just make a normal looking ethernet 
frame,
 * but, the hard_start_xmit method will insert the tag (it has 
to
 * be able to do this for bridged and other skbs that don't come
 * down the protocol stack in an orderly manner.
 */
-   rc = dev->hard_header(skb, dev, type, daddr, saddr, len);
-   }
+   rc = dev_

[PATCH 0/3] move hardware header functions out of netdevice

2007-08-24 Thread Stephen Hemminger
The follow patches series starts the process of moving function
pointers out of network device structure. This saves space and
separates code from data.

The first step is moving the functions dealing with hardware
headers.

Patches are against current net-2.6.24 tree. Basic functional
testing on ethernet part, not on all the other protocols affected.

-- 
Stephen Hemminger <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Linas Vepstas
On Fri, Aug 24, 2007 at 09:04:56PM +0200, Bodo Eggert wrote:
> Linas Vepstas <[EMAIL PROTECTED]> wrote:
> > On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:
> >> 3) On modern systems the incoming packets are processed very fast. 
> >> Especially
> >> on SMP systems when we use multiple queues we process only a few packets
> >> per napi poll cycle. So NAPI does not work very well here and the interrupt
> >> rate is still high.
> > 
> > worst-case network ping-pong app: send one
> > packet, wait for reply, send one packet, etc.
> 
> Possible solution / possible brainfart:
> 
> Introduce a timer, but don't start to use it to combine packets unless you
> receive n packets within the timeframe. If you receive less than m packets
> within one timeframe, stop using the timer. The system should now have a
> decent response time when the network is idle, and when the network is
> busy, nobody will complain about the latency.-)

Ohh, that was inspirational. Let me free-associate some wild ideas.

Suppose we keep a running average of the recent packet arrival rate,
Lets say its 10 per millisecond ("typical" for a gigabit eth runnning
flat-out).  If we could poll the driver at a rate of 10-20 per
millisecond (i.e. letting the OS do other useful work for 0.05 millisec),
then we could potentially service the card without ever having to enable 
interrupts on the card, and without hurting latency.

If the packet arrival rate becomes slow enough, we go back to an
interrupt-driven scheme (to keep latency down).

The main problem here is that, even for HZ=1000 machines, this amounts 
to 10-20 polls per jiffy.  Which, if implemented in kernel, requires 
using the high-resolution timers. And, umm, don't the HR timers require
a cpu timer interrupt to make them go? So its not clear that this is much
of a win.

The eHEA is a 10 gigabit device, so it can expect 80-100 packets per
millisecond for large packets, and even more, say 1K packets per
millisec, for small packets. (Even the spec for my 1Gb spidernet card
claims its internal rate is 1M packets/sec.) 

Another possiblity is to set HZ to 5000 or 2 or something humongous
... after all cpu's are now faster! But, since this might be wasteful,
maybe we could make HZ be dynamically variable: have high HZ rates when
there's lots of network/disk activity, and low HZ rates when not. That
means a non-constant jiffy.

If all drivers used interrupt mitigation, then the variable-high
frequency jiffy could take thier place, and be more "fair" to everyone.
Most drivers would be polled most of the time when they're busy, and 
only use interrupts when they're not.
 
--linas
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()

2007-08-24 Thread Chris Snook

Denys Vlasenko wrote:

On Friday 24 August 2007 18:06, Christoph Lameter wrote:

On Fri, 24 Aug 2007, Satyam Sharma wrote:

But if people do seem to have a mixed / confused notion of atomicity
and barriers, and if there's consensus, then as I'd said earlier, I
have no issues in going with the consensus (eg. having API variants).
Linus would be more difficult to convince, however, I suspect :-)

The confusion may be the result of us having barrier semantics in
atomic_read. If we take that out then we may avoid future confusions.


I think better name may help. Nuke atomic_read() altogether.

n = atomic_value(x);// doesnt hint as strongly at reading as "atomic_read"
n = atomic_fetch(x);// yes, we _do_ touch RAM
n = atomic_read_uncached(x); // or this

How does that sound?


atomic_value() vs. atomic_fetch() should be rather unambiguous. 
atomic_read_uncached() begs the question of precisely which cache we are 
avoiding, and could itself cause confusion.


So, if I were writing atomic.h from scratch, knowing what I know now, I think 
I'd use atomic_value() and atomic_fetch().  The problem is that there are a lot 
of existing users of atomic_read(), and we can't write a script to correctly 
guess their intent.  I'm not sure auditing all uses of atomic_read() is really 
worth the comparatively miniscule benefits.


We could play it safe and convert them all to atomic_fetch(), or we could 
acknowledge that changing the semantics 8 months ago was not at all disastrous, 
and make them all atomic_value(), allowing people to use atomic_fetch() where 
they really care.


-- Chris
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()

2007-08-24 Thread Denys Vlasenko
On Friday 24 August 2007 18:06, Christoph Lameter wrote:
> On Fri, 24 Aug 2007, Satyam Sharma wrote:
> > But if people do seem to have a mixed / confused notion of atomicity
> > and barriers, and if there's consensus, then as I'd said earlier, I
> > have no issues in going with the consensus (eg. having API variants).
> > Linus would be more difficult to convince, however, I suspect :-)
>
> The confusion may be the result of us having barrier semantics in
> atomic_read. If we take that out then we may avoid future confusions.

I think better name may help. Nuke atomic_read() altogether.

n = atomic_value(x);// doesnt hint as strongly at reading as "atomic_read"
n = atomic_fetch(x);// yes, we _do_ touch RAM
n = atomic_read_uncached(x); // or this

How does that sound?
--
vda
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-24 Thread Denys Vlasenko
On Friday 24 August 2007 18:15, Christoph Lameter wrote:
> On Fri, 24 Aug 2007, Denys Vlasenko wrote:
> > On Thursday 16 August 2007 00:22, Paul Mackerras wrote:
> > > Satyam Sharma writes:
> > > In the kernel we use atomic variables in precisely those situations
> > > where a variable is potentially accessed concurrently by multiple
> > > CPUs, and where each CPU needs to see updates done by other CPUs in a
> > > timely fashion.  That is what they are for.  Therefore the compiler
> > > must not cache values of atomic variables in registers; each
> > > atomic_read must result in a load and each atomic_set must result in a
> > > store.  Anything else will just lead to subtle bugs.
> >
> > Amen.
>
> A "timely" fashion? One cannot rely on something like that when coding.
> The visibility of updates is insured by barriers and not by some fuzzy
> notion of "timeliness".

But here you do have some notion of time:

while (atomic_read(&x))
continue;

"continue when other CPU(s) decrement it down to zero".
If "read" includes an insn which accesses RAM, you will
see "new" value sometime after other CPU decrements it.
"Sometime after" is on the order of nanoseconds here.
It is a valid concept of time, right?

The whole confusion is about whether atomic_read implies
"read from RAM" or not. I am in a camp which thinks it does.
You are in an opposite one.

We just need a less ambiguous name.

What about this:

/**
 * atomic_read - read atomic variable
 * @v: pointer of type atomic_t
 *
 * Atomically reads the value of @v.
 * No compiler barrier implied.
 */
#define atomic_read(v)  ((v)->counter)

+/**
+ * atomic_read_uncached - read atomic variable from memory
+ * @v: pointer of type atomic_t
+ *
+ * Atomically reads the value of @v. This is guaranteed to emit an insn
+ * which accesses memory, atomically. No ordering guarantees!
+ */
+#define atomic_read_uncached(v)  asm_or_volatile_ptr_magic(v)

I was thinking of s/atomic_read/atomic_get/ too, but it implies "taking"
atomic a-la get_cpu()...
--
vda
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] iproute2-2.6.23-rc3

2007-08-24 Thread Stephen Hemminger
On Fri, 24 Aug 2007 12:10:44 +0200
Jarek Poplawski <[EMAIL PROTECTED]> wrote:

> On 22-08-2007 20:08, Stephen Hemminger wrote:
> > There have been a lot of changes for 2.6.23, so here is a test release
> > of iproute2 that should capture all the submitted patches
> > 
> > 
> > http://developer.osdl.org/shemminger/iproute2/download/iproute2-2.6.23-rc3.tar.gz
> 
> But... isn't it forged, btw?!

No, I just didn't sign a temporary testing version.  A final version
will be out after 2.6.23

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] [02/10] pasemi_mac: Stop using the pci config space accessors for register read/writes

2007-08-24 Thread Olof Johansson
On Fri, Aug 24, 2007 at 02:05:31PM +1000, Stephen Rothwell wrote:
> On Thu, 23 Aug 2007 13:13:10 -0500 Olof Johansson <[EMAIL PROTECTED]> wrote:
> >
> >  out:
> > -   pci_dev_put(mac->iob_pdev);
> > -out_put_dma_pdev:
> > -   pci_dev_put(mac->dma_pdev);
> > -out_free_netdev:
> > +   if (mac->iob_pdev)
> > +   pci_dev_put(mac->iob_pdev);
> > +   if (mac->dma_pdev)
> > +   pci_dev_put(mac->dma_pdev);
> 
> It is not documented as such (as far as I can see), but pci_dev_put is
> safe to call with NULL. And there are other places in the kernel that
> explicitly use that fact.

Some places check, others do not. I'll leave it be for now but might take
care of it during some future cleanup. Thanks for point it out though.


-Olof
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Bodo Eggert
Linas Vepstas <[EMAIL PROTECTED]> wrote:
> On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:

>> 3) On modern systems the incoming packets are processed very fast. Especially
>> on SMP systems when we use multiple queues we process only a few packets
>> per napi poll cycle. So NAPI does not work very well here and the interrupt
>> rate is still high.
> 
> I saw this too, on a system that is "modern" but not terribly fast, and
> only slightly (2-way) smp. (the spidernet)
> 
> I experimented wih various solutions, none were terribly exciting.  The
> thing that killed all of them was a crazy test case that someone sprung on
> me:  They had written a worst-case network ping-pong app: send one
> packet, wait for reply, send one packet, etc.
> 
> If I waited (indefinitely) for a second packet to show up, the test case
> completely stalled (since no second packet would ever arrive).  And if I
> introduced a timer to wait for a second packet, then I just increased
> the latency in the response to the first packet, and this was noticed,
> and folks complained.

Possible solution / possible brainfart:

Introduce a timer, but don't start to use it to combine packets unless you
receive n packets within the timeframe. If you receive less than m packets
within one timeframe, stop using the timer. The system should now have a
decent response time when the network is idle, and when the network is
busy, nobody will complain about the latency.-)
-- 
Funny quotes:
22. When everything's going your way, you're in the wrong lane and and going
the wrong way.
Friß, Spammer: [EMAIL PROTECTED] [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB

2007-08-24 Thread Rick Jones

Bill Fink wrote:

On Thu, 23 Aug 2007, Rick Jones wrote:



jamal wrote:


[TSO already passed - iirc, it has been
demostranted to really not add much to throughput (cant improve much
over closeness to wire speed) but improve CPU utilization].


In the one gig space sure, but in the 10 Gig space, TSO on/off does make a 
difference for throughput.



Not too much.

TSO enabled:

[EMAIL PROTECTED] ~]# ethtool -k eth2
Offload parameters for eth2:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on

[EMAIL PROTECTED] ~]# nuttcp -w10m 192.168.88.16
11813.4375 MB /  10.00 sec = 9906.1644 Mbps 99 %TX 80 %RX

TSO disabled:

[EMAIL PROTECTED] ~]# ethtool -K eth2 tso off
[EMAIL PROTECTED] ~]# ethtool -k eth2
Offload parameters for eth2:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: off

[EMAIL PROTECTED] ~]# nuttcp -w10m 192.168.88.16
11818.2500 MB /  10.00 sec = 9910.0176 Mbps 100 %TX 78 %RX

Pretty negligible difference it seems.


Leaves one wondering how often more than one segment was sent to the card in the 
9000 byte case :)


rick jones
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Jan-Bernd Themann

James Chapman schrieb:

Stephen Hemminger wrote:

On Fri, 24 Aug 2007 17:47:15 +0200
Jan-Bernd Themann <[EMAIL PROTECTED]> wrote:


Hi,

On Friday 24 August 2007 17:37, [EMAIL PROTECTED] wrote:

On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:

...
3) On modern systems the incoming packets are processed very fast. 
Especially
   on SMP systems when we use multiple queues we process only a 
few packets
   per napi poll cycle. So NAPI does not work very well here and 
the interruptrate is still high. What we need would be some 
sort of timer polling modewhich will schedule a device after a 
certain amount of time for high loadsituations. With high 
precision timers this could work well. Current
   usual timers are too slow. A finer granularity would be needed 
to keep the

   latency down (and queue length moderate).

We found the same on ia64-sn systems with tg3 a couple of years 
ago. Using simple interrupt coalescing ("don't interrupt until 
you've received N packets or M usecs have elapsed") worked 
reasonably well in practice. If your h/w supports that (and I'd 
guess it does, since it's such a simple thing), you might try it.


I don't see how this should work. Our latest machines are fast 
enough that they

simply empty the queue during the first poll iteration (in most cases).
Even if you wait until X packets have been received, it does not 
help for
the next poll cycle. The average number of packets we process per 
poll queue
is low. So a timer would be preferable that periodically polls the 
queue, without the need of generating a HW interrupt. This would 
allow us
to wait until a reasonable amount of packets have been received in 
the meantime

to keep the poll overhead low. This would also be useful in combination
with LRO.



You need hardware support for deferred interrupts. Most devices have 
it (e1000, sky2, tg3)
and it interacts well with NAPI. It is not a generic thing you want 
done by the stack,
you want the hardware to hold off interrupts until X packets or Y 
usecs have expired.


Does hardware interrupt mitigation really interact well with NAPI? In 
my experience, holding off interrupts for X packets or Y usecs does 
more harm than good; such hardware features are useful only when the 
OS has no NAPI-like mechanism.


When tuning NAPI drivers for packets/sec performance (which is a good 
indicator of driver performance), I make sure that the driver stays in 
NAPI polled mode while it has any rx or tx work to do. If the CPU is 
fast enough that all work is always completed on each poll, I have the 
driver stay in polled mode until dev->poll() is called N times with no 
work being done. This keeps interrupts disabled for reasonable traffic 
levels, while minimizing packet processing latency. No need for 
hardware interrupt mitigation.
Yes, that was one idea as well. But the problem with that is that 
net_rx_action will call
the same poll function over and over again in a row if there are no 
further network
devices. The problem about this approach is that you always poll just a 
very few packets
each time. This does not work with LRO well, as there are no packets to 
aggregate...
So it would make more sense to wait for a certain time before trying it 
again.
Second problem: after the jiffies incremented by one in net_rx_action 
(after some poll rounds), net_rx_action will quit and return control to 
the softIRQ handler. The poll function
is called again as the softIRQ handler thinks there is more work to be 
done. So even
then we do not wait... After some rounds in the softIRQ handler, we 
finally wait some time.




The parameters for controlling it are already in ethtool, the issue 
is finding a good
default set of values for a wide range of applications and 
architectures. Maybe some
heuristic based on processor speed would be a good starting point. 
The dynamic irq

moderation stuff is not widely used because it is too hard to get right.


I agree. It would be nice to find a way for the typical user to derive 
best values for these knobs for his/her particular system. Perhaps a 
tool using pktgen and network device phy internal loopback could be 
developed?





-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB

2007-08-24 Thread Bill Fink
On Fri, 24 Aug 2007, jamal wrote:

> On Thu, 2007-23-08 at 23:18 -0400, Bill Fink wrote:
> 
> [..]
> > Here you can see there is a major difference in the TX CPU utilization
> > (99 % with TSO disabled versus only 39 % with TSO enabled), although
> > the TSO disabled case was able to squeeze out a little extra performance
> > from its extra CPU utilization.  
> 
> Good stuff. What kind of machine? SMP?

Tyan Thunder K8WE S2895ANRF motherboard with Nvidia nForce
Professional 2200+2050 chipset, 2 AMD Opteron 254 2.8 GHz CPUs,
4 GB PC3200 ECC REG-DDR 400 memory, and 2 PCI-Express x16 slots
(2 buses).

It is SMP but both the NIC interrupts and nuttcp are bound to
CPU 0.  And all other non-kernel system processes are bound to
CPU 1.

> Seems the receive side of the sender is also consuming a lot more cpu
> i suspect because receiver is generating a lot more ACKs with TSO.

Odd.  I just reran the TCP CUBIC "-M1460" tests, and with TSO enabled
on the transmitter, there were about 153709 eth2 interrupts on the
receiver, while with TSO disabled there was actually a somewhat higher
number (164988) of receiver side eth2 interrupts, although the receive
side CPU utilization was actually lower in that case.

On the transmit side (different test run), the TSO enabled case had
about 161773 eth2 interrupts whereas the TSO disabled case had about
165179 eth2 interrupts.

> Does the choice of the tcp congestion control algorithm affect results?
> it would be interesting to see both MTUs with either TCP BIC vs good old
> reno on sender (probably without changing what the receiver does). BIC
> seems to be the default lately.

These tests were with the default TCP CUBIC (with initial_ssthresh
set to 0).

With TCP BIC (and initial_ssthresh set to 0):

TSO enabled:

[EMAIL PROTECTED] ~]# nuttcp -w10m 192.168.88.16
11751.3750 MB /  10.00 sec = 9853.9839 Mbps 100 %TX 83 %RX

[EMAIL PROTECTED] ~]# nuttcp -M1460 -w10m 192.168.88.16
 4999.3321 MB /  10.06 sec = 4167.7872 Mbps 38 %TX 100 %RX

TSO disabled:

[EMAIL PROTECTED] ~]# nuttcp -w10m 192.168.88.16
11818.1875 MB /  10.00 sec = 9910.0682 Mbps 99 %TX 81 %RX

[EMAIL PROTECTED] ~]# nuttcp -M1460 -w10m 192.168.88.16
 5502.6250 MB /  10.00 sec = 4614.3297 Mbps 100 %TX 84 %RX

And with TCP Reno:

TSO enabled:

[EMAIL PROTECTED] ~]# nuttcp -w10m 192.168.88.16
11782.6250 MB /  10.00 sec = 9880.2613 Mbps 100 %TX 77 %RX

[EMAIL PROTECTED] ~]# nuttcp -M1460 -w10m 192.168.88.16
 5024.6649 MB /  10.06 sec = 4191.6574 Mbps 38 %TX 99 %RX

TSO disabled:

[EMAIL PROTECTED] ~]# nuttcp -w10m 192.168.88.16
11818.2500 MB /  10.00 sec = 9910.0860 Mbps 99 %TX 77 %RX

[EMAIL PROTECTED] ~]# nuttcp -M1460 -w10m 192.168.88.16
 5284. MB /  10.00 sec = 4430.9604 Mbps 99 %TX 79 %RX

Very similar results to the original TCP CUBIC tests.

> > Interestingly, with TSO enabled, the
> > receiver actually consumed more CPU than with TSO disabled, 
> 
> I would suspect the fact that a lot more packets making it into the
> receiver for TSO contributes.
> 
> > so I guess
> > the receiver CPU saturation in that case (99 %) was what restricted
> > its performance somewhat (this was consistent across a few test runs).
> 
> Unfortunately the receiver plays a big role in such tests - if it is
> bottlenecked then you are not really testing the limits of the
> transmitter. 

It might be interesting to see what affect the LRO changes would have
on this.  Once they are in a stable released kernel, I might try that
out, or maybe even before if I get some spare time (but that's in very
short supply right now).

-Thanks

-Bill
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] isdn capi driver broken on 64 bit.

2007-08-24 Thread Stephen Hemminger
The following driver API is broken on any architecture with 64 bit addresses.
because of cast that loses high bits.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>


--- a/drivers/isdn/capi/capidrv.c   2007-06-25 09:03:12.0 -0700
+++ b/drivers/isdn/capi/capidrv.c   2007-08-24 11:06:46.0 -0700
@@ -1855,6 +1855,9 @@ static int if_sendbuf(int id, int channe
return 0;
}
datahandle = nccip->datahandle;
+
+   /* This won't work on 64 bit! */
+   BUILD_BUG_ON(sizeof(skb->data) > sizeof(u32));
capi_fill_DATA_B3_REQ(&sendcmsg, global.ap.applid, card->msgid++,
  nccip->ncci,  /* adr */
  (u32) skb->data,  /* Data */
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Devel] [PATCH 1/1] Dynamically allocate the loopback device

2007-08-24 Thread Stephen Hemminger
On Fri, 24 Aug 2007 19:55:47 +0400
"Denis V. Lunev" <[EMAIL PROTECTED]> wrote:

> [EMAIL PROTECTED] wrote:
> > From: Daniel Lezcano <[EMAIL PROTECTED]>
> > 
> > Doing this makes loopback.c a better example of how to do a
> > simple network device, and it removes the special case
> > single static allocation of a struct net_device, hopefully
> > making maintenance easier.
> > 
> > Applies against net-2.6.24
> > 
> > Tested on i386, x86_64
> > Compiled on ia64, sparc
> 
> I think that a small note, that initialization order is changed will be
> good to record. After this, loopback MUST be allocated before any other
> networking subsystem initialization. And this is an important change.
> 
> Regards,
> Den

Yes, this code would break when other drivers are directly linked
in. 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3/4] 2.6.23-rc3: known regressions v3

2007-08-24 Thread Stephen Hemminger
O
> Subject : New wake ups from sky2
> References  : http://lkml.org/lkml/2007/7/20/386
> Last known good : ?
> Submitter   : Thomas Meyer <[EMAIL PROTECTED]>
> Caused-By   : Stephen Hemminger <[EMAIL PROTECTED]>
>   commit eb35cf60e462491249166182e3e755d3d5d91a28
> Handled-By  : Stephen Hemminger <[EMAIL PROTECTED]>
> Status  : unknown
> 
>

Fix posted to netdev (sky2 1.17 series), but Jeff hasn't 
applied it.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Shirley Ma
> Just to be clear, in the previous email I posted on this thread, I
> described a worst-case network ping-pong test case (send a packet, wait
> for reply), and found out that a deffered interrupt scheme just damaged
> the performance of the test case. 

When splitting rx and tx handler, I found some performance gain by 
deffering interrupt scheme in tx not rx in IPoIB driver.

Shirley
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()

2007-08-24 Thread Linus Torvalds


On Fri, 24 Aug 2007, Denys Vlasenko wrote:
> 
> So you are ok with compiler propagating n1 to n2 here:
> 
> n1 += atomic_read(x);
> other_variable++;
> n2 += atomic_read(x);
> 
> without accessing x second time. What's the point? Any sane coder
> will say that explicitly anyway:

No.

This is a common mistake, and it's total crap.

Any "sane coder" will often use inline functions, macros, etc helpers to 
do certain abstract things. Those things may contain "atomic_read()" 
calls.

The biggest reason for compilers doing CSE is exactly the fact that many 
opportunities for CSE simple *are*not*visible* on a source code level. 

That is true of things like atomic_read() equally as to things like shared 
offsets inside structure member accesses. No difference what-so-ever.

Yes, we have, traditionally, tried to make it *easy* for the compiler to 
generate good code. So when we can, and when we look at performance for 
some really hot path, we *will* write the source code so that the compiler 
doesn't even have the option to screw it up, and that includes things like 
doing CSE at a source code level so that we don't see the compiler 
re-doing accesses unnecessarily.

And I'm not saying we shouldn't do that. But "performance" is not an 
either-or kind of situation, and we should:

 - spend the time at a source code level: make it reasonably easy for the 
   compiler to generate good code, and use the right algorithms at a 
   higher level (and order structures etc so that they have good cache 
   behaviour).

 - .. *and* expect the compiler to handle the cases we didn't do by hand
   pretty well anyway. In particular, quite often, abstraction levels at a 
   software level means that we give compilers "stupid" code, because some 
   function may have a certain high-level abstraction rule, but then on a 
   particular architecture it's actually a no-op, and the compiler should 
   get to "untangle" our stupid code and generate good end results.

 - .. *and* expect the hardware to be sane and do a good job even when the 
   compiler didn't generate perfect code or there were unlucky cache miss
   patterns etc.

and if we do all of that, we'll get good performance. But you really do 
want all three levels. It's not enough to be good at any one level (or 
even any two).

Linus
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2/2] 2.6.23-rc3: known regressions with patches v3

2007-08-24 Thread Michal Piotrowski
Hi all,

Here is a list of some known regressions in 2.6.23-rc3
with patches available.

Feel free to add new regressions/remove fixed etc.
http://kernelnewbies.org/known_regressions

List of Aces

NameRegressions fixed since 21-Jun-2007
Adrian Bunk9
Andi Kleen 5
Linus Torvalds 5
Andrew Morton  4
Al Viro3
Alan Stern 3
Cornelia Huck  3
Jens Axboe 3
Tejun Heo  3



MTD

Subject : error: implicit declaration of function 'cfi_interleave'
References  : http://lkml.org/lkml/2007/8/6/272
Last known good : ?
Submitter   : Ingo Molnar <[EMAIL PROTECTED]>
Caused-By   : ?
Handled-By  : David Woodhouse <[EMAIL PROTECTED]>
Patch   : http://lkml.org/lkml/2007/8/9/586
Status  : patch available



Networking

Subject : BUG: when using 'brctl stp'
References  : http://lkml.org/lkml/2007/8/10/441
Last known good : 2.6.23-rc1
Submitter   : Daniel K. <[EMAIL PROTECTED]>
Caused-By   : ?
Handled-By  : Stephen Hemminger <[EMAIL PROTECTED]>
Status  : fix applied by David Miller

Subject : sky2 boot crash in sky2_mac_intr
References  : http://lkml.org/lkml/2007/7/24/91
Last known good : ?
Submitter   : Florian Lohoff <[EMAIL PROTECTED]>
Caused-By   : 
Handled-By  : Stephen Hemminger <[EMAIL PROTECTED]>
Patch   : http://marc.info/?l=linux-netdev&m=118651402523966&w=2
Status  : patch available



Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3/4] 2.6.23-rc3: known regressions v3

2007-08-24 Thread Michal Piotrowski
Hi all,

Here is a list of some known regressions in 2.6.23-rc3.

Feel free to add new regressions/remove fixed etc.
http://kernelnewbies.org/known_regressions

List of Aces

NameRegressions fixed since 21-Jun-2007
Adrian Bunk9
Andi Kleen 5
Linus Torvalds 5
Andrew Morton  4
Al Viro3
Alan Stern 3
Cornelia Huck  3
Jens Axboe 3
Tejun Heo  3



Networking

Subject : NETDEV WATCHDOG: eth0: transmit timed out
References  : http://lkml.org/lkml/2007/8/13/737
Last known good : ?
Submitter   : Karl Meyer <[EMAIL PROTECTED]>
Caused-By   : ?
Handled-By  : Francois Romieu <[EMAIL PROTECTED]>
Status  : problem is being debugged

Subject : Weird network problems with 2.6.23-rc2
References  : http://lkml.org/lkml/2007/8/11/40
Last known good : ?
Submitter   : Shish <[EMAIL PROTECTED]>
Caused-By   : ?
Handled-By  : ?
Status  : unknown

Subject : New wake ups from sky2
References  : http://lkml.org/lkml/2007/7/20/386
Last known good : ?
Submitter   : Thomas Meyer <[EMAIL PROTECTED]>
Caused-By   : Stephen Hemminger <[EMAIL PROTECTED]>
  commit eb35cf60e462491249166182e3e755d3d5d91a28
Handled-By  : Stephen Hemminger <[EMAIL PROTECTED]>
Status  : unknown



Power management

Subject : 2.6.23-rc2 swsusp, suddenly increased uptime
References  : http://lkml.org/lkml/2007/8/12/249
Last known good : ?
Submitter   : Thomas Voegtle <[EMAIL PROTECTED]>
Caused-By   : ?
Handled-By  : Rafael J. Wysocki <[EMAIL PROTECTED]>
Status  : problem is being debugged

Subject : resume from ram much slower
References  : http://lkml.org/lkml/2007/8/10/275
Last known good : 2.6.23-rc1 ?
Submitter   : Arkadiusz Miskiewicz <[EMAIL PROTECTED]>
Caused-By   : ?
Handled-By  : Rafael J. Wysocki <[EMAIL PROTECTED]>
Status  : problem is being debugged



Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-24 Thread Linus Torvalds


On Fri, 24 Aug 2007, Denys Vlasenko wrote:
>
> > No, you don't use "x.counter++". But you *do* use
> >
> > if (atomic_read(&x) <= 1)
> >
> > and loading into a register is stupid and pointless, when you could just
> > do it as a regular memory-operand to the cmp instruction.
> 
> It doesn't mean that (volatile int*) cast is bad, it means that current gcc
> is bad (or "not good enough"). IOW: instead of avoiding volatile cast,
> it's better to fix the compiler.

I would agree that fixing the compiler in this case would be a good thing, 
even quite regardless of any "atomic_read()" discussion.

I just have a strong suspicion that "volatile" performance is so low down 
the list of any C compiler persons interest, that it's never going to 
happen. And quite frankly, I cannot blame the gcc guys for it.

That's especially as "volatile" really isn't a very good feature of the C 
language, and is likely to get *less* interesting rather than more (as 
user space starts to be more and more threaded, "volatile" gets less and 
less useful.

[ Ie, currently, I think you can validly use "volatile" in a "sigatomic_t" 
  kind of way, where there is a single thread, but with asynchronous 
  events. In that kind of situation, I think it's probably useful. But 
  once you get multiple threads, it gets pointless.

  Sure: you could use "volatile" together with something like Dekker's or 
  Peterson's algorithm that doesn't depend on cache coherency (that's 
  basically what the C "volatile" keyword approximates: not atomic 
  accesses, but *uncached* accesses! But let's face it, that's way past 
  insane. ]

So I wouldn't expect "volatile" to ever really generate better code. It 
might happen as a side effect of other improvements (eg, I might hope that 
the SSA work would eventually lead to gcc having a much better defined 
model of valid optimizations, and maybe better code generation for 
volatile accesses fall out cleanly out of that), but in the end, it's such 
an ugly special case in C, and so seldom used, that I wouldn't depend on 
it.

> Linus, in all honesty gcc has many more cases of suboptimal code,
> case of "volatile" is just one of many.

Well, the thing is, quite often, many of those "suboptimal code" 
generations fall into two distinct classes:

 - complex C code. I can't really blame the compiler too much for this. 
   Some things are *hard* to optimize, and for various scalability 
   reasons, you often end up having limits in the compiler where it 
   doesn't even _try_ doing certain optimizations if you have excessive 
   complexity.

 - bad register allocation. Register allocation really is hard, and 
   sometimes gcc just does the "obviously wrong" thing, and you end up 
   having totally unnecessary spills.

> Off the top of my head:

Yes, "unsigned long long" with x86 has always generated atrocious code. In 
fact, I would say that historically it was really *really* bad. These 
days, gcc actually does a pretty good job, but I'm not surprised that it's 
still quite possible to find cases where it did some optimization (in this 
case, apparently noticing that "shift by >= 32 bits" causes the low 
register to be pointless) and then missed *another* optimization (better 
register use) because that optimization had been done *before* the first 
optimization was done.

That's a *classic* example of compiler code generation issues, and quite 
frankly, I think that's very different from the issue of "volatile".

Quite frankly, I'd like there to be more competition in the open source 
compiler game, and that might cause some upheavals, but on the whole, gcc 
actually does a pretty damn good job. 

Linus
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread James Chapman

Stephen Hemminger wrote:

On Fri, 24 Aug 2007 17:47:15 +0200
Jan-Bernd Themann <[EMAIL PROTECTED]> wrote:


Hi,

On Friday 24 August 2007 17:37, [EMAIL PROTECTED] wrote:

On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:

...
3) On modern systems the incoming packets are processed very fast. Especially
   on SMP systems when we use multiple queues we process only a few packets
   per napi poll cycle. So NAPI does not work very well here and the interrupt 
   rate is still high. What we need would be some sort of timer polling mode 
   which will schedule a device after a certain amount of time for high load 
   situations. With high precision timers this could work well. Current

   usual timers are too slow. A finer granularity would be needed to keep the
   latency down (and queue length moderate).

We found the same on ia64-sn systems with tg3 a couple of years 
ago. Using simple interrupt coalescing ("don't interrupt until 
you've received N packets or M usecs have elapsed") worked 
reasonably well in practice. If your h/w supports that (and I'd 
guess it does, since it's such a simple thing), you might try 
it.



I don't see how this should work. Our latest machines are fast enough that they
simply empty the queue during the first poll iteration (in most cases).
Even if you wait until X packets have been received, it does not help for
the next poll cycle. The average number of packets we process per poll queue
is low. So a timer would be preferable that periodically polls the 
queue, without the need of generating a HW interrupt. This would allow us

to wait until a reasonable amount of packets have been received in the meantime
to keep the poll overhead low. This would also be useful in combination
with LRO.



You need hardware support for deferred interrupts. Most devices have it (e1000, 
sky2, tg3)
and it interacts well with NAPI. It is not a generic thing you want done by the 
stack,
you want the hardware to hold off interrupts until X packets or Y usecs have 
expired.


Does hardware interrupt mitigation really interact well with NAPI? In my 
experience, holding off interrupts for X packets or Y usecs does more 
harm than good; such hardware features are useful only when the OS has 
no NAPI-like mechanism.


When tuning NAPI drivers for packets/sec performance (which is a good 
indicator of driver performance), I make sure that the driver stays in 
NAPI polled mode while it has any rx or tx work to do. If the CPU is 
fast enough that all work is always completed on each poll, I have the 
driver stay in polled mode until dev->poll() is called N times with no 
work being done. This keeps interrupts disabled for reasonable traffic 
levels, while minimizing packet processing latency. No need for hardware 
interrupt mitigation.



The parameters for controlling it are already in ethtool, the issue is finding 
a good
default set of values for a wide range of applications and architectures. Maybe 
some
heuristic based on processor speed would be a good starting point. The dynamic 
irq
moderation stuff is not widely used because it is too hard to get right.


I agree. It would be nice to find a way for the typical user to derive 
best values for these knobs for his/her particular system. Perhaps a 
tool using pktgen and network device phy internal loopback could be 
developed?


--
James Chapman
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-24 Thread Christoph Lameter
On Fri, 24 Aug 2007, Denys Vlasenko wrote:

> On Thursday 16 August 2007 00:22, Paul Mackerras wrote:
> > Satyam Sharma writes:
> > In the kernel we use atomic variables in precisely those situations
> > where a variable is potentially accessed concurrently by multiple
> > CPUs, and where each CPU needs to see updates done by other CPUs in a
> > timely fashion.  That is what they are for.  Therefore the compiler
> > must not cache values of atomic variables in registers; each
> > atomic_read must result in a load and each atomic_set must result in a
> > store.  Anything else will just lead to subtle bugs.
> 
> Amen.

A "timely" fashion? One cannot rely on something like that when coding. 
The visibility of updates is insured by barriers and not by some fuzzy 
notion of "timeliness".
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Rick Jones

Just to be clear, in the previous email I posted on this thread, I
described a worst-case network ping-pong test case (send a packet, wait
for reply), and found out that a deffered interrupt scheme just damaged
the performance of the test case.  Since the folks who came up with the
test case were adamant, I turned off the defferred interrupts.  
While defferred interrupts are an "obvious" solution, I decided that 
they weren't a good solution. (And I have no other solution to offer).


Sounds exactly like the default netperf TCP_RR test and any number of other 
benchmarks.  The "send  a request, wait for reply, send next request, etc etc 
etc" is a rather common application behaviour afterall.


rick jones
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()

2007-08-24 Thread Christoph Lameter
On Fri, 24 Aug 2007, Satyam Sharma wrote:

> But if people do seem to have a mixed / confused notion of atomicity
> and barriers, and if there's consensus, then as I'd said earlier, I
> have no issues in going with the consensus (eg. having API variants).
> Linus would be more difficult to convince, however, I suspect :-)

The confusion may be the result of us having barrier semantics in 
atomic_read. If we take that out then we may avoid future confusions.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Linas Vepstas
On Fri, Aug 24, 2007 at 08:52:03AM -0700, Stephen Hemminger wrote:
> 
> You need hardware support for deferred interrupts. Most devices have it 
> (e1000, sky2, tg3)
> and it interacts well with NAPI. It is not a generic thing you want done by 
> the stack,
> you want the hardware to hold off interrupts until X packets or Y usecs have 
> expired.

Just to be clear, in the previous email I posted on this thread, I
described a worst-case network ping-pong test case (send a packet, wait
for reply), and found out that a deffered interrupt scheme just damaged
the performance of the test case.  Since the folks who came up with the
test case were adamant, I turned off the defferred interrupts.  
While defferred interrupts are an "obvious" solution, I decided that 
they weren't a good solution. (And I have no other solution to offer).

--linas

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread David Stevens
Stephen Hemminger <[EMAIL PROTECTED]> wrote on 08/24/2007 
08:52:03 AM:

> 
> You need hardware support for deferred interrupts. Most devices have it 
> (e1000, sky2, tg3)
> and it interacts well with NAPI. It is not a generic thing you want done 
by the stack,
> you want the hardware to hold off interrupts until X packets or Y usecs 
have expired.

For generic hardware that doesn't support it, couldn't you use an 
estimater
and adjust the timer dynamicly in software based on sampled values? Switch 
to per-packet
interrupts when the receive rate is low...
Actually, that's how I thought NAPI worked before I found out 
otherwise (ie,
before I looked :-)).

The hardware-accelerated one is essentially siloing as done by 
ancient serial
devices on UNIX systems. If you had a tunable for a target count, and an 
estimator
for the time interval, then switch to per-packet when the estimator 
exceeds a tunable
max threshold (and also, I suppose, if you near overflowing the ring on 
the min
timer granularity), you get almost all of it, right?
Problem is if it increases rapidly, you may drop packets before 
you notice
that the ring is full in the current estimated interval.

 +-DLS


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Devel] [PATCH 1/1] Dynamically allocate the loopback device

2007-08-24 Thread Daniel Lezcano

Denis V. Lunev wrote:

[EMAIL PROTECTED] wrote:

From: Daniel Lezcano <[EMAIL PROTECTED]>

Doing this makes loopback.c a better example of how to do a
simple network device, and it removes the special case
single static allocation of a struct net_device, hopefully
making maintenance easier.

Applies against net-2.6.24

Tested on i386, x86_64
Compiled on ia64, sparc


I think that a small note, that initialization order is changed will be
good to record. After this, loopback MUST be allocated before any other
networking subsystem initialization. And this is an important change.

Regards,
Den



Thanks Denis to point that.

-- Daniel

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Linas Vepstas
On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:
> 3) On modern systems the incoming packets are processed very fast. Especially
>    on SMP systems when we use multiple queues we process only a few packets
>    per napi poll cycle. So NAPI does not work very well here and the 
> interrupt 
>    rate is still high. 

I saw this too, on a system that is "modern" but not terribly fast, and
only slightly (2-way) smp. (the spidernet)

I experimented wih various solutions, none were terribly exciting.  The
thing that killed all of them was a crazy test case that someone sprung on
me:  They had written a worst-case network ping-pong app: send one
packet, wait for reply, send one packet, etc.  

If I waited (indefinitely) for a second packet to show up, the test case 
completely stalled (since no second packet would ever arrive).  And if I 
introduced a timer to wait for a second packet, then I just increased 
the latency in the response to the first packet, and this was noticed, 
and folks complained.  

In the end, I just let it be, and let the system work as a busy-beaver, 
with the high interrupt rate. Is this a wise thing to do?  I was
thinking that, if the system is under heavy load, then the interrupt
rate would fall, since (for less pathological network loads) more 
packets would queue up before the poll was serviced.  But I did not
actually measure the interrupt rate under heavy load ... 

--linas
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB

2007-08-24 Thread Rick Jones


A current hot topic of research is reducing the number of ACK's to make TCP
work better over asymmetric links like 3G.


Oy.  People running Solaris and HP-UX have been "researching" ACK reductions 
since 1997 if not earlier.


rick jones
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()

2007-08-24 Thread Luck, Tony
>>  static inline void wait_for_init_deassert(atomic_t *deassert)
>>  {
>> -while (!atomic_read(deassert));
>> +while (!atomic_read(deassert))
>> +cpu_relax();
>>  return;
>>  }
>
> For less-than-briliant people like me, it's totally non-obvious that
> cpu_relax() is needed for correctness here, not just to make P4 happy.

Not just P4 ... there are other threaded cpus where it is useful to
let the core know that this is a busy loop so it would be a good thing
to let other threads have priority.

Even on a non-threaded cpu the cpu_relax() could be useful in the
future to hint to the cpu that it could drop into a lower power
hogging state.

But I agree with your main point that the loop without the cpu_relax()
looks like it ought to work because atomic_read() ought to actually
go out and read memory each time around the loop.

-Tony
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with implementation of TCP_DEFER_ACCEPT?

2007-08-24 Thread John Heffner

TJ wrote:

Right now Juniper are claiming the issue that brought this to the
surface (the bug linked to in my original post) is a problem with the
implementation of TCP_DEFER_ACCEPT.

My position so far is that the Juniper DX OS is not following the HTTP
standard because it doesn't send a request with the connection, and as I
read the end of section 1.4 of RFC2616, an HTTP connection should be
accompanied by a request.

Can anyone confirm my interpretation or provide references to firm it
up, or refute it?


You can think of TCP_DEFER_ACCEPT as an implicit application close() 
after a certain timeout, when not receiving a request.  All HTTP servers 
do this anyway (though I think technically they're supposed to send a 
408 Request Timeout error it seems many do not).  It's a very valid 
question for Juniper as to why their box is failing to fill requests 
when its back-end connection has gone away, instead of re-establishing 
the connection and filling the request.


  -John
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/30] net: Kill some unneeded allocation return value casts in libertas

2007-08-24 Thread Dan Williams
On Fri, 2007-08-24 at 02:03 +0200, Jesper Juhl wrote:
> kmalloc() and friends return void*, no need to cast it.

Applied to libertas-2.6 'for-linville' branch, thanks.

Dan

> Signed-off-by: Jesper Juhl <[EMAIL PROTECTED]>
> ---
>  drivers/net/wireless/libertas/debugfs.c |2 +-
>  drivers/net/wireless/libertas/ethtool.c |3 +--
>  2 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/wireless/libertas/debugfs.c 
> b/drivers/net/wireless/libertas/debugfs.c
> index 715cbda..6ade63e 100644
> --- a/drivers/net/wireless/libertas/debugfs.c
> +++ b/drivers/net/wireless/libertas/debugfs.c
> @@ -1839,7 +1839,7 @@ static ssize_t wlan_debugfs_write(struct file *f, const 
> char __user *buf,
>   char *p2;
>   struct debug_data *d = (struct debug_data *)f->private_data;
>  
> - pdata = (char *)kmalloc(cnt, GFP_KERNEL);
> + pdata = kmalloc(cnt, GFP_KERNEL);
>   if (pdata == NULL)
>   return 0;
>  
> diff --git a/drivers/net/wireless/libertas/ethtool.c 
> b/drivers/net/wireless/libertas/ethtool.c
> index 96f1974..7dad493 100644
> --- a/drivers/net/wireless/libertas/ethtool.c
> +++ b/drivers/net/wireless/libertas/ethtool.c
> @@ -60,8 +60,7 @@ static int libertas_ethtool_get_eeprom(struct net_device 
> *dev,
>  
>  //  mutex_lock(&priv->mutex);
>  
> - adapter->prdeeprom =
> - (char *)kmalloc(eeprom->len+sizeof(regctrl), GFP_KERNEL);
> + adapter->prdeeprom = kmalloc(eeprom->len+sizeof(regctrl), GFP_KERNEL);
>   if (!adapter->prdeeprom)
>   return -ENOMEM;
>   memcpy(adapter->prdeeprom, ®ctrl, sizeof(regctrl));

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Devel] [PATCH 1/1] Dynamically allocate the loopback device

2007-08-24 Thread Denis V. Lunev
[EMAIL PROTECTED] wrote:
> From: Daniel Lezcano <[EMAIL PROTECTED]>
> 
> Doing this makes loopback.c a better example of how to do a
> simple network device, and it removes the special case
> single static allocation of a struct net_device, hopefully
> making maintenance easier.
> 
> Applies against net-2.6.24
> 
> Tested on i386, x86_64
> Compiled on ia64, sparc

I think that a small note, that initialization order is changed will be
good to record. After this, loopback MUST be allocated before any other
networking subsystem initialization. And this is an important change.

Regards,
Den
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Stephen Hemminger
On Fri, 24 Aug 2007 17:47:15 +0200
Jan-Bernd Themann <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> On Friday 24 August 2007 17:37, [EMAIL PROTECTED] wrote:
> > On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:
> > > ...
> > > 3) On modern systems the incoming packets are processed very fast. 
> > > Especially
> > >    on SMP systems when we use multiple queues we process only a few 
> > > packets
> > >    per napi poll cycle. So NAPI does not work very well here and the 
> > > interrupt 
> > >    rate is still high. What we need would be some sort of timer polling 
> > > mode 
> > >    which will schedule a device after a certain amount of time for high 
> > > load 
> > >    situations. With high precision timers this could work well. Current
> > >    usual timers are too slow. A finer granularity would be needed to keep 
> > > the
> > >latency down (and queue length moderate).
> > > 
> > 
> > We found the same on ia64-sn systems with tg3 a couple of years 
> > ago. Using simple interrupt coalescing ("don't interrupt until 
> > you've received N packets or M usecs have elapsed") worked 
> > reasonably well in practice. If your h/w supports that (and I'd 
> > guess it does, since it's such a simple thing), you might try 
> > it.
> > 
> 
> I don't see how this should work. Our latest machines are fast enough that 
> they
> simply empty the queue during the first poll iteration (in most cases).
> Even if you wait until X packets have been received, it does not help for
> the next poll cycle. The average number of packets we process per poll queue
> is low. So a timer would be preferable that periodically polls the 
> queue, without the need of generating a HW interrupt. This would allow us
> to wait until a reasonable amount of packets have been received in the 
> meantime
> to keep the poll overhead low. This would also be useful in combination
> with LRO.
> 

You need hardware support for deferred interrupts. Most devices have it (e1000, 
sky2, tg3)
and it interacts well with NAPI. It is not a generic thing you want done by the 
stack,
you want the hardware to hold off interrupts until X packets or Y usecs have 
expired.

The parameters for controlling it are already in ethtool, the issue is finding 
a good
default set of values for a wide range of applications and architectures. Maybe 
some
heuristic based on processor speed would be a good starting point. The dynamic 
irq
moderation stuff is not widely used because it is too hard to get right.

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Jan-Bernd Themann
Hi,

On Friday 24 August 2007 17:37, [EMAIL PROTECTED] wrote:
> On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:
> > ...
> > 3) On modern systems the incoming packets are processed very fast. 
> > Especially
> >    on SMP systems when we use multiple queues we process only a few packets
> >    per napi poll cycle. So NAPI does not work very well here and the 
> > interrupt 
> >    rate is still high. What we need would be some sort of timer polling 
> > mode 
> >    which will schedule a device after a certain amount of time for high 
> > load 
> >    situations. With high precision timers this could work well. Current
> >    usual timers are too slow. A finer granularity would be needed to keep 
> > the
> >latency down (and queue length moderate).
> > 
> 
> We found the same on ia64-sn systems with tg3 a couple of years 
> ago. Using simple interrupt coalescing ("don't interrupt until 
> you've received N packets or M usecs have elapsed") worked 
> reasonably well in practice. If your h/w supports that (and I'd 
> guess it does, since it's such a simple thing), you might try 
> it.
> 

I don't see how this should work. Our latest machines are fast enough that they
simply empty the queue during the first poll iteration (in most cases).
Even if you wait until X packets have been received, it does not help for
the next poll cycle. The average number of packets we process per poll queue
is low. So a timer would be preferable that periodically polls the 
queue, without the need of generating a HW interrupt. This would allow us
to wait until a reasonable amount of packets have been received in the meantime
to keep the poll overhead low. This would also be useful in combination
with LRO.

Regards,
Jan-Bernd
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] Dynamically allocate the loopback device

2007-08-24 Thread dlezcano
From: Daniel Lezcano <[EMAIL PROTECTED]>

Doing this makes loopback.c a better example of how to do a
simple network device, and it removes the special case
single static allocation of a struct net_device, hopefully
making maintenance easier.

Applies against net-2.6.24

Tested on i386, x86_64
Compiled on ia64, sparc

Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]>
Signed-off-by: Daniel Lezcano <[EMAIL PROTECTED]>
Acked-By: Kirill Korotaev <[EMAIL PROTECTED]>
Acked-by: Benjamin Thery <[EMAIL PROTECTED]>
---
 drivers/net/loopback.c   |   63 +++---
 include/linux/netdevice.h|2 +-
 net/core/dst.c   |8 ++--
 net/decnet/dn_dev.c  |4 +-
 net/decnet/dn_route.c|   14 
 net/ipv4/devinet.c   |6 ++--
 net/ipv4/ipconfig.c  |6 ++--
 net/ipv4/ipvs/ip_vs_core.c   |2 +-
 net/ipv4/route.c |   18 +-
 net/ipv4/xfrm4_policy.c  |2 +-
 net/ipv6/addrconf.c  |   15 +---
 net/ipv6/ip6_input.c |2 +-
 net/ipv6/netfilter/ip6t_REJECT.c |2 +-
 net/ipv6/route.c |   15 +++-
 net/ipv6/xfrm6_policy.c  |2 +-
 net/xfrm/xfrm_policy.c   |4 +-
 16 files changed, 89 insertions(+), 76 deletions(-)

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index 5106c23..3642aff 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -199,44 +199,57 @@ static const struct ethtool_ops loopback_ethtool_ops = {
.get_rx_csum= always_on,
 };
 
-/*
- * The loopback device is special. There is only one instance and
- * it is statically allocated. Don't do this for other devices.
- */
-struct net_device loopback_dev = {
-   .name   = "lo",
-   .get_stats  = &get_stats,
-   .mtu= (16 * 1024) + 20 + 20 + 12,
-   .hard_start_xmit= loopback_xmit,
-   .hard_header= eth_header,
-   .hard_header_cache  = eth_header_cache,
-   .header_cache_update= eth_header_cache_update,
-   .hard_header_len= ETH_HLEN, /* 14   */
-   .addr_len   = ETH_ALEN, /* 6*/
-   .tx_queue_len   = 0,
-   .type   = ARPHRD_LOOPBACK,  /* 0x0001*/
-   .rebuild_header = eth_rebuild_header,
-   .flags  = IFF_LOOPBACK,
-   .features   = NETIF_F_SG | NETIF_F_FRAGLIST
+static void loopback_setup(struct net_device *dev)
+{
+   dev->get_stats  = &get_stats;
+   dev->mtu= (16 * 1024) + 20 + 20 + 12;
+   dev->hard_start_xmit= loopback_xmit;
+   dev->hard_header= eth_header;
+   dev->hard_header_cache  = eth_header_cache;
+   dev->header_cache_update = eth_header_cache_update;
+   dev->hard_header_len= ETH_HLEN; /* 14   */
+   dev->addr_len   = ETH_ALEN; /* 6*/
+   dev->tx_queue_len   = 0;
+   dev->type   = ARPHRD_LOOPBACK;  /* 0x0001*/
+   dev->rebuild_header = eth_rebuild_header;
+   dev->flags  = IFF_LOOPBACK;
+   dev->features   = NETIF_F_SG | NETIF_F_FRAGLIST
 #ifdef LOOPBACK_TSO
  | NETIF_F_TSO
 #endif
  | NETIF_F_NO_CSUM | NETIF_F_HIGHDMA
- | NETIF_F_LLTX,
-   .ethtool_ops= &loopback_ethtool_ops,
-};
+ | NETIF_F_LLTX;
+   dev->ethtool_ops= &loopback_ethtool_ops;
+}
 
 /* Setup and register the loopback device. */
 static int __init loopback_init(void)
 {
-   int err = register_netdev(&loopback_dev);
+   struct net_device *dev;
+   int err;
+   
+   err = -ENOMEM;
+   dev = alloc_netdev(0, "lo", loopback_setup);
+   if (!dev)
+   goto out;
+
+   err = register_netdev(dev);
+   if (err)
+   goto out_free_netdev;
 
+   err = 0;
+   loopback_dev = dev;
+
+out:
if (err)
panic("loopback: Failed to register netdevice: %d\n", err);
-
return err;
+out_free_netdev:
+   free_netdev(dev);
+   goto out;
 };
 
-module_init(loopback_init);
+fs_initcall(loopback_init);
 
+struct net_device *loopback_dev;
 EXPORT_SYMBOL(loopback_dev);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 8d12f02..7cd0641 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -680,7 +680,7 @@ struct packet_type {
 #include 
 #include 
 
-extern struct net_device   loopback_dev;   /* The loopback 
*/
+extern struct net_device   *loopback_dev;  /* The loopback 
*/
 extern struct list_headdev_base_head;  /* All 
devices */
 extern rwlock_tdev_base_lock;

Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread akepner
On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:
> ...
> 3) On modern systems the incoming packets are processed very fast. Especially
>    on SMP systems when we use multiple queues we process only a few packets
>    per napi poll cycle. So NAPI does not work very well here and the 
> interrupt 
>    rate is still high. What we need would be some sort of timer polling mode 
>    which will schedule a device after a certain amount of time for high load 
>    situations. With high precision timers this could work well. Current
>    usual timers are too slow. A finer granularity would be needed to keep the
>latency down (and queue length moderate).
> 

We found the same on ia64-sn systems with tg3 a couple of years 
ago. Using simple interrupt coalescing ("don't interrupt until 
you've received N packets or M usecs have elapsed") worked 
reasonably well in practice. If your h/w supports that (and I'd 
guess it does, since it's such a simple thing), you might try 
it.

-- 
Arthur

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()

2007-08-24 Thread Denys Vlasenko
On Friday 24 August 2007 13:12, Kenn Humborg wrote:
> > On Thursday 16 August 2007 01:39, Satyam Sharma wrote:
> > >  static inline void wait_for_init_deassert(atomic_t *deassert)
> > >  {
> > > - while (!atomic_read(deassert));
> > > + while (!atomic_read(deassert))
> > > + cpu_relax();
> > >   return;
> > >  }
> >
> > For less-than-briliant people like me, it's totally non-obvious that
> > cpu_relax() is needed for correctness here, not just to make P4 happy.
> >
> > IOW: "atomic_read" name quite unambiguously means "I will read
> > this variable from main memory". Which is not true and creates
> > potential for confusion and bugs.
>
> To me, "atomic_read" means a read which is synchronized with other
> changes to the variable (using the atomic_XXX functions) in such
> a way that I will always only see the "before" or "after"
> state of the variable - never an intermediate state while a
> modification is happening.  It doesn't imply that I have to
> see the "after" state immediately after another thread modifies
> it.

So you are ok with compiler propagating n1 to n2 here:

n1 += atomic_read(x);
other_variable++;
n2 += atomic_read(x);

without accessing x second time. What's the point? Any sane coder
will say that explicitly anyway:

tmp = atomic_read(x);
n1 += tmp;
other_variable++;
n2 += tmp;

if only for the sake of code readability. Because first code
is definitely hinting that it reads RAM twice, and it's actively *bad*
for code readability when in fact it's not the case!

Locking, compiler and CPU barriers are complicated enough already,
please don't make them even harder to understand.
--
vda
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RFC: issues concerning the next NAPI interface

2007-08-24 Thread Jan-Bernd Themann
Hi,

when I tried to get the eHEA driver working with the new interface,
the following issues came up.

1) The current implementation of netif_rx_schedule, netif_rx_complete
   and the net_rx_action have the following problem: netif_rx_schedule
   sets the NAPI_STATE_SCHED flag and adds the NAPI instance to the poll_list.
   netif_rx_action checks NAPI_STATE_SCHED, if set it will add the device
   to the poll_list again (as well). netif_rx_complete clears the 
NAPI_STATE_SCHED.
   If an interrupt handler calls netif_rx_schedule on CPU 2
   after netif_rx_complete has been called on CPU 1 (and the poll function 
   has not returned yet), the NAPI instance will be added twice to the 
   poll_list (by netif_rx_schedule and net_rx_action). Problems occur when 
   netif_rx_complete is called twice for the device (BUG() called)

2) If an ethernet chip supports multiple receive queues, the queues are 
   currently all processed on the CPU where the interrupt comes in. This
   is because netif_rx_schedule will always add the rx queue to the CPU's
   napi poll_list. The result under heavy presure is that all queues will
   gather on the weakest CPU (with highest CPU load) after some time as they
   will stay there as long as the entire queue is emptied. On SMP systems 
   this behaviour is not desired. It should also work well without interrupt
   pinning.
   It would be nice if it is possible to schedule queues to other CPU's, or
   at least to use interrupts to put the queue to another cpu (not nice for 
   as you never know which one you will hit). 
   I'm not sure how bad the tradeoff would be.

3) On modern systems the incoming packets are processed very fast. Especially
   on SMP systems when we use multiple queues we process only a few packets
   per napi poll cycle. So NAPI does not work very well here and the interrupt 
   rate is still high. What we need would be some sort of timer polling mode 
   which will schedule a device after a certain amount of time for high load 
   situations. With high precision timers this could work well. Current
   usual timers are too slow. A finer granularity would be needed to keep the
   latency down (and queue length moderate).

What do you think?

Thanks,
Jan-Bernd
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()

2007-08-24 Thread Satyam Sharma
Hi Denys,


On Fri, 24 Aug 2007, Denys Vlasenko wrote:

> On Thursday 16 August 2007 01:39, Satyam Sharma wrote:
> >
> >  static inline void wait_for_init_deassert(atomic_t *deassert)
> >  {
> > -   while (!atomic_read(deassert));
> > +   while (!atomic_read(deassert))
> > +   cpu_relax();
> > return;
> >  }
> 
> For less-than-briliant people like me, it's totally non-obvious that
> cpu_relax() is needed for correctness here, not just to make P4 happy.

This thread has been round-and-round with exactly the same discussions
:-) I had proposed few such variants to make a compiler barrier implicit
in atomic_{read,set} myself, but frankly, at least personally speaking
(now that I know better), I'm not so much in favour of implicit barriers
(compiler, memory or both) in atomic_{read,set}.

This might sound like an about-turn if you read my own postings to Nick
Piggin from a week back, but I do agree with most his opinions on the
matter now -- separation of barriers from atomic ops is actually good,
beneficial to certain code that knows what it's doing, explicit usage
of barriers stands out more clearly (most people here who deal with it
do know cpu_relax() is an explicit compiler barrier) compared to an
implicit usage in an atomic_read() or such variant ...


> IOW: "atomic_read" name quite unambiguously means "I will read
> this variable from main memory". Which is not true and creates
> potential for confusion and bugs.

I'd have to disagree here -- atomic ops are all about _atomicity_ of
memory accesses, not _making_ them happen (or visible to other CPUs)
_then and there_ itself. The latter are the job of barriers.

The behaviour (and expectations) are quite comprehensively covered in
atomic_ops.txt -- let alone atomic_{read,set}, even atomic_{inc,dec}
are permitted by archs' implementations to _not_ have any memory
barriers, for that matter. [It is unrelated that on x86 making them
SMP-safe requires the use of the LOCK prefix that also happens to be
an implicit memory barrier.]

An argument was also made about consistency of atomic_{read,set} w.r.t.
the other atomic ops -- but clearly, they are all already consistent!
All of them are atomic :-) The fact that atomic_{read,set} do _not_
require any inline asm or LOCK prefix whereas the others do, has to do
with the fact that unlike all others, atomic_{read,set} are not RMW ops
and hence guaranteed to be atomic just as they are in plain & simple C.

But if people do seem to have a mixed / confused notion of atomicity
and barriers, and if there's consensus, then as I'd said earlier, I
have no issues in going with the consensus (eg. having API variants).
Linus would be more difficult to convince, however, I suspect :-)


Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-24 Thread Denys Vlasenko
On Thursday 16 August 2007 00:22, Paul Mackerras wrote:
> Satyam Sharma writes:
> In the kernel we use atomic variables in precisely those situations
> where a variable is potentially accessed concurrently by multiple
> CPUs, and where each CPU needs to see updates done by other CPUs in a
> timely fashion.  That is what they are for.  Therefore the compiler
> must not cache values of atomic variables in registers; each
> atomic_read must result in a load and each atomic_set must result in a
> store.  Anything else will just lead to subtle bugs.

Amen.
--
vda
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >