Re: NETDEV-BCM43XX BUG - Failure to associate AP with latest devscape git pull..
Robert Martin [EMAIL PROTECTED] writes: I seem to be having a problem associating with my AP--everything appears fine and I can bring my wireless adapter up (the LED lights up correctly), and I don't see complaints about firmware/IRQs in the dmesg output. I am able to see operating APs with an iwlist wlan0 scan, but I am unable to connect to the AP, with or without WEP encryption enabled (tried none, hex and ascii; nothing worked). This is with the latest wireless-dev pull and a vanilla 2.6.19 kernel with irqpoll option enabled, otherwise it misbehaves. I would be happy to give more information if requested, or try different kernel/firmware options. I started playing with wireless-dev's bcm43xx-d80211 recently and noticed that I had to set the frequency manually. When I did that, setting the ap manually caused the card to associate immediately. Here's the sequence of commands I run. (The driver seems stable on my hardware, so I've only had to do this twice in the last week, and then only for reasons unrelated to the driver.) ip link set up wlan0 iwlist wlan0 scan iwconfig wlan0 essid $essid key $wep_key iwconfig wlan0 (note frequencies differ) iwconfig wlan0 freq $freq_from_iwlist_scan iwconfig wlan0 ap $ap_mac_address dhclient wlan0 -- Paul Collins Wellington, New Zealand Dag vijandelijk luchtschip de huismeester is dood - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NETDEV-BCM43XX BUG - Failure to associate AP with latest devscape git pull..
On Sun, 2006-12-03 at 16:21 -0500, Robert Martin wrote: I would be happy to give more information if requested, or try different kernel/firmware options. Has it ever worked? I'm curious because the PCI-E chips are 4318s which are known to not work properly due to missing/wrong TX power calibration. johannes signature.asc Description: This is a digitally signed message part
Re: Kernel header changes break glibc build
* David Woodhouse [EMAIL PROTECTED] 2006-12-03 12:25 Thomas, this is in response to your changes in http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=1823730fbc89fadde72a7bb3b7bdf03cc7b8835c;hp=47f68512d2685431f1781830dfcbab31bda87644 in which you create linux/if_addr.h and require that it's included directly rather than being part of (or even included from) linux/rtnetlink.h. Was there a good reason for changing that user-visible header? Is there a reason not to include if_addr.h from rtnetlink.h as Joseph's patch does? Userspace is not supposd to directly include kernel headers, instead it has to make local copies and compile against them. Binary compatibility is always guaranteed but in times of development within a stable tree it's wrong to assume that headers never change. I do not agree with the change to include if_addr.h in rtnetlink.h. The point is to move bits apart and have multiple small pieces of header files defining a specific rtnetlink family which are a lot easier to maintain for both kernel and userspace than one giant rtnetlink.h for everything. I suspect that if the IF{L,}A_{PAYLOAD,RTA} macros aren't used in the kernel then the best answer is for glibc to define those for itself. Right, if they did it right they would only have noticed when they updated the kernel headers to some newer versions and only had to move the bits to some compat header. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: watchdog timeout panic in e1000 driver
Hi, Doesn't this just mean that we need a spinlock or some other kind of semaphore around acquiring, using, and releasing this resource? We keep going around and around about this but I'm pretty sure spinlocks are meant to be able to solve exactly this issue. The problem is going to get considerably more nasty if we need to hold a spinlock with interrupts disabled for a significant amount of time, at which point a semaphore of some kind with a spinlock around it would seem to be more useful. Even if spin_lock() was used to protect this resource, it is still possible for an interrupt to kick in and call e1000_watchdog. In this case, e1000_get_software_semaphore() will be called from within the interrupt handler and the problem will still occur. In order to solve this problem, interrupt should be disabled (for example, spin_lock_irqsave). The interrupt handler can't run while the process is holding this resource, and this problem doesn't occur. I'll work with Auke to see if we can come up with another try. Do you have any updates about your test code? Does the fix I previously proposed have problems? If it does, I'd like to help find investigate another fix to solve this problem. -- Kenzo Iwami ([EMAIL PROTECTED]) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][GENETLINK] move command capabilities to flags
* jamal [EMAIL PROTECTED] 2006-12-02 06:56 Dave, If there is no objections on this approach, please apply this patch. Against net-2.6.20 cheers, jamal This patch moves command capabilities to command flags. Other than being cleaner, saves several bytes. Is it worth to save these 12 bytes by breaking compatibility? If you really want to do it, remove the obsoleted attribute types, I don't like dead bodies laying around :-) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][GENETLINK] introduce command names
* jamal [EMAIL PROTECTED] 2006-12-02 07:11 [GENETLINK] introduce command names Introduce optional command names. While command names can be put in user space by the author of the command, this alleviates things for the discovery process without requiring any user space code written. In a recent tutorial that i gave, the desire for this feature was the highest. I assume you're planning to export this to userspace at some point? What's the real advantage besides that when listing avaiable operations we can output names instead of numbers? Userspace should be aware of operation numbers when using it. I'm all for this if the direction is to move towards having some form of scriptable genetlink tool which can be used to communicate with simple genetlink families. I guess the desire was the highest because you sold it as such :-) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[2.4 PATCH] netfilter broken and unused macro removal
Hello, This patch removes broken and unused macro from netfilter code. Signed-off-by: Mariusz Kozlowski [EMAIL PROTECTED] net/ipv4/netfilter/ip_nat_standalone.c | 6 -- 1 file changed, 6 deletions(-) --- linux-2.4.34-pre6-a/net/ipv4/netfilter/ip_nat_standalone.c 2005-11-16 20:12:54.0 +0100 +++ linux-2.4.34-pre6-b/net/ipv4/netfilter/ip_nat_standalone.c 2006-12-01 12:10:42.0 +0100 @@ -43,12 +43,6 @@ #define DEBUGP(format, args...) #endif -#define HOOKNAME(hooknum) ((hooknum) == NF_IP_POST_ROUTING ? POST_ROUTING \ - : ((hooknum) == NF_IP_PRE_ROUTING ? PRE_ROUTING \ - : ((hooknum) == NF_IP_LOCAL_OUT ? LOCAL_OUT \ -: ((hooknum) == NF_IP_LOCAL_IN ? LOCAL_IN \ - : *ERROR*))) - static inline int call_expect(struct ip_conntrack *master, struct sk_buff **pskb, unsigned int hooknum, -- Regards, Mariusz Kozlowski - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network virtualization/isolation
Hi Jamal, thanks for taking the time read the document. The objective of the document was not to convince one approach is better than other. I wanted to show the pros and the cons of each approach and to point that the 2 approaches are complementary. Currently, there are some resources moved to a namespace relative access, the IPC and the utsname and this is into the 2.6.19 kernel. The work on the pid namespace is still in progress. The idea is to use a clone approach relying on the unshare_ns syscall. The syscall is called with a set of flags for pids, ipcs, utsname, network ... You can then unshare only the network and have an application into its own network environment. For a l3 approach, like a l2, you can run an apache server into a unshared network environment. Better, you can run several apaches server into several network namespaces without modifying the server's network configuration. Some of us, consider l2 as perfectly adapted for some kind of containers like system containers and some kind of application containers running big servers, but find the l2 too much (seems to be a hammer to crush a beetle) for simple network requirements like for network migration, jails or containers which does not take care of such virtualization. For example, you want to create thousands of containers for a cluster of HPC jobs and just to have migration for these jobs. Does it make sense to have l2 approach ? Dmitry Mishin and I, we thought about a l2/l3 solution and we thing we found a solution to have the 2 at runtime. Roughly, it is a l3 based on bind filtering and socket isolation, very similar to what vserver provides. I did a prototype, and it works well for IPV4/unicast. So, considering, we have a l2 isolation/virtualization, and having a l3 relying on the l2 network isolation resources subset. Is it an acceptable solution ? -- Daniel - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/1] small myri10ge fix for 2.6.20
Hi Jeff, The following patch is a small fix for myri10ge, please apply it for 2.6.20. I don't send any other fix/update since they highly depend on the physical page skb conversion that I sent a couple times in the past without getting any answer. Thanks, Brice - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1] myri10ge: write as 2 32-byte blocks in myri10ge_submit_8rx
In the myri10ge_submit_8rx() routine, write the 64 byte request block as 2 32-byte blocks so that it is handled by the hardware pio write handler if write-combining is enabled. Signed-off-by: Brice Goglin [EMAIL PROTECTED] --- drivers/net/myri10ge/myri10ge.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) Index: linux-2.6.19/drivers/net/myri10ge/myri10ge.c === --- linux-2.6.19.orig/drivers/net/myri10ge/myri10ge.c 2006-11-29 22:57:37.0 +0100 +++ linux-2.6.19/drivers/net/myri10ge/myri10ge.c2006-12-04 11:22:48.0 +0100 @@ -790,7 +790,9 @@ low = src-addr_low; src-addr_low = DMA_32BIT_MASK; - myri10ge_pio_copy(dst, src, 8 * sizeof(*src)); + myri10ge_pio_copy(dst, src, 4 * sizeof(*src)); + mb(); + myri10ge_pio_copy(dst + 4, src + 4, 4 * sizeof(*src)); mb(); src-addr_low = low; __raw_writel(low, dst-addr_low); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.18] declance: Support the I/O ASIC LANCE w/o TURBOchannel
On Fri, 1 Dec 2006, Andrew Morton wrote: can you (or Andrew) please resend your patches against 2.6.19? I have then all (I think) queued up. Will send once I've done a round of build-testing. Thanks a lot, Andrew. Maciej - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network virtualization/isolation
jamal [EMAIL PROTECTED] writes: I have removed the Re: just to add some freshness to the discussion So i read quickly the rest of the discussions. I was almost suprised to find that i agree with Eric on a lot of opinions (we also agree that vindaloo is good for you i guess);- The two issues that stood out for me (in addition to what i already said below): 1) the solution must ease the migration of containers; i didnt see anything about migrating them to another host across a network, but i assume that this is a given. It is mostly a given. It is a goal for some of us and not for others. Containers are a necessary first step to getting migration and checkpoint/restart assistance from the kernel. 2) the socket level bind/accept filtering with multiple IPs. From reading what Herbert has, it seems they have figured a clever way to optimize this path albeit some challenges (speacial casing for raw filters) etc. I am wondering if one was to use the two level muxing of the socket layer, how much more performance improvement the above scheme provides for #2? I don't follow this question. Consider the case of L2 where by the time the packet hits the socket layer on incoming, the VE is already known; in such a case, the lookup would be very cheap. The advantage being you get rid of the speacial casing altogether. I dont see any issues with binds per multiple IPs etc using such a technique. For the case of #1 above, wouldnt it be also easier if the tables for netdevices, PIDs etc were per VE (using the 2 level mux)? Generally yes. s/VE/namespace/. There is a case with hash tables where it seems saner to add an additional entry because hash it is hard to dynamically allocate a hash table, (because they need something large then a single page allocation). But for everything else yes it makes things much easier if you have a per namespace data structure. A practical question is can we replace hash tables with some variant of trie or radix-tree and not take a performance hit. Given the better scaling of tress to different workload sizes if we can use them so much the better. Especially because a per namespace split gives us a lot of good properties. In any case, folks, i hope i am not treading on anyones toes; i know each one of you has implemented and has users and i am trying to be as neutral as i can (but clearly biased;-). Well we rather expect to bash heads until we can come up with something we all can agree on with the people who more regularly have to maintain the code. The discussions so far have largely been warm ups, to actually doing something. Getting feedback from people who regularly work with the networking stack is appreciated. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][XFRM] Optimize policy dumping
jamal wrote: diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index 64d3938..c8a98ca 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -860,57 +860,70 @@ EXPORT_SYMBOL(xfrm_policy_flush); int xfrm_policy_walk(u8 type, int (*func)(struct xfrm_policy *, int, int, void*), void *data) { - struct xfrm_policy *pol; struct hlist_node *entry; - int dir, count, error; + int dir = 0, last_dir = 0, count = 0, error = -ENOENT; + struct xfrm_policy *pol = NULL, *send_pol = NULL, *last_pol = NULL; read_lock_bh(xfrm_policy_lock); - count = 0; + for (dir = 0; dir 2*XFRM_POLICY_MAX; dir++) { struct hlist_head *table = xfrm_policy_bydst[dir].table; int i; hlist_for_each_entry(pol, entry, xfrm_policy_inexact[dir], bydst) { - if (pol-type == type) - count++; - } - for (i = xfrm_policy_bydst[dir].hmask; i = 0; i--) { - hlist_for_each_entry(pol, entry, table + i, bydst) { - if (pol-type == type) - count++; + if (count send_pol send_pol != last_pol) { + error = func(send_pol, dir % XFRM_POLICY_MAX, count, data); + if (error) + goto out; + send_pol = NULL; } - } - } - - if (count == 0) { - error = -ENOENT; - goto out; - } - - for (dir = 0; dir 2*XFRM_POLICY_MAX; dir++) { - struct hlist_head *table = xfrm_policy_bydst[dir].table; - int i; - hlist_for_each_entry(pol, entry, - xfrm_policy_inexact[dir], bydst) { if (pol-type != type) continue; - error = func(pol, dir % XFRM_POLICY_MAX, --count, data); - if (error) - goto out; + + if (!count) { + last_pol = send_pol = pol; + } else { + send_pol = last_pol; + last_pol = pol; + } + + last_dir = dir; + count++; } + for (i = xfrm_policy_bydst[dir].hmask; i = 0; i--) { hlist_for_each_entry(pol, entry, table + i, bydst) { + if (count send_pol send_pol != last_pol) { + error = func(send_pol, dir % XFRM_POLICY_MAX, count, data); + send_pol = NULL; + if (error) + goto out; + } if (pol-type != type) continue; - error = func(pol, dir % XFRM_POLICY_MAX, --count, data); - if (error) - goto out; + if (!count) { + last_pol = send_pol = pol; + } else { + send_pol = last_pol; + last_pol = pol; + } + last_dir = dir; + count++; } } } - error = 0; + + if (send_pol send_pol != last_pol) { + error = func(send_pol, last_dir % XFRM_POLICY_MAX, count, data); + } + + if (count) { + BUG_TRAP(last_pol == NULL); + error = func(send_pol, last_dir % XFRM_POLICY_MAX, 0, data); + } + out: read_unlock_bh(xfrm_policy_lock); return error; A few cases that will behave incorrectly: - two policies in xfrm_policy_inexact with the same direction: after the first iteration we have last_pol = send_pol = first policy and no messages sent, after the second iteration we have send_pol = first policy, last_pol = second policy and still no messages sent. Since send_pol send_pol != last_pol, the second to last block will send send_pol with last_dir, since count 0 the last block will send send_pol again. So we get two times the first policy and zero times the second one. - same case as above, but policies in opposite directions. The first policy will again be sent twice, but with last_dir, which is the direction of the second policy. - three policies in xfrm_policy_inexact, two with similar direction, one with opposite direction.
Re: d80211-drivers updated (zd1211rw-d80211 synced with zd1211rw)
On Mon, Dec 04, 2006 at 02:50:39AM -0500, Michael Wu wrote: Other (d80211) wireless drivers are welcome to send patches this way if they do not have their own git tree for Linville to pull. Please don't do this. It adds to my pain for reviewing patches. I don't mind (and maybe even like) pulling w/ git from primary driver authors. But, I would prefer not to add extra layers of git between me and the patch authors. Similarly, I would prefer for Ulrich or Daniel to maintain the zd1211rw git tree unless you (i.e. Michael) are going to be one of the primary authors going forward. While I'm complaining :-), I would probably prefer it if you had adm8211 and p54 in separate git trees (or at least on separate branches) as well. That way, if there is a problem in a p54 patch series, I can still pull adm8211 (or vice versa). It is not my intent to scold (so please don't feel scolded). It just is counter-productive to prematurely consolidate merging duties. Thanks, John -- John W. Linville [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2.6.19] AT91RM9200 Ethernet update 2
This patch adds NetPoll / NetConsole support to the Atmel AT91RM9200 Ethernet driver. Original patch from Bill Gatliff. Signed-off-by: Andrew Victor [EMAIL PROTECTED] diff -urN linux-2.6.19-final.orig/drivers/net/arm/at91_ether.c linux-2.6.19-final/drivers/net/arm/at91_ether.c --- linux-2.6.19-final.orig/drivers/net/arm/at91_ether.cMon Dec 4 14:27:21 2006 +++ linux-2.6.19-final/drivers/net/arm/at91_ether.c Mon Dec 4 14:33:35 2006 @@ -925,6 +925,17 @@ return IRQ_HANDLED; } +#ifdef CONFIG_NET_POLL_CONTROLLER +static void at91ether_poll_controller(struct net_device *dev) +{ + unsigned long flags; + + local_irq_save(flags); + at91ether_interrupt(dev-irq, dev, NULL); + local_irq_restore(flags); +} +#endif + /* * Initialize the ethernet interface */ @@ -974,6 +985,9 @@ dev-set_mac_address = set_mac_address; dev-ethtool_ops = at91ether_ethtool_ops; dev-do_ioctl = at91ether_ioctl; +#ifdef CONFIG_NET_POLL_CONTROLLER + dev-poll_controller = at91ether_poll_controller; +#endif SET_NETDEV_DEV(dev, pdev-dev); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][GENETLINK] introduce command names
On Mon, 2006-04-12 at 10:28 +0100, Thomas Graf wrote: * jamal [EMAIL PROTECTED] 2006-12-02 07:11 [GENETLINK] introduce command names Introduce optional command names. While command names can be put in user space by the author of the command, this alleviates things for the discovery process without requiring any user space code written. In a recent tutorial that i gave, the desire for this feature was the highest. I assume you're planning to export this to userspace at some point? Right. What's the real advantage besides that when listing avaiable operations we can output names instead of numbers? Just makes the discovery more knowledgeable. Theres a hidden meaning in that i would like if possible to create as much of user space as possible without the user having a single line written. Heres how i output the discovered families at the moment without the patch. - [EMAIL PROTECTED]:~/git-trees/iproute2/nov22/genl$ ./genl ctrl ls Added Family Name: nlctrl ID: 0x10 Version: 0x1 header size: 0 max attribs: 6 commands supported: #1: ID-0x3 flags-0x0 Capabilities: has policy; can doit; can dumpit Added Family Name: TASKSTATS ID: 0x11 Version: 0x1 header size: 0 max attribs: 4 commands supported: #1: ID-0x1 flags-0x0 Capabilities: has policy; can doit; [EMAIL PROTECTED]:~/git-trees/iproute2/nov22/genl$ --- It would be a lot more human friendly to put better readability in the commands. Userspace should be aware of operation numbers when using it. I'm all for this if the direction is to move towards having some form of scriptable genetlink tool which can be used to communicate with simple genetlink families. That is the real agenda actually. To be honest i dont know how realistic it would be. But one of the next things is to output the command policies. I guess the desire was the highest because you sold it as such :-) Theres some truth to that ;- But i didnt start it;-, after two people asking why they couldnt tell the command name, it connected to me i also need it for this other reason. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][GENETLINK] move command capabilities to flags
On Mon, 2006-04-12 at 10:20 +0100, Thomas Graf wrote: * jamal [EMAIL PROTECTED] 2006-12-02 06:56 Dave, If there is no objections on this approach, please apply this patch. Against net-2.6.20 cheers, jamal This patch moves command capabilities to command flags. Other than being cleaner, saves several bytes. Is it worth to save these 12 bytes by breaking compatibility? The savings bytes is one aspect; the other is the cleanliness. transfering a boolean in that many bits is a little of overkill. I think it is better to fix it now than later. I know you mentioned libnl uses it. But that is something you can change on your side. I dont know of any other app that uses it. If you really want to do it, remove the obsoleted attribute types, I don't like dead bodies laying around :-) I could resend the patch getting rid of those definitions. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2.6.19] AT91RM9200 Ethernet update 3
A minor fix to the Atmel AT91RM9200 Ethernet driver. 1. Use dev_alloc_skb() instead of alloc_skb(). 2. It is not necessary to adjust skb-len manually. Signed-off-by: Andrew Victor [EMAIL PROTECTED] diff -urN linux-2.6.19-final.orig/drivers/net/arm/at91_ether.c linux-2.6.19-final/drivers/net/arm/at91_ether.c --- linux-2.6.19-final.orig/drivers/net/arm/at91_ether.cMon Dec 4 14:42:05 2006 +++ linux-2.6.19-final/drivers/net/arm/at91_ether.c Mon Dec 4 14:43:57 2006 @@ -855,14 +855,13 @@ while (dlist-descriptors[lp-rxBuffIndex].addr EMAC_DESC_DONE) { p_recv = dlist-recv_buf[lp-rxBuffIndex]; pktlen = dlist-descriptors[lp-rxBuffIndex].size 0x7ff; /* Length of frame including FCS */ - skb = alloc_skb(pktlen + 2, GFP_ATOMIC); + skb = dev_alloc_skb(pktlen + 2); if (skb != NULL) { skb_reserve(skb, 2); memcpy(skb_put(skb, pktlen), p_recv, pktlen); skb-dev = dev; skb-protocol = eth_type_trans(skb, dev); - skb-len = pktlen; dev-last_rx = jiffies; lp-stats.rx_bytes += pktlen; netif_rx(skb); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network virtualization/isolation
Daniel, On Mon, 2006-04-12 at 11:18 +0100, Daniel Lezcano wrote: Hi Jamal, Currently, there are some resources moved to a namespace relative access, the IPC and the utsname and this is into the 2.6.19 kernel. The work on the pid namespace is still in progress. The idea is to use a clone approach relying on the unshare_ns syscall. The syscall is called with a set of flags for pids, ipcs, utsname, network ... You can then unshare only the network and have an application into its own network environment. Ok, so i take it this call is used by the setup manager on the host side? For a l3 approach, like a l2, you can run an apache server into a unshared network environment. Better, you can run several apaches server into several network namespaces without modifying the server's network configuration. ok - as i understand it now, this will be the case for all the approaches taken? Some of us, consider l2 as perfectly adapted for some kind of containers like system containers and some kind of application containers running big servers, but find the l2 too much (seems to be a hammer to crush a beetle) for simple network requirements like for network migration, jails or containers which does not take care of such virtualization. For example, you want to create thousands of containers for a cluster of HPC jobs and just to have migration for these jobs. Does it make sense to have l2 approach ? Perhaps not for the specific app you mentioned above. But it makes sense for what i described as virtual routers/bridges. I would say that the solution has to cater for a variety of applications, no? Dmitry Mishin and I, we thought about a l2/l3 solution and we thing we found a solution to have the 2 at runtime. Roughly, it is a l3 based on bind filtering and socket isolation, very similar to what vserver provides. I did a prototype, and it works well for IPV4/unicast. ok - so you guys seem to be reaching at least some consensus then. So, considering, we have a l2 isolation/virtualization, and having a l3 relying on the l2 network isolation resources subset. Is it an acceptable solution ? As long as you can be generic enough so that a wide array of apps can be met, it should be fine. For a test app, consider the virtual bridges/routers i mentioned. The other requirement i would see is that apps that would run on a host would run unchanged. The migration of containers you folks seem to be having under control - my only input into that thought since it is early enough, you may want to build your structuring in such a way that this is easy to do. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][XFRM] Optimize policy dumping
On Mon, 2006-04-12 at 13:24 +0100, Patrick McHardy wrote: A few cases that will behave incorrectly: - two policies in xfrm_policy_inexact with the same direction: after the first iteration we have last_pol = send_pol = first policy and no messages sent, after the second iteration we have send_pol = first policy, last_pol = second policy and still no messages sent. Since send_pol send_pol != last_pol, the second to last block will send send_pol with last_dir, since count 0 the last block will send send_pol again. So we get two times the first policy and zero times the second one. - same case as above, but policies in opposite directions. The first policy will again be sent twice, but with last_dir, which is the direction of the second policy. - three policies in xfrm_policy_inexact, two with similar direction, one with opposite direction. The first two iterations look similar and no policies are dumped, during the third iteration we have count send_pol send_pol != last_pol. So send_pol (the first policy) is sent, but with direction dir, which is at that time the opposite direction of the policy. I guess its easy to construct more cases. In general I don't see how remebering only the last direction can work since two policies with potentially different directions are remembered. Within the loop you always use dir, which also look wrong. All very valid points. Yikes, the directionality is not something i thought clearly about or tested well. I can fix this but this code will only get fuglier. How about the following approach: I add a new callback which is passed in the invocation to walk. This callback is invoked at the end to signal the end of the walk, sort of what done() does in netlink. netlink doesnt use this call but pfkey does. So the burden is then moved to pfkey to keep track of the stoopid count. Thoughts? cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [XFRM] Optimize SA dumping
On Mon, 2006-04-12 at 13:36 +0100, Patrick McHardy wrote: jamal wrote: for (i = 0; i = xfrm_state_hmask; i++) { hlist_for_each_entry(x, entry, xfrm_state_bydst+i, bydst) { + if (count send_x != last_x) { + err = func(send_x, count, data); + if (err) + goto out; + send_x = NULL; + } if (!xfrm_id_proto_match(x-id.proto, proto)) continue; After you sent send_x and set it to NULL, it will be different from last_x (since that is != NULL) and the NULL pointer will be given to func() when continuing here. This one you lost me. Can you give me an example? one or two SAs found? In any case, if i go the done callback approach, I can get rid of all this tracking thing ... cheers, jamal - err = func(x, --count, data); - if (err) - goto out; + + if (!count) { + last_x = send_x = x; + } else { + send_x = last_x; + last_x = x; + } + count++; } } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [XFRM] Optimize SA dumping
jamal wrote: On Mon, 2006-04-12 at 13:36 +0100, Patrick McHardy wrote: jamal wrote: for (i = 0; i = xfrm_state_hmask; i++) { hlist_for_each_entry(x, entry, xfrm_state_bydst+i, bydst) { +if (count send_x != last_x) { +err = func(send_x, count, data); +if (err) +goto out; +send_x = NULL; +} if (!xfrm_id_proto_match(x-id.proto, proto)) continue; After you sent send_x and set it to NULL, it will be different from last_x (since that is != NULL) and the NULL pointer will be given to func() when continuing here. This one you lost me. Can you give me an example? one or two SAs found? More than three SAs, so we have: send_x = last_x = NULL 1. iteration: send_x = last_x = first policy 2. iteration: send_x = first policy last_x = second policy 3. iteration: dump send_x, set send_x = NULL continue at continue statement 4. iteration: We have send_x = NULL and last_x != NULL, so send_x != last_x, leading to dump(NULL, ...) In any case, if i go the done callback approach, I can get rid of all this tracking thing ... I need to read your other mail first before commenting on this :) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network virtualization/isolation
On Mon, 2006-04-12 at 05:15 -0700, Eric W. Biederman wrote: jamal [EMAIL PROTECTED] writes: Containers are a necessary first step to getting migration and checkpoint/restart assistance from the kernel. Isnt it like a MUST have if you are doing things from scratch instead of it being an after thought. 2) the socket level bind/accept filtering with multiple IPs. From reading what Herbert has, it seems they have figured a clever way to optimize this path albeit some challenges (speacial casing for raw filters) etc. I am wondering if one was to use the two level muxing of the socket layer, how much more performance improvement the above scheme provides for #2? I don't follow this question. if you had the sockets tables being in two level mux, first level to hash on namespace which leads to an indirection pointer to the table to find the socket and its bindings (with zero code changes to the socket code), then isnt this fast enough? Clearly you can optimize as in the case of bind/accept filtering, but then you may have to do that for every socket family/protocol (eg netlink doesnt have IP addresses, but the binding to multiple groups is possible) Am i making any more sense? ;- Consider the case of L2 where by the time the packet hits the socket layer on incoming, the VE is already known; in such a case, the lookup would be very cheap. The advantage being you get rid of the speacial casing altogether. I dont see any issues with binds per multiple IPs etc using such a technique. For the case of #1 above, wouldnt it be also easier if the tables for netdevices, PIDs etc were per VE (using the 2 level mux)? Generally yes. s/VE/namespace/. There is a case with hash tables where it seems saner to add an additional entry because hash it is hard to dynamically allocate a hash table, (because they need something large then a single page allocation). A page to store the namespace indirection hash doesnt seem to be such a big waste; i wonder though why you even need a page. If i had 256 hash buckets with 1024 namespaces, it is still not too much of an overhead. But for everything else yes it makes things much easier if you have a per namespace data structure. Ok, I am sure youve done the research; i am just being a devils advocate. A practical question is can we replace hash tables with some variant of trie or radix-tree and not take a performance hit. Given the better scaling of tress to different workload sizes if we can use them so much the better. Especially because a per namespace split gives us a lot of good properties. Is there a patch somewhere i can stare at that you guys agree on? Well we rather expect to bash heads until we can come up with something we all can agree on with the people who more regularly have to maintain the code. The discussions so far have largely been warm ups, to actually doing something. Getting feedback from people who regularly work with the networking stack is appreciated. I hope i am being helpful; It seems to me that folks doing the different implementations may have had different apps in mind. IMO, as long as the solution caters for all apps (can you do virtual bridges/routers?), then we should be fine. Intrusiveness may not be so bad if it needs to be done once. I have to say i like the approach where the core code and algorithms are untouched. Thats why i am humping on the two level mux approach, where one level is to mux and find the namespace indirection and the second step is to use the current datastructures and algorithms as is. I dont know how much more cleaner or less intrusive you can be compared to that. If i compile out the first level mux, I have my old net stack as is, untouched. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][XFRM] Optimize policy dumping
jamal wrote: All very valid points. Yikes, the directionality is not something i thought clearly about or tested well. I can fix this but this code will only get fuglier. How about the following approach: I add a new callback which is passed in the invocation to walk. This callback is invoked at the end to signal the end of the walk, sort of what done() does in netlink. netlink doesnt use this call but pfkey does. So the burden is then moved to pfkey to keep track of the stoopid count. Thoughts? I think the complications come from the fact that you remeber two policies, but only one seems necessary. How about this (completely untested) patch? It simply uses increasing sequence numbers for all but the last entry and uses zero for the last one. diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index 64d3938..c790420 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -860,33 +860,12 @@ EXPORT_SYMBOL(xfrm_policy_flush); int xfrm_policy_walk(u8 type, int (*func)(struct xfrm_policy *, int, int, void*), void *data) { - struct xfrm_policy *pol; + struct xfrm_policy *pol, *last = NULL; struct hlist_node *entry; - int dir, count, error; + int dir, last_dir, count, error; read_lock_bh(xfrm_policy_lock); count = 0; - for (dir = 0; dir 2*XFRM_POLICY_MAX; dir++) { - struct hlist_head *table = xfrm_policy_bydst[dir].table; - int i; - - hlist_for_each_entry(pol, entry, -xfrm_policy_inexact[dir], bydst) { - if (pol-type == type) - count++; - } - for (i = xfrm_policy_bydst[dir].hmask; i = 0; i--) { - hlist_for_each_entry(pol, entry, table + i, bydst) { - if (pol-type == type) - count++; - } - } - } - - if (count == 0) { - error = -ENOENT; - goto out; - } for (dir = 0; dir 2*XFRM_POLICY_MAX; dir++) { struct hlist_head *table = xfrm_policy_bydst[dir].table; @@ -894,23 +873,39 @@ int xfrm_policy_walk(u8 type, int (*func hlist_for_each_entry(pol, entry, xfrm_policy_inexact[dir], bydst) { + if (last) { + error = func(last, last_dir % XFRM_POLICY_MAX, +++count, data); + if (error) + goto out; + last = NULL; + } if (pol-type != type) continue; - error = func(pol, dir % XFRM_POLICY_MAX, --count, data); - if (error) - goto out; + last = pol; + last_dir = dir; } for (i = xfrm_policy_bydst[dir].hmask; i = 0; i--) { hlist_for_each_entry(pol, entry, table + i, bydst) { + if (last) { + error = func(last, last_dir % XFRM_POLICY_MAX, +++count, data); + if (error) + goto out; + last = NULL; + } if (pol-type != type) continue; - error = func(pol, dir % XFRM_POLICY_MAX, --count, data); - if (error) - goto out; + last = pol; + last_dir = dir; } } } - error = 0; + if (count == 0) { + error = -ENOENT; + goto out; + } + error = func(last, last_dir % XFRM_POLICY_MAX, 0, data); out: read_unlock_bh(xfrm_policy_lock); return error;
Re: [PATCH][XFRM] Optimize policy dumping
Patrick McHardy wrote: jamal wrote: All very valid points. Yikes, the directionality is not something i thought clearly about or tested well. I can fix this but this code will only get fuglier. How about the following approach: I add a new callback which is passed in the invocation to walk. This callback is invoked at the end to signal the end of the walk, sort of what done() does in netlink. netlink doesnt use this call but pfkey does. So the burden is then moved to pfkey to keep track of the stoopid count. Thoughts? I think the complications come from the fact that you remeber two policies, but only one seems necessary. How about this (completely untested) patch? It simply uses increasing sequence numbers for all but the last entry and uses zero for the last one. And the same for SAs. diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index 864962b..8e7c52d 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -1099,7 +1099,7 @@ int xfrm_state_walk(u8 proto, int (*func void *data) { int i; - struct xfrm_state *x; + struct xfrm_state *x, *last = NULL; struct hlist_node *entry; int count = 0; int err = 0; @@ -1107,24 +1107,21 @@ int xfrm_state_walk(u8 proto, int (*func spin_lock_bh(xfrm_state_lock); for (i = 0; i = xfrm_state_hmask; i++) { hlist_for_each_entry(x, entry, xfrm_state_bydst+i, bydst) { - if (xfrm_id_proto_match(x-id.proto, proto)) - count++; + if (last) { + err = func(last, ++count, data); + if (err) + goto out; + } + if (!xfrm_id_proto_match(x-id.proto, proto)) + continue; + last = x; } } if (count == 0) { err = -ENOENT; goto out; } - - for (i = 0; i = xfrm_state_hmask; i++) { - hlist_for_each_entry(x, entry, xfrm_state_bydst+i, bydst) { - if (!xfrm_id_proto_match(x-id.proto, proto)) - continue; - err = func(x, --count, data); - if (err) - goto out; - } - } + err = func(last, 0, data); out: spin_unlock_bh(xfrm_state_lock); return err;
Re: [PATCH][XFRM] Optimize policy dumping
On Mon, 2006-04-12 at 14:57 +0100, Patrick McHardy wrote: I think the complications come from the fact that you remeber two policies, but only one seems necessary. How about this (completely untested) patch? It simply uses increasing sequence numbers for all but the last entry and uses zero for the last one. I could give this a try in about 2 hours. But why dont you like the callback approach? You have to admit, this is hairy code. And the same for SAs. The SA has less things to remember, so it is easier; but i will apply this and test it and if it meets the requirements I will look into converting the SA to the same scheme. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][XFRM] Optimize policy dumping
jamal wrote: On Mon, 2006-04-12 at 14:57 +0100, Patrick McHardy wrote: I think the complications come from the fact that you remeber two policies, but only one seems necessary. How about this (completely untested) patch? It simply uses increasing sequence numbers for all but the last entry and uses zero for the last one. I could give this a try in about 2 hours. But why dont you like the callback approach? You have to admit, this is hairy code. Both ways are fine I guess. But the counting has almost no overhead with the patch I sent, so I'm not sure if its worth adding a callback (which still needs to get the last policy/SA as argument, so that part won't get any nicer). BTW, I'm not sure whether there are further requirements than those you quoted, but according to that text, using 1 for all but the last message would be fine as well :) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][XFRM] Optimize policy dumping
Patrick, Your approach is much cleaner. Let me give these a few tests then I will repost later today; forget about the callback approach for now. cheers, jamal On Mon, 2006-04-12 at 08:58 -0500, jamal wrote: On Mon, 2006-04-12 at 14:57 +0100, Patrick McHardy wrote: I think the complications come from the fact that you remeber two policies, but only one seems necessary. How about this (completely untested) patch? It simply uses increasing sequence numbers for all but the last entry and uses zero for the last one. I could give this a try in about 2 hours. But why dont you like the callback approach? You have to admit, this is hairy code. And the same for SAs. The SA has less things to remember, so it is easier; but i will apply this and test it and if it meets the requirements I will look into converting the SA to the same scheme. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][XFRM] Optimize policy dumping
On Mon, 2006-04-12 at 15:06 +0100, Patrick McHardy wrote: Both ways are fine I guess. But the counting has almost no overhead with the patch I sent, so I'm not sure if its worth adding a callback (which still needs to get the last policy/SA as argument, so that part won't get any nicer). BTW, I'm not sure whether there are further requirements than those you quoted, but according to that text, using 1 for all but the last message would be fine as well :) The only arguement for the callback is it will lead to eventually having some semi-reliable dump for pfkey. But i think that is a separate issue to be tackled later. I am actually scratching my head a little as to what happens when the pfkey socket recv is full. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: d80211-drivers updated (zd1211rw-d80211 synced with zd1211rw)
John W. Linville wrote: Similarly, I would prefer for Ulrich or Daniel to maintain the zd1211rw git tree unless you (i.e. Michael) are going to be one of the primary authors going forward. OK. Myself and Ulrich have discussed this, we are going to create our own git trees for zd1211rw-d80211. The attraction of sending pull requests to Michael is that he should be able to merge patches and push out updated trees quicker. If the turnaround is quicker here it would really mean we could stop having to maintain our own git stuff and simply use the upstream tree as the base. However, this isn't really a big deal: Ulrich wants to use git regardless, and I'm still the patch monkey, so I will continue using git as well (regardless of upstream merge time turnaround). Daniel - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: d80211-drivers updated (zd1211rw-d80211 synced with zd1211rw)
On Monday 04 December 2006 07:51, John W. Linville wrote: On Mon, Dec 04, 2006 at 02:50:39AM -0500, Michael Wu wrote: Other (d80211) wireless drivers are welcome to send patches this way if they do not have their own git tree for Linville to pull. Please don't do this. It adds to my pain for reviewing patches. Sure. While I'm complaining :-), I would probably prefer it if you had adm8211 and p54 in separate git trees (or at least on separate branches) as well. That way, if there is a problem in a p54 patch series, I can still pull adm8211 (or vice versa). I will keep separate branches for each driver then. It is not my intent to scold (so please don't feel scolded). It just is counter-productive to prematurely consolidate merging duties. No problem. It seemed like a good idea for zd1211rw, so I was offering to do the same for other drivers. Thanks, -Michael Wu pgpjrSZqN7Xi7.pgp Description: PGP signature
Re: [PATCH][XFRM] Optimize policy dumping
jamal wrote: On Mon, 2006-04-12 at 15:06 +0100, Patrick McHardy wrote: Both ways are fine I guess. But the counting has almost no overhead with the patch I sent, so I'm not sure if its worth adding a callback (which still needs to get the last policy/SA as argument, so that part won't get any nicer). The only arguement for the callback is it will lead to eventually having some semi-reliable dump for pfkey. But i think that is a separate issue to be tackled later. Agreed, that also looks a bit tricker than the optimization. I am actually scratching my head a little as to what happens when the pfkey socket recv is full. dump_sp() doesn't check the return value of pfkey_broadcast, so I guess it will just try to stuff more and more data in the recv queue, leading to either all messages after the last one fitting getting dropped or random drops if dumping and reading happen in parallel. setkey will loop forever if it doesn't receive the zero sequence number. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NETDEV-BCM43XX BUG - Failure to associate AP with latest devscape git pull..
On Mon, 2006-12-04 at 21:30 +1300, Paul Collins wrote: Robert Martin [EMAIL PROTECTED] writes: I seem to be having a problem associating with my AP--everything appears fine and I can bring my wireless adapter up (the LED lights up correctly), and I don't see complaints about firmware/IRQs in the dmesg output. I am able to see operating APs with an iwlist wlan0 scan, but I am unable to connect to the AP, with or without WEP encryption enabled (tried none, hex and ascii; nothing worked). This is with the latest wireless-dev pull and a vanilla 2.6.19 kernel with irqpoll option enabled, otherwise it misbehaves. I would be happy to give more information if requested, or try different kernel/firmware options. I started playing with wireless-dev's bcm43xx-d80211 recently and noticed that I had to set the frequency manually. When I did that, setting the ap manually caused the card to associate immediately. That sounds suspiciously like the driver either doesn't have the correct scan results to be able to pick the AP's frequency, or is not correctly searching those scan results to find it... If the AP is in the scan list, the driver should definitely be able to find set the frequency for the AP, _unless_ you've previously specified a non-zero frequency for iwconfig ethX freq [1]. Do you see your AP in the 'iwlist wlan0 scan' output? dan [1] with WEXT, if you've specified a fixed frequency with iwconfig wlan0 freq , the driver should stay locked to that frequency until it is told to unlock with iwconfig wlan0 freq 0. Here's the sequence of commands I run. (The driver seems stable on my hardware, so I've only had to do this twice in the last week, and then only for reasons unrelated to the driver.) ip link set up wlan0 iwlist wlan0 scan iwconfig wlan0 essid $essid key $wep_key iwconfig wlan0 (note frequencies differ) iwconfig wlan0 freq $freq_from_iwlist_scan iwconfig wlan0 ap $ap_mac_address dhclient wlan0 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RFC: consistent disable_xfrm behaviour
Currently the behaviour of disable_xfrm is inconsistent between locally generated and forwarded packets. For locally generated packets disable_xfrm disables the policy lookup if it is set on the output device, for forwarded traffic however it looks at the input device. This makes it impossible to disable xfrm on all devices but a dummy device and use normal routing to direct traffic to that device. The Documentation is not exactly clear about whether the input or output device is meant, but the way I read it talks about the output device as well (since encryption is only done at output): disable_xfrm - BOOLEAN Disable IPSEC encryption on this interface, whatever the policy Opinions? diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 9f3924c..164a7ee 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1780,7 +1780,7 @@ #ifdef CONFIG_IP_ROUTE_MULTIPATH_CACHED #endif if (in_dev-cnf.no_policy) rth-u.dst.flags |= DST_NOPOLICY; - if (in_dev-cnf.no_xfrm) + if (out_dev-cnf.no_xfrm) rth-u.dst.flags |= DST_NOXFRM; rth-fl.fl4_dst = daddr; rth-rt_dst = daddr;
Re: RFC: consistent disable_xfrm behaviour
James Morris wrote: On Mon, 4 Dec 2006, Patrick McHardy wrote: disable_xfrm - BOOLEAN Disable IPSEC encryption on this interface, whatever the policy Opinions? Looks good to me, wonder what the original rationale was, though. Me too. It was introduced by this patch: http://linux.bkbits.net:8080/linux-2.6/[EMAIL PROTECTED]|src/|src/net|src/net/ipv4|related/net/ipv4/route.c which only mentions the loopback device, in which case it doesn't matter. Alexey, do you remember what the original intent of this was? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network virtualization/isolation
On Sunday 03 December 2006 19:00, Eric W. Biederman wrote: Ok. Just a quick summary of where I see the discussion. We all agree that L2 isolation is needed at some point. As we all agreed on this, may be it is time to send patches one-by-one? For the beggining, I propose to resend Cedric's empty namespace patch as base for others - it is really empty, but necessary in order to move further. After this patch and the following net namespace unshare patch will be accepted, I could send network devices virtualization patches for review and discussion. What do you think? The approaches discussed for L2 and L3 are sufficiently orthogonal that we can implement then in either order. You would need to unshare L3 to unshare L2, but if we think of them as two separate namespaces we are likely to be in better shape. The L3 discussion still has the problem that there has not been agreement on all of the semantics yet. More comments after I get some sleep. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Thanks, Dmitry. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: consistent disable_xfrm behaviour
On Mon, 4 Dec 2006, Patrick McHardy wrote: disable_xfrm - BOOLEAN Disable IPSEC encryption on this interface, whatever the policy Opinions? Looks good to me, wonder what the original rationale was, though. -- James Morris [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: help regarding ip address assignment in kernel module
prajakta choudhari wrote: Hi all: I am absolutely new to kernel module programming. I have to modify the hostap -code such that whenever a network device is registered, it is assigned an ip address. Do it in userspace. Listen for the netlink event that occurs when network device is registered and send a netlink message to assign an IP address. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[NET_SCHED]: cls_fw: fix NULL pointer dereference
Fix a regression from my nfmark mask patch for cls_fw. Thomas, Jamal, do you have an idea what this old method stuff is used for? It seems it is only used during the below mentioned race. [NET_SCHED]: cls_fw: fix NULL pointer dereference When the first fw classifier is initialized, there is a small window between the -init() and -change() calls, during which the classifier is active but not entirely set up and tp-root is still NULL (-init() does nothing). When a packet is queued during this window a NULL pointer dereference occurs in fw_classify() when trying to dereference head-mask; Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- commit 07aac6f7b7e43bc1bb960b2f41a02e81d4e25ead tree 523108861c92ec7e513fbc8561a57b5e1c56c1eb parent d916faace3efc0bf19fe9a615a1ab8fa1a24cd93 author Patrick McHardy [EMAIL PROTECTED] Mon, 04 Dec 2006 16:29:07 +0100 committer Patrick McHardy [EMAIL PROTECTED] Mon, 04 Dec 2006 16:29:07 +0100 net/sched/cls_fw.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/net/sched/cls_fw.c b/net/sched/cls_fw.c index f59a2c4..c797d6a 100644 --- a/net/sched/cls_fw.c +++ b/net/sched/cls_fw.c @@ -101,9 +101,10 @@ static int fw_classify(struct sk_buff *s struct fw_head *head = (struct fw_head*)tp-root; struct fw_filter *f; int r; - u32 id = skb-mark head-mask; + u32 id = skb-mark; if (head != NULL) { + id = head-mask; for (f=head-ht[fw_hash(id)]; f; f=f-next) { if (f-id == id) { *res = f-res;
Re: Network virtualization/isolation
jamal [EMAIL PROTECTED] writes: On Mon, 2006-04-12 at 05:15 -0700, Eric W. Biederman wrote: jamal [EMAIL PROTECTED] writes: Containers are a necessary first step to getting migration and checkpoint/restart assistance from the kernel. Isnt it like a MUST have if you are doing things from scratch instead of it being an after thought. Having the proper semantics is a MUST, which generally makes those a requirement to get consensus and to build the general mergeable solution. The logic for serializing the state is totally uninteresting for the first pass at containers. The applications inside the containers simply don't care. There are two basic techniques for containers. 1) Name filtering. Where you keep the same global identifiers as you do now, but applications inside the container are only allowed to deal with a subset of those names. The current vserver layer 3 networking approach is a handy example of this. But this can apply to process ids and just about everything else. 2) Independent namespaces. (Name duplication) Where you allow the same global name to refer to two different objects at the same time, with the context the reference comes being used to resolve which global object you are talking about. Independent namespaces are the only core requirement for migration, because the ensure when you get to the next machine you don't have a conflict with your global names. So at this point simply allowing duplicate names is the only requirement for migration. But yes that part is a MUST. 2) the socket level bind/accept filtering with multiple IPs. From reading what Herbert has, it seems they have figured a clever way to optimize this path albeit some challenges (speacial casing for raw filters) etc. I am wondering if one was to use the two level muxing of the socket layer, how much more performance improvement the above scheme provides for #2? I don't follow this question. if you had the sockets tables being in two level mux, first level to hash on namespace which leads to an indirection pointer to the table to find the socket and its bindings (with zero code changes to the socket code), then isnt this fast enough? Clearly you can optimize as in the case of bind/accept filtering, but then you may have to do that for every socket family/protocol (eg netlink doesnt have IP addresses, but the binding to multiple groups is possible) Am i making any more sense? ;- Yes. As far as I can tell this is what we are doing and generally it doesn't even require a hash to get the namespace. Just an appropriate place to look for the pointer to the namespace structure. The practical problem with socket lookup is that is a hash table today, allocating the top level of that hash table dynamically at run-time looks problematic, as it is more than a single page. Consider the case of L2 where by the time the packet hits the socket layer on incoming, the VE is already known; in such a case, the lookup would be very cheap. The advantage being you get rid of the speacial casing altogether. I dont see any issues with binds per multiple IPs etc using such a technique. For the case of #1 above, wouldnt it be also easier if the tables for netdevices, PIDs etc were per VE (using the 2 level mux)? Generally yes. s/VE/namespace/. There is a case with hash tables where it seems saner to add an additional entry because hash it is hard to dynamically allocate a hash table, (because they need something large then a single page allocation). A page to store the namespace indirection hash doesnt seem to be such a big waste; i wonder though why you even need a page. If i had 256 hash buckets with 1024 namespaces, it is still not too much of an overhead. Not for namespaces, the problem is for existing hash tables, like the ipv4 routing cache, and for the sockets... But for everything else yes it makes things much easier if you have a per namespace data structure. Ok, I am sure youve done the research; i am just being a devils advocate. I don't think we have gone far enough to prove what has good performance. A practical question is can we replace hash tables with some variant of trie or radix-tree and not take a performance hit. Given the better scaling of tress to different workload sizes if we can use them so much the better. Especially because a per namespace split gives us a lot of good properties. Is there a patch somewhere i can stare at that you guys agree on? For non networking stuff you can look at the uts and ipc namespaces that have been merged into 2.6.19. There is also the struct pid work that is a lead up to the pid namespace. We have very carefully broken the problem by subsystem so we can do incremental steps to get container support into the kernel. That I don't think is the answer you want I think you are looking for networking stack agreement. If we had that we would be submitting
Re: [PATCH][XFRM] Optimize policy dumping
On Mon, 2006-04-12 at 09:05 -0500, jamal wrote: Patrick, Your approach is much cleaner. Let me give these a few tests then I will repost later today; forget about the callback approach for now. I have just applied the policy patch; havent compiled or tested (the setup takes me a while to put together). But by staring, I am seeing that you will end up with the same thing of sending a NULL or the same entry twice. Consider a simple hypothetical test. You have one one entry in the xfrm_policy_inexact table that matches. It happens to be the fifth out of 10 elements. You find it at the 5th iteration. At the sixth iteration you send it and last becomes null. All the way down, you call func with a NULL entry. You could add a check to make sure it only gets invoked when last is not null, but the result is in such a case, you will never send a 0 count element. I am sure there could be other tricky scenarios like this that could be constructed. Thoughts. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][XFRM] Optimize policy dumping
jamal wrote: On Mon, 2006-04-12 at 09:05 -0500, jamal wrote: Patrick, Your approach is much cleaner. Let me give these a few tests then I will repost later today; forget about the callback approach for now. I have just applied the policy patch; havent compiled or tested (the setup takes me a while to put together). But by staring, I am seeing that you will end up with the same thing of sending a NULL or the same entry twice. Consider a simple hypothetical test. You have one one entry in the xfrm_policy_inexact table that matches. It happens to be the fifth out of 10 elements. You find it at the 5th iteration. At the sixth iteration you send it and last becomes null. All the way down, you call func with a NULL entry. You could add a check to make sure it only gets invoked when last is not null, but the result is in such a case, you will never send a 0 count element. I am sure there could be other tricky scenarios like this that could be constructed. Thoughts. Double sending can't happen, but you're right about potentially sending a NULL ptr when after setting it to NULL we don't find any other matching elements. This patch should fix it (and is even simpler), by moving the check for pol-type != type before sending, we make sure that last always contains a valid element unless count == 0. Also fixed an incorrect gcc warning about last_dir potentially being used uninitialized. diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index 64d3938..e19ec1e 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -860,33 +860,12 @@ EXPORT_SYMBOL(xfrm_policy_flush); int xfrm_policy_walk(u8 type, int (*func)(struct xfrm_policy *, int, int, void*), void *data) { - struct xfrm_policy *pol; + struct xfrm_policy *pol, *last = NULL; struct hlist_node *entry; - int dir, count, error; + int dir, last_dir = 0, count, error; read_lock_bh(xfrm_policy_lock); count = 0; - for (dir = 0; dir 2*XFRM_POLICY_MAX; dir++) { - struct hlist_head *table = xfrm_policy_bydst[dir].table; - int i; - - hlist_for_each_entry(pol, entry, -xfrm_policy_inexact[dir], bydst) { - if (pol-type == type) - count++; - } - for (i = xfrm_policy_bydst[dir].hmask; i = 0; i--) { - hlist_for_each_entry(pol, entry, table + i, bydst) { - if (pol-type == type) - count++; - } - } - } - - if (count == 0) { - error = -ENOENT; - goto out; - } for (dir = 0; dir 2*XFRM_POLICY_MAX; dir++) { struct hlist_head *table = xfrm_policy_bydst[dir].table; @@ -896,21 +875,35 @@ int xfrm_policy_walk(u8 type, int (*func xfrm_policy_inexact[dir], bydst) { if (pol-type != type) continue; - error = func(pol, dir % XFRM_POLICY_MAX, --count, data); - if (error) - goto out; + if (last) { + error = func(last, last_dir % XFRM_POLICY_MAX, +++count, data); + if (error) + goto out; + } + last = pol; + last_dir = dir; } for (i = xfrm_policy_bydst[dir].hmask; i = 0; i--) { hlist_for_each_entry(pol, entry, table + i, bydst) { if (pol-type != type) continue; - error = func(pol, dir % XFRM_POLICY_MAX, --count, data); - if (error) - goto out; + if (last) { + error = func(last, last_dir % XFRM_POLICY_MAX, +++count, data); + if (error) + goto out; + } + last = pol; + last_dir = dir; } } } - error = 0; + if (count == 0) { + error = -ENOENT; + goto out; + } + error = func(last, last_dir % XFRM_POLICY_MAX, 0, data); out: read_unlock_bh(xfrm_policy_lock); return error;
Re: Network virtualization/isolation
Dmitry Mishin [EMAIL PROTECTED] writes: On Sunday 03 December 2006 19:00, Eric W. Biederman wrote: Ok. Just a quick summary of where I see the discussion. We all agree that L2 isolation is needed at some point. As we all agreed on this, may be it is time to send patches one-by-one? For the beggining, I propose to resend Cedric's empty namespace patch as base for others - it is really empty, but necessary in order to move further. After this patch and the following net namespace unshare patch will be accepted, I could send network devices virtualization patches for review and discussion. What do you think? I think sending out these patches for review sounds great. For merge order I think enabling the unshare/clone flags to anyone but developers should be about the last thing we do. Starting with clone/unshare sounds to me like hitching up the cart before it is built. I really need to focus on finishing up the pid namespace, so except for a little review and conversation I'm not going to help much on the network side. Of course I need to mess with unix domain sockets to properly implement the pid namespace. Because of the pid credential passing. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][XFRM] Optimize policy dumping
Patrick McHardy wrote: jamal wrote: All the way down, you call func with a NULL entry. You could add a check to make sure it only gets invoked when last is not null, but the result is in such a case, you will never send a 0 count element. I am sure there could be other tricky scenarios like this that could be constructed. Thoughts. Double sending can't happen, but you're right about potentially sending a NULL ptr when after setting it to NULL we don't find any other matching elements. This patch should fix it (and is even simpler), by moving the check for pol-type != type before sending, we make sure that last always contains a valid element unless count == 0. Also fixed an incorrect gcc warning about last_dir potentially being used uninitialized. And again for SAs .. diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index 864962b..a5877f8 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -1099,7 +1099,7 @@ int xfrm_state_walk(u8 proto, int (*func void *data) { int i; - struct xfrm_state *x; + struct xfrm_state *x, *last = NULL; struct hlist_node *entry; int count = 0; int err = 0; @@ -1107,24 +1107,21 @@ int xfrm_state_walk(u8 proto, int (*func spin_lock_bh(xfrm_state_lock); for (i = 0; i = xfrm_state_hmask; i++) { hlist_for_each_entry(x, entry, xfrm_state_bydst+i, bydst) { - if (xfrm_id_proto_match(x-id.proto, proto)) - count++; + if (!xfrm_id_proto_match(x-id.proto, proto)) + continue; + if (last) { + err = func(last, ++count, data); + if (err) + goto out; + } + last = x; } } if (count == 0) { err = -ENOENT; goto out; } - - for (i = 0; i = xfrm_state_hmask; i++) { - hlist_for_each_entry(x, entry, xfrm_state_bydst+i, bydst) { - if (!xfrm_id_proto_match(x-id.proto, proto)) - continue; - err = func(x, --count, data); - if (err) - goto out; - } - } + err = func(last, 0, data); out: spin_unlock_bh(xfrm_state_lock); return err;
Re: Multiple end-points behind same NAT
Herbert Xu wrote: Venkat Yekkirala [EMAIL PROTECTED] wrote: I am wondering if 26sec supports NAT-Traversal for multiple endpoints behind the same NAT. In looking at xfrm_tmpl it's not obvious to me that it's supported, at least going by the following from the setkey man page: When NAT-T is enabled in the kernel, policy matching for ESP over UDP packets may be done on endpoint addresses and port (this depends on the system. System that do not perform the port check cannot support multiple endpoints behind the same NAT). When using ESP over UDP, you can specify port numbers in the endpoint addresses to get the correct matching. Here is an example: spdadd 10.0.11.0/24[any] 10.0.11.33/32[any] any -P out ipsec esp/tunnel/192.168.0.1[4500]-192.168.1.2[3]/require ; Or is this to be accomplished in a different way? It depends on whether it's transport mode or tunnel mode. In tunnel mode it should work just fine. Transport mode on the other hand has fundamental problems with NAT-T that go beyond the Linux implementation. We are experiencing problem when using tunnel mode. Consider the example where the responder is 10.1.0.100 and there are two clients (192.16.8.0.100 and 192.168.0.101) behind a single NAT. The translated address is 10.1.0.200. We are having the IKE daemon (racoon) generate policy based on the initiators policy. When 192.168.0.100 initiates a connection to 10.1.0.100, racoon creates and inserts the following SAs: 10.1.0.100[4500] - 10.1.0.200[4500] 10.1.0.200[4500] - 10.1.0.100[4500] 4500 is the NAT-T encapsulation ports on the dst and src passed in through the SADB_X_EXT_NAT_T*PORT messages. Policy is then generated of the form (omitting fwd policies): 192.168.1.100[any] 10.1.0.100[any] any in prio def ipsec esp/tunnel/10.1.0.200-10.1.0.100/require 10.1.0.100[any] 192.168.1.100[any] any out prio def ipsec esp/tunnel/10.1.0.100-10.1.0.200/require Everything works fine at this point :) When the other client behind the NAT initiates a connection, the following SAs and SPD are created and inserted. 10.1.0.100[1024] - 10.1.0.200[4500] 10.1.0.200[4500] - 10.1.0.100[1024] 192.168.1.101[any] 10.1.0.100[any] any in prio def ipsec esp/tunnel/10.1.0.200-10.1.0.100/require 10.1.0.100[any] 192.168.1.101[any] any out prio def ipsec esp/tunnel/10.1.0.100-10.1.0.200/require This is where things break down :( If the first client sends a message to the responder, the response gets sent to the second client. In fact if you add more clients, responses to *all* of the clients will use the last outbound SA generated and therefore go to the last connected client because it will be using that encapsulation port. I believe (I'll be confirming in a bit) that racoon is sending the encap port info in the SPD, but that info is never used by the kernel. It would seem that information must be retained with the xfrm_tmpl, and used in the SA selection process (compared with the encap info in the xfrm_state) for multiple clients to work. Does the above scenario seem to have the SAs and SPDs set up correctly (we've already made some slight changes to racoon to get it work properly on Linux...)? What is the mechanism that would tie the SPD to particular SAs and allow it to use the SA with the appropriate encap information when the tunnel endpoint address are the same (clients behind the same NAT)? I something isn't clear in my explanation of the behavior that we are experiencing, please ask (I hope I got it all right). Thanks, Darrel - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network virtualization/isolation
On Monday 04 December 2006 18:35, Eric W. Biederman wrote: [skip] Where and when you look to find the network namespace that applies to a packet is the primary difference between the OpenVZ L2 implementation and my L2 implementation. If there is a better and less intrusive while still being obvious method I am all for it. I do not like the OpenVZ thing of doing the lookup once and then stashing the value in current and the special casing the exceptions. Why? -- Thanks, Dmitry. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET_SCHED]: cls_fw: fix NULL pointer dereference
On Mon, 2006-04-12 at 16:34 +0100, Patrick McHardy wrote: Fix a regression from my nfmark mask patch for cls_fw. Thomas, Jamal, do you have an idea what this old method stuff is used for? It seems it is only used during the below mentioned race. AFAIK, that has been there forever. Alexey may know. I am not sure if removing it will break any scripts etc. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][GENETLINK] move command capabilities to flags
* jamal [EMAIL PROTECTED] 2006-12-04 08:01 The savings bytes is one aspect; the other is the cleanliness. transfering a boolean in that many bits is a little of overkill. I think it is better to fix it now than later. I know you mentioned libnl uses it. But that is something you can change on your side. I dont know of any other app that uses it. Right but if some distro is based on exactly 2.6.19 the specific libnl parts won't work at all for a long time. If you really want to do it, remove the obsoleted attribute types, I don't like dead bodies laying around :-) I could resend the patch getting rid of those definitions. OK, I can live with it, I don't care for the breakage much. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][GENETLINK] introduce command names
* jamal [EMAIL PROTECTED] 2006-12-04 08:09 Just makes the discovery more knowledgeable. Theres a hidden meaning in that i would like if possible to create as much of user space as possible without the user having a single line written. Heres how i output the discovered families at the moment without the patch. - [EMAIL PROTECTED]:~/git-trees/iproute2/nov22/genl$ ./genl ctrl ls Added Family Name: nlctrl ID: 0x10 Version: 0x1 header size: 0 max attribs: 6 commands supported: #1: ID-0x3 flags-0x0 Capabilities: has policy; can doit; can dumpit Added Family Name: TASKSTATS ID: 0x11 Version: 0x1 header size: 0 max attribs: 4 commands supported: #1: ID-0x1 flags-0x0 Capabilities: has policy; can doit; [EMAIL PROTECTED]:~/git-trees/iproute2/nov22/genl$ --- It would be a lot more human friendly to put better readability in the commands. I don't agree to waste so much text section just to fancy up some userspace tool which is mainly a toy while developing. If you really need it, do it in userspace like libnl. Userspace should be aware of operation numbers when using it. I'm all for this if the direction is to move towards having some form of scriptable genetlink tool which can be used to communicate with simple genetlink families. That is the real agenda actually. To be honest i dont know how realistic it would be. But one of the next things is to output the command policies. Once we go that path we can reconsider a patch based on this which includes the bits to dump the information to userspace. Until then I don't see the point in this. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET_SCHED]: cls_fw: fix NULL pointer dereference
* jamal [EMAIL PROTECTED] 2006-12-04 11:25 On Mon, 2006-04-12 at 16:34 +0100, Patrick McHardy wrote: Fix a regression from my nfmark mask patch for cls_fw. Thomas, Jamal, do you have an idea what this old method stuff is used for? It seems it is only used during the below mentioned race. AFAIK, that has been there forever. Alexey may know. I am not sure if removing it will break any scripts etc. You mean the scripts get upset when the kernel oopses? Very good spotting Patrick! - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET_SCHED]: cls_fw: fix NULL pointer dereference
Thomas Graf wrote: * jamal [EMAIL PROTECTED] 2006-12-04 11:25 On Mon, 2006-04-12 at 16:34 +0100, Patrick McHardy wrote: Fix a regression from my nfmark mask patch for cls_fw. Thomas, Jamal, do you have an idea what this old method stuff is used for? It seems it is only used during the below mentioned race. AFAIK, that has been there forever. Alexey may know. I am not sure if removing it will break any scripts etc. You mean the scripts get upset when the kernel oopses? Well, it won't oops without my broken patch :) It just seems this code is entirely useless and the only thing it does is cause short term unexpected behaviour during the race I mentioned. One thing we should probably do is to move the tp-root allocation to the init function in cls_fw and the others implementing it as dummy to at least close the race between -init and -change. I'll look into that as a follow-up patch. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network virtualization/isolation
On Mon, Dec 04, 2006 at 06:19:00PM +0300, Dmitry Mishin wrote: On Sunday 03 December 2006 19:00, Eric W. Biederman wrote: Ok. Just a quick summary of where I see the discussion. We all agree that L2 isolation is needed at some point. As we all agreed on this, may be it is time to send patches one-by-one? For the beggining, I propose to resend Cedric's empty namespace patch as base for others - it is really empty, but necessary in order to move further. After this patch and the following net namespace unshare patch will be accepted, well, I have neither seen any performance tests showing that the following is true: - no change on network performance without the space enabled - no change on network performance on the host with the network namespaces enabled - no measureable overhead inside the network namespace - good scaleability for a larger number of network namespaces I could send network devices virtualization patches for review and discussion. that won't hurt ... best, Herbert What do you think? The approaches discussed for L2 and L3 are sufficiently orthogonal that we can implement then in either order. You would need to unshare L3 to unshare L2, but if we think of them as two separate namespaces we are likely to be in better shape. The L3 discussion still has the problem that there has not been agreement on all of the semantics yet. More comments after I get some sleep. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Thanks, Dmitry. ___ Containers mailing list [EMAIL PROTECTED] https://lists.osdl.org/mailman/listinfo/containers - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network virtualization/isolation
Dmitry Mishin [EMAIL PROTECTED] writes: On Monday 04 December 2006 18:35, Eric W. Biederman wrote: [skip] Where and when you look to find the network namespace that applies to a packet is the primary difference between the OpenVZ L2 implementation and my L2 implementation. If there is a better and less intrusive while still being obvious method I am all for it. I do not like the OpenVZ thing of doing the lookup once and then stashing the value in current and the special casing the exceptions. Why? I like it when things are obvious and not implied. The implementations seems to favor fewer lines of code touched over maintainability of the code. Which if you are maintaining out of tree code is fine. At leas that was my impression last time I looked at the code. I know there are a lot of silly things in the existing implementations because they were initially written without the expectation of being able to merge the code into the main kernel. This resulted in some non-general interfaces, and a preference for patches that touch as few lines of code as possible. Anyway this has bit has been discussed before and we can discuss it seriously in the context of patch review. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: consistent disable_xfrm behaviour
Hello! Alexey, do you remember what the original intent of this was? disable_policy was supposed to skip policy checks on input. It makes sense only on input device. disable_xfrm was supposed to skip transformations on output. It makes sense only on output device. If it does not work, it was done wrong. :-) As I see it, root of the problem is that DST_NOXFRM flag is calculated using wrong device. out_dev should be used in __mkroute_input(). It looks as a cut-n-paste error, the code was taken from output path, where it is correct. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET_SCHED]: cls_fw: fix NULL pointer dereference
* Patrick McHardy [EMAIL PROTECTED] 2006-12-04 17:39 It just seems this code is entirely useless and the only thing it does is cause short term unexpected behaviour during the race I mentioned. Yes, the whole head == NULL branch should be removed. One thing we should probably do is to move the tp-root allocation to the init function in cls_fw and the others implementing it as dummy to at least close the race between -init and -change. I'll look into that as a follow-up patch. Right, allocating the head in init with a mask of 0x and then allow the user to overwrite it seems to make most sense. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network virtualization/isolation
Herbert Poetzl [EMAIL PROTECTED] writes: On Mon, Dec 04, 2006 at 06:19:00PM +0300, Dmitry Mishin wrote: On Sunday 03 December 2006 19:00, Eric W. Biederman wrote: Ok. Just a quick summary of where I see the discussion. We all agree that L2 isolation is needed at some point. As we all agreed on this, may be it is time to send patches one-by-one? For the beggining, I propose to resend Cedric's empty namespace patch as base for others - it is really empty, but necessary in order to move further. After this patch and the following net namespace unshare patch will be accepted, well, I have neither seen any performance tests showing that the following is true: - no change on network performance without the space enabled - no change on network performance on the host with the network namespaces enabled - no measureable overhead inside the network namespace - good scaleability for a larger number of network namespaces Yes all important criteria for selecting the implementation. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: consistent disable_xfrm behaviour
Alexey Kuznetsov wrote: Hello! Alexey, do you remember what the original intent of this was? disable_policy was supposed to skip policy checks on input. It makes sense only on input device. disable_xfrm was supposed to skip transformations on output. It makes sense only on output device. If it does not work, it was done wrong. :-) As I see it, root of the problem is that DST_NOXFRM flag is calculated using wrong device. out_dev should be used in __mkroute_input(). It looks as a cut-n-paste error, the code was taken from output path, where it is correct. Thanks, thats exactly what I suspected :) Here's the patch again properly signed off. [XFRM]: Use output device disable_xfrm for forwarded packets Currently the behaviour of disable_xfrm is inconsistent between locally generated and forwarded packets. For locally generated packets disable_xfrm disables the policy lookup if it is set on the output device, for forwarded traffic however it looks at the input device. This makes it impossible to disable xfrm on all devices but a dummy device and use normal routing to direct traffic to that device. Always use the output device when checking disable_xfrm. Signed-off-by: Patrick McHardy [EMAIL PROTECTED] diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 9f3924c..164a7ee 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1780,7 +1780,7 @@ #ifdef CONFIG_IP_ROUTE_MULTIPATH_CACHED #endif if (in_dev-cnf.no_policy) rth-u.dst.flags |= DST_NOPOLICY; - if (in_dev-cnf.no_xfrm) + if (out_dev-cnf.no_xfrm) rth-u.dst.flags |= DST_NOXFRM; rth-fl.fl4_dst = daddr; rth-rt_dst = daddr;
Re: Network virtualization/isolation
On Monday 04 December 2006 19:43, Herbert Poetzl wrote: On Mon, Dec 04, 2006 at 06:19:00PM +0300, Dmitry Mishin wrote: On Sunday 03 December 2006 19:00, Eric W. Biederman wrote: Ok. Just a quick summary of where I see the discussion. We all agree that L2 isolation is needed at some point. As we all agreed on this, may be it is time to send patches one-by-one? For the beggining, I propose to resend Cedric's empty namespace patch as base for others - it is really empty, but necessary in order to move further. After this patch and the following net namespace unshare patch will be accepted, well, I have neither seen any performance tests showing that the following is true: - no change on network performance without the space enabled - no change on network performance on the host with the network namespaces enabled - no measureable overhead inside the network namespace - good scaleability for a larger number of network namespaces These questions are for complete L2 implementation, not for these 2 empty patches. If you need some data relating to Andrey's implementation, I'll get it. Which test do you accept? I could send network devices virtualization patches for review and discussion. that won't hurt ... best, Herbert What do you think? The approaches discussed for L2 and L3 are sufficiently orthogonal that we can implement then in either order. You would need to unshare L3 to unshare L2, but if we think of them as two separate namespaces we are likely to be in better shape. The L3 discussion still has the problem that there has not been agreement on all of the semantics yet. More comments after I get some sleep. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Thanks, Dmitry. ___ Containers mailing list [EMAIL PROTECTED] https://lists.osdl.org/mailman/listinfo/containers -- Thanks, Dmitry. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] add net_device_stats support to ethtool
Hi, ethtool -S only supports devices that have custom code written to print the stats. A lot of drivers use struct net_device_stats, so adding code to ethtool would make it very easy for such drivers to add support for ethtool -S. The drivers would just need to add this: .get_strings = ethtool_op_net_device_stats_get_strings, .get_stats_count = ethtool_op_net_device_stats_get_stats_count, .get_ethtool_stats = ethtool_op_net_device_get_ethtool_stats, to their struct ethtool_ops (The function names are not the best... Suggestions for better ones are welcome) Is there any interest to have this in the kernel? The patch contains straightforward implementation for the .get_strings, .get_stats_count and .get_ethtool_stats ethtool_ops methods. Thanks --Dan --- ethtool.c~ 2004-12-24 13:35:50.0 -0800 +++ ethtool.c 2006-12-01 08:55:26.0 -0800 @@ -809,6 +809,64 @@ return -EOPNOTSUPP; } + +#define NET_DEVICE_NUM_STATS (sizeof(struct net_device_stats) / sizeof(unsigned long)) + +static struct { + const char string[ETH_GSTRING_LEN]; +} ethtool_net_device_stats_keys[NET_DEVICE_NUM_STATS] = { + { rx_packets}, + { tx_packets}, + { rx_bytes}, + { tx_bytes}, + { rx_errors}, + { tx_errors}, + { rx_dropped}, + { tx_dropped}, + { multicast}, + { collisions}, + { rx_length_errors}, + { rx_over_errors}, + { rx_crc_errors}, + { rx_frame_errors}, + { rx_fifo_errors}, + { rx_missed_errors}, + { tx_aborted_errors}, + { tx_carrier_errors}, + { tx_fifo_errors}, + { tx_heartbeat_errors}, + { tx_window_errors}, + { rx_compressed}, + { tx_compressed} +}; + +int ethtool_op_net_device_stats_get_stats_count(struct net_device *dev) +{ + return NET_DEVICE_NUM_STATS; +} + +void ethtool_op_net_device_stats_get_strings(struct net_device *dev, u32 stringset, u8 *buf) +{ + switch (stringset) { + case ETH_SS_STATS: + memcpy(buf, ethtool_net_device_stats_keys, sizeof(ethtool_net_device_stats_keys)); + break; + default: + WARN_ON(1); /* we need a WARN() */ + break; + } +} + +void ethtool_op_net_device_get_ethtool_stats(struct net_device *dev, + struct ethtool_stats *estats, u64 *tmp_stats) +{ + u32 i; + u64 *dest = tmp_stats; + unsigned long *src = (unsigned long*)dev-get_stats(dev); + for (i = 0; i estats-n_stats; i++) + *dest++ = *src++; +} + EXPORT_SYMBOL(dev_ethtool); EXPORT_SYMBOL(ethtool_op_get_link); EXPORT_SYMBOL(ethtool_op_get_sg); @@ -817,3 +875,6 @@ EXPORT_SYMBOL(ethtool_op_set_sg); EXPORT_SYMBOL(ethtool_op_set_tso); EXPORT_SYMBOL(ethtool_op_set_tx_csum); +EXPORT_SYMBOL(ethtool_op_net_device_stats_get_stats_count); +EXPORT_SYMBOL(ethtool_op_net_device_stats_get_strings); +EXPORT_SYMBOL(ethtool_op_net_device_get_ethtool_stats); --- ethtool.h~ 2006-09-05 12:29:45.0 -0700 +++ ethtool.h 2006-12-01 08:51:46.0 -0800 @@ -260,6 +260,12 @@ int ethtool_op_set_sg(struct net_device *dev, u32 data); u32 ethtool_op_get_tso(struct net_device *dev); int ethtool_op_set_tso(struct net_device *dev, u32 data); +int ethtool_op_net_device_stats_get_stats_count(struct net_device *dev); +void ethtool_op_net_device_stats_get_strings(struct net_device *dev, +u32 stringset, u8 *buf); +void ethtool_op_net_device_get_ethtool_stats(struct net_device *dev, +struct ethtool_stats *estats, +u64 *tmp_stats); /** * ethtool_ops - Alter and report network device settings - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] NetXen: 1G/10G Ethernet Driver updates
Hi All, We have incorporated feedbacks received during last post. We have removed bounce buffer as well. I will be sending patches shortly. These patches are wrt netdev#master. Thanks, --Amit - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [IPVS] transparent proxying
Hi Jinhua, home_king wrote: I am afraid that the method used in the patch is not native, it breaks on IP fragments. IPVS is a kind of layer-4 switching, it routes packet by checking layer-4 information such as address and port number. ip_vs_in() is hooked at NF_IP_LOCAL_IN, so that all the packets received by ip_vs_in() are already defragmented. On NF_IP_FORWARD hook, there may be some IP fragements, ip_vs_in() cannot handle those IP fragments. However, your analysis is a bit inaccurate, I think. As far as I know, Policy route's conjunction with fwmark works just under some precondition, the most important one of which is defragmentation function provided by NF_CONNTRACK. That is, the routing core works in layer 3, and it can even route all IP fragments just by the IP info, and doesn't care about the layer 4 info, such as service ports. Firewall Mark makes layer 4 involved. To retrieve the full layer 4 header, netfilter has no choice but to do defragmentation for the routing core, which is the key function of NF_CONNTRACK. In a word, without NF_CONNTRACK, neither the policy route nor my patch, can face the defragmentation problem! OK, it's my mistake. :-) we mark packets according to port number and those packets go through FORWARD chain by default, so we have to do defragmentation before firewall-marking. I will give you some proof of my words. LOOKING at the NETFILTER SOURCE: See the below quotation form /usr/include/linux/netfilter_ipv4.h, the defragmentation of netfilter owns the highest prority, so the corresponding hook will be called before any other hooks including ipvs iptables. -quote start- enum nf_ip_hook_priorities { NF_IP_PRI_FIRST = INT_MIN, NF_IP_PRI_CONNTRACK_DEFRAG = -400, ... NF_IP_PRI_LAST = INT_MAX, }; -quote end- And see ip_conntrack_standalone.c, here defines the defrag hooks on PREROUTING chain and OUTPUT chain with NF_IP_PRI_CONNTRACK_DEFRAG prority. Needless to say, all packets which flow on INPUT FORWARD chain are already defragmented by it; In other word, once the CONNTRACK is enabled, you cannot see any fragment in INPUT FORWARD chain, even the other chains. -quote start- static struct nf_hook_ops ip_conntrack_defrag_ops = { .hook= ip_conntrack_defrag, .owner= THIS_MODULE, .pf= PF_INET, .hooknum= NF_IP_PRE_ROUTING, .priority= NF_IP_PRI_CONNTRACK_DEFRAG, }; static struct nf_hook_ops ip_conntrack_defrag_local_out_ops = { .hook= ip_conntrack_defrag, .owner= THIS_MODULE, .pf= PF_INET, .hooknum= NF_IP_LOCAL_OUT, .priority= NF_IP_PRI_CONNTRACK_DEFRAG, }; -quote end- On the other hand, I wrote a simply program -- test_udp_fragment.c to test it. -test code start-- #include sys/types.h #include sys/socket.h #include errno.h #include stdlib.h #include linux/in.h int main(int argc, char *argv[]) { #ifndef AS_SERVER if (argc 2) { printf(SYNTAX: %s server ip\n, argv[0]); exit(EXIT_SUCCESS); } #endif int sockfd; sockfd = socket(PF_INET, SOCK_DGRAM, 0); if (sockfd 0) { perror(socket); exit(EXIT_FAILURE); } #define MSG_SIZE 1 /* bigger than MTU */ #define BUF_SIZE MSG_SIZE+1 char buf[BUF_SIZE]; memset(buf, 0, BUF_SIZE); struct sockaddr_in test_addr; test_addr.sin_family = AF_INET; test_addr.sin_port = htons(1); #ifdef AS_SERVER test_addr.sin_addr.s_addr = inet_addr(0.0.0.0); if (bind(sockfd, (struct sockaddr *) test_addr, sizeof(test_addr)) 0) { perror(bind); exit(EXIT_FAILURE); } ssize_t r = 0; while (1) { r = recv(sockfd, buf, MSG_SIZE, MSG_WAITALL); if (r MSG_SIZE) { printf(truncated!\n); exit(EXIT_FAILURE); } printf(recv message: %s\n, buf); } #else memset(buf, 'A', MSG_SIZE); test_addr.sin_addr.s_addr = inet_addr(argv[1]); ssize_t s = 0; s = sendto(sockfd, buf, MSG_SIZE, 0, (struct sockaddr *) test_addr, sizeof(test_addr)); if (s != MSG_SIZE) { perror(send failed); exit(EXIT_FAILURE); } #endif exit(EXIT_SUCCESS); } -test code end-- The program above implements a simple udp server a simple udp client. The client sends a message of MSG_SIZE bytes (which is filled with 'A') to the server, and the server receives and prints out the message. The MSG_SIZE (Here I takes 1 as example) is far bigger than the normal Ethernet NIC MTU (1500), so the output message will be fragmented. Given the IP (SIP) of server is 172.16.100.254, and the IP of client (CIP) is 172.16.100.63. The Default Gateway IP of client is SIP. I do the below settings: @ Server # Mark the client's udp access iptables -t mangle -A PREROUTING -p udp -s 172.16.100.63 --dport 1 \ -j MARK --set-mark 1 # REDIRECT the forward packets marked with 1 to
[PATCH 1/2] NetXen: whitespace cleaup and more cleanup fixes
Signed-off-by: Amit S. Kale [EMAIL PROTECTED] netxen_nic.h | 56 -- netxen_nic_ethtool.c | 53 +--- netxen_nic_hdr.h |6 ++--- netxen_nic_hw.c | 54 + netxen_nic_hw.h | 10 netxen_nic_init.c| 61 +-- netxen_nic_ioctl.h |6 ++--- netxen_nic_isr.c | 48 +--- netxen_nic_main.c| 54 + netxen_nic_niu.c | 10 10 files changed, 165 insertions(+), 193 deletions(-) diff --git a/drivers/net/netxen/netxen_nic.h b/drivers/net/netxen/netxen_nic.h index d925053..d51f437 100644 --- a/drivers/net/netxen/netxen_nic.h +++ b/drivers/net/netxen/netxen_nic.h @@ -1,25 +1,25 @@ /* * Copyright (C) 2003 - 2006 NetXen, Inc. * All rights reserved. - * + * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License * as published by the Free Software Foundation; either version 2 * of the License, or (at your option) any later version. - * + * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. - * + * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, * MA 02111-1307, USA. - * + * * The full GNU General Public License is included in this distribution * in the file called LICENSE. - * + * * Contact Information: *[EMAIL PROTECTED] * NetXen, @@ -89,8 +89,8 @@ * normalize a 64MB crb address to 32MB PCI window * To use NETXEN_CRB_NORMALIZE, window _must_ be set to 1 */ -#define NETXEN_CRB_NORMAL(reg)\ - (reg) - NETXEN_CRB_PCIX_HOST2 + NETXEN_CRB_PCIX_HOST +#define NETXEN_CRB_NORMAL(reg) \ + ((reg) - NETXEN_CRB_PCIX_HOST2 + NETXEN_CRB_PCIX_HOST) #define NETXEN_CRB_NORMALIZE(adapter, reg) \ pci_base_offset(adapter, NETXEN_CRB_NORMAL(reg)) @@ -164,7 +164,7 @@ enum { #define MAX_CMD_DESCRIPTORS1024 #define MAX_RCV_DESCRIPTORS32768 -#define MAX_JUMBO_RCV_DESCRIPTORS 1024 +#define MAX_JUMBO_RCV_DESCRIPTORS 4096 #define MAX_RCVSTATUS_DESCRIPTORS MAX_RCV_DESCRIPTORS #define MAX_JUMBO_RCV_DESC MAX_JUMBO_RCV_DESCRIPTORS #define MAX_RCV_DESC MAX_RCV_DESCRIPTORS @@ -559,7 +559,7 @@ typedef enum { #define PRIMARY_START (BOOTLD_START) #define FLASH_CRBINIT_SIZE (0x4000) #define FLASH_BRDCFG_SIZE (sizeof(struct netxen_board_info)) -#define FLASH_USER_SIZE(sizeof(netxen_user_info)/sizeof(u32)) +#define FLASH_USER_SIZE(sizeof(struct netxen_user_info)/sizeof(u32)) #define FLASH_SECONDARY_SIZE (USER_START-SECONDARY_START) #define NUM_PRIMARY_SECTORS(0x20) #define NUM_CONFIG_SECTORS (1) @@ -572,7 +572,7 @@ typedef enum { #else #define DPRINTK(klevel, fmt, args...) do { \ printk(KERN_##klevel PFX %s: %s: fmt, __FUNCTION__,\ - (adapter != NULL adapter-port != NULL \ + (adapter != NULL \ adapter-port[0] != NULL \ adapter-port[0]-netdev != NULL) ? \ adapter-port[0]-netdev-name : NULL, \ @@ -703,8 +703,6 @@ struct netxen_recv_context { #define NETXEN_NIC_MSI_ENABLED 0x02 -struct netxen_drvops; - struct netxen_adapter { struct netxen_hardware_context ahw; int port_count; /* Number of configured ports */ @@ -746,8 +744,21 @@ struct netxen_adapter { struct netxen_recv_context recv_ctx[MAX_RCV_CTX]; int is_up; - int work_done; - struct netxen_drvops *ops; + int (*enable_phy_interrupts) (struct netxen_adapter *, int); + int (*disable_phy_interrupts) (struct netxen_adapter *, int); + void (*handle_phy_intr) (struct netxen_adapter *); + int (*macaddr_set) (struct netxen_port *, netxen_ethernet_macaddr_t); + int (*set_mtu) (struct netxen_port *, int); + int (*set_promisc) (struct netxen_adapter *, int, + netxen_niu_prom_mode_t); + int (*unset_promisc) (struct netxen_adapter *, int, + netxen_niu_prom_mode_t); + int (*phy_read) (struct netxen_adapter *, long phy, long reg, u32 *); + int (*phy_write) (struct netxen_adapter *, long phy, long reg, u32 val); + int (*init_port) (struct netxen_adapter *, int); + void (*init_niu) (struct netxen_adapter *); + int (*stop_port) (struct netxen_adapter *, int); + };
Re: RFC: consistent disable_xfrm behaviour
Hello! Here's the patch again properly signed off. I think it is correct. Alexey - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network virtualization/isolation
Dmitry Mishin wrote: On Monday 04 December 2006 19:43, Herbert Poetzl wrote: On Mon, Dec 04, 2006 at 06:19:00PM +0300, Dmitry Mishin wrote: On Sunday 03 December 2006 19:00, Eric W. Biederman wrote: Ok. Just a quick summary of where I see the discussion. We all agree that L2 isolation is needed at some point. As we all agreed on this, may be it is time to send patches one-by-one? For the beggining, I propose to resend Cedric's empty namespace patch as base for others - it is really empty, but necessary in order to move further. After this patch and the following net namespace unshare patch will be accepted, well, I have neither seen any performance tests showing that the following is true: - no change on network performance without the space enabled - no change on network performance on the host with the network namespaces enabled - no measureable overhead inside the network namespace - good scaleability for a larger number of network namespaces These questions are for complete L2 implementation, not for these 2 empty patches. If you need some data relating to Andrey's implementation, I'll get it. Which test do you accept? tbench ? With the following scenarii: * intra host communication (one time with IP on eth and one time with 127.0.0.1) * inter host communication Each time: - a single network namespace - with 100 network namespace. 1 server communicating and 99 listening but doing nothing. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][XFRM] Optimize policy dumping
On Mon, 2006-04-12 at 16:55 +0100, Patrick McHardy wrote: This patch should fix it (and is even simpler), by moving the check for pol-type != type before sending, we make sure that last always contains a valid element unless count == 0. Also fixed an incorrect gcc warning about last_dir potentially being used uninitialized. Ok, both looked good except for the test of a single entry. This was because you would break out of the loop with count of 0. The patch against yours would look like something attached for the state case. Dont forget there are two spots on the policy side of things;- You can either submit both patches or i could later today. If you do, please look at some of the comments i made in the first patch and include them. Just from the outset the numbers improvement looked the same as the way i had it. With these changes, 40K SPDs and 20K SAs dumping (subpolicies compiled out) --- speedopolis:~# time ./ip x state real0m1.985s user0m0.000s sys 0m1.984s speedopolis:~# time ./ip x policy real0m7.901s user0m0.008s sys 0m7.896s --- As a reference point, the old numbers: --- speedopolis:~# ./ip xf pol real0m13.496s user0m0.000s sys 0m13.493s speedopolis:~# speedopolis:~# time ./ip xf sta real0m5.321s user0m0.004s sys 0m5.316s Thanks a lot for your efforts Patrick. cheers, jamal --- a/net/xfrm/xfrm_state.c 2006-12-04 12:06:23.0 -0500 +++ b/net/xfrm/xfrm_state.c 2006-12-04 12:07:09.0 -0500 @@ -1159,11 +1159,12 @@ if (!xfrm_id_proto_match(x-id.proto, proto)) continue; if (last) { -err = func(last, ++count, data); +err = func(last, count, data); if (err) goto out; } last = x; + count++; } } if (count == 0) {
Re: [RFC][GENETLINK] introduce command names
On Mon, 2006-04-12 at 17:34 +0100, Thomas Graf wrote: * jamal [EMAIL PROTECTED] 2006-12-04 08:09 Just makes the discovery more knowledgeable. ... ... It would be a lot more human friendly to put better readability in the commands. I don't agree to waste so much text section just to fancy up some userspace tool which is mainly a toy while developing. Hey, it is not a toy ;- If you really need it, do it in userspace like libnl. ... That is the real agenda actually. To be honest i dont know how realistic it would be. But one of the next things is to output the command policies. Once we go that path we can reconsider a patch based on this which includes the bits to dump the information to userspace. Until then I don't see the point in this. Ok, fair enough lets defer it for then. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][GENETLINK] move command capabilities to flags
On Mon, 2006-04-12 at 17:31 +0100, Thomas Graf wrote: * jamal [EMAIL PROTECTED] 2006-12-04 08:01 The savings bytes is one aspect; the other is the cleanliness. transfering a boolean in that many bits is a little of overkill. I think it is better to fix it now than later. I know you mentioned libnl uses it. But that is something you can change on your side. I dont know of any other app that uses it. Right but if some distro is based on exactly 2.6.19 the specific libnl parts won't work at all for a long time. We can resolve that by uping the version for the controller. User will use that a signal. If you really want to do it, remove the obsoleted attribute types, I don't like dead bodies laying around :-) I could resend the patch getting rid of those definitions. OK, I can live with it, I don't care for the breakage much. cool - will send a patch later. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][XFRM] Optimize policy dumping
jamal wrote: Ok, both looked good except for the test of a single entry. This was because you would break out of the loop with count of 0. The patch against yours would look like something attached for the state case. Dont forget there are two spots on the policy side of things;- You're right, that case was also broken. With your patch on top it looks all right. You can either submit both patches or i could later today. If you do, please look at some of the comments i made in the first patch and include them. I'd prefer if you did it since you're already testing the thing :) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/10] chelesio: transmit locking (plus bug fix).
On Sun, 3 Dec 2006 11:45:09 +0100 Eric Lemoine [EMAIL PROTECTED] wrote: Stephen, On 12/2/06, Stephen Hemminger [EMAIL PROTECTED] wrote: If transmit lock is contended on, then push return code back and retry at higher level. Looking at qdisc_restart, it seems to me that the NETDEV_TX_LOCKED return code must only be used if the device features LLTX. With your patch, if q-lock is already grabbed, qdisc_restart is going to requeue skb without going through the collision section of qdisc_restart. The Chelsio driver already sets LLTX (see drivers/net/chelsio/cxgb2.c) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.19] AT91RM9200 Ethernet update 1
On 04 Dec 2006 14:26:57 +0200 Andrew Victor [EMAIL PROTECTED] wrote: This patch is an update to the Atmel AT91RM9200 Ethernet driver. 1. Remove the global 'at91_dev' variable. 2. Move the global 'check_timer' variable into the private data structure. Signed-off-by: Andrew Victor [EMAIL PROTECTED] diff -urN linux-2.6.19-final.orig/drivers/net/arm/at91_ether.c linux-2.6.19-final/drivers/net/arm/at91_ether.c --- linux-2.6.19-final.orig/drivers/net/arm/at91_ether.c Sat Dec 2 17:28:27 2006 +++ linux-2.6.19-final/drivers/net/arm/at91_ether.c Mon Dec 4 14:13:01 2006 @@ -41,9 +41,6 @@ #define DRV_NAME at91_ether #define DRV_VERSION 1.0 -static struct net_device *at91_dev; - -static struct timer_list check_timer; #define LINK_POLL_INTERVAL (HZ) /* . */ @@ -252,8 +249,8 @@ * PHY doesn't have an IRQ pin (RTL8201, DP83847, AC101L), * or board does not have it connected. */ - check_timer.expires = jiffies + LINK_POLL_INTERVAL; - add_timer(check_timer); + lp-check_timer.expires = jiffies + LINK_POLL_INTERVAL; + add_timer(lp-check_timer); return; } @@ -300,7 +297,7 @@ irq_number = lp-board_data.phy_irq_pin; if (!irq_number) { - del_timer_sync(check_timer); + del_timer_sync(lp-check_timer); return; } @@ -362,13 +359,14 @@ static void at91ether_check_link(unsigned long dev_id) { struct net_device *dev = (struct net_device *) dev_id; + struct at91_private *lp = (struct at91_private *) dev-priv; No cast needed. Use netdev_priv(dev) rather than dev-priv. netdev_priv() is a constant offset so the compiler can save a register. -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][GENETLINK] move command capabilities to flags
* jamal [EMAIL PROTECTED] 2006-12-04 12:48 On Mon, 2006-04-12 at 17:31 +0100, Thomas Graf wrote: * jamal [EMAIL PROTECTED] 2006-12-04 08:01 The savings bytes is one aspect; the other is the cleanliness. transfering a boolean in that many bits is a little of overkill. I think it is better to fix it now than later. I know you mentioned libnl uses it. But that is something you can change on your side. I dont know of any other app that uses it. Right but if some distro is based on exactly 2.6.19 the specific libnl parts won't work at all for a long time. We can resolve that by uping the version for the controller. User will use that a signal. Good idea, makes me happy :-) please do that in your patch as well. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.19] AT91RM9200 Ethernet update 2
On 04 Dec 2006 14:42:08 +0200 Andrew Victor [EMAIL PROTECTED] wrote: This patch adds NetPoll / NetConsole support to the Atmel AT91RM9200 Ethernet driver. Original patch from Bill Gatliff. Signed-off-by: Andrew Victor [EMAIL PROTECTED] diff -urN linux-2.6.19-final.orig/drivers/net/arm/at91_ether.c linux-2.6.19-final/drivers/net/arm/at91_ether.c --- linux-2.6.19-final.orig/drivers/net/arm/at91_ether.c Mon Dec 4 14:27:21 2006 +++ linux-2.6.19-final/drivers/net/arm/at91_ether.c Mon Dec 4 14:33:35 2006 @@ -925,6 +925,17 @@ return IRQ_HANDLED; } +#ifdef CONFIG_NET_POLL_CONTROLLER +static void at91ether_poll_controller(struct net_device *dev) +{ + unsigned long flags; + + local_irq_save(flags); + at91ether_interrupt(dev-irq, dev, NULL); + local_irq_restore(flags); +} +#endif poll_controller is always called with interrupts already disabled. The third argument to interrupt routines was dropped (struct pt_regs) in 2.6.19. Maybe that never got fixed in ARM? -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC patch] driver for the Opencores Ethernet Controller
Hi, Here is a driver for the Opencores Ethernet Controller. I started from a 2.4 uClinux driver, ported it to 2.6, made it work, cleaned it up and added the MII interface. The Opencores Ethernet Controller is Verilog code that can be used to implement an Ethernet device in hardware. It needs to be coupled with a PHY and some buffer memory. Because of that devices that implement this controller can be very different. The code here tries to support that by having some parameters that need to be defined at compile time. This is my first Ethernet driver, so comments/advice would be appreciated. Thanks --dan Kconfig|5 open_eth.c | 1022 + open_eth.h | 132 +++ 3 files changed, 1159 insertions(+) --- /dev/null 2006-09-20 11:38:04.545479250 -0700 +++ drivers/net/open_eth.c 2006-12-04 09:20:17.0 -0800 @@ -0,0 +1,1022 @@ +/* + * Ethernet driver for Open Ethernet Controller (www.opencores.org). + * Copyright (c) 2002 Simon Srot ([EMAIL PROTECTED]) + * Copyright (c) 2006 Tensilica Inc. + * + * Based on: + * + * Ethernet driver for Motorola MPC8xx. + * Copyright (c) 1997 Dan Malek ([EMAIL PROTECTED]) + * + * mcen302.c: A Linux network driver for Mototrola 68EN302 MCU + * + * Copyright (C) 1999 Aplio S.A. Written by Vadim Lebedev + * + * + * The Open Ethernet Controller is just a MAC, it needs to be + * combined with a PHY and buffer memory in order to create an + * ethernet device. Thus some of the hardware parameters are device + * specific. They need to be defined in asm/hardware.h. Example: + * + * The IRQ for the device: + * #define OETH_IRQ1 + * + * The address where the MAC registers are mapped: + * #define OETH_BASE_ADDR 0xFD03 + * + * The address where the MAC RX/TX buffers are mapped: + * #define OETH_SRAM_BUFF_BASE 0xFD80 + * + * Sizes for a RX or TX buffer: + * #define OETH_RX_BUFF_SIZE 2048 + * #define OETH_TX_BUFF_SIZE 2048 + * The number of RX and TX buffers: + * #define OETH_RXBD_NUM 16 + * #define OETH_TXBD_NUM 16 + * The PHY ID (needed if MII is enabled): + * #define OETH_PHY_ID 0 + * + * Code to perform the device specific initialization (REGS is a + * struct oeth_regs*): + * #define OETH_PLATFORM_SPECIFIC_INIT(REGS) + * it should at least initialize the device MAC address in + * REGS-mac_addr1 and REGS-mac_addr2. + * + */ + +#include linux/kernel.h +#include linux/string.h +#include linux/errno.h +#include linux/ioport.h +#include linux/slab.h +#include linux/interrupt.h +#include linux/delay.h +#include linux/init.h +#include linux/netdevice.h +#include linux/etherdevice.h +#include linux/skbuff.h +#include linux/module.h +#include linux/ethtool.h +#include linux/mii.h + +#include asm/hardware.h + +#include open_eth.h + +#define DRV_NAME OpencoresEthernet + +/* The Opencores Ethernet driver needs some parameters from the + * hardware implementation. They should be defined in the + asm/hardware.h file. */ + +#if 1 +#undef OETH_TXBD_NUM +#undef OETH_RXBD_NUM +#define OETH_RXBD_NUM 4 +#define OETH_TXBD_NUM 4 +/* #undef OETH_RX_BUFF_SIZE */ +/* #undef OETH_TX_BUFF_SIZE */ +/* #define OETH_RX_BUFF_SIZE 0x600 */ +/* #define OETH_TX_BUFF_SIZE 0x600 */ +#endif + +#define BUFFER_SCREWED 1 +/* #define BUFFER_SCREWED_ADDR (OETH_SRAM_BUFF_BASE + OETH_TXBD_NUM * OETH_TX_BUFF_SIZE + OETH_RXBD_NUM * OETH_RX_BUFF_SIZE + 4) */ +#define BUFFER_SCREWED_ADDR (0xfd803800 + 0x600) + +/* Debug helpers. */ +/* #define OETH_DEBUG_TRANSMIT */ +#ifdef OETH_DEBUG_TRANSMIT +#define OEDTX(x) x +#else +#define OEDTX(x) +#endif + +/* #define OETH_DEBUG_RECEIVE */ +#ifdef OETH_DEBUG_RECEIVE +#define OEDRX(x) x +#else +#define OEDRX(x) +#endif + +#define OETH_REGS_SIZE 0x1000 /* MAC registers + RX and TX descriptors */ +#define OETH_BD_BASE(OETH_BASE_ADDR + 0x400) +#define OETH_TOTAL_BD 128 + +/* The transmitter timeout FIXME: dann this needs to be handled */ +#define OETH_TX_TIMEOUT (2*HZ) + +/* The buffer descriptors track the ring buffers. */ +struct oeth_private { + struct oeth_regs *regs; /* Address of controller registers. */ + struct oeth_bd *rx_bd_base; /* Address of Rx BDs. */ + struct oeth_bd *tx_bd_base; /* Address of Tx BDs. */ + u8 tx_next; /* Next buffer to be sent */ + u8 tx_last; /* Next buffer to be checked if packet sent */ + u8 tx_full; /* Buffer ring full indicator */ + u8 rx_cur; /* Next buffer to be checked if packet received */ + +#if CONFIG_MII + struct mii_if_info mii_if; /* MII lib hooks/info */ +#endif + spinlock_t lock; + struct net_device_stats stats; +}; + +static int oeth_open(struct net_device *dev); +static int oeth_start_xmit(struct sk_buff *skb, struct net_device *dev); +static void oeth_rx(struct net_device *dev);
Re: [PATCH 2.6.19] AT91RM9200 Ethernet update 3
On 04 Dec 2006 14:50:24 +0200 Andrew Victor [EMAIL PROTECTED] wrote: A minor fix to the Atmel AT91RM9200 Ethernet driver. 1. Use dev_alloc_skb() instead of alloc_skb(). 2. It is not necessary to adjust skb-len manually. Signed-off-by: Andrew Victor [EMAIL PROTECTED] diff -urN linux-2.6.19-final.orig/drivers/net/arm/at91_ether.c linux-2.6.19-final/drivers/net/arm/at91_ether.c --- linux-2.6.19-final.orig/drivers/net/arm/at91_ether.c Mon Dec 4 14:42:05 2006 +++ linux-2.6.19-final/drivers/net/arm/at91_ether.c Mon Dec 4 14:43:57 2006 @@ -855,14 +855,13 @@ while (dlist-descriptors[lp-rxBuffIndex].addr EMAC_DESC_DONE) { p_recv = dlist-recv_buf[lp-rxBuffIndex]; pktlen = dlist-descriptors[lp-rxBuffIndex].size 0x7ff; /* Length of frame including FCS */ - skb = alloc_skb(pktlen + 2, GFP_ATOMIC); + skb = dev_alloc_skb(pktlen + 2); if (skb != NULL) { skb_reserve(skb, 2); memcpy(skb_put(skb, pktlen), p_recv, pktlen); skb-dev = dev; skb-protocol = eth_type_trans(skb, dev); - skb-len = pktlen; dev-last_rx = jiffies; lp-stats.rx_bytes += pktlen; netif_rx(skb); Use netdev_alloc_skb instead. It sets skb-dev so you don't have to. Setting skb-len is redundant since that is what skb_put() does. It would be best if you didn't have to copy data at all and could receive directly into the skb. If you have to copy received data, it is better to implement NAPI since that allows you to copy data in soft irq. The existing code will cause poor realtime performance since the driver is copying received data with IRQ's disabled. -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC patch] driver for the Opencores Ethernet Controller
On Mon, Dec 04, 2006 at 10:01:01AM -0800, Dan Nicolaescu wrote: The Opencores Ethernet Controller is Verilog code that can be used to implement an Ethernet device in hardware. It needs to be coupled with a PHY and some buffer memory. Because of that devices that implement this controller can be very different. The code here tries to support that by having some parameters that need to be defined at compile time. Considering this, why don't you make it a platform driver? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC patch] driver for the Opencores Ethernet Controller
Lennert Buytenhek [EMAIL PROTECTED] writes: On Mon, Dec 04, 2006 at 10:01:01AM -0800, Dan Nicolaescu wrote: The Opencores Ethernet Controller is Verilog code that can be used to implement an Ethernet device in hardware. It needs to be coupled with a PHY and some buffer memory. Because of that devices that implement this controller can be very different. The code here tries to support that by having some parameters that need to be defined at compile time. Considering this, why don't you make it a platform driver? I didn't know about platform drivers before your mail. I guess I could convert it to that if that is the right thing to do. (It might be an overkill given that the device is kind of simple and embedded people prefer small code...) Any comments on the driver itself? Thanks --dan - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
network devices don't handle pci_dma_mapping_error()'s
On Sat, 02 Dec 2006 00:32:55 -0500 Jeff Garzik [EMAIL PROTECTED] wrote: Amit S. Kale wrote: NetXen: 1G/10G Ethernet driver updates - These fixes take care of driver on machines with 4G memory - Driver cleanup Signed-off-by: Amit S. Kale [EMAIL PROTECTED] netxen_nic.h | 29 +-- netxen_nic_ethtool.c | 19 ++-- netxen_nic_hw.c |4 netxen_nic_hw.h |4 netxen_nic_init.c | 51 +++- netxen_nic_isr.c |3 netxen_nic_main.c | 204 +++--- netxen_nic_phan_reg.h | 10 +- NAK, the driver itself should not be doing bounce buffering - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html I notice that no current network driver handles dma mapping errors. Might that be part of the problem. On i386, this never happens, and it would be rare on most others. Why don't drivers do some checking/unwind. Here is what it would look like on Tx for sky2... --- sky2.orig/drivers/net/sky2.c2006-12-04 10:12:16.0 -0800 +++ sky2/drivers/net/sky2.c 2006-12-04 10:37:42.0 -0800 @@ -1277,6 +1277,38 @@ return count; } + +static inline void tx_le_done(struct sky2_port *sky2, unsigned idx) +{ + struct pci_dev *pdev = sky2-hw-pdev; + struct sky2_tx_le *le = sky2-tx_le + idx; + struct tx_ring_info *re = sky2-tx_ring + idx; + + switch(le-opcode ~HW_OWNER) { + case OP_LARGESEND: + case OP_PACKET: + pci_unmap_single(pdev, +pci_unmap_addr(re, mapaddr), +pci_unmap_len(re, maplen), +PCI_DMA_TODEVICE); + break; + case OP_BUFFER: + pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr), + pci_unmap_len(re, maplen), + PCI_DMA_TODEVICE); + break; + } + + if (le-ctrl EOP) { + if (unlikely(netif_msg_tx_done(sky2))) + printk(KERN_DEBUG %s: tx done %u\n, sky2-netdev-name, + idx); + dev_kfree_skb_any(re-skb); + } + + le-opcode = 0; /* paranoia */ +} + /* * Put one packet in ring for transmit. * A single packet can generate multiple list elements, and @@ -1292,7 +1324,7 @@ unsigned i, len; dma_addr_t mapping; u32 addr64; - u16 mss; + u16 mss, first; u8 ctrl; if (unlikely(tx_avail(sky2) tx_le_req(skb))) @@ -1303,7 +1335,13 @@ dev-name, sky2-tx_prod, skb-len); len = skb_headlen(skb); + first = sky2-tx_prod; mapping = pci_map_single(hw-pdev, skb-data, len, PCI_DMA_TODEVICE); + if (pci_dma_mapping_error(mapping)) { + printk(KERN_INFO %s: tx dma mapping error\n, dev-name); + dev_kfree_skb_any(skb); + return NETDEV_TX_OK; + } addr64 = high32(mapping); /* Send high bits if changed or crosses boundary */ @@ -1383,6 +1421,10 @@ mapping = pci_map_page(hw-pdev, frag-page, frag-page_offset, frag-size, PCI_DMA_TODEVICE); + + if (pci_dma_mapping_error(mapping)) + goto map_error; + addr64 = high32(mapping); if (addr64 != sky2-tx_addr64) { le = get_tx_le(sky2); @@ -1413,6 +1455,15 @@ dev-trans_start = jiffies; return NETDEV_TX_OK; + +map_error: + /* map failure on fragmented send, free work from first..sky2-tx_prod */ + printk(KERN_INFO %s: tx dma page mapping error\n, dev-name); + le-ctrl |= EOP; + for (i = first; i != sky2-tx_prod; i = RING_NEXT(i, TX_RING_SIZE)) + tx_le_done(sky2, i); + sky2-tx_prod = first; + return NETDEV_TX_OK; } /* @@ -1424,40 +1475,12 @@ static void sky2_tx_complete(struct sky2_port *sky2, u16 done) { struct net_device *dev = sky2-netdev; - struct pci_dev *pdev = sky2-hw-pdev; unsigned idx; BUG_ON(done = TX_RING_SIZE); - for (idx = sky2-tx_cons; idx != done; -idx = RING_NEXT(idx, TX_RING_SIZE)) { - struct sky2_tx_le *le = sky2-tx_le + idx; - struct tx_ring_info *re = sky2-tx_ring + idx; - - switch(le-opcode ~HW_OWNER) { - case OP_LARGESEND: - case OP_PACKET: - pci_unmap_single(pdev, -pci_unmap_addr(re, mapaddr), -pci_unmap_len(re, maplen), -PCI_DMA_TODEVICE); - break; - case OP_BUFFER: -
Re: [RFC patch] driver for the Opencores Ethernet Controller
On Mon, 04 Dec 2006 10:01:01 -0800 Dan Nicolaescu [EMAIL PROTECTED] wrote: Hi, Here is a driver for the Opencores Ethernet Controller. I started from a 2.4 uClinux driver, ported it to 2.6, made it work, cleaned it up and added the MII interface. The Opencores Ethernet Controller is Verilog code that can be used to implement an Ethernet device in hardware. It needs to be coupled with a PHY and some buffer memory. Because of that devices that implement this controller can be very different. The code here tries to support that by having some parameters that need to be defined at compile time. This is my first Ethernet driver, so comments/advice would be appreciated. Thanks --dan Kconfig|5 open_eth.c | 1022 + open_eth.h | 132 +++ 3 files changed, 1159 insertions(+) Please run through scripts/Lindent or cleanup style. Also has trailing whitespace. --- /dev/null 2006-09-20 11:38:04.545479250 -0700 +++ drivers/net/open_eth.c2006-12-04 09:20:17.0 -0800 @@ -0,0 +1,1022 @@ +/* + * Ethernet driver for Open Ethernet Controller (www.opencores.org). + * Copyright (c) 2002 Simon Srot ([EMAIL PROTECTED]) + * Copyright (c) 2006 Tensilica Inc. + * + * Based on: + * + * Ethernet driver for Motorola MPC8xx. + * Copyright (c) 1997 Dan Malek ([EMAIL PROTECTED]) + * + * mcen302.c: A Linux network driver for Mototrola 68EN302 MCU + * + * Copyright (C) 1999 Aplio S.A. Written by Vadim Lebedev + * + * + * The Open Ethernet Controller is just a MAC, it needs to be + * combined with a PHY and buffer memory in order to create an + * ethernet device. Thus some of the hardware parameters are device + * specific. They need to be defined in asm/hardware.h. Example: + * + * The IRQ for the device: + * #define OETH_IRQ1 + * + * The address where the MAC registers are mapped: + * #define OETH_BASE_ADDR 0xFD03 + * + * The address where the MAC RX/TX buffers are mapped: + * #define OETH_SRAM_BUFF_BASE 0xFD80 + * + * Sizes for a RX or TX buffer: + * #define OETH_RX_BUFF_SIZE 2048 + * #define OETH_TX_BUFF_SIZE 2048 + * The number of RX and TX buffers: + * #define OETH_RXBD_NUM 16 + * #define OETH_TXBD_NUM 16 + * The PHY ID (needed if MII is enabled): + * #define OETH_PHY_ID 0 + * + * Code to perform the device specific initialization (REGS is a + * struct oeth_regs*): + * #define OETH_PLATFORM_SPECIFIC_INIT(REGS) + * it should at least initialize the device MAC address in + * REGS-mac_addr1 and REGS-mac_addr2. + * + */ + +#include linux/kernel.h +#include linux/string.h +#include linux/errno.h +#include linux/ioport.h +#include linux/slab.h +#include linux/interrupt.h +#include linux/delay.h +#include linux/init.h +#include linux/netdevice.h +#include linux/etherdevice.h +#include linux/skbuff.h +#include linux/module.h +#include linux/ethtool.h +#include linux/mii.h + +#include asm/hardware.h + +#include open_eth.h + +#define DRV_NAME OpencoresEthernet + +/* The Opencores Ethernet driver needs some parameters from the + * hardware implementation. They should be defined in the + asm/hardware.h file. */ + +#if 1 +#undef OETH_TXBD_NUM +#undef OETH_RXBD_NUM +#define OETH_RXBD_NUM 4 +#define OETH_TXBD_NUM 4 +/* #undef OETH_RX_BUFF_SIZE */ +/* #undef OETH_TX_BUFF_SIZE */ +/* #define OETH_RX_BUFF_SIZE 0x600 */ +/* #define OETH_TX_BUFF_SIZE 0x600 */ +#endif Gack, just put in correct define's avoid adding conditional compilation stuff. +#define BUFFER_SCREWED 1 +/* #define BUFFER_SCREWED_ADDR (OETH_SRAM_BUFF_BASE + OETH_TXBD_NUM * OETH_TX_BUFF_SIZE + OETH_RXBD_NUM * OETH_RX_BUFF_SIZE + 4) */ +#define BUFFER_SCREWED_ADDR (0xfd803800 + 0x600) + +/* Debug helpers. */ +/* #define OETH_DEBUG_TRANSMIT */ +#ifdef OETH_DEBUG_TRANSMIT +#define OEDTX(x) x +#else +#define OEDTX(x) +#endif + +/* #define OETH_DEBUG_RECEIVE */ +#ifdef OETH_DEBUG_RECEIVE +#define OEDRX(x) x +#else +#define OEDRX(x) +#endif + +#define OETH_REGS_SIZE 0x1000 /* MAC registers + RX and TX descriptors */ +#define OETH_BD_BASE(OETH_BASE_ADDR + 0x400) +#define OETH_TOTAL_BD 128 + +/* The transmitter timeout FIXME: dann this needs to be handled */ +#define OETH_TX_TIMEOUT (2*HZ) + +/* The buffer descriptors track the ring buffers. */ +struct oeth_private { + struct oeth_regs *regs; /* Address of controller registers. */ + struct oeth_bd *rx_bd_base; /* Address of Rx BDs. */ + struct oeth_bd *tx_bd_base; /* Address of Tx BDs. */ + u8 tx_next; /* Next buffer to be sent */ + u8 tx_last; /* Next buffer to be checked if packet sent */ + u8 tx_full; /* Buffer ring full indicator */ + u8 rx_cur; /* Next buffer to be
Re: [patch 3/6] 2.6.18: sb1250-mac: Phylib IRQ handling fixes
On Nov 30, 2006, at 12:07, Maciej W. Rozycki wrote: On Mon, 23 Oct 2006, Maciej W. Rozycki wrote: I'm not too enthusiastic about requiring the ethernet drivers to call phy_disconnect in a separate thread after close is called. Assuming there's not some sort of squash work queue function that can be invoked with rtnl_lock held, I think phy_disconnect should schedule itself to flush the queue. This would also require that mdiobus_unregister hold off on freeing phydevs if any of the phys were still waiting for pending flush_pending calls to finish. Which would, in turn, require mdiobus_unregister to schedule cleaning up memory for some later time. This could work, indeed. I'm not enthusiastic about that implementation, either, but it maintains the abstractions I consider important for this code. The ethernet driver should not need to know what structures the PHY lib uses to implement its interrupt handling, and how to work around their failings, IMHO. Agreed. So what's the plan? Here's a new version of the patch that addresses your other concerns. So I think the problem is we still don't understand the problem, and the solution to the problem, except that it's causing your driver to lock up. Most of the changes below are fine with me. The confusing one is still the check for current_is_keventd(). This is related in some way to why the driver code invokes phy_disconnect from a work_queue. I admit, though, I'm not familiar enough with the work queue infrastructure to understand the problem. But I'm very certain that creating a work queue for the sole purpose of disconnecting from the PHY is crufty. Can you try again to convey how this solves your problem, so we can try to figure out if there's a better way? Andy - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][XFRM] Optimize policy dumping
On Mon, 2006-04-12 at 18:59 +0100, Patrick McHardy wrote: jamal wrote: I'd prefer if you did it since you're already testing the thing :) Ok, will do shortly. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][XFRM] Optimize policy dumping
Dave, This one has undergone more scrutiny and testing. Against net-2.6.20 cheers, jamal [XFRM] Optimize policy dumping This change optimizes the dumping of Security policies. 1) Before this change .. speedopolis:~# time ./ip xf pol real0m22.274s user0m0.000s sys 0m22.269s 2) Turn off sub-policies speedopolis:~# ./ip xf pol real0m13.496s user0m0.000s sys 0m13.493s i suppose the above is to be expected 3) With this change .. speedopolis:~# time ./ip x policy real0m7.901s user0m0.008s sys 0m7.896s - This is probably the best we can do for now. The current code attempts to work well for PFKEY which has a broken two phase semantic. From RFC 2367: 3.1.10 SADB_DUMP The SADB_DUMP message causes the kernel to dump the operating system's entire Key Table to the requesting key socket. Each Security Association is returned in its own SADB_DUMP message. A SADB_DUMP message with a sadb_seq field of zero indicates the end of the dump transaction. The dump message is used for debugging purposes only and is not intended for production use. Support for the dump message MAY be discontinued in future versions of PF_KEY. Key management applications MUST NOT depend on this message for basic operation. Note the funny comment above on the dump message being discontinued at some point and is only for debugging ;- The way to eventually fix this IMO and reach the goals stated by Davem of making pfkey more robust is to add to pfkey a socket-cb structure. For now i think this even improves the pfkey by reducing the compute. The advantages are noticeable when you have a large number of policies installed. Signed-off-by: Jamal Hadi Salim [EMAIL PROTECTED] --- commit 33b1e3fcdaee3252cce3c1cadf93a4d81f2200ff tree 584411b6ad0ac830cc39dd184ccb32573739036d parent 5465ae68b5ec11b2820db3f9b4c6fd94f113da44 author Patrick McHardy [EMAIL PROTECTED] Mon, 04 Dec 2006 15:33:48 -0500 committer Jamal Hadi Salim [EMAIL PROTECTED] Mon, 04 Dec 2006 15:33:48 -0500 net/xfrm/xfrm_policy.c | 55 ++-- 1 files changed, 25 insertions(+), 30 deletions(-) diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index 64d3938..c438035 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -860,33 +860,12 @@ EXPORT_SYMBOL(xfrm_policy_flush); int xfrm_policy_walk(u8 type, int (*func)(struct xfrm_policy *, int, int, void*), void *data) { - struct xfrm_policy *pol; + struct xfrm_policy *pol, *last = NULL; struct hlist_node *entry; - int dir, count, error; + int dir, last_dir = 0, count, error; read_lock_bh(xfrm_policy_lock); count = 0; - for (dir = 0; dir 2*XFRM_POLICY_MAX; dir++) { - struct hlist_head *table = xfrm_policy_bydst[dir].table; - int i; - - hlist_for_each_entry(pol, entry, -xfrm_policy_inexact[dir], bydst) { - if (pol-type == type) - count++; - } - for (i = xfrm_policy_bydst[dir].hmask; i = 0; i--) { - hlist_for_each_entry(pol, entry, table + i, bydst) { - if (pol-type == type) - count++; - } - } - } - - if (count == 0) { - error = -ENOENT; - goto out; - } for (dir = 0; dir 2*XFRM_POLICY_MAX; dir++) { struct hlist_head *table = xfrm_policy_bydst[dir].table; @@ -896,21 +875,37 @@ int xfrm_policy_walk(u8 type, int (*func)(struct xfrm_policy *, int, int, void*) xfrm_policy_inexact[dir], bydst) { if (pol-type != type) continue; - error = func(pol, dir % XFRM_POLICY_MAX, --count, data); - if (error) - goto out; + if (last) { + error = func(last, last_dir % XFRM_POLICY_MAX, +count, data); + if (error) + goto out; + } + last = pol; + last_dir = dir; + count++; } for (i = xfrm_policy_bydst[dir].hmask; i = 0; i--) { hlist_for_each_entry(pol, entry, table + i, bydst) { if (pol-type != type) continue; - error = func(pol, dir % XFRM_POLICY_MAX, --count, data); - if (error) - goto out; + if (last) { +
[PATCH][XFRM] Optimize SA dumping
The SA version cheers, jamal [XFRM] Optimize SA dumping Same comments as in [XFRM] Optimize policy dumping The numbers are (20K SAs): -- 1) before the change .. speedopolis:~# time ./ip xf sta real0m5.321s user0m0.004s sys 0m5.316s 2) after the change ... speedopolis:~# time ./ip x state real0m1.985s user0m0.000s sys 0m1.984s -- Signed-off-by: Jamal Hadi Salim [EMAIL PROTECTED] --- commit 55a2dc3caa86e03cb3d9e856215e2fceb5cf5f66 tree 42abe39f60fe9a22ec54c7bdbf358842a94e82e9 parent 33b1e3fcdaee3252cce3c1cadf93a4d81f2200ff author Patrick McHardy [EMAIL PROTECTED] Mon, 04 Dec 2006 15:41:31 -0500 committer Jamal Hadi Salim [EMAIL PROTECTED] Mon, 04 Dec 2006 15:41:31 -0500 net/xfrm/xfrm_state.c | 24 +++- 1 files changed, 11 insertions(+), 13 deletions(-) diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index 864962b..11da3d3 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -1099,7 +1099,7 @@ int xfrm_state_walk(u8 proto, int (*func)(struct xfrm_state *, int, void*), void *data) { int i; - struct xfrm_state *x; + struct xfrm_state *x, *last = NULL; struct hlist_node *entry; int count = 0; int err = 0; @@ -1107,24 +1107,22 @@ int xfrm_state_walk(u8 proto, int (*func)(struct xfrm_state *, int, void*), spin_lock_bh(xfrm_state_lock); for (i = 0; i = xfrm_state_hmask; i++) { hlist_for_each_entry(x, entry, xfrm_state_bydst+i, bydst) { - if (xfrm_id_proto_match(x-id.proto, proto)) - count++; + if (!xfrm_id_proto_match(x-id.proto, proto)) + continue; + if (last) { + err = func(last, count, data); + if (err) + goto out; + } + last = x; + count++; } } if (count == 0) { err = -ENOENT; goto out; } - - for (i = 0; i = xfrm_state_hmask; i++) { - hlist_for_each_entry(x, entry, xfrm_state_bydst+i, bydst) { - if (!xfrm_id_proto_match(x-id.proto, proto)) - continue; - err = func(x, --count, data); - if (err) - goto out; - } - } + err = func(last, 0, data); out: spin_unlock_bh(xfrm_state_lock); return err;
[RFC][GENETLINK] move command capabilities to flags
On Mon, 2006-04-12 at 19:00 +0100, Thomas Graf wrote: * jamal [EMAIL PROTECTED] 2006-12-04 12:48 We can resolve that by uping the version for the controller. User will use that a signal. Good idea, makes me happy :-) please do that in your patch as well. Ok, here she goes... If looks good, Dave please apply to net-2.6.20 cheers, jamal This patch moves command capabilities to command flags. Other than being cleaner, saves several bytes. We increment the nlctrl version so as to signal to user space that to not expect the attributes. We will try to be careful not to do this too often ;- Signed-off-by: Jamal Hadi Salim [EMAIL PROTECTED] --- commit 49ba7537cec41f0db2500054b5fde3c193c37a97 tree f56cb78a17a9847df33101fd4a807617f74d27c6 parent 55a2dc3caa86e03cb3d9e856215e2fceb5cf5f66 author Jamal Hadi Salim [EMAIL PROTECTED] Mon, 04 Dec 2006 15:49:06 -0500 committer Jamal Hadi Salim [EMAIL PROTECTED] Mon, 04 Dec 2006 15:49:06 -0500 include/linux/genetlink.h |6 +++--- net/netlink/genetlink.c | 18 -- 2 files changed, 11 insertions(+), 13 deletions(-) diff --git a/include/linux/genetlink.h b/include/linux/genetlink.h index 9049dc6..f7a9377 100644 --- a/include/linux/genetlink.h +++ b/include/linux/genetlink.h @@ -17,6 +17,9 @@ struct genlmsghdr { #define GENL_HDRLENNLMSG_ALIGN(sizeof(struct genlmsghdr)) #define GENL_ADMIN_PERM0x01 +#define GENL_CMD_CAP_DO0x02 +#define GENL_CMD_CAP_DUMP 0x04 +#define GENL_CMD_CAP_HASPOL0x08 /* * List of reserved static generic netlink identifiers: @@ -58,9 +61,6 @@ enum { CTRL_ATTR_OP_UNSPEC, CTRL_ATTR_OP_ID, CTRL_ATTR_OP_FLAGS, - CTRL_ATTR_OP_POLICY, - CTRL_ATTR_OP_DOIT, - CTRL_ATTR_OP_DUMPIT, __CTRL_ATTR_OP_MAX, }; diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c index cc874f0..3e37ea5 100644 --- a/net/netlink/genetlink.c +++ b/net/netlink/genetlink.c @@ -143,6 +143,13 @@ int genl_register_ops(struct genl_family *family, struct genl_ops *ops) goto errout; } + if (ops-dumpit) + ops-flags |= GENL_CMD_CAP_DO; + if (ops-doit) + ops-flags |= GENL_CMD_CAP_DUMP; + if (ops-policy) + ops-flags |= GENL_CMD_CAP_HASPOL; + genl_lock(); list_add_tail(ops-ops_list, family-ops_list); genl_unlock(); @@ -387,7 +394,7 @@ static void genl_rcv(struct sock *sk, int len) static struct genl_family genl_ctrl = { .id = GENL_ID_CTRL, .name = nlctrl, - .version = 0x1, + .version = 0x2, .maxattr = CTRL_ATTR_MAX, }; @@ -425,15 +432,6 @@ static int ctrl_fill_info(struct genl_family *family, u32 pid, u32 seq, NLA_PUT_U32(skb, CTRL_ATTR_OP_ID, ops-cmd); NLA_PUT_U32(skb, CTRL_ATTR_OP_FLAGS, ops-flags); - if (ops-policy) - NLA_PUT_FLAG(skb, CTRL_ATTR_OP_POLICY); - - if (ops-doit) - NLA_PUT_FLAG(skb, CTRL_ATTR_OP_DOIT); - - if (ops-dumpit) - NLA_PUT_FLAG(skb, CTRL_ATTR_OP_DUMPIT); - nla_nest_end(skb, nest); }
Re: [RFC][GENETLINK] move command capabilities to flags
* jamal [EMAIL PROTECTED] 2006-12-04 16:07 On Mon, 2006-04-12 at 19:00 +0100, Thomas Graf wrote: * jamal [EMAIL PROTECTED] 2006-12-04 12:48 We can resolve that by uping the version for the controller. User will use that a signal. Good idea, makes me happy :-) please do that in your patch as well. Ok, here she goes... If looks good, Dave please apply to net-2.6.20 Looks good. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 8/8] net: smc91x add missing bracket
From: Mariusz Kozlowski [EMAIL PROTECTED] Signed-off-by: Mariusz Kozlowski [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] --- drivers/net/smc91x.h |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff -puN drivers/net/smc91x.h~net-smc91x-add-missing-bracket drivers/net/smc91x.h --- a/drivers/net/smc91x.h~net-smc91x-add-missing-bracket +++ a/drivers/net/smc91x.h @@ -238,7 +238,7 @@ SMC_outw(u16 val, void __iomem *ioaddr, #define SMC_CAN_USE_16BIT 1 #define SMC_CAN_USE_32BIT 0 -#define SMC_inb(a, r) inb((u32)a) + (r)) +#define SMC_inb(a, r) inb(((u32)a) + (r)) #define SMC_inw(a, r) inw(((u32)a) + (r)) #define SMC_outb(v, a, r) outb(v, ((u32)a) + (r)) #define SMC_outw(v, a, r) outw(v, ((u32)a) + (r)) _ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 2/8] 8139too: force media setting cleanup
From: Bernard Lee [EMAIL PROTECTED] Setting bit 4 5 alone in 8139too module media option does not really force 100Mbps full-duplex mode. When media option bit 0-3 is cleared, 8139too module does not force media setting. Therefore, bit 0-3 requires to be set for bit 4 5 to take effect. The hidden bit 0-3 setting is not stated in module description. It can be fixed by changing rtl8139_private structure default_port bitfield from 4-bit to 6-bit. Besides, module media bit 9 is a duplicate of bit 4 (full-duplex). It is suggested that bit 9 is freed. A remark is added to module description that bit 0 can be used to force setting. It helps to clarify 10Mbps half-duplex mode. Signed-off-by: Bernard Lee [EMAIL PROTECTED] Cc: Jeff Garzik [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] --- drivers/net/8139too.c |8 1 file changed, 4 insertions(+), 4 deletions(-) diff -puN drivers/net/8139too.c~8139too-force-media-setting-fix drivers/net/8139too.c --- a/drivers/net/8139too.c~8139too-force-media-setting-fix +++ a/drivers/net/8139too.c @@ -586,7 +586,7 @@ struct rtl8139_private { signed char phys[4];/* MII device addresses. */ char twistie, twist_row, twist_col; /* Twister tune state. */ unsigned int watchdog_fired : 1; - unsigned int default_port : 4; /* Last dev-if_port value. */ + unsigned int default_port : 6; /* Last dev-if_port value. */ unsigned int have_thread : 1; spinlock_t lock; spinlock_t rx_lock; @@ -612,7 +612,7 @@ module_param_array(full_duplex, int, NUL module_param(debug, int, 0); MODULE_PARM_DESC (debug, 8139too bitmapped message enable number); MODULE_PARM_DESC (multicast_filter_limit, 8139too maximum number of filtered multicast addresses); -MODULE_PARM_DESC (media, 8139too: Bits 4+9: force full duplex, bit 5: 100Mbps); +MODULE_PARM_DESC (media, 8139too: bit 0: force setting, bit 4: full duplex, bit 5: 100Mbps); MODULE_PARM_DESC (full_duplex, 8139too: Force full duplex for board(s) (1)); static int read_eeprom (void __iomem *ioaddr, int location, int addr_len); @@ -1068,8 +1068,8 @@ static int __devinit rtl8139_init_one (s /* The lower four bits are the media type. */ option = (board_idx = MAX_UNITS) ? 0 : media[board_idx]; if (option 0) { - tp-mii.full_duplex = (option 0x210) ? 1 : 0; - tp-default_port = option 0xFF; + tp-mii.full_duplex = (option 0x10) ? 1 : 0; + tp-default_port = option 0x3F; if (tp-default_port) tp-mii.force_media = 1; } _ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 7/8] sk98lin debug build fix
From: Mariusz Kozlowski [EMAIL PROTECTED] Fix parenthesis mismatch. Signed-off-by: Andrew Morton [EMAIL PROTECTED] --- drivers/net/sk98lin/skgesirq.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff -puN drivers/net/sk98lin/skgesirq.c~sk98lin-debug-build-fix drivers/net/sk98lin/skgesirq.c --- a/drivers/net/sk98lin/skgesirq.c~sk98lin-debug-build-fix +++ a/drivers/net/sk98lin/skgesirq.c @@ -1319,7 +1319,7 @@ SK_BOOL AutoNeg)/* Is Auto-negotiation SkXmPhyRead(pAC, IoC, Port, PHY_BCOM_INT_STAT, Isrc); #ifdef xDEBUG - if ((Isrc ~(PHY_B_IS_HCT | PHY_B_IS_LCT) == + if ((Isrc ~(PHY_B_IS_HCT | PHY_B_IS_LCT)) == (PHY_B_IS_SCR_S_ER | PHY_B_IS_RRS_CHANGE | PHY_B_IS_LRS_CHANGE)) { SK_U32 Stat1, Stat2, Stat3; _ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 4/8] bonding: incorrect bonding state reported via ioctl
From: Andy Gospodarek [EMAIL PROTECTED] This is a small fix-up to finish out the work done by Jay Vosburgh to add carrier-state support for bonding devices. The output in /proc/net/bonding/bondX was correct, but when collecting the same info via an iotcl it could still be incorrect. Signed-off-by: Andy Gospodarek [EMAIL PROTECTED] Cc: Jeff Garzik [EMAIL PROTECTED] Cc: Stephen Hemminger [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] --- drivers/net/bonding/bond_main.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff -puN drivers/net/bonding/bond_main.c~bonding-incorrect-bonding-state-reported-via-ioctl drivers/net/bonding/bond_main.c --- a/drivers/net/bonding/bond_main.c~bonding-incorrect-bonding-state-reported-via-ioctl +++ a/drivers/net/bonding/bond_main.c @@ -3684,7 +3684,7 @@ static int bond_do_ioctl(struct net_devi mii-val_out = 0; read_lock_bh(bond-lock); read_lock(bond-curr_slave_lock); - if (bond-curr_active_slave) { + if (netif_carrier_ok(bond-dev)) { mii-val_out = BMSR_LSTATUS; } read_unlock(bond-curr_slave_lock); _ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 1/8] Update smc91x driver with ARM Versatile board info
From: Deepak Saxena [EMAIL PROTECTED] We need to specify a Versatile-specific SMC_IRQ_FLAGS value or the new generic IRQ layer will complain thusly: No IRQF_TRIGGER set_type function for IRQ 25 (NULL) Signed-off-by: Deepak Saxena [EMAIL PROTECTED] Cc: Jeff Garzik [EMAIL PROTECTED] Cc: Russell King [EMAIL PROTECTED] Cc: Nicolas Pitre [EMAIL PROTECTED] On Fri, 20 Oct 2006 22:50:40 +0100 Russell King [EMAIL PROTECTED] wrote: On Fri, Oct 20, 2006 at 02:42:04PM -0700, [EMAIL PROTECTED] wrote: We need to specify a Versatile-specific SMC_IRQ_FLAGS value or the new generic IRQ layer will complain thusly: I don't think I heard anything back from my previous suggestion that the IRQ flags are passed through the platform device IRQ resource. Doing so would avoid adding yet another platform specific block into the file. BTW, Integrator platforms will also suffer from this, which will add another ifdef to this header. Let's do it right and arrange to pass these flags from the platform code. It's not like they're in a critical path. Signed-off-by: Andrew Morton [EMAIL PROTECTED] --- drivers/net/smc91x.h | 18 ++ 1 file changed, 18 insertions(+) diff -puN drivers/net/smc91x.h~update-smc91x-driver-with-arm-versatile-board-info drivers/net/smc91x.h --- a/drivers/net/smc91x.h~update-smc91x-driver-with-arm-versatile-board-info +++ a/drivers/net/smc91x.h @@ -434,6 +434,24 @@ static inline void LPD7_SMC_outsw (unsig #define SMC_IRQ_FLAGS (0) +#elif defined(CONFIG_ARCH_VERSATILE) + +#define SMC_CAN_USE_8BIT 1 +#define SMC_CAN_USE_16BIT 1 +#define SMC_CAN_USE_32BIT 1 +#define SMC_NOWAIT 1 + +#define SMC_inb(a, r) readb((a) + (r)) +#define SMC_inw(a, r) readw((a) + (r)) +#define SMC_inl(a, r) readl((a) + (r)) +#define SMC_outb(v, a, r) writeb(v, (a) + (r)) +#define SMC_outw(v, a, r) writew(v, (a) + (r)) +#define SMC_outl(v, a, r) writel(v, (a) + (r)) +#define SMC_insl(a, r, p, l) readsl((a) + (r), p, l) +#define SMC_outsl(a, r, p, l) writesl((a) + (r), p, l) + +#define SMC_IRQ_FLAGS (0) + #else #define SMC_CAN_USE_8BIT 1 _ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 6/8] declance: Support the I/O ASIC LANCE w/o TURBOchannel
From: Maciej W. Rozycki [EMAIL PROTECTED] The onboard LANCE of I/O ASIC systems is not a TURBOchannel device, at least from the software point of view. Therefore it does not rely on any kernel TURBOchannel bus services and can be supported even if support for TURBOchannel has not been enabled in the configuration. Tested with the onboard LANCE of a DECstation 5000/133. Signed-off-by: Maciej W. Rozycki [EMAIL PROTECTED] Cc: Jeff Garzik [EMAIL PROTECTED] Cc: Ralf Baechle [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] --- drivers/net/declance.c |6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff -puN drivers/net/declance.c~declance-support-the-i-o-asic-lance-w-o-turbochannel drivers/net/declance.c --- a/drivers/net/declance.c~declance-support-the-i-o-asic-lance-w-o-turbochannel +++ a/drivers/net/declance.c @@ -1065,7 +1065,6 @@ static int __init dec_lance_init(const i lp-type = type; lp-slot = slot; switch (type) { -#ifdef CONFIG_TC case ASIC_LANCE: dev-base_addr = CKSEG1ADDR(dec_kn_slot_base + IOASIC_LANCE); @@ -1109,7 +1108,7 @@ static int __init dec_lance_init(const i CPHYSADDR(dev-mem_start) 3); break; - +#ifdef CONFIG_TC case PMAD_LANCE: claim_tc_card(slot); @@ -1140,7 +1139,6 @@ static int __init dec_lance_init(const i break; #endif - case PMAX_LANCE: dev-irq = dec_interrupt[DEC_IRQ_LANCE]; dev-base_addr = CKSEG1ADDR(KN01_SLOT_BASE + KN01_LANCE); @@ -1295,10 +1293,8 @@ static int __init dec_lance_probe(void) /* Then handle onboard devices. */ if (dec_interrupt[DEC_IRQ_LANCE] = 0) { if (dec_interrupt[DEC_IRQ_LANCE_MERR] = 0) { -#ifdef CONFIG_TC if (dec_lance_init(ASIC_LANCE, -1) = 0) count++; -#endif } else if (!TURBOCHANNEL) { if (dec_lance_init(PMAX_LANCE, -1) = 0) count++; _ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 5/8] declance: Fix PMAX and PMAD support
From: Maciej W. Rozycki [EMAIL PROTECTED] The shared buffer used by the LANCE on the PMAX only supports halfword (16-bit) accesses. And the PMAD has the buffer wired differently. This is a change to fix these issues. Tested with a DECstation 2100 (thanks Flo for making this possible) and a DECstation 5000/133 (both the PMAD and the onboard LANCE). Signed-off-by: Maciej W. Rozycki [EMAIL PROTECTED] Cc: Jeff Garzik [EMAIL PROTECTED] Cc: Ralf Baechle [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] --- drivers/net/declance.c | 398 --- 1 file changed, 207 insertions(+), 191 deletions(-) diff -puN drivers/net/declance.c~declance-fix-pmax-and-pmad-support drivers/net/declance.c --- a/drivers/net/declance.c~declance-fix-pmax-and-pmad-support +++ a/drivers/net/declance.c @@ -40,6 +40,10 @@ * * v0.009: Module support fixes, multiple interfaces support, various * bits. macro + * + * v0.010: Fixes for the PMAD mapping of the LANCE buffer and for the + * PMAX requirement to only use halfword accesses to the + * buffer. macro */ #include linux/crc32.h @@ -54,6 +58,7 @@ #include linux/spinlock.h #include linux/stddef.h #include linux/string.h +#include linux/types.h #include asm/addrspace.h #include asm/system.h @@ -67,7 +72,7 @@ #include asm/dec/tc.h static char version[] __devinitdata = -declance.c: v0.009 by Linux MIPS DECstation task force\n; +declance.c: v0.010 by Linux MIPS DECstation task force\n; MODULE_AUTHOR(Linux MIPS DECstation task force); MODULE_DESCRIPTION(DEC LANCE (DECstation onboard, PMAD-xx) driver); @@ -110,24 +115,25 @@ MODULE_LICENSE(GPL); #defineLE_C3_BCON 0x1 /* Byte control */ /* Receive message descriptor 1 */ -#define LE_R1_OWN 0x80 /* Who owns the entry */ -#define LE_R1_ERR 0x40 /* Error: if FRA, OFL, CRC or BUF is set */ -#define LE_R1_FRA 0x20 /* FRA: Frame error */ -#define LE_R1_OFL 0x10 /* OFL: Frame overflow */ -#define LE_R1_CRC 0x08 /* CRC error */ -#define LE_R1_BUF 0x04 /* BUF: Buffer error */ -#define LE_R1_SOP 0x02 /* Start of packet */ -#define LE_R1_EOP 0x01 /* End of packet */ -#define LE_R1_POK 0x03 /* Packet is complete: SOP + EOP */ - -#define LE_T1_OWN 0x80 /* Lance owns the packet */ -#define LE_T1_ERR 0x40 /* Error summary */ -#define LE_T1_EMORE 0x10 /* Error: more than one retry needed */ -#define LE_T1_EONE 0x08 /* Error: one retry needed */ -#define LE_T1_EDEF 0x04 /* Error: deferred */ -#define LE_T1_SOP 0x02 /* Start of packet */ -#define LE_T1_EOP 0x01 /* End of packet */ -#define LE_T1_POK 0x03/* Packet is complete: SOP + EOP */ +#define LE_R1_OWN 0x8000 /* Who owns the entry */ +#define LE_R1_ERR 0x4000 /* Error: if FRA, OFL, CRC or BUF is set */ +#define LE_R1_FRA 0x2000 /* FRA: Frame error */ +#define LE_R1_OFL 0x1000 /* OFL: Frame overflow */ +#define LE_R1_CRC 0x0800 /* CRC error */ +#define LE_R1_BUF 0x0400 /* BUF: Buffer error */ +#define LE_R1_SOP 0x0200 /* Start of packet */ +#define LE_R1_EOP 0x0100 /* End of packet */ +#define LE_R1_POK 0x0300 /* Packet is complete: SOP + EOP */ + +/* Transmit message descriptor 1 */ +#define LE_T1_OWN 0x8000 /* Lance owns the packet */ +#define LE_T1_ERR 0x4000 /* Error summary */ +#define LE_T1_EMORE0x1000 /* Error: more than one retry needed */ +#define LE_T1_EONE 0x0800 /* Error: one retry needed */ +#define LE_T1_EDEF 0x0400 /* Error: deferred */ +#define LE_T1_SOP 0x0200 /* Start of packet */ +#define LE_T1_EOP 0x0100 /* End of packet */ +#define LE_T1_POK 0x0300 /* Packet is complete: SOP + EOP */ #define LE_T3_BUF 0x8000 /* Buffer error */ #define LE_T3_UFL 0x4000 /* Error underflow */ @@ -156,69 +162,57 @@ MODULE_LICENSE(GPL); #undef TEST_HITS #define ZERO 0 -/* The DS2000/3000 have a linear 64 KB buffer. - - * The PMAD-AA has 128 kb buffer on-board. +/* + * The DS2100/3100 have a linear 64 kB buffer which supports halfword + * accesses only. Each halfword of the buffer is word-aligned in the + * CPU address space. + * + * The PMAD-AA has a 128 kB buffer on-board. * - * The IOASIC LANCE devices use a shared memory region. This region as seen - * from the CPU is (max) 128 KB long and has to be on an 128 KB boundary. - * The LANCE sees this as a 64 KB long continuous memory region. + * The IOASIC LANCE devices use a shared memory region. This region + * as seen from the CPU is (max) 128 kB long and has to be on an 128 kB + * boundary. The LANCE sees this as a 64 kB long continuous memory + * region. * - * The LANCE's DMA address is used as an index in this buffer and DMA takes - * place in bursts of eight 16-Bit words which are packed into four 32-Bit words - * by the IOASIC. This leads to a strange
Generic Netlink doc now wiki-ized
Clicky clicky... * http://linux-net.osdl.org/index.php/Generic_Netlink_HOWTO I took the latest version of the text document and put it on the OSDL networking wiki. The content is unchanged but I messed with the formatting a fair amount so it meshed with the wiki formatting. That said, I will admit that my wiki-foo is pretty weak so if anybody wants to take a stab at it please do. I'll write up a small patch sometime tomorrow for Documentation/ that points to the wiki page and send it off to the list. Thanks to everybody who sent me comments on the document on and off the list. -- paul moore linux security @ hp - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] myri10ge: write as 2 32-byte blocks in myri10ge_submit_8rx
Brice Goglin wrote: In the myri10ge_submit_8rx() routine, write the 64 byte request block as 2 32-byte blocks so that it is handled by the hardware pio write handler if write-combining is enabled. Signed-off-by: Brice Goglin [EMAIL PROTECTED] --- drivers/net/myri10ge/myri10ge.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) ACK, can you resend due to patch collision in #upstream? Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.19] AT91RM9200 Ethernet update 2
Andrew Victor wrote: This patch adds NetPoll / NetConsole support to the Atmel AT91RM9200 Ethernet driver. Original patch from Bill Gatliff. Signed-off-by: Andrew Victor [EMAIL PROTECTED] ACK patch content, but comments about email description / subject line also apply here. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.19] AT91RM9200 Ethernet update 1
NAK. see inline comments in quoted text. Andrew Victor wrote: Please fix your email's subject line per http://linux.yyz.us/patch-format.html The subject is a one-line summary that tells us not only what driver is being update, but also summarizes the changes in the patch. update 1 tells us nothing. Your email subject line is the one-summary of this change that will be copied directly into the kernel source code repository, archived for all eternity. This patch is an update to the Atmel AT91RM9200 Ethernet driver. Remove this line. Your email's body is copied directly into the kernel changelog, and it's redundant. From your subject line, we already know what this is updating. 1. Remove the global 'at91_dev' variable. 2. Move the global 'check_timer' variable into the private data structure. Signed-off-by: Andrew Victor [EMAIL PROTECTED] diff -urN linux-2.6.19-final.orig/drivers/net/arm/at91_ether.c linux-2.6.19-final/drivers/net/arm/at91_ether.c --- linux-2.6.19-final.orig/drivers/net/arm/at91_ether.cSat Dec 2 17:28:27 2006 +++ linux-2.6.19-final/drivers/net/arm/at91_ether.c Mon Dec 4 14:13:01 2006 @@ -41,9 +41,6 @@ #define DRV_NAME at91_ether #define DRV_VERSION1.0 -static struct net_device *at91_dev; - -static struct timer_list check_timer; #define LINK_POLL_INTERVAL (HZ) /* . */ @@ -252,8 +249,8 @@ * PHY doesn't have an IRQ pin (RTL8201, DP83847, AC101L), * or board does not have it connected. */ - check_timer.expires = jiffies + LINK_POLL_INTERVAL; - add_timer(check_timer); + lp-check_timer.expires = jiffies + LINK_POLL_INTERVAL; + add_timer(lp-check_timer); consider using mod_timer() return; } @@ -300,7 +297,7 @@ irq_number = lp-board_data.phy_irq_pin; if (!irq_number) { - del_timer_sync(check_timer); + del_timer_sync(lp-check_timer); return; } @@ -362,13 +359,14 @@ static void at91ether_check_link(unsigned long dev_id) { struct net_device *dev = (struct net_device *) dev_id; + struct at91_private *lp = (struct at91_private *) dev-priv; enable_mdi(); update_linkspeed(dev, 1); disable_mdi(); - check_timer.expires = jiffies + LINK_POLL_INTERVAL; - add_timer(check_timer); + lp-check_timer.expires = jiffies + LINK_POLL_INTERVAL; + add_timer(lp-check_timer); ditto } /* . ADDRESS MANAGEMENT */ @@ -939,9 +937,6 @@ unsigned int val; int res; - if (at91_dev) /* already initialized */ - return 0; - dev = alloc_etherdev(sizeof(struct at91_private)); if (!dev) return -ENOMEM; @@ -1024,7 +1019,6 @@ dma_free_coherent(NULL, sizeof(struct recv_desc_bufs), lp-dlist, (dma_addr_t)lp-dlist_phys); return res; } - at91_dev = dev; /* Determine current link speed */ spin_lock_irq(lp-lock); @@ -1036,9 +1030,9 @@ /* If board has no PHY IRQ, use a timer to poll the PHY */ if (!lp-board_data.phy_irq_pin) { - init_timer(check_timer); - check_timer.data = (unsigned long)dev; - check_timer.function = at91ether_check_link; + init_timer(lp-check_timer); + lp-check_timer.data = (unsigned long)dev; + lp-check_timer.function = at91ether_check_link; } /* Display ethernet banner */ @@ -1115,15 +1109,16 @@ static int __devexit at91ether_remove(struct platform_device *pdev) { - struct at91_private *lp = (struct at91_private *) at91_dev-priv; + struct net_device *dev = platform_get_drvdata(pdev); + struct at91_private *lp = (struct at91_private *) dev-priv; use netdev_priv() - unregister_netdev(at91_dev); - free_irq(at91_dev-irq, at91_dev); + unregister_netdev(dev); + free_irq(dev-irq, dev); dma_free_coherent(NULL, sizeof(struct recv_desc_bufs), lp-dlist, (dma_addr_t)lp-dlist_phys); clk_put(lp-ether_clk); - free_netdev(at91_dev); - at91_dev = NULL; + platform_set_drvdata(pdev, NULL); + free_netdev(dev); return 0; } @@ -1131,8 +1126,8 @@ static int at91ether_suspend(struct platform_device *pdev, pm_message_t mesg) { - struct at91_private *lp = (struct at91_private *) at91_dev-priv; struct net_device *net_dev = platform_get_drvdata(pdev); + struct at91_private *lp = (struct at91_private *) net_dev-priv; ditto int phy_irq = lp-board_data.phy_irq_pin; if (netif_running(net_dev)) { @@ -1149,8 +1144,8 @@ static int at91ether_resume(struct platform_device *pdev) { - struct at91_private *lp = (struct
Re: [PATCH 2.6.19] AT91RM9200 Ethernet update 3
Andrew Victor wrote: A minor fix to the Atmel AT91RM9200 Ethernet driver. 1. Use dev_alloc_skb() instead of alloc_skb(). 2. It is not necessary to adjust skb-len manually. Signed-off-by: Andrew Victor [EMAIL PROTECTED] ACK patch content - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC patch] driver for the Opencores Ethernet Controller
Stephen Hemminger [EMAIL PROTECTED] writes: On Mon, 04 Dec 2006 10:01:01 -0800 Dan Nicolaescu [EMAIL PROTECTED] wrote: Hi, Here is a driver for the Opencores Ethernet Controller. I started from a 2.4 uClinux driver, ported it to 2.6, made it work, cleaned it up and added the MII interface. The Opencores Ethernet Controller is Verilog code that can be used to implement an Ethernet device in hardware. It needs to be coupled with a PHY and some buffer memory. Because of that devices that implement this controller can be very different. The code here tries to support that by having some parameters that need to be defined at compile time. This is my first Ethernet driver, so comments/advice would be appreciated. Thanks --dan Kconfig|5 open_eth.c | 1022 + open_eth.h | 132 +++ 3 files changed, 1159 insertions(+) Please run through scripts/Lindent or cleanup style. Also has trailing whitespace. Thanks for the review! I managed to send in the diff of my working file instead of the one that I just cleaned up. Sorry about that. Gack, just put in correct define's avoid adding conditional compilation stuff. Gone, that was for debugging. + +#if CONFIG_MII + struct mii_if_info mii_if; /* MII lib hooks/info */ +#endif Use select in Kconfig, to force MII That was intentional, I don't want to add extra code to the kernel in case MII is not otherwise enabled (for code size for embedded processors). + else if (!(i % 8)) + printk( ); + printk( %.2x, *(((unsigned char *)add) + i)); + } + printk(\n); +} +#endif + +int dann_int_count = 0; +int dann_rx_count = 0; +int dann_tx_count = 0; + +static int oeth_open(struct net_device *dev) +{ + int ret; + struct oeth_private *cep = netdev_priv(dev); + struct oeth_regs *regs = cep-regs; + + /*FIXME: just for debugging*/ + memset((void*)OETH_SRAM_BUFF_BASE, 0, 0x4000); + + /* Install our interrupt handler. */ + ret = request_irq(OETH_IRQ, oeth_interrupt, 0, eth, (void *)dev); + if (ret) + { Why not use dev-name rather than eth? Fixed. Indentation. See Documentation style. What about IRQF_SHARED? Not sure, maybe I should make this another driver parameter. On my platform is not shared... + printk(request_irq failed for the Opencore ethernet device\n); + return ret; + } + /* Enable the receiver and transmiter. */ + regs-moder |= OETH_MODER_RXEN | OETH_MODER_TXEN; + + /* Start the queue, we are ready to process packets now. */ + netif_start_queue (dev); + return 0; +} + +static int oeth_close(struct net_device *dev) +{ + struct oeth_private *cep = netdev_priv(dev); + struct oeth_regs *regs = cep-regs; + volatile struct oeth_bd *bdp; + int i; + + spin_lock_irq(cep-lock); + /* Disable the receiver and transmiter. */ + regs-moder = ~(OETH_MODER_RXEN | OETH_MODER_TXEN); + + bdp = cep-rx_bd_base; + for (i = 0; i OETH_RXBD_NUM; i++) { + bdp-len_status = ~(OETH_TX_BD_STATS | OETH_TX_BD_READY); + bdp++; + } + + bdp = cep-tx_bd_base; + for (i = 0; i OETH_TXBD_NUM; i++) { + bdp-len_status = ~(OETH_RX_BD_STATS | OETH_RX_BD_EMPTY); + bdp++; + } + + spin_unlock_irq(cep-lock); + + return 0; +} + +#if 1 +static void* memcpy_hton (void *dest, void *data, size_t n) const ? Nuked, not needed, it was just working around HW bugs. +static int oeth_start_xmit(struct sk_buff *skb, struct net_device *dev) +{ + struct oeth_private *cep = netdev_priv(dev); + volatile struct oeth_bd *bdp; + unsigned long flags; + u32 len_status; + + spin_lock_irqsave(cep-lock, flags); + + if (cep-tx_full) { + /* All transmit buffers are full. Bail out. */ + printk(%s: tx queue full!.\n, dev-name); + print_queue(cep-tx_bd_base, cep-tx_next, OETH_TXBD_NUM); + spin_unlock_irqrestore(cep-lock, flags); + return 1; return NETDEV_TX_BUSY. you forgot to call stop_queue Fixed. What should I return in the case below: if (skb-len OETH_TX_BUFF_SIZE) { printk(%s: tx frame too long!.\n, dev-name); spin_unlock_irqrestore(cep-lock, flags); return 1; } Even better, is there a way to make sure the network stack knows that it should not try to send packets bigger than OETH_TX_BUFF_SIZE? + + /* Copy data to TX buffer. */ + memcpy_hton ((unsigned char *)bdp-addr, skb-data, skb-len); Use skb_copy_and_csum_dev and you get checksum offload for free. Wouldn't that just add
[PATCH 3/3] sky2: beter ram buffer partitioning
Different chips have different sizes of ram buffers, and some versions have no ram buffer at all!. Be more careful about sizing the ram usage because it maybe a problem if vendor keeps changing sizes. There is the (unlikely) possibility that some of the errors on some of the chips have been caused by partitioning not on a 1K boundary. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- drivers/net/sky2.c | 41 + 1 file changed, 25 insertions(+), 16 deletions(-) --- sky2.orig/drivers/net/sky2.c2006-12-04 12:47:19.0 -0800 +++ sky2/drivers/net/sky2.c 2006-12-04 13:19:49.0 -0800 @@ -696,10 +696,15 @@ } -/* Assign Ram Buffer allocation in units of 64bit (8 bytes) */ -static void sky2_ramset(struct sky2_hw *hw, u16 q, u32 start, u32 end) +/* Assign Ram Buffer allocation to queue */ +static void sky2_ramset(struct sky2_hw *hw, u16 q, u32 start, u32 space) { - pr_debug(PFX q %d %#x %#x\n, q, start, end); + u32 end; + + /* convert from K bytes to qwords used for hw register */ + start *= 1024/8; + space *= 1024/8; + end = start + space - 1; sky2_write8(hw, RB_ADDR(q, RB_CTRL), RB_RST_CLR); sky2_write32(hw, RB_ADDR(q, RB_START), start); @@ -708,7 +713,6 @@ sky2_write32(hw, RB_ADDR(q, RB_RP), start); if (q == Q_R1 || q == Q_R2) { - u32 space = end - start + 1; u32 tp = space - space/4; /* On receive queue's set the thresholds @@ -1138,7 +1142,7 @@ struct sky2_port *sky2 = netdev_priv(dev); struct sky2_hw *hw = sky2-hw; unsigned port = sky2-port; - u32 ramsize, rxspace, imask; + u32 ramsize, imask; int cap, err = -ENOMEM; struct net_device *otherdev = hw-dev[sky2-port^1]; @@ -1191,20 +1195,25 @@ sky2_mac_init(hw, port); - /* Determine available ram buffer space in qwords. */ - ramsize = sky2_read8(hw, B2_E_0) * 4096/8; + /* Register is number of 4K blocks on internal RAM buffer. */ + ramsize = sky2_read8(hw, B2_E_0) * 4; + printk(KERN_INFO PFX %s: ram buffer %dK\n, dev-name, ramsize); - if (ramsize 6*1024/8) - rxspace = ramsize - (ramsize + 2) / 3; - else - rxspace = ramsize / 2; + if (ramsize 0) { + u32 rxspace; - sky2_ramset(hw, rxqaddr[port], 0, rxspace-1); - sky2_ramset(hw, txqaddr[port], rxspace, ramsize-1); + if (ramsize 16) + rxspace = ramsize / 2; + else + rxspace = 8 + (2*(ramsize - 16))/3; - /* Make sure SyncQ is disabled */ - sky2_write8(hw, RB_ADDR(port == 0 ? Q_XS1 : Q_XS2, RB_CTRL), - RB_RST_SET); + sky2_ramset(hw, rxqaddr[port], 0, rxspace); + sky2_ramset(hw, txqaddr[port], rxspace, ramsize - rxspace); + + /* Make sure SyncQ is disabled */ + sky2_write8(hw, RB_ADDR(port == 0 ? Q_XS1 : Q_XS2, RB_CTRL), + RB_RST_SET); + } sky2_qset(hw, txqaddr[port]); -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] sky2: add PCI for 88ec033
Add another new/missing pci id for 88ec033 chip. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- sky2.orig/drivers/net/sky2.c2006-12-04 12:33:26.0 -0800 +++ sky2/drivers/net/sky2.c 2006-12-04 12:38:00.0 -0800 @@ -117,6 +117,7 @@ { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4351) }, { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4352) }, { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4353) }, + { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4356) }, { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4360) }, { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4361) }, { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4362) }, - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] sky2: add comments to PCI ids
Add comments to sky2 driver to show relationship between PCI id and hardware. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- drivers/net/sky2.c | 48 +++- 1 file changed, 23 insertions(+), 25 deletions(-) --- sky2.orig/drivers/net/sky2.c2006-12-04 12:47:06.0 -0800 +++ sky2/drivers/net/sky2.c 2006-12-04 12:47:19.0 -0800 @@ -100,34 +100,32 @@ MODULE_PARM_DESC(idle_timeout, Watchdog timer for lost interrupts (ms)); static const struct pci_device_id sky2_id_table[] = { - { PCI_DEVICE(PCI_VENDOR_ID_SYSKONNECT, 0x9000) }, - { PCI_DEVICE(PCI_VENDOR_ID_SYSKONNECT, 0x9E00) }, + { PCI_DEVICE(PCI_VENDOR_ID_SYSKONNECT, 0x9000) }, /* SK-9Sxx */ + { PCI_DEVICE(PCI_VENDOR_ID_SYSKONNECT, 0x9E00) }, /* SK-9Exx */ { PCI_DEVICE(PCI_VENDOR_ID_DLINK, 0x4b00) },/* DGE-560T */ { PCI_DEVICE(PCI_VENDOR_ID_DLINK, 0x4001) },/* DGE-550SX */ { PCI_DEVICE(PCI_VENDOR_ID_DLINK, 0x4B02) },/* DGE-560SX */ - { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4340) }, - { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4341) }, - { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4342) }, - { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4343) }, - { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4344) }, - { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4345) }, - { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4346) }, - { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4347) }, - { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4350) }, - { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4351) }, - { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4352) }, - { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4353) }, - { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4356) }, - { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4360) }, - { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4361) }, - { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4362) }, - { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4363) }, - { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4364) }, - { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4365) }, - { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4366) }, - { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4367) }, - { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4368) }, - { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4369) }, + { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4340) }, /* 88E8021 */ + { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4341) }, /* 88E8022 */ + { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4342) }, /* 88E8061 */ + { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4343) }, /* 88E8062 */ + { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4344) }, /* 88E8021 */ + { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4345) }, /* 88E8022 */ + { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4346) }, /* 88E8061 */ + { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4347) }, /* 88E8062 */ + { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4350) }, /* 88E8035 */ + { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4351) }, /* 88E8036 */ + { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4352) }, /* 88E8038 */ + { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4353) }, /* 88E8039 */ + { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4356) }, /* 88EC033 */ + { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4360) }, /* 88E8052 */ + { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4361) }, /* 88E8050 */ + { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4362) }, /* 88E8053 */ + { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4363) }, /* 88E8055 */ + { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4364) }, /* 88E8056 */ + { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4366) }, /* 88EC036 */ + { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4367) }, /* 88EC032 */ + { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4368) }, /* 88EC034 */ { 0 } }; -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html