[take28 0/8] kevent: Generic event handling mechanism.
Generic event handling mechanism. Kevent is a generic subsytem which allows to handle event notifications. It supports both level and edge triggered events. It is similar to poll/epoll in some cases, but it is more scalable, it is faster and allows to work with essentially eny kind of events. Events are provided into kernel through control syscall and can be read back through ring buffer or using usual syscalls. Kevent update (i.e. readiness switching) happens directly from internals of the appropriate state machine of the underlying subsytem (like network, filesystem, timer or any other). Homepage: http://tservice.net.ru/~s0mbre/old/?section=projectsitem=kevent Documentation page: http://linux-net.osdl.org/index.php/Kevent Consider for inclusion. New benchmark, which can be a hoax though, can be found at http://tservice.net.ru/~s0mbre/blog/2006/11/30#2006_11_30 where kevent on amd64 with 1gb of ram can handle more than 7200 events per second with 8000 requests concurrency with 'ab' benchmark and lighttpd. Although I tought it should not be published due to possible errors, I decided to send it for review. With this release I start 3 days resending timeout - i.e. each third day I will send either new version (if something new was requested and agreed to be implemented) or resending with back counter started from three. When back counter hits zero after three resending I consider there is no interest in subsystem and I will stop further sending. Thanks for understanding and your time. Changes from 'take27' patchset: * made kevent default yes in non embedded case. * added falgs to callback structures - currently used to check if kevent can be requested from kernelspace only (posix timers) or userspace (all others) Changes from 'take26' patchset: * made kevent visible in config only in case of embedded setup. * added comment about KEVENT_MAX number. * spell fix. Changes from 'take25' patchset: * use timespec as timeout parameter. * added high-resolution timer to handle absolute timeouts. * added flags to waiting and initialization syscalls. * kevent_commit() has new_uidx parameter. * kevent_wait() has old_uidx parameter, which, if not equal to u-uidx, results in immediate wakeup (usefull for the case when entries are added asynchronously from kernel (not supported for now)). * added interface to mark any event as ready. * event POSIX timers support. * return -ENOSYS if there is no registered event type. * provided file descriptor must be checked for fifo type (spotted by Eric Dumazet). * signal notifications. * documentation update. * lighttpd patch updated (the latest benchmarks with lighttpd patch can be found in blog). Changes from 'take24' patchset: * new (old (new)) ring buffer implementation with kernel and user indexes. * added initialization syscall instead of opening /dev/kevent * kevent_commit() syscall to commit ring buffer entries * changed KEVENT_REQ_WAKEUP_ONE flag to KEVENT_REQ_WAKEUP_ALL, kevent wakes only first thread always if that flag is not set * KEVENT_REQ_ALWAYS_QUEUE flag. If set, kevent will be queued into ready queue instead of copying back to userspace when kevent is ready immediately when it is added. * lighttpd patch (Hail! Although nothing really outstanding compared to epoll) Changes from 'take23' patchset: * kevent PIPE notifications * KEVENT_REQ_LAST_CHECK flag, which allows to perform last check at dequeueing time * fixed poll/select notifications (were broken due to tree manipulations) * made Documentation/kevent.txt look nice in 80-col terminal * fix for copy_to_user() failure report for the first kevent (Andrew Morton) * minor function renames Changes from 'take22' patchset: * new ring buffer implementation in process' memory * wakeup-one-thread flag * edge-triggered behaviour Changes from 'take21' patchset: * minor cleanups (different return values, removed unneded variables, whitespaces and so on) * fixed bug in kevent removal in case when kevent being removed is the same as overflow_kevent (spotted by Eric Dumazet) Changes from 'take20' patchset: * new ring buffer implementation * removed artificial limit on possible number of kevents Changes from 'take19' patchset: * use __init instead of __devinit * removed 'default N' from config for user statistic * removed kevent_user_fini() since kevent can not be unloaded * use KERN_INFO for statistic output Changes from 'take18' patchset: * use __init instead of __devinit * removed 'default N' from config for user statistic * removed kevent_user_fini() since kevent can not be unloaded * use KERN_INFO for statistic output Changes from 'take17' patchset: * Use RB tree instead of hash table. At least for a web sever, frequency of addition/deletion of new kevent is comparable with number of search access, i.e. most of the time events are added, accesed only couple of times and then
[take28 4/8] kevent: Socket notifications.
Socket notifications. This patch includes socket send/recv/accept notifications. Using trivial web server based on kevent and this features instead of epoll it's performance increased more than noticebly. More details about various benchmarks and server itself (evserver_kevent.c) can be found on project's homepage. Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED] diff --git a/fs/inode.c b/fs/inode.c index ada7643..2740617 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -21,6 +21,7 @@ #include linux/cdev.h #include linux/bootmem.h #include linux/inotify.h +#include linux/kevent.h #include linux/mount.h /* @@ -164,12 +165,18 @@ static struct inode *alloc_inode(struct super_block *sb) } inode-i_private = 0; inode-i_mapping = mapping; +#if defined CONFIG_KEVENT_SOCKET || defined CONFIG_KEVENT_PIPE + kevent_storage_init(inode, inode-st); +#endif } return inode; } void destroy_inode(struct inode *inode) { +#if defined CONFIG_KEVENT_SOCKET || defined CONFIG_KEVENT_PIPE + kevent_storage_fini(inode-st); +#endif BUG_ON(inode_has_buffers(inode)); security_inode_free(inode); if (inode-i_sb-s_op-destroy_inode) diff --git a/include/net/sock.h b/include/net/sock.h index edd4d73..d48ded8 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -48,6 +48,7 @@ #include linux/netdevice.h #include linux/skbuff.h /* struct sk_buff */ #include linux/security.h +#include linux/kevent.h #include linux/filter.h @@ -450,6 +451,21 @@ static inline int sk_stream_memory_free(struct sock *sk) extern void sk_stream_rfree(struct sk_buff *skb); +struct socket_alloc { + struct socket socket; + struct inode vfs_inode; +}; + +static inline struct socket *SOCKET_I(struct inode *inode) +{ + return container_of(inode, struct socket_alloc, vfs_inode)-socket; +} + +static inline struct inode *SOCK_INODE(struct socket *socket) +{ + return container_of(socket, struct socket_alloc, socket)-vfs_inode; +} + static inline void sk_stream_set_owner_r(struct sk_buff *skb, struct sock *sk) { skb-sk = sk; @@ -477,6 +493,7 @@ static inline void sk_add_backlog(struct sock *sk, struct sk_buff *skb) sk-sk_backlog.tail = skb; } skb-next = NULL; + kevent_socket_notify(sk, KEVENT_SOCKET_RECV); } #define sk_wait_event(__sk, __timeo, __condition) \ @@ -679,21 +696,6 @@ static inline struct kiocb *siocb_to_kiocb(struct sock_iocb *si) return si-kiocb; } -struct socket_alloc { - struct socket socket; - struct inode vfs_inode; -}; - -static inline struct socket *SOCKET_I(struct inode *inode) -{ - return container_of(inode, struct socket_alloc, vfs_inode)-socket; -} - -static inline struct inode *SOCK_INODE(struct socket *socket) -{ - return container_of(socket, struct socket_alloc, socket)-vfs_inode; -} - extern void __sk_stream_mem_reclaim(struct sock *sk); extern int sk_stream_mem_schedule(struct sock *sk, int size, int kind); diff --git a/include/net/tcp.h b/include/net/tcp.h index 7a093d0..69f4ad2 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -857,6 +857,7 @@ static inline int tcp_prequeue(struct sock *sk, struct sk_buff *skb) tp-ucopy.memory = 0; } else if (skb_queue_len(tp-ucopy.prequeue) == 1) { wake_up_interruptible(sk-sk_sleep); + kevent_socket_notify(sk, KEVENT_SOCKET_RECV|KEVENT_SOCKET_SEND); if (!inet_csk_ack_scheduled(sk)) inet_csk_reset_xmit_timer(sk, ICSK_TIME_DACK, (3 * TCP_RTO_MIN) / 4, diff --git a/kernel/kevent/kevent_socket.c b/kernel/kevent/kevent_socket.c new file mode 100644 index 000..1798092 --- /dev/null +++ b/kernel/kevent/kevent_socket.c @@ -0,0 +1,144 @@ +/* + * kevent_socket.c + * + * 2006 Copyright (c) Evgeniy Polyakov [EMAIL PROTECTED] + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#include linux/kernel.h +#include linux/types.h +#include linux/list.h +#include linux/slab.h +#include linux/spinlock.h +#include linux/timer.h +#include linux/file.h
[take28 1/8] kevent: Description.
Description. diff --git a/Documentation/kevent.txt b/Documentation/kevent.txt new file mode 100644 index 000..2e03a3f --- /dev/null +++ b/Documentation/kevent.txt @@ -0,0 +1,240 @@ +Description. + +int kevent_init(struct kevent_ring *ring, unsigned int ring_size, + unsigned int flags); + +num - size of the ring buffer in events +ring - pointer to allocated ring buffer +flags - various flags, see KEVENT_FLAGS_* definitions. + +Return value: kevent control file descriptor or negative error value. + + struct kevent_ring + { + unsigned int ring_kidx, ring_over; + struct ukevent event[0]; + } + +ring_kidx - index in the ring buffer where kernel will put new events + when kevent_wait() or kevent_get_events() is called +ring_over - number of overflows of ring_uidx happend from the start. + Overflow counter is used to prevent situation when two threads + are going to free the same events, but one of them was scheduled + away for too long, so ring indexes were wrapped, so when that + thread will be awakened, it will free not those events, which + it suppose to free. + +Example userspace code (ring_buffer.c) can be found on project's homepage. + +Each kevent syscall can be so called cancellation point in glibc, i.e. when +thread has been cancelled in kevent syscall, thread can be safely removed +and no events will be lost, since each syscall (kevent_wait() or +kevent_get_events()) will copy event into special ring buffer, accessible +from other threads or even processes (if shared memory is used). + +When kevent is removed (not dequeued when it is ready, but just removed), +even if it was ready, it is not copied into ring buffer, since if it is +removed, no one cares about it (otherwise user would wait until it becomes +ready and got it through usual way using kevent_get_events() or kevent_wait()) +and thus no need to copy it to the ring buffer. + +--- + + +int kevent_ctl(int fd, unsigned int cmd, unsigned int num, struct ukevent *arg); + +fd - is the file descriptor referring to the kevent queue to manipulate. +It is created by opening /dev/kevent char device, which is created with +dynamic minor number and major number assigned for misc devices. + +cmd - is the requested operation. It can be one of the following: +KEVENT_CTL_ADD - add event notification +KEVENT_CTL_REMOVE - remove event notification +KEVENT_CTL_MODIFY - modify existing notification +KEVENT_CTL_READY - mark existing events as ready, if number of events is zero, + it just wakes up parked in syscall thread + +num - number of struct ukevent in the array pointed to by arg +arg - array of struct ukevent + +Return value: + number of events processed or negative error value. + +When called, kevent_ctl will carry out the operation specified in the +cmd parameter. +--- + + int kevent_get_events(int ctl_fd, unsigned int min_nr, unsigned int max_nr, + struct timespec timeout, struct ukevent *buf, unsigned flags); + +ctl_fd - file descriptor referring to the kevent queue +min_nr - minimum number of completed events that kevent_get_events will block +waiting for +max_nr - number of struct ukevent in buf +timeout - time to wait before returning less than min_nr + events. If this is -1, then wait forever. +buf - pointer to an array of struct ukevent. +flags - various flags, see KEVENT_FLAGS_* definitions. + +Return value: + number of events copied or negative error value. + +kevent_get_events will wait timeout milliseconds for at least min_nr completed +events, copying completed struct ukevents to buf and deleting any +KEVENT_REQ_ONESHOT event requests. In nonblocking mode it returns as many +events as possible, but not more than max_nr. In blocking mode it waits until +timeout or if at least min_nr events are ready. + +This function copies event into ring buffer if it was initialized, if ring buffer +is full, KEVENT_RET_COPY_FAILED flag is set in ret_flags field. +--- + + int kevent_wait(int ctl_fd, unsigned int num, unsigned int old_uidx, + struct timespec timeout, unsigned int flags); + +ctl_fd - file descriptor referring to the kevent queue +num - number of processed kevents +old_uidx - the last index user is aware of +timeout - time to wait until there is free space in kevent queue +flags - various flags, see KEVENT_FLAGS_* definitions. + +Return value: + number of events copied into ring buffer or negative error value. + +This syscall waits until either timeout expires or at least one event becomes +ready. It also copies events into special ring buffer. If ring buffer is full, +it waits until there are ready events and then return. +If kevent is one-shot kevent it is
[take28 8/8] kevent: Kevent posix timer notifications.
Kevent posix timer notifications. Simple extensions to POSIX timers which allows to deliver notification of the timer expiration through kevent queue. Example application posix_timer.c can be found in archive on project homepage. Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED] diff --git a/include/asm-generic/siginfo.h b/include/asm-generic/siginfo.h index 8786e01..3768746 100644 --- a/include/asm-generic/siginfo.h +++ b/include/asm-generic/siginfo.h @@ -235,6 +235,7 @@ typedef struct siginfo { #define SIGEV_NONE 1 /* other notification: meaningless */ #define SIGEV_THREAD 2 /* deliver via thread creation */ #define SIGEV_THREAD_ID 4 /* deliver to thread */ +#define SIGEV_KEVENT 8 /* deliver through kevent queue */ /* * This works because the alignment is ok on all current architectures @@ -260,6 +261,8 @@ typedef struct sigevent { void (*_function)(sigval_t); void *_attribute; /* really pthread_attr_t */ } _sigev_thread; + + int kevent_fd; } _sigev_un; } sigevent_t; diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h index a7dd38f..4b9deb4 100644 --- a/include/linux/posix-timers.h +++ b/include/linux/posix-timers.h @@ -4,6 +4,7 @@ #include linux/spinlock.h #include linux/list.h #include linux/sched.h +#include linux/kevent_storage.h union cpu_time_count { cputime_t cpu; @@ -49,6 +50,9 @@ struct k_itimer { sigval_t it_sigev_value;/* value word of sigevent struct */ struct task_struct *it_process; /* process to send signal to */ struct sigqueue *sigq; /* signal queue entry. */ +#ifdef CONFIG_KEVENT_TIMER + struct kevent_storage st; +#endif union { struct { struct hrtimer timer; diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c index e5ebcc1..74270f8 100644 --- a/kernel/posix-timers.c +++ b/kernel/posix-timers.c @@ -48,6 +48,8 @@ #include linux/wait.h #include linux/workqueue.h #include linux/module.h +#include linux/kevent.h +#include linux/file.h /* * Management arrays for POSIX timers. Timers are kept in slab memory @@ -224,6 +226,100 @@ static int posix_ktime_get_ts(clockid_t which_clock, struct timespec *tp) return 0; } +#ifdef CONFIG_KEVENT_TIMER +static int posix_kevent_enqueue(struct kevent *k) +{ + /* +* It is not ugly - there is no pointer in the id field union, +* but its size is 64bits, which is ok for any known pointer size. +*/ + struct k_itimer *tmr = (struct k_itimer *)(unsigned long)k-event.id.raw_u64; + return kevent_storage_enqueue(tmr-st, k); +} +static int posix_kevent_dequeue(struct kevent *k) +{ + struct k_itimer *tmr = (struct k_itimer *)(unsigned long)k-event.id.raw_u64; + kevent_storage_dequeue(tmr-st, k); + return 0; +} +static int posix_kevent_callback(struct kevent *k) +{ + return 1; +} +static int posix_kevent_init(void) +{ + struct kevent_callbacks tc = { + .callback = posix_kevent_callback, + .enqueue = posix_kevent_enqueue, + .dequeue = posix_kevent_dequeue, + .flags = KEVENT_CALLBACKS_KERNELONLY}; + + return kevent_add_callbacks(tc, KEVENT_POSIX_TIMER); +} + +extern struct file_operations kevent_user_fops; + +static int posix_kevent_init_timer(struct k_itimer *tmr, int fd) +{ + struct ukevent uk; + struct file *file; + struct kevent_user *u; + int err; + + file = fget(fd); + if (!file) { + err = -EBADF; + goto err_out; + } + + if (file-f_op != kevent_user_fops) { + err = -EINVAL; + goto err_out_fput; + } + + u = file-private_data; + + memset(uk, 0, sizeof(struct ukevent)); + + uk.event = KEVENT_MASK_ALL; + uk.type = KEVENT_POSIX_TIMER; + uk.id.raw_u64 = (unsigned long)(tmr); /* Just cast to something unique */ + uk.req_flags = KEVENT_REQ_ONESHOT | KEVENT_REQ_ALWAYS_QUEUE; + uk.ptr = tmr-it_sigev_value.sival_ptr; + + err = kevent_user_add_ukevent(uk, u); + if (err) + goto err_out_fput; + + fput(file); + + return 0; + +err_out_fput: + fput(file); +err_out: + return err; +} + +static void posix_kevent_fini_timer(struct k_itimer *tmr) +{ + kevent_storage_fini(tmr-st); +} +#else +static int posix_kevent_init_timer(struct k_itimer *tmr, int fd) +{ + return -ENOSYS; +} +static int posix_kevent_init(void) +{ + return 0; +} +static void posix_kevent_fini_timer(struct k_itimer *tmr) +{ +} +#endif + + /* * Initialize everything, well, just everything in Posix clocks/timers ;) */ @@ -241,6 +337,11 @@ static __init int init_posix_timers(void) register_posix_clock(CLOCK_REALTIME, clock_realtime);
[take28 3/8] kevent: poll/select() notifications.
poll/select() notifications. This patch includes generic poll/select notifications. kevent_poll works simialr to epoll and has the same issues (callback is invoked not from internal state machine of the caller, but through process awake, a lot of allocations and so on). Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED] diff --git a/fs/file_table.c b/fs/file_table.c index bc35a40..0805547 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -20,6 +20,7 @@ #include linux/cdev.h #include linux/fsnotify.h #include linux/sysctl.h +#include linux/kevent.h #include linux/percpu_counter.h #include asm/atomic.h @@ -119,6 +120,7 @@ struct file *get_empty_filp(void) f-f_uid = tsk-fsuid; f-f_gid = tsk-fsgid; eventpoll_init_file(f); + kevent_init_file(f); /* f-f_version: 0 */ return f; @@ -164,6 +166,7 @@ void fastcall __fput(struct file *file) * in the file cleanup chain. */ eventpoll_release(file); + kevent_cleanup_file(file); locks_remove_flock(file); if (file-f_op file-f_op-release) diff --git a/include/linux/fs.h b/include/linux/fs.h index 5baf3a1..8bbf3a5 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -276,6 +276,7 @@ extern int dir_notify_enable; #include linux/init.h #include linux/sched.h #include linux/mutex.h +#include linux/kevent_storage.h #include asm/atomic.h #include asm/semaphore.h @@ -586,6 +587,10 @@ struct inode { struct mutexinotify_mutex; /* protects the watches list */ #endif +#if defined CONFIG_KEVENT_SOCKET || defined CONFIG_KEVENT_PIPE + struct kevent_storage st; +#endif + unsigned long i_state; unsigned long dirtied_when; /* jiffies of first dirtying */ @@ -739,6 +744,9 @@ struct file { struct list_headf_ep_links; spinlock_t f_ep_lock; #endif /* #ifdef CONFIG_EPOLL */ +#ifdef CONFIG_KEVENT_POLL + struct kevent_storage st; +#endif struct address_space*f_mapping; }; extern spinlock_t files_lock; diff --git a/kernel/kevent/kevent_poll.c b/kernel/kevent/kevent_poll.c new file mode 100644 index 000..7ccf7da --- /dev/null +++ b/kernel/kevent/kevent_poll.c @@ -0,0 +1,234 @@ +/* + * 2006 Copyright (c) Evgeniy Polyakov [EMAIL PROTECTED] + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include linux/kernel.h +#include linux/types.h +#include linux/list.h +#include linux/slab.h +#include linux/spinlock.h +#include linux/timer.h +#include linux/file.h +#include linux/kevent.h +#include linux/poll.h +#include linux/fs.h + +static kmem_cache_t *kevent_poll_container_cache; +static kmem_cache_t *kevent_poll_priv_cache; + +struct kevent_poll_ctl +{ + struct poll_table_structpt; + struct kevent *k; +}; + +struct kevent_poll_wait_container +{ + struct list_headcontainer_entry; + wait_queue_head_t *whead; + wait_queue_twait; + struct kevent *k; +}; + +struct kevent_poll_private +{ + struct list_headcontainer_list; + spinlock_t container_lock; +}; + +static int kevent_poll_enqueue(struct kevent *k); +static int kevent_poll_dequeue(struct kevent *k); +static int kevent_poll_callback(struct kevent *k); + +static int kevent_poll_wait_callback(wait_queue_t *wait, + unsigned mode, int sync, void *key) +{ + struct kevent_poll_wait_container *cont = + container_of(wait, struct kevent_poll_wait_container, wait); + struct kevent *k = cont-k; + + kevent_storage_ready(k-st, NULL, KEVENT_MASK_ALL); + return 0; +} + +static void kevent_poll_qproc(struct file *file, wait_queue_head_t *whead, + struct poll_table_struct *poll_table) +{ + struct kevent *k = + container_of(poll_table, struct kevent_poll_ctl, pt)-k; + struct kevent_poll_private *priv = k-priv; + struct kevent_poll_wait_container *cont; + unsigned long flags; + + cont = kmem_cache_alloc(kevent_poll_container_cache, GFP_KERNEL); + if (!cont) { + kevent_break(k); + return; + } + + cont-k = k; + init_waitqueue_func_entry(cont-wait, kevent_poll_wait_callback); + cont-whead = whead; + + spin_lock_irqsave(priv-container_lock, flags); +
Re: [PATCH 1/6] d80211: add IEEE802.11e/WMM MLMEs, Status Code and Reason Code
On Thu, 14 Dec 2006 12:02:04 +0800, Zhu Yi wrote: Signed-off-by: Zhu Yi [EMAIL PROTECTED] Please Cc: me and John Linville on d80211 patches otherwise your chances of review (and inclusion) are much lower. In addition to comments from Michael (which are all perfectly valid and you need to address all of them): +struct ieee802_11_ts_info { Choose a name consistent with the rest of the header (e.g. ieee80211_ prefix). Thanks, Jiri -- Jiri Benc SUSE Labs - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/6] d80211: create wifi.h to define WIFI OUIs
On Thu, 14 Dec 2006 12:02:16 +0800, Zhu Yi wrote: --- /dev/null +++ b/net/d80211/wifi.h @@ -0,0 +1,28 @@ +/* + * This file defines Wi-Fi(r) OUIs for 80211.o + * Copyright 2006, Zhu Yi [EMAIL PROTECTED] Intel Corp. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#ifndef D802_11_WIFI_H +#define D802_11_WIFI_H + +/* WI-FI Alliance OUI Type and Subtype */ +enum wifi_oui_type { + WIFI_OUI_TYPE_WPA = 1, + WIFI_OUI_TYPE_WMM = 2, + WIFI_OUI_TYPE_WSC = 4, + WIFI_OUI_TYPE_PSD = 6, +}; + +enum wifi_oui_stype_wmm { + WIFI_OUI_STYPE_WMM_INFO = 0, + WIFI_OUI_STYPE_WMM_PARAM = 1, + WIFI_OUI_STYPE_WMM_TSPEC = 2, +}; + + +#endif /* D802_11_WIFI_H */ AFAIK wifi is a trademark and we want to avoid using it. wlan seems to be a better alternative for the prefixes. Also, I don't see a reason for a separate header file here. Thanks, Jiri -- Jiri Benc SUSE Labs - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/6] d80211: fix classify_1d() priority selection
On Thu, 14 Dec 2006 12:02:27 +0800, Zhu Yi wrote: I don't see any reason why packets with DSCP=0x40 should have lower IEEE 802.1D priority than packets with DSCP=0x20. Spare Background. No? Hm, seems so. Jouni, is there any reason for this? Signed-off-by: Zhu Yi [EMAIL PROTECTED] --- net/d80211/wme.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) e1765ea0d80ad86619300d3253e801883fd745a5 diff --git a/net/d80211/wme.c b/net/d80211/wme.c index b9505dc..f26fe6c 100644 --- a/net/d80211/wme.c +++ b/net/d80211/wme.c @@ -131,9 +131,9 @@ static inline unsigned classify_1d(struc dscp = ip-tos 0xfc; switch (dscp) { case 0x20: - return 2; - case 0x40: return 1; + case 0x40: + return 2; case 0x60: return 3; case 0x80: Thanks, Jiri -- Jiri Benc SUSE Labs - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] d80211: add sysfs interface for QoS functions
On Thu, 14 Dec 2006 12:03:02 +0800, Zhu Yi wrote: The sysfs interface here is only a proof of concept. It provides a way for the userspace applications to use the advanced QoS features supported by d80211 stack. The finial solution should be switched to cfg80211. So... what about implementing that into cfg80211? :-) I'm not inclined towards this patch (even if you address Stephen's comment). Thanks, Jiri (Btw, it will take me some time to review patches 4 and 5.) -- Jiri Benc SUSE Labs - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] drivers/net: spidernet driver on Celleb
Christoph-san, Thanks for your comments. On Tue, Dec 12, 2006 at 02:25:50PM +0900, Ishizaki Kou wrote: Following are the changes. -This patch enables auto-negotiation. -Loading firmware is done when spidernet_open() is called. -And this patch adds other several small changes for Celleb. This should be split into three separate patches, sent as a patch series. We are now working to separeting the patch. We'll send later. -/* outside loopback mode: ETOMOD signal dont matter, not connected */ -#define SPIDER_NET_OPMODE_VALUE 0x0063 +/* ETOMOD signal is brought to PHY reset. bit2 must be 1 in Celleb */ +#define SPIDER_NET_OPMODE_VALUE 0x0067 Is it okay to simple change this value for the ibm blades? Sorry, we didn't test on ibm blades, because we don't have one. We hope to develop together so that the driver works on both platform. +static int is1000 = 1; This should be in struct spider_net_card instead of a global flag. We'll move it in struct spider_net_card. case SPIDER_NET_GTMFLLINT: -if (netif_msg_intr(card) net_ratelimit()) -pr_err(Spider TX RAM full\n); +/* if (netif_msg_intr(card) net_ratelimit()) +pr_err(Spider TX RAM full\n); */ Either this should be kept or removed entirely. In the latter case you need a good description why it's removed in the patch header. We'll remove it entirely. GTMFLLINT occures frequently when we use 100M HUB. We didn't find any bad influence by this interrupt so far, so we removed the output. + +spider_net_write_reg(card, SPIDER_NET_GMACOPEMD, + spider_net_read_reg(card, SPIDER_NET_GMACOPEMD) | 0x4); Please make sure this doesn't overflow the 80 characters per line limit. We'll correct it. +static int spider_net_init_firmware(struct spider_net_card *); Random forward declarations in the middle of the file aren't very nice. If you really need them put them at the beggining of the file, but it would be even better if you moved spider_net_init_firmware further up in the file so we wouldn't need the forward-declaration at all. We'll move some functions. +if (card-phy.def-phy_id) +mod_timer(card-aneg_timer, jiffies + SPIDER_NET_ANEG_TIMER); +else +pr_err(No phy is available\n); What is this idiom about? Is not having a phy a fatal error in which case we should abort here, or is it tolerable in which case pr_err is too much. Checking phy_id is not required here, so we'll change to call mod_timer() simply. +static void spider_net_init_card(struct spider_net_card *); Same comment above forward declarations as above. Thank you, Kou Ishizaki Toshiba - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][RFC] tcp: fix ambiguity in the `before' relation
While looking at DCCP sequence numbers, I stumbled over a problem with the following definition of before in tcp.h: static inline int before(__u32 seq1, __u32 seq2) { return (__s32)(seq1-seq2) 0; } Problem: This definition suffers from an an ambiguity, i.e. always before(a, (a + 2^31) % 2^32)) = 1 before((a + 2^31) % 2^32), a) = 1 In text: when the difference between a and b amounts to 2^31, a is always considered `before' b, the function can not decide. The reason is that implicitly 0 is `before' 1 ... 2^31-1 ... 2^31 Solution: There is a simple fix, by defining before in such a way that 0 is no longer `before' 2^31, i.e. 0 `before' 1 ... 2^31-1 By not using the middle between 0 and 2^32, before can be made unambiguous. This is achieved by testing whether seq2-seq1 0 (using signed 32-bit arithmetic). I attach a patch to codify this. Also the `after' relation is basically a redefinition of `before', it is now defined as a macro after before. Signed-off-by: Gerrit Renker [EMAIL PROTECTED] --- tcp.h |9 ++--- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index c99774f..b7d8317 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -242,14 +242,9 @@ extern int tcp_memory_pressure; static inline int before(__u32 seq1, __u32 seq2) { -return (__s32)(seq1-seq2) 0; +return (__s32)(seq2-seq1) 0; } - -static inline int after(__u32 seq1, __u32 seq2) -{ - return (__s32)(seq2-seq1) 0; -} - +#define after(seq2, seq1) before(seq1, seq2) /* is s2=s1=s3 ? */ static inline int between(__u32 seq1, __u32 seq2, __u32 seq3) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] d80211: add sysfs interface for QoS functions
On Thu, 2006-12-14 at 12:23 +0100, Jiri Benc wrote: On Thu, 14 Dec 2006 12:03:02 +0800, Zhu Yi wrote: The sysfs interface here is only a proof of concept. It provides a way for the userspace applications to use the advanced QoS features supported by d80211 stack. The finial solution should be switched to cfg80211. So... what about implementing that into cfg80211? :-) I agree, we should put this into cfg80211/nl80211 from the start. It's not really hard to still have some things over WE and start using cfg/nl80211 now. johannes signature.asc Description: This is a digitally signed message part
[PATCH 3/3] d80211: fix workqueue breakage (v2)
d80211: fix workqueue breakage This patch updates d80211 to use the new workqueue API. Signed-off-by: Michael Wu [EMAIL PROTECTED] --- net/d80211/ieee80211.c |7 --- net/d80211/ieee80211_i.h |8 +--- net/d80211/ieee80211_iface.c |2 +- net/d80211/ieee80211_sta.c | 32 +++- net/d80211/sta_info.c|7 --- 5 files changed, 29 insertions(+), 27 deletions(-) diff --git a/net/d80211/ieee80211.c b/net/d80211/ieee80211.c index 6e10db5..76ee491 100644 --- a/net/d80211/ieee80211.c +++ b/net/d80211/ieee80211.c @@ -2092,13 +2092,13 @@ void ieee80211_if_shutdown(struct net_de case IEEE80211_IF_TYPE_IBSS: sdata-u.sta.state = IEEE80211_DISABLED; cancel_delayed_work(sdata-u.sta.work); - if (local-scan_work.data == sdata-dev) { + if (local-scan_dev == sdata-dev) { local-sta_scanning = 0; cancel_delayed_work(local-scan_work); flush_scheduled_work(); /* see comment in ieee80211_unregister_hw to * understand why this works */ - local-scan_work.data = NULL; + local-scan_dev = NULL; } else flush_scheduled_work(); break; @@ -4486,6 +4486,7 @@ struct ieee80211_hw *ieee80211_alloc_hw( INIT_LIST_HEAD(local-sub_if_list); spin_lock_init(local-generic_lock); + INIT_DELAYED_WORK(local-scan_work, ieee80211_sta_scan_work); init_timer(local-stat_timer); local-stat_timer.function = ieee80211_stat_refresh; local-stat_timer.data = (unsigned long) local; @@ -4686,7 +4687,7 @@ void ieee80211_unregister_hw(struct ieee if (local-stat_time) del_timer_sync(local-stat_timer); - if (local-scan_work.data) { + if (local-scan_dev) { local-sta_scanning = 0; cancel_delayed_work(local-scan_work); flush_scheduled_work(); diff --git a/net/d80211/ieee80211_i.h b/net/d80211/ieee80211_i.h index ef303da..b7b4b35 100644 --- a/net/d80211/ieee80211_i.h +++ b/net/d80211/ieee80211_i.h @@ -240,7 +240,7 @@ struct ieee80211_if_sta { IEEE80211_ASSOCIATE, IEEE80211_ASSOCIATED, IEEE80211_IBSS_SEARCH, IEEE80211_IBSS_JOINED } state; - struct work_struct work; + struct delayed_work work; u8 bssid[ETH_ALEN], prev_bssid[ETH_ALEN]; u8 ssid[IEEE80211_MAX_SSID_LEN]; size_t ssid_len; @@ -429,7 +429,8 @@ struct ieee80211_local { int scan_channel_idx; enum { SCAN_SET_CHANNEL, SCAN_SEND_PROBE } scan_state; unsigned long last_scan_completed; - struct work_struct scan_work; + struct delayed_work scan_work; + struct net_device *scan_dev; int scan_oper_channel; int scan_oper_channel_val; int scan_oper_power_level; @@ -638,7 +639,8 @@ int ieee80211_set_compression(struct iee struct net_device *dev, struct sta_info *sta); int ieee80211_init_client(struct net_device *dev); /* ieee80211_sta.c */ -void ieee80211_sta_work(void *ptr); +void ieee80211_sta_work(struct work_struct *work); +void ieee80211_sta_scan_work(struct work_struct *work); void ieee80211_sta_rx_mgmt(struct net_device *dev, struct sk_buff *skb, struct ieee80211_rx_status *rx_status); int ieee80211_sta_set_ssid(struct net_device *dev, char *ssid, size_t len); diff --git a/net/d80211/ieee80211_iface.c b/net/d80211/ieee80211_iface.c index 3e9d531..288dce5 100644 --- a/net/d80211/ieee80211_iface.c +++ b/net/d80211/ieee80211_iface.c @@ -185,7 +185,7 @@ void ieee80211_if_set_type(struct net_de struct ieee80211_if_sta *ifsta; ifsta = sdata-u.sta; - INIT_WORK(ifsta-work, ieee80211_sta_work, dev); + INIT_DELAYED_WORK(ifsta-work, ieee80211_sta_work); ifsta-capab = WLAN_CAPABILITY_ESS; ifsta-auth_algs = IEEE80211_AUTH_ALG_OPEN | diff --git a/net/d80211/ieee80211_sta.c b/net/d80211/ieee80211_sta.c index 04bd5cd..5df585a 100644 --- a/net/d80211/ieee80211_sta.c +++ b/net/d80211/ieee80211_sta.c @@ -1837,10 +1837,11 @@ static void ieee80211_sta_merge_ibss(str } -void ieee80211_sta_work(void *ptr) +void ieee80211_sta_work(struct work_struct *work) { - struct net_device *dev = ptr; - struct ieee80211_sub_if_data *sdata; + struct ieee80211_sub_if_data *sdata = + container_of(work, struct ieee80211_sub_if_data, u.sta.work.work); + struct net_device *dev = sdata-dev; struct ieee80211_if_sta *ifsta; if (!netif_running(dev)) @@ -2407,7 +2408,7 @@ static int ieee80211_active_scan(struct void ieee80211_scan_completed(struct ieee80211_hw *hw) { struct ieee80211_local *local =
Re: [PATCH 1/14] Spidernet DMA coalescing
On Thu, Dec 14, 2006 at 11:05:17AM +, Christoph Hellwig wrote: On Wed, Dec 13, 2006 at 03:06:59PM -0600, Linas Vepstas wrote: The current driver code performs 512 DMA mappings of a bunch of 32-byte ring descriptor structures. This is silly, as they are all in contiguous memory. This patch changes the code to dma_map_coherent() each rx/tx ring as a whole. It's acutally dma_alloc_coherent now that you updated the patch :) + chain-ring = dma_alloc_coherent(card-pdev-dev, alloc_size, + chain-dma_addr, GFP_KERNEL); + if (!chain-ring) + return -ENOMEM; + descr = chain-ring; + memset(descr, 0, alloc_size); dma_alloc_coherent is defined to zero the allocated memory, so you won't need this memset. Being unclear on the concept, should a send a new version of this patch, or should I send a new patch that removes this? --linas - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 12/14] Spidernet Avoid possible RX chain corruption
On Thu, Dec 14, 2006 at 11:22:43AM +1100, Michael Ellerman wrote: spider_net_refill_rx_chain(card); - spider_net_enable_rxchtails(card); spider_net_enable_rxdmac(card); return 0; Didn't you just add that line? Dagnabbit. The earlier pach was moving around existing code. Or, more precisely, trying to maintain the general function of the old code even while moving things around. Later on, when I started looking at what the danged function actually did, and the context it was in, I realized that it was a bad idea to call the thing. So then I removed it. :-/ How should I handle this proceedurally? Resend the patch sequence? Let it slide? --linas - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/14] Spidernet DMA coalescing
On Thu, Dec 14, 2006 at 11:07:37AM -0600, Linas Vepstas wrote: Being unclear on the concept, should a send a new version of this patch, or should I send a new patch that removes this? For just the memset issue an incremental patch would be fine. But given the small mistake in the patch description a resend with the fixed description mighrt be in order here. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 12/14] Spidernet Avoid possible RX chain corruption
On Thu, Dec 14, 2006 at 11:15:11AM -0600, Linas Vepstas wrote: On Thu, Dec 14, 2006 at 11:22:43AM +1100, Michael Ellerman wrote: spider_net_refill_rx_chain(card); - spider_net_enable_rxchtails(card); spider_net_enable_rxdmac(card); return 0; Didn't you just add that line? Dagnabbit. The earlier pach was moving around existing code. Or, more precisely, trying to maintain the general function of the old code even while moving things around. Later on, when I started looking at what the danged function actually did, and the context it was in, I realized that it was a bad idea to call the thing. So then I removed it. :-/ How should I handle this proceedurally? Resend the patch sequence? Let it slide? Just keep it as is in this case. In case you have to redo the patch series for some other reason or for similar cases in the future put the patch to remove things in front of the one that reorders the surrounding bits. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Revised: [PATCH 1/14] Spidernet DMA coalescing
Andrew, I'm hoping its not irritatingly obthersome to ask you to rip out the first patch of this series, and replace it with the one below. On Thu, Dec 14, 2006 at 05:35:34PM +, Christoph Hellwig wrote: On Thu, Dec 14, 2006 at 11:07:37AM -0600, Linas Vepstas wrote: Being unclear on the concept, should a send a new version of this patch, or should I send a new patch that removes this? For just the memset issue an incremental patch would be fine. But given the small mistake in the patch description a resend with the fixed description mighrt be in order here. --linas The current driver code performs 512 DMA mappings of a bunch of 32-byte ring descriptor structures. This is silly, as they are all in contiguous memory. This patch changes the code to dma_alloc_coherent() each rx/tx ring as a whole. Signed-off-by: Linas Vepstas [EMAIL PROTECTED] Cc: James K Lewis [EMAIL PROTECTED] Cc: Arnd Bergmann [EMAIL PROTECTED] drivers/net/spider_net.c | 101 +-- drivers/net/spider_net.h | 17 +- drivers/net/spider_net_ethtool.c |4 - 3 files changed, 52 insertions(+), 70 deletions(-) Index: linux-2.6.19-git7/drivers/net/spider_net.c === --- linux-2.6.19-git7.orig/drivers/net/spider_net.c 2006-12-13 14:23:11.0 -0600 +++ linux-2.6.19-git7/drivers/net/spider_net.c 2006-12-14 11:02:59.0 -0600 @@ -280,72 +280,65 @@ spider_net_free_chain(struct spider_net_ { struct spider_net_descr *descr; - for (descr = chain-tail; !descr-bus_addr; descr = descr-next) { - pci_unmap_single(card-pdev, descr-bus_addr, -SPIDER_NET_DESCR_SIZE, PCI_DMA_BIDIRECTIONAL); + descr = chain-ring; + do { descr-bus_addr = 0; - } + descr-next_descr_addr = 0; + descr = descr-next; + } while (descr != chain-ring); + + dma_free_coherent(card-pdev-dev, chain-num_desc, + chain-ring, chain-dma_addr); } /** - * spider_net_init_chain - links descriptor chain + * spider_net_init_chain - alloc and link descriptor chain * @card: card structure * @chain: address of chain - * @start_descr: address of descriptor array - * @no: number of descriptors * - * we manage a circular list that mirrors the hardware structure, + * We manage a circular list that mirrors the hardware structure, * except that the hardware uses bus addresses. * - * returns 0 on success, 0 on failure + * Returns 0 on success, 0 on failure */ static int spider_net_init_chain(struct spider_net_card *card, - struct spider_net_descr_chain *chain, - struct spider_net_descr *start_descr, - int no) + struct spider_net_descr_chain *chain) { int i; struct spider_net_descr *descr; dma_addr_t buf; + size_t alloc_size; - descr = start_descr; - memset(descr, 0, sizeof(*descr) * no); + alloc_size = chain-num_desc * sizeof (struct spider_net_descr); - /* set up the hardware pointers in each descriptor */ - for (i=0; ino; i++, descr++) { - descr-dmac_cmd_status = SPIDER_NET_DESCR_NOT_IN_USE; + chain-ring = dma_alloc_coherent(card-pdev-dev, alloc_size, + chain-dma_addr, GFP_KERNEL); - buf = pci_map_single(card-pdev, descr, -SPIDER_NET_DESCR_SIZE, -PCI_DMA_BIDIRECTIONAL); + if (!chain-ring) + return -ENOMEM; - if (pci_dma_mapping_error(buf)) - goto iommu_error; + /* Set up the hardware pointers in each descriptor */ + descr = chain-ring; + buf = chain-dma_addr; + for (i=0; i chain-num_desc; i++, descr++) { + descr-dmac_cmd_status = SPIDER_NET_DESCR_NOT_IN_USE; descr-bus_addr = buf; + descr-next_descr_addr = 0; descr-next = descr + 1; descr-prev = descr - 1; + buf += sizeof(struct spider_net_descr); } /* do actual circular list */ - (descr-1)-next = start_descr; - start_descr-prev = descr-1; + (descr-1)-next = chain-ring; + chain-ring-prev = descr-1; spin_lock_init(chain-lock); - chain-head = start_descr; - chain-tail = start_descr; - + chain-head = chain-ring; + chain-tail = chain-ring; return 0; - -iommu_error: - descr = start_descr; - for (i=0; i no; i++, descr++) - if (descr-bus_addr) - pci_unmap_single(card-pdev, descr-bus_addr, -SPIDER_NET_DESCR_SIZE, -PCI_DMA_BIDIRECTIONAL); - return -ENOMEM; } /** @@ -707,7 +700,7 @@
[PATCH 1/4] net: make dev_kfree_skb_irq not inline
Move the dev_kfree_skb_irq function from netdevice.h to dev.c for a couple of reasons. Primarily, I want to make softnet_data local to dev.c; also this function is called 300+ places already. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- linux-2.6.20-rc1.orig/include/linux/netdevice.h +++ linux-2.6.20-rc1/include/linux/netdevice.h @@ -676,20 +676,7 @@ static inline int netif_running(const st /* Use this variant when it is known for sure that it * is executing from interrupt context. */ -static inline void dev_kfree_skb_irq(struct sk_buff *skb) -{ - if (atomic_dec_and_test(skb-users)) { - struct softnet_data *sd; - unsigned long flags; - - local_irq_save(flags); - sd = __get_cpu_var(softnet_data); - skb-next = sd-completion_queue; - sd-completion_queue = skb; - raise_softirq_irqoff(NET_TX_SOFTIRQ); - local_irq_restore(flags); - } -} +extern void dev_kfree_skb_irq(struct sk_buff *skb); /* Use this variant in places where it could be invoked * either from interrupt or non-interrupt context. --- linux-2.6.20-rc1.orig/net/core/dev.c +++ linux-2.6.20-rc1/net/core/dev.c @@ -1141,6 +1141,21 @@ void dev_kfree_skb_any(struct sk_buff *s } EXPORT_SYMBOL(dev_kfree_skb_any); +void dev_kfree_skb_irq(struct sk_buff *skb) +{ + if (atomic_dec_and_test(skb-users)) { + struct softnet_data *sd; + unsigned long flags; + + local_irq_save(flags); + sd = __get_cpu_var(softnet_data); + skb-next = sd-completion_queue; + sd-completion_queue = skb; + raise_softirq_irqoff(NET_TX_SOFTIRQ); + local_irq_restore(flags); + } +} +EXPORT_SYMBOL(dev_kfree_skb_irq); /* Hot-plugging. */ void netif_device_detach(struct net_device *dev) -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] net: uninline netif_rx_reschedule
Move netif_rx_reschedule out of line, so that softnet_data can be made local. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- linux-2.6.20-rc1.orig/include/linux/netdevice.h +++ linux-2.6.20-rc1/include/linux/netdevice.h @@ -851,21 +851,7 @@ static inline void netif_rx_schedule(str /* Try to reschedule poll. Called by dev-poll() after netif_rx_complete(). * Do not inline this? */ -static inline int netif_rx_reschedule(struct net_device *dev, int undo) -{ - if (netif_rx_schedule_prep(dev)) { - unsigned long flags; - - dev-quota += undo; - - local_irq_save(flags); - list_add_tail(dev-poll_list, __get_cpu_var(softnet_data).poll_list); - __raise_softirq_irqoff(NET_RX_SOFTIRQ); - local_irq_restore(flags); - return 1; - } - return 0; -} +extern int netif_rx_reschedule(struct net_device *dev, int undo); /* Remove interface from poll list: it must be in the poll list * on current cpu. This primitive is called by dev-poll(), when --- linux-2.6.20-rc1.orig/net/core/dev.c +++ linux-2.6.20-rc1/net/core/dev.c @@ -1132,6 +1132,23 @@ void __netif_rx_schedule(struct net_devi } EXPORT_SYMBOL(__netif_rx_schedule); +int netif_rx_reschedule(struct net_device *dev, int undo) +{ + if (netif_rx_schedule_prep(dev)) { + unsigned long flags; + + dev-quota += undo; + + local_irq_save(flags); + list_add_tail(dev-poll_list, __get_cpu_var(softnet_data).poll_list); + __raise_softirq_irqoff(NET_RX_SOFTIRQ); + local_irq_restore(flags); + return 1; + } + return 0; +} +EXPORT_SYMBOL(netif_rx_reschedule); + void dev_kfree_skb_any(struct sk_buff *skb) { if (in_irq() || irqs_disabled()) -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
NAPI wait before enabling irq's [was Re: [Cbe-oss-dev] Spider DMA wrongness]
On Wed, Nov 08, 2006 at 07:38:12AM +1100, Benjamin Herrenschmidt wrote: What about Linas patches to do interrupt mitigation with NAPI polling ? That didn't end up working ? It seems to be working as designed, which is different than working as naively expected. For large packets: -- a packet comes in -- rx interrupt generated -- rx interrupts turned off -- tcp poll function runs, receives packet -- completes all work before next packet has arrived, so interupts are turned back on. -- go to start This results in a high number of interrupts, and a high cpu usage. We were able to prove that napi works by stalling in the poll function just long enough to allow the next packet to arrive. In this case, napi works great, and number of irqs is vastly reduced. Unfortunately, I could not figure out any simple way of turning this into acceptable code. I can't just wait a little bit before turning on interrupts. Some network apps, such as netpipe, want to receive something before sending the next thing. Without the interrupt, the packet just sits there, and the OS doesn't realize (until milliseconds later) that there's a packet that can be handled. This is a variant of the so-called rotting packet discussed in the napi docs. What is needed is for the tcp stack to wait for 1500Bytes / (1Gbit/sec) = 12 microsecs and then poll again. If there are *still* no new packets, then and only then do we re-enable interrupts. This would require a new napi. Presuming the network stack folks find it even remotely acceptable. --linas - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] net: rearrange functions in netdevice.h
Use existing inline functions rather than having multiple copies of same code. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- linux-2.6.20-rc1.orig/include/linux/netdevice.h +++ linux-2.6.20-rc1/include/linux/netdevice.h @@ -615,9 +615,14 @@ static inline int unregister_gifconf(uns extern void __netif_schedule(struct net_device *dev); +static inline int netif_queue_stopped(const struct net_device *dev) +{ + return test_bit(__LINK_STATE_XOFF, dev-state); +} + static inline void netif_schedule(struct net_device *dev) { - if (!test_bit(__LINK_STATE_XOFF, dev-state)) + if (!netif_queue_stopped(dev)) __netif_schedule(dev); } @@ -645,11 +650,6 @@ static inline void netif_stop_queue(stru set_bit(__LINK_STATE_XOFF, dev-state); } -static inline int netif_queue_stopped(const struct net_device *dev) -{ - return test_bit(__LINK_STATE_XOFF, dev-state); -} - static inline int netif_running(const struct net_device *dev) { return test_bit(__LINK_STATE_START, dev-state); @@ -841,15 +841,20 @@ extern int netif_rx_reschedule(struct ne * it completes the work. The device cannot be out of poll list at this * moment, it is BUG(). */ -static inline void netif_rx_complete(struct net_device *dev) +static inline void __netif_rx_complete(struct net_device *dev) { - unsigned long flags; - - local_irq_save(flags); BUG_ON(!test_bit(__LINK_STATE_RX_SCHED, dev-state)); list_del(dev-poll_list); smp_mb__before_clear_bit(); clear_bit(__LINK_STATE_RX_SCHED, dev-state); +} + +static inline void netif_rx_complete(struct net_device *dev) +{ + unsigned long flags; + + local_irq_save(flags); + __netif_rx_complete(dev); local_irq_restore(flags); } @@ -865,17 +870,6 @@ static inline void netif_poll_enable(str clear_bit(__LINK_STATE_RX_SCHED, dev-state); } -/* same as netif_rx_complete, except that local_irq_save(flags) - * has already been issued - */ -static inline void __netif_rx_complete(struct net_device *dev) -{ - BUG_ON(!test_bit(__LINK_STATE_RX_SCHED, dev-state)); - list_del(dev-poll_list); - smp_mb__before_clear_bit(); - clear_bit(__LINK_STATE_RX_SCHED, dev-state); -} - static inline void netif_tx_lock(struct net_device *dev) { spin_lock(dev-_xmit_lock); -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] net: move softnet_data
Make softnet_data local to dev.c. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- linux-2.6.20-rc1.orig/include/linux/netdevice.h +++ linux-2.6.20-rc1/include/linux/netdevice.h @@ -600,6 +600,9 @@ extern int dev_restart(struct net_devic #ifdef CONFIG_NETPOLL_TRAP extern int netpoll_trap(void); #endif +#ifdef CONFIG_NETPOLL +extern voidnetpoll_do_completion(void); +#endif typedef int gifconf_func_t(struct net_device * dev, char __user * bufptr, int len); extern int register_gifconf(unsigned int family, gifconf_func_t * gifconf); @@ -608,26 +611,6 @@ static inline int unregister_gifconf(uns return register_gifconf(family, NULL); } -/* - * Incoming packets are placed on per-cpu queues so that - * no locking is needed. - */ - -struct softnet_data -{ - struct net_device *output_queue; - struct sk_buff_head input_pkt_queue; - struct list_headpoll_list; - struct sk_buff *completion_queue; - - struct net_device backlog_dev;/* Sorry. 8) */ -#ifdef CONFIG_NET_DMA - struct dma_chan *net_dma; -#endif -}; - -DECLARE_PER_CPU(struct softnet_data,softnet_data); - #define HAVE_NETIF_QUEUE extern void __netif_schedule(struct net_device *dev); --- linux-2.6.20-rc1.orig/net/core/dev.c +++ linux-2.6.20-rc1/net/core/dev.c @@ -203,10 +203,23 @@ static inline struct hlist_head *dev_ind static RAW_NOTIFIER_HEAD(netdev_chain); /* - * Device drivers call our routines to queue packets here. We empty the - * queue in the local softnet handler. + * Incoming packets are placed on per-cpu queues so that + * no locking is needed. */ -DEFINE_PER_CPU(struct softnet_data, softnet_data) = { NULL }; +struct softnet_data +{ + struct net_device *output_queue; + struct sk_buff_head input_pkt_queue; + struct list_headpoll_list; + struct sk_buff *completion_queue; + + struct net_device backlog_dev;/* Sorry. 8) */ +#ifdef CONFIG_NET_DMA + struct dma_chan *net_dma; +#endif +}; + +static DEFINE_PER_CPU(struct softnet_data, softnet_data); #ifdef CONFIG_SYSFS extern int netdev_sysfs_init(void); @@ -1673,6 +1686,34 @@ static inline struct net_device *skb_bon return dev; } +#ifdef CONFIG_NETPOLL +void netpoll_do_completion(void) +{ + unsigned long flags; + struct softnet_data *sd = get_cpu_var(softnet_data); + + if (sd-completion_queue) { + struct sk_buff *clist; + + local_irq_save(flags); + clist = sd-completion_queue; + sd-completion_queue = NULL; + local_irq_restore(flags); + + while (clist != NULL) { + struct sk_buff *skb = clist; + clist = clist-next; + if (skb-destructor) + dev_kfree_skb_any(skb); /* put this one back */ + else + __kfree_skb(skb); + } + } + + put_cpu_var(softnet_data); +} +#endif + static void net_tx_action(struct softirq_action *h) { struct softnet_data *sd = __get_cpu_var(softnet_data); --- linux-2.6.20-rc1.orig/net/core/netpoll.c +++ linux-2.6.20-rc1/net/core/netpoll.c @@ -47,7 +47,6 @@ static atomic_t trapped; (MAX_UDP_CHUNK + sizeof(struct udphdr) + \ sizeof(struct iphdr) + sizeof(struct ethhdr)) -static void zap_completion_queue(void); static void arp_reply(struct sk_buff *skb); static void queue_process(struct work_struct *work) @@ -162,7 +161,7 @@ void netpoll_poll(struct netpoll *np) service_arp_queue(np-dev-npinfo); - zap_completion_queue(); + netpoll_do_completion(); } static void refill_skbs(void) @@ -181,7 +180,7 @@ static void refill_skbs(void) spin_unlock_irqrestore(skb_pool.lock, flags); } -static void zap_completion_queue(void) +static void netpoll_do_completion(void) { unsigned long flags; struct softnet_data *sd = get_cpu_var(softnet_data); @@ -212,7 +211,7 @@ static struct sk_buff *find_skb(struct n int count = 0; struct sk_buff *skb; - zap_completion_queue(); + netpoll_do_completion(); refill_skbs(); repeat: -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] network device interface cleanups
This set of patches makes softnet_data local to dev.c and does some code cleanups, no API changes. -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NAPI wait before enabling irq's [was Re: [Cbe-oss-dev] Spider DMA wrongness]
On Thu, 14 Dec 2006, Linas Vepstas wrote: On Wed, Nov 08, 2006 at 07:38:12AM +1100, Benjamin Herrenschmidt wrote: What about Linas patches to do interrupt mitigation with NAPI polling ? That didn't end up working ? It seems to be working as designed, which is different than working as naively expected. For large packets: -- a packet comes in -- rx interrupt generated -- rx interrupts turned off -- tcp poll function runs, receives packet -- completes all work before next packet has arrived, so interupts are turned back on. -- go to start This results in a high number of interrupts, and a high cpu usage. We were able to prove that napi works by stalling in the poll function just long enough to allow the next packet to arrive. In this case, napi works great, and number of irqs is vastly reduced. This sounds awfully familiar. We went through the same with the tg3 driver on Altix. In that case we succeeded getting interrupt coalescence added to the driver, which ended up working pretty well for us. See the thread beginning with: http://oss.sgi.com/archives/netdev/2005-05/msg00497.html if you're interested. As for the stalling NAPI idea, Jamal did a bit of work with that idea and wrote it up in: www.kernel.org/pub/linux/kernel/people/hadi/docs/UKUUG2005.pdf -- Arthur - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.20-rc1 sky2 problems (regression?)
On Thu, 14 Dec 2006 12:47:05 -0800 Alex Romosan [EMAIL PROTECTED] wrote: under heavy network load the sky2 driver (compiled in the kernel) locks up and the only way i can get the network back is to reboot the machine (bringing the network down and back up again doesn't help). this happens on an amd64 machine (athlon 3500+ processor) and the card in question is a Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 15) (from lspci). this is what i see in the syslog: kernel: sky2 eth0: rx error, status 0x414a414a length 0 kernel: eth0: hw csum failure. kernel: kernel: Call Trace: kernel: IRQ [8044681c] __skb_checksum_complete+0x4d/0x66 kernel: [80477bc5] tcp_v4_rcv+0x147/0x8ea kernel: [80479ef2] raw_rcv_skb+0x9/0x20 kernel: [8047a2ff] raw_rcv+0xbe/0xc4 kernel: [8045ea9d] ip_local_deliver+0x170/0x21b kernel: [8045e8fa] ip_rcv+0x478/0x4ab kernel: [8044905d] netif_receive_skb+0x184/0x20e kernel: [803de8e5] sky2_poll+0x68f/0x93c kernel: [802219ce] scheduler_tick+0x23/0x2f9 kernel: [8044a796] net_rx_action+0x61/0xf0 kernel: [8022a35f] __do_softirq+0x40/0x8a kernel: [8020a3cc] call_softirq+0x1c/0x28 kernel: [8020bbf0] do_softirq+0x2c/0x7d kernel: [8022a313] irq_exit+0x36/0x42 kernel: [8020bebe] do_IRQ+0x8c/0x9e kernel: [80208710] default_idle+0x0/0x3a kernel: [80209bf1] ret_from_intr+0x0/0xa kernel: EOI [80208736] default_idle+0x26/0x3a kernel: [8020878c] cpu_idle+0x42/0x75 kernel: [805df675] start_kernel+0x1ce/0x1d3 kernel: [805df140] _sinittext+0x140/0x144 kernel: kernel: eth0: hw csum failure. kernel: kernel: Call Trace: kernel: IRQ [8044681c] __skb_checksum_complete+0x4d/0x66 kernel: [80477bc5] tcp_v4_rcv+0x147/0x8ea kernel: [80479ef2] raw_rcv_skb+0x9/0x20 kernel: [8047a2ff] raw_rcv+0xbe/0xc4 kernel: [8045ea9d] ip_local_deliver+0x170/0x21b kernel: [8045e8fa] ip_rcv+0x478/0x4ab kernel: [8044905d] netif_receive_skb+0x184/0x20e kernel: [803de8e5] sky2_poll+0x68f/0x93c kernel: [80474647] tcp_delack_timer+0x0/0x1b5 kernel: [8044a796] net_rx_action+0x61/0xf0 kernel: [8022a35f] __do_softirq+0x40/0x8a kernel: [8020a3cc] call_softirq+0x1c/0x28 kernel: [8020bbf0] do_softirq+0x2c/0x7d kernel: [8022a313] irq_exit+0x36/0x42 kernel: [8020bebe] do_IRQ+0x8c/0x9e kernel: [80209bf1] ret_from_intr+0x0/0xa kernel: EOI [802a8402] inode2sd+0x104/0x117 kernel: [802b8cfa] search_by_key+0xa08/0xbfe kernel: [802b8475] search_by_key+0x183/0xbfe kernel: [80284778] ll_rw_block+0x89/0x9e kernel: [802b8475] search_by_key+0x183/0xbfe kernel: [80283cf5] __find_get_block_slow+0x101/0x10d kernel: [80284053] __find_get_block+0x197/0x1a5 kernel: [8026800c] inode_get_bytes+0x2a/0x52 kernel: [802a89f1] reiserfs_update_sd_size+0x7e/0x284 kernel: [80237700] kthread+0xed/0xfd kernel: [802be990] do_journal_end+0x34b/0xbdd kernel: [802b1729] reiserfs_dirty_inode+0x56/0x76 kernel: [80284c19] block_prepare_write+0x1a/0x24 kernel: [802809b1] __mark_inode_dirty+0x29/0x197 kernel: [802a8d04] reiserfs_commit_write+0x10d/0x19f kernel: [80284c19] block_prepare_write+0x1a/0x24 kernel: [802484fc] generic_file_buffered_write+0x4ad/0x6c4 kernel: [80271b3c] __pollwait+0x0/0xe0 kernel: [8022a006] current_fs_time+0x35/0x3b kernel: [80248a8c] __generic_file_aio_write_nolock+0x379/0x3ec kernel: [8049baca] unix_dgram_recvmsg+0x1be/0x1d9 kernel: [804b6516] __mutex_lock_slowpath+0x205/0x210 kernel: [80248b60] generic_file_aio_write+0x61/0xc1 kernel: [80248aff] generic_file_aio_write+0x0/0xc1 kernel: [80264e57] do_sync_readv_writev+0xc0/0x107 kernel: [802377f7] autoremove_wake_function+0x0/0x2e kernel: [80229d16] getnstimeofday+0x10/0x28 kernel: [80264ced] rw_copy_check_uvector+0x6c/0xdc kernel: [802654f7] do_readv_writev+0xb2/0x18b kernel: [80265a2c] sys_writev+0x45/0x93 kernel: [802096de] system_call+0x7e/0x83 and so on. some times i don't get this trace but instead i get: kernel: sky2 eth0: tx timeout kernel: sky2 eth0: transmit ring 140 .. 99 report=181 done=181 kernel: sky2 status report lost? kernel: NETDEV WATCHDOG: eth0: transmit timed out kernel: sky2 eth0: tx timeout kernel: sky2 eth0: transmit ring 181 .. 140 report=181 done=181 kernel: sky2 hardware hung? flushing but the end result is the same, the network card stops responding and i have to reboot the machine. i can reproduce this on a consistent basis so if there are any
Re: 2.6.20-rc1 sky2 problems (regression?)
Stephen Hemminger [EMAIL PROTECTED] writes: On Thu, 14 Dec 2006 12:47:05 -0800 Alex Romosan [EMAIL PROTECTED] wrote: under heavy network load the sky2 driver (compiled in the kernel) locks up and the only way i can get the network back is to reboot the machine (bringing the network down and back up again doesn't help). this happens on an amd64 machine (athlon 3500+ processor) and the card in question is a Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 15) (from lspci). this is what i see in the syslog: kernel: sky2 eth0: rx error, status 0x414a414a length 0 kernel: eth0: hw csum failure. kernel: kernel: Call Trace: kernel: IRQ [8044681c] __skb_checksum_complete+0x4d/0x66 kernel: [80477bc5] tcp_v4_rcv+0x147/0x8ea kernel: [80479ef2] raw_rcv_skb+0x9/0x20 kernel: [8047a2ff] raw_rcv+0xbe/0xc4 kernel: [8045ea9d] ip_local_deliver+0x170/0x21b kernel: [8045e8fa] ip_rcv+0x478/0x4ab kernel: [8044905d] netif_receive_skb+0x184/0x20e kernel: [803de8e5] sky2_poll+0x68f/0x93c kernel: [802219ce] scheduler_tick+0x23/0x2f9 kernel: [8044a796] net_rx_action+0x61/0xf0 kernel: [8022a35f] __do_softirq+0x40/0x8a kernel: [8020a3cc] call_softirq+0x1c/0x28 kernel: [8020bbf0] do_softirq+0x2c/0x7d kernel: [8022a313] irq_exit+0x36/0x42 kernel: [8020bebe] do_IRQ+0x8c/0x9e kernel: [80208710] default_idle+0x0/0x3a kernel: [80209bf1] ret_from_intr+0x0/0xa kernel: EOI [80208736] default_idle+0x26/0x3a kernel: [8020878c] cpu_idle+0x42/0x75 kernel: [805df675] start_kernel+0x1ce/0x1d3 kernel: [805df140] _sinittext+0x140/0x144 kernel: kernel: eth0: hw csum failure. kernel: kernel: Call Trace: kernel: IRQ [8044681c] __skb_checksum_complete+0x4d/0x66 kernel: [80477bc5] tcp_v4_rcv+0x147/0x8ea kernel: [80479ef2] raw_rcv_skb+0x9/0x20 kernel: [8047a2ff] raw_rcv+0xbe/0xc4 kernel: [8045ea9d] ip_local_deliver+0x170/0x21b kernel: [8045e8fa] ip_rcv+0x478/0x4ab kernel: [8044905d] netif_receive_skb+0x184/0x20e kernel: [803de8e5] sky2_poll+0x68f/0x93c kernel: [80474647] tcp_delack_timer+0x0/0x1b5 kernel: [8044a796] net_rx_action+0x61/0xf0 kernel: [8022a35f] __do_softirq+0x40/0x8a kernel: [8020a3cc] call_softirq+0x1c/0x28 kernel: [8020bbf0] do_softirq+0x2c/0x7d kernel: [8022a313] irq_exit+0x36/0x42 kernel: [8020bebe] do_IRQ+0x8c/0x9e kernel: [80209bf1] ret_from_intr+0x0/0xa kernel: EOI [802a8402] inode2sd+0x104/0x117 kernel: [802b8cfa] search_by_key+0xa08/0xbfe kernel: [802b8475] search_by_key+0x183/0xbfe kernel: [80284778] ll_rw_block+0x89/0x9e kernel: [802b8475] search_by_key+0x183/0xbfe kernel: [80283cf5] __find_get_block_slow+0x101/0x10d kernel: [80284053] __find_get_block+0x197/0x1a5 kernel: [8026800c] inode_get_bytes+0x2a/0x52 kernel: [802a89f1] reiserfs_update_sd_size+0x7e/0x284 kernel: [80237700] kthread+0xed/0xfd kernel: [802be990] do_journal_end+0x34b/0xbdd kernel: [802b1729] reiserfs_dirty_inode+0x56/0x76 kernel: [80284c19] block_prepare_write+0x1a/0x24 kernel: [802809b1] __mark_inode_dirty+0x29/0x197 kernel: [802a8d04] reiserfs_commit_write+0x10d/0x19f kernel: [80284c19] block_prepare_write+0x1a/0x24 kernel: [802484fc] generic_file_buffered_write+0x4ad/0x6c4 kernel: [80271b3c] __pollwait+0x0/0xe0 kernel: [8022a006] current_fs_time+0x35/0x3b kernel: [80248a8c] __generic_file_aio_write_nolock+0x379/0x3ec kernel: [8049baca] unix_dgram_recvmsg+0x1be/0x1d9 kernel: [804b6516] __mutex_lock_slowpath+0x205/0x210 kernel: [80248b60] generic_file_aio_write+0x61/0xc1 kernel: [80248aff] generic_file_aio_write+0x0/0xc1 kernel: [80264e57] do_sync_readv_writev+0xc0/0x107 kernel: [802377f7] autoremove_wake_function+0x0/0x2e kernel: [80229d16] getnstimeofday+0x10/0x28 kernel: [80264ced] rw_copy_check_uvector+0x6c/0xdc kernel: [802654f7] do_readv_writev+0xb2/0x18b kernel: [80265a2c] sys_writev+0x45/0x93 kernel: [802096de] system_call+0x7e/0x83 and so on. some times i don't get this trace but instead i get: kernel: sky2 eth0: tx timeout kernel: sky2 eth0: transmit ring 140 .. 99 report=181 done=181 kernel: sky2 status report lost? kernel: NETDEV WATCHDOG: eth0: transmit timed out kernel: sky2 eth0: tx timeout kernel: sky2 eth0: transmit ring 181 .. 140 report=181 done=181 kernel: sky2 hardware hung? flushing Pleas report these problems to netdev@vger.kernel.org, I rarely go looking in LKML. These are the things you need to
Re: [PATCH 4/4][SCTP]: Change adaption - adaptation as per the latest API draft.
On Wed, 2006-12-13 at 18:03 -0800, David Miller wrote: From: Sridhar Samudrala [EMAIL PROTECTED] Date: Wed, 13 Dec 2006 17:38:52 -0800 These parameters are not used by user-space apps. They define the parameters used by the protocol in SCTP headers that go on wire. There is no __KERNEL__ ifdef protection for these defines, and the linux/sctp.h header is exported to userspace via include/linux/Kbuild, therefore the interface is exposed to userspace and you cannot break it. I didn't know that all the files under include/linux are exported to userspace. AFAIK, i don't think there are any SCTP apps that use this file. But if you say that we shouldn't remove any of the APIs in linux/sctp.h, i am OK with keeping the existing ones and adding new ones with the changed name. Now that 2.6.20-rc1 is out, should i wait until 2.6.21 tree to open for re-submission or is there a window still open for 2.6.20? Thanks Sridhar - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.20-rc1 sky2 problems (regression?)
On Thu, 14 Dec 2006 14:25:06 -0800 Alex Romosan [EMAIL PROTECTED] wrote: Stephen Hemminger [EMAIL PROTECTED] writes: 4) What is the IRQ routing? There are two issues here, first the driver will never work with edge trigger IRQ's, some motherboards also have busted BIOS and chipsets that don't do MSI properly. A couple of module parameters are available to help: disable_msi=1 avoids using MSI idle_timeout=10 polls for lost IRQ's every N ms (10) i didn't take long to lock up the machine again. i've rebooted back into stock 2.6.20-rc1 and added the two module parameters above. cat /proc/interrupts now gives me: 17:203 IO-APIC-fasteoi eth0, CMI8738 so i guess the MSI interrupts are disabled. we'll see how this works. probably won't do much but now the IRQ ends up shared. 5) What are the messages in the console log when problem happens? kernel: NETDEV WATCHDOG: eth0: transmit timed out kernel: sky2 eth0: tx timeout kernel: sky2 eth0: transmit ring 402 .. 361 report=406 done=406 kernel: sky2 status report lost? The transmit timeout code trys to be smart, but doesn't really recover properly if hardware is stuck. 7) Please get a current version of ethtool from: git://git.kernel.org/pub/scm/network/ethtool/ethtool.git and run ethtool register dump after a problem occurs: ethtool -d eth0 this is the output after it stopped working: PCI config -- 00: ab 11 62 43 07 04 18 00 15 00 00 02 08 00 00 00 10: 04 c0 df fd 00 00 00 00 01 ce 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 62 14 8c 05 30: 00 00 00 00 48 00 00 00 00 00 00 00 03 01 00 00 40: 00 00 f0 01 00 80 a0 01 01 50 02 fe 00 20 00 14 50: 03 5c 00 80 00 00 00 01 00 00 00 01 05 e0 83 00 60: 0c 10 e0 fe 00 00 00 00 61 41 00 00 00 00 00 00 70: 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Control Registers - Register Access Port 0x00 LED Control/Status 0xA603164A Interrupt Source 0x4000 Interrupt Mask 0xC01D Interrupt Hardware Error Source 0x Interrupt Hardware Error Mask0x2E003F3F Bus Management Unit --- CSR Receive Queue 1 0x0001 CSR Sync Queue 1 0x CSR Async Queue 10x MAC Addresses --- Addr 100 11 09 DA 39 A3 Addr 200 11 09 DA 39 A3 Addr 300 00 00 00 00 00 Connector type 0x4A (J) PMD type 0x54 (T) PHY type 0x80 Chip Id 0xB6 Yukon-2 EC (rev 0) Ram Buffer 0x0C Status BMU: --- Control0x0002220A Last Index 0x07FF Put Index 0x0601 List Address 0x7FBF8000 Transmit 1 done index 0x0196 Transmit index threshold 0x000A Status FIFO Write Pointer0x16 Read Pointer 0x16 Level0x00 Watermark0x10 ISR Watermark0x10 Status level Init 0x30D4 Value 0x0D00 Test 0x04 Control 0x02 TX status Init 0x0001E848 Value 0x0001E848 Test 0x04 Control 0x02 ISR Init 0x09C4 Value 0x09C4 Test 0x04 Control 0x02 GMAC control 0x005A GPHY control 0x2002 LINK control 0x02 GMAC 1 Status 0xD000 Control 0x1800 Transmit 0x1000 Receive 0xE000 Transmit flow control0x Transmit parameter 0xD7C4 Serial mode 0x221E Source address: 00 11 09 DA 39 A3 Physical address: 00 11 09 DA 39 A3 Rx GMAC 1 End Address 0x007F Almost Full Thresh 0x0070 Control/Test 0x0900228A FIFO Flush Mask 0x18FB FIFO Flush Threshold 0x000B Truncation Threshold 0x017C Upper Pause Threshold0x Lower Pause Threshold0x0081 VLAN Tag 0x0074 FIFO Write Pointer 0x FIFO Write Level 0x007B FIFO Read Pointer0x FIFO Read Level 0x0079 Tx GMAC 1 End Address 0x007F Almost Full Thresh 0x0010 Control/Test 0x0102220A FIFO Flush Mask 0x FIFO Flush Threshold 0x Truncation Threshold 0x Upper Pause Threshold0x Lower Pause Threshold0x0081 VLAN Tag
Re: NAPI wait before enabling irq's [was Re: [Cbe-oss-dev] Spider DMA wrongness]
On Thu, Dec 14, 2006 at 12:51:14PM -0800, [EMAIL PROTECTED] wrote: On Thu, 14 Dec 2006, Linas Vepstas wrote: On Wed, Nov 08, 2006 at 07:38:12AM +1100, Benjamin Herrenschmidt wrote: What about Linas patches to do interrupt mitigation with NAPI polling ? That didn't end up working ? It seems to be working as designed, which is different than working as naively expected. For large packets: -- a packet comes in -- rx interrupt generated -- rx interrupts turned off -- tcp poll function runs, receives packet -- completes all work before next packet has arrived, so interupts are turned back on. -- go to start This results in a high number of interrupts, and a high cpu usage. We were able to prove that napi works by stalling in the poll function just long enough to allow the next packet to arrive. In this case, napi works great, and number of irqs is vastly reduced. This sounds awfully familiar. We went through the same with the tg3 driver on Altix. In that case we succeeded getting interrupt coalescence added to the driver, which ended up working pretty well for us. See the thread beginning with: http://oss.sgi.com/archives/netdev/2005-05/msg00497.html if you're interested. I'm interested. The tg3 seems to have hardware coalescing, which, from what I can tell, is a way of delaying an RX interrupt for some number of microseconds? I assume there's nothing more to it than that? The spider has some suggestively named registers and functions, hinting that it can similarly delay an RX interupt, but the docs are opaque and mysteriously worded, so I cannot really tell. Perhaps Ishizaki Kou can clue us in? As for the stalling NAPI idea, Jamal did a bit of work with that idea and wrote it up in: www.kernel.org/pub/linux/kernel/people/hadi/docs/UKUUG2005.pdf Reading now ... --linas - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.20-rc1 sky2 problems (regression?)
Stephen Hemminger [EMAIL PROTECTED] writes: Another useful bit of information is the statistics (ethtool -S eth0). When there were flow control bugs, they would show up as count of 1. we'll see if the machine locks up again. Are you doing jumbo frames (MTU 1500)? no (or at least i don't think so). how can i tell? assuming the machine doesn't lock up with msi interrupts disabled, do you want me to do anything to debug why the driver locks up when the msi interrupts are enabled? --alex-- -- | I believe the moment is at hand when, by a paranoiac and active | | advance of the mind, it will be possible (simultaneously with | | automatism and other passive states) to systematize confusion | | and thus to help to discredit completely the world of reality. | - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4][SCTP]: Change adaption - adaptation as per the latest API draft.
From: Sridhar Samudrala [EMAIL PROTECTED] Date: Thu, 14 Dec 2006 14:22:16 -0800 On Wed, 2006-12-13 at 18:03 -0800, David Miller wrote: From: Sridhar Samudrala [EMAIL PROTECTED] Date: Wed, 13 Dec 2006 17:38:52 -0800 These parameters are not used by user-space apps. They define the parameters used by the protocol in SCTP headers that go on wire. There is no __KERNEL__ ifdef protection for these defines, and the linux/sctp.h header is exported to userspace via include/linux/Kbuild, therefore the interface is exposed to userspace and you cannot break it. I didn't know that all the files under include/linux are exported to userspace. Not all of them, only select ones specified in the Kbuild file. If these structures and defines are meant for kernel-only, or only partially so, you should either annotate linux/sctp.h with appropriate __KERNEL__ ifdefs, or remove the header file from include/linux/Kbuild You cannot remove the file from Kbuild if it somehow is required by your SCTP user.h header file, for example. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] net: make dev_kfree_skb_irq not inline
From: Christoph Hellwig [EMAIL PROTECTED] Date: Thu, 14 Dec 2006 22:30:09 + Maybe you should only move the slowpath out of line ala: static inline void dev_kfree_skb_irq(struct sk_buff *skb) { if (atomic_dec_and_test(skb-users)) __dev_kfree_skb_irq(skb); } The atomic operation all by itself is either a function call or a 6-7 instruction sequence, so the inlining doesn't make sense even in this case. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.20-rc1 sky2 problems (regression?)
Stephen Hemminger [EMAIL PROTECTED] writes: Another useful bit of information is the statistics (ethtool -S eth0). When there were flow control bugs, they would show up as count of 1. the driver locked up again, even with msi interrupts disabled and idle_timeout=10. the console message was pretty much as before: kernel: NETDEV WATCHDOG: eth0: transmit timed out kernel: sky2 eth0: tx timeout kernel: sky2 eth0: transmit ring 336 .. 296 report=336 done=336 kernel: sky2 hardware hung? flushing kernel: NETDEV WATCHDOG: eth0: transmit timed out kernel: sky2 eth0: tx timeout kernel: sky2 eth0: transmit ring 296 .. 255 report=336 done=336 kernel: sky2 status report lost? and this is the output from ethtool -S: NIC statistics: tx_bytes: 3092123897 rx_bytes: 546577898 tx_broadcast: 20 rx_broadcast: 4376 tx_multicast: 0 rx_multicast: 459 tx_unicast: 2585993 rx_unicast: 1550758 tx_mac_pause: 1 rx_mac_pause: 0 collisions: 0 late_collision: 0 aborted: 0 single_collisions: 0 multi_collisions: 0 rx_short: 0 rx_runt: 0 rx_64_byte_packets: 850693 rx_65_to_127_byte_packets: 297029 rx_128_to_255_byte_packets: 62116 rx_256_to_511_byte_packets: 28795 rx_512_to_1023_byte_packets: 31357 rx_1024_to_1518_byte_packets: 285603 rx_1518_to_max_byte_packets: 0 rx_too_long: 0 rx_fifo_overflow: 0 rx_jabber: 0 rx_fcs_error: 0 tx_64_byte_packets: 194159 tx_65_to_127_byte_packets: 239961 tx_128_to_255_byte_packets: 48148 tx_256_to_511_byte_packets: 27635 tx_512_to_1023_byte_packets: 95557 tx_1024_to_1518_byte_packets: 1980554 tx_1519_to_max_byte_packets: 0 tx_fifo_underrun: 0 time to try the vendor driver and see if that provides any clues. --alex-- -- | I believe the moment is at hand when, by a paranoiac and active | | advance of the mind, it will be possible (simultaneously with | | automatism and other passive states) to systematize confusion | | and thus to help to discredit completely the world of reality. | - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.20-rc1 sky2 problems (regression?)
On Thu, 14 Dec 2006 15:21:00 -0800 Alex Romosan [EMAIL PROTECTED] wrote: Stephen Hemminger [EMAIL PROTECTED] writes: Another useful bit of information is the statistics (ethtool -S eth0). When there were flow control bugs, they would show up as count of 1. the driver locked up again, even with msi interrupts disabled and idle_timeout=10. the console message was pretty much as before: kernel: NETDEV WATCHDOG: eth0: transmit timed out kernel: sky2 eth0: tx timeout kernel: sky2 eth0: transmit ring 336 .. 296 report=336 done=336 kernel: sky2 hardware hung? flushing kernel: NETDEV WATCHDOG: eth0: transmit timed out kernel: sky2 eth0: tx timeout kernel: sky2 eth0: transmit ring 296 .. 255 report=336 done=336 kernel: sky2 status report lost? and this is the output from ethtool -S: NIC statistics: tx_bytes: 3092123897 rx_bytes: 546577898 tx_broadcast: 20 rx_broadcast: 4376 tx_multicast: 0 rx_multicast: 459 tx_unicast: 2585993 rx_unicast: 1550758 tx_mac_pause: 1 If this is repeatable... and mac_pause is always one then the problem is hardware flow control. I saw bugs before in the bus interface where it would not resume on unaligned buffer, but that was on receive. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3][BNX2]: Fix minor loopback problem.
From: Michael Chan [EMAIL PROTECTED] Date: Wed, 13 Dec 2006 18:31:19 -0800 [BNX2]: Fix minor loopback problem. Use the configured MAC address instead of the permanent MAC address for loopback frames. Update version to 1.5.2. Signed-off-by: Michael Chan [EMAIL PROTECTED] Also applied, thanks Michael. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3][BNX2]: Fix bug in bnx2_nvram_write().
From: Michael Chan [EMAIL PROTECTED] Date: Wed, 13 Dec 2006 18:30:39 -0800 [BNX2]: Fix bug in bnx2_nvram_write(). Length was not calculated correctly if the NVRAM offset is on a non- aligned offset. Signed-off-by: Michael Chan [EMAIL PROTECTED] Applied, thanks a lot. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3][BNX2]: Fix panic in bnx2_tx_int().
From: Michael Chan [EMAIL PROTECTED] Date: Wed, 13 Dec 2006 18:30:33 -0800 [BNX2]: Fix panic in bnx2_tx_int(). There was an off-by-one bug in bnx2_tx_avail(). If the tx ring is completely full, the producer and consumer indices may be apart by 256 even though the ring size is only 255. One entry in the ring is unused and must be properly accounted for when calculating the number of available entries. The bug caused the tx ring entries to be reused by mistake, overwriting active entries, and ultimately causing it to crash. This bug rarely occurs because the tx ring is rarely completely full. We always stop when there is less than MAX_SKB_FRAGS entries available in the ring. Thanks to Corey Kovacs [EMAIL PROTECTED] and Andy Gospodarek [EMAIL PROTECTED] for reporting the problem and helping to collect debug information. Signed-off-by: Michael Chan [EMAIL PROTECTED] Applied, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dhcpclient netlink bugs (was Re: [NETLINK]: Schedule removal of old macros exported to userspace)
Stefan Rompf [EMAIL PROTECTED] wrote: Yes, the code has quite some trust into the kernel that if it answers the asked question the answer is semantically correct. But to be fair, if you issue a write(), you also expect the number of bytes written in return and not the msec taken ;-) Will fix that and the other stuff you pointed out, thanks! I hope you checked that the message is really from the kernel (based on saddr). Unconnected sockets can receive messages from any user on the host. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch sungem] improved locking
On Tue, 2006-12-12 at 06:49 +0100, Eric Lemoine wrote: On 12/12/06, Benjamin Herrenschmidt [EMAIL PROTECTED] wrote: On Tue, 2006-12-12 at 06:33 +0100, Eric Lemoine wrote: On 12/12/06, David Miller [EMAIL PROTECTED] wrote: [...] Anyways, Eric your changes look fine as far as I can tell, can you give them a really good testing on some SMP boxes? Unfortunately I can't, I don't have the hardware (only an old ibook here). I do however, I'll give it a beating on a dual G5 as soon as I get a chance. I'm pretty swamped at the moment and the box is used by somebody else today. Ok, thanks a lot Benjamin. Patched driver's been running fine for a couple of days nights with constant beating... just those RX MAC fifo overflows every now and then (though they cause no data corruption and no big hit on the driver perfs neither). I suppose still worth investigating when I have a bit of time, I must have done something stupid with the pause settings. In the meantime, Eric's patch is all good. Ben. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.20-rc1 sky2 problems (regression?)
Stephen Hemminger [EMAIL PROTECTED] writes: If this is repeatable... and mac_pause is always one then the problem is hardware flow control. I saw bugs before in the bus interface where it would not resume on unaligned buffer, but that was on receive. i tried to switch over to the latest vendor driver but unfortunately it doesn't work with kernel 2.6.19+. it still uses CHECKSUM_HW which looks like it was replaced by CHECKSUM_PARTIAL and CHECKSUM_COMPLETE was also added. i think i can replace CHECKSUM_HW in the marvell driver with CHECKSUM_PARTIAL, except for a couple of places where i i am not sure what i am supposed to do. the first instance it says (i am kind of paraphrasing here since i am copying from the screen and not cutting and pasting): /** does the HW need to evaluate checksum for TCP or UDP packets? if (pMessage-ip_summed == CHECKSUM_HW) maybe this needs to be replace with CHECKSUM_PARTIAL. the second one /** TCP checksum offload if ((pSKPacket-pMbuf-ip_summed == CHECKSUM_HW) (SetOpcodePacketFlag == SK_TRUE) i wonder if this is supposed to be CHECKSUM_COMPLETE if you have any suggestions, i'll appreciate it. --alex-- -- | I believe the moment is at hand when, by a paranoiac and active | | advance of the mind, it will be possible (simultaneously with | | automatism and other passive states) to systematize confusion | | and thus to help to discredit completely the world of reality. | - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[AX.25 4/7] Fix unchecked nr_add_node uses
Signed-off-by: Ralf Baechle [EMAIL PROTECTED] net/netrom/nr_route.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) Index: linux-net/net/netrom/nr_route.c === --- linux-net.orig/net/netrom/nr_route.c +++ linux-net/net/netrom/nr_route.c @@ -779,9 +779,13 @@ int nr_route_frame(struct sk_buff *skb, nr_src = (ax25_address *)(skb-data + 0); nr_dest = (ax25_address *)(skb-data + 7); - if (ax25 != NULL) - nr_add_node(nr_src, , ax25-dest_addr, ax25-digipeat, - ax25-ax25_dev-dev, 0, sysctl_netrom_obsolescence_count_initialiser); + if (ax25 != NULL) { + ret = nr_add_node(nr_src, , ax25-dest_addr, ax25-digipeat, + ax25-ax25_dev-dev, 0, + sysctl_netrom_obsolescence_count_initialiser); + if (ret) + return ret; + } if ((dev = nr_dev_get(nr_dest)) != NULL) { /* Its for me */ if (ax25 == NULL) /* Its from me */ @@ -846,6 +850,7 @@ int nr_route_frame(struct sk_buff *skb, ret = (nr_neigh-ax25 != NULL); nr_node_unlock(nr_node); nr_node_put(nr_node); + return ret; } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[AX.25 7/7] Fix unchecked rose_add_loopback_neigh uses
rose_add_loopback_neigh uses kmalloc and the callers were ignoring the error value. Rewrite to let the caller deal with the allocation. This allows the use of static allocation of kmalloc use entirely. Signed-off-by: Ralf Baechle [EMAIL PROTECTED] include/net/rose.h |4 ++-- net/rose/rose_loopback.c |5 +++-- net/rose/rose_route.c| 45 + 3 files changed, 26 insertions(+), 28 deletions(-) Index: linux-net/include/net/rose.h === --- linux-net.orig/include/net/rose.h +++ linux-net/include/net/rose.h @@ -188,12 +188,12 @@ extern void rose_kick(struct sock *); extern void rose_enquiry_response(struct sock *); /* rose_route.c */ -extern struct rose_neigh *rose_loopback_neigh; +extern struct rose_neigh rose_loopback_neigh; extern struct file_operations rose_neigh_fops; extern struct file_operations rose_nodes_fops; extern struct file_operations rose_routes_fops; -extern int __must_check rose_add_loopback_neigh(void); +extern void rose_add_loopback_neigh(void); extern int __must_check rose_add_loopback_node(rose_address *); extern void rose_del_loopback_node(rose_address *); extern void rose_rt_device_down(struct net_device *); Index: linux-net/net/rose/rose_loopback.c === --- linux-net.orig/net/rose/rose_loopback.c +++ linux-net/net/rose/rose_loopback.c @@ -79,7 +79,8 @@ static void rose_loopback_timer(unsigned skb-h.raw = skb-data; - if ((sk = rose_find_socket(lci_o, rose_loopback_neigh)) != NULL) { + sk = rose_find_socket(lci_o, rose_loopback_neigh); + if (sk) { if (rose_process_rx_frame(sk, skb) == 0) kfree_skb(skb); continue; @@ -87,7 +88,7 @@ static void rose_loopback_timer(unsigned if (frametype == ROSE_CALL_REQUEST) { if ((dev = rose_dev_get(dest)) != NULL) { - if (rose_rx_call_request(skb, dev, rose_loopback_neigh, lci_o) == 0) + if (rose_rx_call_request(skb, dev, rose_loopback_neigh, lci_o) == 0) kfree_skb(skb); } else { kfree_skb(skb); Index: linux-net/net/rose/rose_route.c === --- linux-net.orig/net/rose/rose_route.c +++ linux-net/net/rose/rose_route.c @@ -46,7 +46,7 @@ static DEFINE_SPINLOCK(rose_neigh_list_l static struct rose_route *rose_route_list; static DEFINE_SPINLOCK(rose_route_list_lock); -struct rose_neigh *rose_loopback_neigh; +struct rose_neigh rose_loopback_neigh; /* * Add a new route to a node, and in the process add the node and the @@ -361,33 +361,30 @@ out: /* * Add the loopback neighbour. */ -int rose_add_loopback_neigh(void) +void rose_add_loopback_neigh(void) { - if ((rose_loopback_neigh = kmalloc(sizeof(struct rose_neigh), GFP_ATOMIC)) == NULL) - return -ENOMEM; + struct rose_neigh *sn = rose_loopback_neigh; - rose_loopback_neigh-callsign = null_ax25_address; - rose_loopback_neigh-digipeat = NULL; - rose_loopback_neigh-ax25 = NULL; - rose_loopback_neigh-dev = NULL; - rose_loopback_neigh-count = 0; - rose_loopback_neigh-use = 0; - rose_loopback_neigh-dce_mode = 1; - rose_loopback_neigh-loopback = 1; - rose_loopback_neigh-number= rose_neigh_no++; - rose_loopback_neigh-restarted = 1; + sn-callsign = null_ax25_address; + sn-digipeat = NULL; + sn-ax25 = NULL; + sn-dev = NULL; + sn-count = 0; + sn-use = 0; + sn-dce_mode = 1; + sn-loopback = 1; + sn-number= rose_neigh_no++; + sn-restarted = 1; - skb_queue_head_init(rose_loopback_neigh-queue); + skb_queue_head_init(sn-queue); - init_timer(rose_loopback_neigh-ftimer); - init_timer(rose_loopback_neigh-t0timer); + init_timer(sn-ftimer); + init_timer(sn-t0timer); spin_lock_bh(rose_neigh_list_lock); - rose_loopback_neigh-next = rose_neigh_list; - rose_neigh_list = rose_loopback_neigh; + sn-next = rose_neigh_list; + rose_neigh_list = sn; spin_unlock_bh(rose_neigh_list_lock); - - return 0; } /* @@ -421,13 +418,13 @@ int rose_add_loopback_node(rose_address rose_node-mask = 10; rose_node-count= 1; rose_node-loopback = 1; - rose_node-neighbour[0] = rose_loopback_neigh; + rose_node-neighbour[0] = rose_loopback_neigh; /* Insert at the head of list. Address is always mask=10 */ rose_node-next = rose_node_list; rose_node_list =
[AX.25 1/7] Mark all kmalloc users __must_check
The recent fix 0506d4068bad834aab1141b5dc5e748eb175c6b3 made obvious that error values were not being propagated through the AX.25 stack. To help with that this patch marks all kmalloc users in the AX.25, NETROM and ROSE stacks as __must_check. Signed-off-by: Ralf Baechle [EMAIL PROTECTED] include/net/ax25.h| 11 ++- include/net/rose.h|4 ++-- net/ax25/af_ax25.c|4 ++-- net/ax25/ax25_route.c |2 +- net/netrom/nr_route.c |8 +--- net/rose/rose_route.c |2 +- 6 files changed, 17 insertions(+), 14 deletions(-) Index: linux-net/include/net/ax25.h === --- linux-net.orig/include/net/ax25.h +++ linux-net/include/net/ax25.h @@ -277,7 +277,7 @@ struct sock *ax25_get_socket(ax25_addres extern ax25_cb *ax25_find_cb(ax25_address *, ax25_address *, ax25_digi *, struct net_device *); extern void ax25_send_to_raw(ax25_address *, struct sk_buff *, int); extern void ax25_destroy_socket(ax25_cb *); -extern ax25_cb *ax25_create_cb(void); +extern ax25_cb * __must_check ax25_create_cb(void); extern void ax25_fillin_cb(ax25_cb *, ax25_dev *); extern struct sock *ax25_make_new(struct sock *, struct ax25_dev *); @@ -333,11 +333,12 @@ extern void ax25_ds_t3timer_expiry(ax25_ extern void ax25_ds_idletimer_expiry(ax25_cb *); /* ax25_iface.c */ -extern int ax25_protocol_register(unsigned int, int (*)(struct sk_buff *, ax25_cb *)); +extern int __must_check ax25_protocol_register(unsigned int, int (*)(struct sk_buff *, ax25_cb *)); extern void ax25_protocol_release(unsigned int); -extern int ax25_linkfail_register(void (*)(ax25_cb *, int)); +extern int __must_check ax25_linkfail_register(void (*)(ax25_cb *, int)); extern void ax25_linkfail_release(void (*)(ax25_cb *, int)); -extern int ax25_listen_register(ax25_address *, struct net_device *); +extern int __must_check ax25_listen_register(ax25_address *, + struct net_device *); extern void ax25_listen_release(ax25_address *, struct net_device *); extern int (*ax25_protocol_function(unsigned int))(struct sk_buff *, ax25_cb *); extern int ax25_listen_mine(ax25_address *, struct net_device *); @@ -415,7 +416,7 @@ extern unsigned long ax25_display_timer( /* ax25_uid.c */ extern int ax25_uid_policy; extern ax25_uid_assoc *ax25_findbyuid(uid_t); -extern int ax25_uid_ioctl(int, struct sockaddr_ax25 *); +extern int __must_check ax25_uid_ioctl(int, struct sockaddr_ax25 *); extern struct file_operations ax25_uid_fops; extern void ax25_uid_free(void); Index: linux-net/include/net/rose.h === --- linux-net.orig/include/net/rose.h +++ linux-net/include/net/rose.h @@ -193,8 +193,8 @@ extern struct file_operations rose_neigh extern struct file_operations rose_nodes_fops; extern struct file_operations rose_routes_fops; -extern int rose_add_loopback_neigh(void); -extern int rose_add_loopback_node(rose_address *); +extern int __must_check rose_add_loopback_neigh(void); +extern int __must_check rose_add_loopback_node(rose_address *); extern void rose_del_loopback_node(rose_address *); extern void rose_rt_device_down(struct net_device *); extern void rose_link_device_down(struct net_device *); Index: linux-net/net/ax25/af_ax25.c === --- linux-net.orig/net/ax25/af_ax25.c +++ linux-net/net/ax25/af_ax25.c @@ -1088,8 +1088,8 @@ out: /* * FIXME: nonblock behaviour looks like it may have a bug. */ -static int ax25_connect(struct socket *sock, struct sockaddr *uaddr, - int addr_len, int flags) +static int __must_check ax25_connect(struct socket *sock, + struct sockaddr *uaddr, int addr_len, int flags) { struct sock *sk = sock-sk; ax25_cb *ax25 = ax25_sk(sk), *ax25t; Index: linux-net/net/ax25/ax25_route.c === --- linux-net.orig/net/ax25/ax25_route.c +++ linux-net/net/ax25/ax25_route.c @@ -71,7 +71,7 @@ void ax25_rt_device_down(struct net_devi write_unlock(ax25_route_lock); } -static int ax25_rt_add(struct ax25_routes_struct *route) +static int __must_check ax25_rt_add(struct ax25_routes_struct *route) { ax25_route *ax25_rt; ax25_dev *ax25_dev; Index: linux-net/net/rose/rose_route.c === --- linux-net.orig/net/rose/rose_route.c +++ linux-net/net/rose/rose_route.c @@ -52,7 +52,7 @@ struct rose_neigh *rose_loopback_neigh; * Add a new route to a node, and in the process add the node and the * neighbour if it is new. */ -static int rose_add_node(struct rose_route_struct *rose_route, +static int __must_check rose_add_node(struct rose_route_struct *rose_route, struct net_device *dev) { struct rose_node *rose_node, *rose_tmpn, *rose_tmpp; Index: linux-net/net/netrom/nr_route.c
[AX.25 2/7] Fix unchecked ax25_protocol_register uses.
Replace ax25_protocol_register by ax25_register_pid which assumes the caller has done the memory allocation. This allows replacing the kmalloc allocations entirely by static allocations. Signed-off-by: Ralf Baechle [EMAIL PROTECTED] include/net/ax25.h |9 - net/ax25/ax25_iface.c | 41 - net/netrom/af_netrom.c |7 ++- net/rose/af_rose.c |7 ++- 4 files changed, 32 insertions(+), 32 deletions(-) Index: linux-net/include/net/ax25.h === --- linux-net.orig/include/net/ax25.h +++ linux-net/include/net/ax25.h @@ -333,7 +333,14 @@ extern void ax25_ds_t3timer_expiry(ax25_ extern void ax25_ds_idletimer_expiry(ax25_cb *); /* ax25_iface.c */ -extern int __must_check ax25_protocol_register(unsigned int, int (*)(struct sk_buff *, ax25_cb *)); + +struct ax25_protocol { + struct ax25_protocol *next; + unsigned int pid; + int (*func)(struct sk_buff *, ax25_cb *); +}; + +extern void ax25_register_pid(struct ax25_protocol *ap); extern void ax25_protocol_release(unsigned int); extern int __must_check ax25_linkfail_register(void (*)(ax25_cb *, int)); extern void ax25_linkfail_release(void (*)(ax25_cb *, int)); Index: linux-net/net/ax25/ax25_iface.c === --- linux-net.orig/net/ax25/ax25_iface.c +++ linux-net/net/ax25/ax25_iface.c @@ -29,11 +29,7 @@ #include linux/mm.h #include linux/interrupt.h -static struct protocol_struct { - struct protocol_struct *next; - unsigned int pid; - int (*func)(struct sk_buff *, ax25_cb *); -} *protocol_list = NULL; +static struct ax25_protocol *protocol_list; static DEFINE_RWLOCK(protocol_list_lock); static struct linkfail_struct { @@ -49,36 +45,23 @@ static struct listen_struct { } *listen_list = NULL; static DEFINE_SPINLOCK(listen_lock); -int ax25_protocol_register(unsigned int pid, - int (*func)(struct sk_buff *, ax25_cb *)) +/* + * Do not register the internal protocols AX25_P_TEXT, AX25_P_SEGMENT, + * AX25_P_IP or AX25_P_ARP ... + */ +void ax25_register_pid(struct ax25_protocol *ap) { - struct protocol_struct *protocol; - - if (pid == AX25_P_TEXT || pid == AX25_P_SEGMENT) - return 0; -#ifdef CONFIG_INET - if (pid == AX25_P_IP || pid == AX25_P_ARP) - return 0; -#endif - if ((protocol = kmalloc(sizeof(*protocol), GFP_ATOMIC)) == NULL) - return 0; - - protocol-pid = pid; - protocol-func = func; - write_lock_bh(protocol_list_lock); - protocol-next = protocol_list; - protocol_list = protocol; + ap-next = protocol_list; + protocol_list = ap; write_unlock_bh(protocol_list_lock); - - return 1; } -EXPORT_SYMBOL(ax25_protocol_register); +EXPORT_SYMBOL_GPL(ax25_register_pid); void ax25_protocol_release(unsigned int pid) { - struct protocol_struct *s, *protocol; + struct ax25_protocol *s, *protocol; write_lock_bh(protocol_list_lock); protocol = protocol_list; @@ -223,7 +206,7 @@ EXPORT_SYMBOL(ax25_listen_release); int (*ax25_protocol_function(unsigned int pid))(struct sk_buff *, ax25_cb *) { int (*res)(struct sk_buff *, ax25_cb *) = NULL; - struct protocol_struct *protocol; + struct ax25_protocol *protocol; read_lock(protocol_list_lock); for (protocol = protocol_list; protocol != NULL; protocol = protocol-next) @@ -263,7 +246,7 @@ void ax25_link_failed(ax25_cb *ax25, int int ax25_protocol_is_registered(unsigned int pid) { - struct protocol_struct *protocol; + struct ax25_protocol *protocol; int res = 0; read_lock_bh(protocol_list_lock); Index: linux-net/net/netrom/af_netrom.c === --- linux-net.orig/net/netrom/af_netrom.c +++ linux-net/net/netrom/af_netrom.c @@ -1377,6 +1377,11 @@ static struct notifier_block nr_dev_noti static struct net_device **dev_nr; +static struct ax25_protocol nr_pid = { + .pid= AX25_P_NETROM, + .func = nr_route_frame +}; + static int __init nr_proto_init(void) { int i; @@ -1424,7 +1429,7 @@ static int __init nr_proto_init(void) register_netdevice_notifier(nr_dev_notifier); - ax25_protocol_register(AX25_P_NETROM, nr_route_frame); + ax25_register_pid(nr_pid); ax25_linkfail_register(nr_link_failed); #ifdef CONFIG_SYSCTL Index: linux-net/net/rose/af_rose.c === --- linux-net.orig/net/rose/af_rose.c +++ linux-net/net/rose/af_rose.c @@ -1481,6 +1481,11 @@ static struct notifier_block rose_dev_no static struct net_device **dev_rose; +static struct ax25_protocol rose_pid = { + .pid= AX25_P_ROSE, + .func = rose_route_frame +}; + static int __init
Re: [AX.25 3/7] Fix unchecked ax25_listen_register uses
From: Ralf Baechle [EMAIL PROTECTED] Date: Thu, 14 Dec 2006 23:42:09 +0100 Fix ax25_listen_register to return something that's a sane error code, then all callers to use it. Signed-off-by: Ralf Baechle [EMAIL PROTECTED] Applied. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AX.25 4/7] Fix unchecked nr_add_node uses
From: Ralf Baechle [EMAIL PROTECTED] Date: Thu, 14 Dec 2006 23:42:10 +0100 Signed-off-by: Ralf Baechle [EMAIL PROTECTED] Applied. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AX.25 5/7] Fix unchecked ax25_linkfail_register uses
From: Ralf Baechle [EMAIL PROTECTED] Date: Thu, 14 Dec 2006 23:42:11 +0100 ax25_linkfail_register uses kmalloc and the callers were ignoring the error value. Rewrite to let the caller deal with the allocation. This allows the use of static allocation of kmalloc use entirely. Signed-off-by: Ralf Baechle [EMAIL PROTECTED] Applied. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AX.25 2/7] Fix unchecked ax25_protocol_register uses.
From: Ralf Baechle [EMAIL PROTECTED] Date: Thu, 14 Dec 2006 23:42:08 +0100 Replace ax25_protocol_register by ax25_register_pid which assumes the caller has done the memory allocation. This allows replacing the kmalloc allocations entirely by static allocations. Signed-off-by: Ralf Baechle [EMAIL PROTECTED] Applied. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AX.25 6/7] Fix unchecked rose_add_loopback_node uses
From: Ralf Baechle [EMAIL PROTECTED] Date: Thu, 14 Dec 2006 23:42:12 +0100 Signed-off-by: Ralf Baechle [EMAIL PROTECTED] Applied. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 00/13] 2.6.20 Chelsio T3 RDMA Driver
Roland, I think this is ready to go once the ethernet driver is pulled in. Version 4 changes: - Cleaned up spacing in the Kconfig file - Remove locking.txt file - its not needed - Remove -O1 from the debug config option - BugFix: support new LLD interface for dual-port adapters Version 3 changes: - BugFix: Don't use mutex inside of the mmap function. - BugFix: Move QP to TERMINATE when TERMINATE AE is processed - Support the new work queue design - Merged up to linus's tree as of 12/8/2006 - Misc nits Version 2 changes: - Make code sparse endian clean - Use IDRs for mapping QP and CQ IDs to structure pointers instead of arrays - Clean up confusing bitfields - Use random32() instead of local random function - Use krefs to track endpoint reference counts - Misc nits - The following series implements the Chelsio T3 iWARP/RDMA Driver to be considered for inclusion in 2.6.20. It depends on the Chelsio T3 Ethernet driver which is also under review now for 2.6.20. The latest Chelsio T3 Ethernet driver patch can be pulled from: http://service.chelsio.com/kernel.org/cxgb3.patch.bz2 A complete GIT kernel tree with all the T3 drivers can be pulled from: git://staging.openfabrics.org/~swise/cxgb3.git Thanks, Steve. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 03/13] Provider Methods and Data Structures
Provider methods to support the Linux RDMA verbs. Signed-off-by: Steve Wise [EMAIL PROTECTED] --- drivers/infiniband/hw/cxgb3/iwch_provider.c | 1171 +++ drivers/infiniband/hw/cxgb3/iwch_provider.h | 363 drivers/infiniband/hw/cxgb3/iwch_user.h | 68 ++ 3 files changed, 1602 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c new file mode 100644 index 000..e9721b1 --- /dev/null +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -0,0 +1,1171 @@ +/* + * Copyright (c) 2006 Chelsio, Inc. All rights reserved. + * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#include linux/module.h +#include linux/moduleparam.h +#include linux/device.h +#include linux/netdevice.h +#include linux/etherdevice.h +#include linux/delay.h +#include linux/errno.h +#include linux/list.h +#include linux/spinlock.h +#include linux/ethtool.h + +#include asm/io.h +#include asm/irq.h +#include asm/byteorder.h + +#include rdma/iw_cm.h +#include rdma/ib_verbs.h +#include rdma/ib_smi.h +#include rdma/ib_user_verbs.h + +#include cxio_hal.h +#include iwch.h +#include iwch_provider.h +#include iwch_cm.h +#include iwch_user.h + +static int iwch_modify_port(struct ib_device *ibdev, + u8 port, int port_modify_mask, + struct ib_port_modify *props) +{ + return -ENOSYS; +} + +static struct ib_ah *iwch_ah_create(struct ib_pd *pd, + struct ib_ah_attr *ah_attr) +{ + return ERR_PTR(-ENOSYS); +} + +static int iwch_ah_destroy(struct ib_ah *ah) +{ + return -ENOSYS; +} + +static int iwch_multicast_attach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) +{ + return -ENOSYS; +} + +static int iwch_multicast_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) +{ + return -ENOSYS; +} + +static int iwch_process_mad(struct ib_device *ibdev, + int mad_flags, + u8 port_num, + struct ib_wc *in_wc, + struct ib_grh *in_grh, + struct ib_mad *in_mad, struct ib_mad *out_mad) +{ + return -ENOSYS; +} + +static int iwch_dealloc_ucontext(struct ib_ucontext *context) +{ + struct iwch_dev *rhp = to_iwch_dev(context-device); + struct iwch_ucontext *ucontext = to_iwch_ucontext(context); + PDBG(%s context %p\n, __FUNCTION__, context); + cxio_release_ucontext(rhp-rdev, ucontext-uctx); + kfree(ucontext); + return 0; +} + +static struct ib_ucontext *iwch_alloc_ucontext(struct ib_device *ibdev, + struct ib_udata *udata) +{ + struct iwch_ucontext *context; + struct iwch_dev *rhp = to_iwch_dev(ibdev); + + PDBG(%s ibdev %p\n, __FUNCTION__, ibdev); + context = kmalloc(sizeof(*context), GFP_KERNEL); + if (!context) + return ERR_PTR(-ENOMEM); + cxio_init_ucontext(rhp-rdev, context-uctx); + INIT_LIST_HEAD(context-mmaps); + spin_lock_init(context-mmap_lock); + return context-ibucontext; +} + +static int iwch_destroy_cq(struct ib_cq *ib_cq) +{ + struct iwch_cq *chp; + + PDBG(%s ib_cq %p\n, __FUNCTION__, ib_cq); + chp = to_iwch_cq(ib_cq); + + remove_handle(chp-rhp, chp-rhp-cqidr, chp-cq.cqid); + atomic_dec(chp-refcnt); + wait_event(chp-wait, !atomic_read(chp-refcnt)); + + cxio_destroy_cq(chp-rhp-rdev, chp-cq); + kfree(chp); + return 0;
[PATCH v4 09/13] Core WQE/CQE Types
T3 WQE and CQE structures, defines, etc... Signed-off-by: Steve Wise [EMAIL PROTECTED] --- drivers/infiniband/hw/cxgb3/core/cxio_wr.h | 685 1 files changed, 685 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_wr.h b/drivers/infiniband/hw/cxgb3/core/cxio_wr.h new file mode 100644 index 000..45870be --- /dev/null +++ b/drivers/infiniband/hw/cxgb3/core/cxio_wr.h @@ -0,0 +1,685 @@ +/* + * Copyright (c) 2006 Chelsio, Inc. All rights reserved. + * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#ifndef __CXIO_WR_H__ +#define __CXIO_WR_H__ + +#include asm/io.h +#include linux/pci.h +#include linux/timer.h +#include firmware_exports.h + +#define T3_MAX_SGE 4 + +#define Q_EMPTY(rptr,wptr) ((rptr)==(wptr)) +#define Q_FULL(rptr,wptr,size_log2) ( (((wptr)-(rptr))(size_log2)) \ + ((rptr)!=(wptr)) ) +#define Q_GENBIT(ptr,size_log2) (!(((ptr)size_log2)0x1)) +#define Q_FREECNT(rptr,wptr,size_log2) ((1ULsize_log2)-((wptr)-(rptr))) +#define Q_COUNT(rptr,wptr) ((wptr)-(rptr)) +#define Q_PTR2IDX(ptr,size_log2) (ptr ((1ULsize_log2)-1)) + +static inline void ring_doorbell(void __iomem *doorbell, u32 qpid) +{ + writel(((131) | qpid), doorbell); +} + +#define SEQ32_GE(x,y) (!( (((u32) (x)) - ((u32) (y))) 0x8000 )) + +enum t3_wr_flags { + T3_COMPLETION_FLAG = 0x01, + T3_NOTIFY_FLAG = 0x02, + T3_SOLICITED_EVENT_FLAG = 0x04, + T3_READ_FENCE_FLAG = 0x08, + T3_LOCAL_FENCE_FLAG = 0x10 +} __attribute__ ((packed)); + +enum t3_wr_opcode { + T3_WR_BP = FW_WROPCODE_RI_BYPASS, + T3_WR_SEND = FW_WROPCODE_RI_SEND, + T3_WR_WRITE = FW_WROPCODE_RI_RDMA_WRITE, + T3_WR_READ = FW_WROPCODE_RI_RDMA_READ, + T3_WR_INV_STAG = FW_WROPCODE_RI_LOCAL_INV, + T3_WR_BIND = FW_WROPCODE_RI_BIND_MW, + T3_WR_RCV = FW_WROPCODE_RI_RECEIVE, + T3_WR_INIT = FW_WROPCODE_RI_RDMA_INIT, + T3_WR_QP_MOD = FW_WROPCODE_RI_MODIFY_QP +} __attribute__ ((packed)); + +enum t3_rdma_opcode { + T3_RDMA_WRITE, /* IETF RDMAP v1.0 ... */ + T3_READ_REQ, + T3_READ_RESP, + T3_SEND, + T3_SEND_WITH_INV, + T3_SEND_WITH_SE, + T3_SEND_WITH_SE_INV, + T3_TERMINATE, + T3_RDMA_INIT, /* CHELSIO RI specific ... */ + T3_BIND_MW, + T3_FAST_REGISTER, + T3_LOCAL_INV, + T3_QP_MOD, + T3_BYPASS +} __attribute__ ((packed)); + +static inline enum t3_rdma_opcode wr2opcode(enum t3_wr_opcode wrop) +{ + switch (wrop) { + case T3_WR_BP: return T3_BYPASS; + case T3_WR_SEND: return T3_SEND; + case T3_WR_WRITE: return T3_RDMA_WRITE; + case T3_WR_READ: return T3_READ_REQ; + case T3_WR_INV_STAG: return T3_LOCAL_INV; + case T3_WR_BIND: return T3_BIND_MW; + case T3_WR_INIT: return T3_RDMA_INIT; + case T3_WR_QP_MOD: return T3_QP_MOD; + default: break; + } + return -1; +} + + +/* Work request id */ +union t3_wrid { + struct { + u32 hi; + u32 low; + } id0; + u64 id1; +}; + +#define WRID(wrid) (wrid.id1) +#define WRID_GEN(wrid) (wrid.id0.wr_gen) +#define WRID_IDX(wrid) (wrid.id0.wr_idx) +#define WRID_LO(wrid) (wrid.id0.wr_lo) + +struct fw_riwrh { + __be32 op_seop_flags; + __be32 gen_tid_len; +}; + +#define S_FW_RIWR_OP 24 +#define M_FW_RIWR_OP 0xff +#define V_FW_RIWR_OP(x)
[PATCH v4 02/13] Device Discovery and ULLD Linkage
Code to discover all the T3 devices and register them with the T3 RDMA Core and the Linux RDMA Core. Signed-off-by: Steve Wise [EMAIL PROTECTED] --- drivers/infiniband/hw/cxgb3/iwch.c | 189 drivers/infiniband/hw/cxgb3/iwch.h | 175 + 2 files changed, 364 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch.c b/drivers/infiniband/hw/cxgb3/iwch.c new file mode 100644 index 000..acbe449 --- /dev/null +++ b/drivers/infiniband/hw/cxgb3/iwch.c @@ -0,0 +1,189 @@ +/* + * Copyright (c) 2006 Chelsio, Inc. All rights reserved. + * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#include linux/module.h +#include linux/moduleparam.h + +#include rdma/ib_verbs.h + +#include cxgb3_offload.h +#include iwch_provider.h +#include iwch_user.h +#include iwch.h +#include iwch_cm.h + +#define DRV_VERSION 1.1 + +MODULE_AUTHOR(Boyd Faulkner, Steve Wise); +MODULE_DESCRIPTION(Chelsio T3 RDMA Driver); +MODULE_LICENSE(Dual BSD/GPL); +MODULE_VERSION(DRV_VERSION); + +cxgb3_cpl_handler_func t3c_handlers[NUM_CPL_CMDS]; + +static void open_rnic_dev(struct t3cdev *); +static void close_rnic_dev(struct t3cdev *); + +struct cxgb3_client t3c_client = { + .name = iw_cxgb3, + .add = open_rnic_dev, + .remove = close_rnic_dev, + .handlers = t3c_handlers, + .redirect = iwch_ep_redirect +}; + +static LIST_HEAD(dev_list); +static DEFINE_MUTEX(dev_mutex); + +static void rnic_init(struct iwch_dev *rnicp) +{ + PDBG(%s iwch_dev %p\n, __FUNCTION__, rnicp); + idr_init(rnicp-cqidr); + idr_init(rnicp-qpidr); + idr_init(rnicp-mmidr); + spin_lock_init(rnicp-lock); + + rnicp-attr.vendor_id = 0x168; + rnicp-attr.vendor_part_id = 7; + rnicp-attr.max_qps = T3_MAX_NUM_QP - 32; + rnicp-attr.max_wrs = (1UL 24) - 1; + rnicp-attr.max_sge_per_wr = T3_MAX_SGE; + rnicp-attr.max_sge_per_rdma_write_wr = T3_MAX_SGE; + rnicp-attr.max_cqs = T3_MAX_NUM_CQ - 1; + rnicp-attr.max_cqes_per_cq = (1UL 24) - 1; + rnicp-attr.max_mem_regs = cxio_num_stags(rnicp-rdev); + rnicp-attr.max_phys_buf_entries = T3_MAX_PBL_SIZE; + rnicp-attr.max_pds = T3_MAX_NUM_PD - 1; + rnicp-attr.mem_pgsizes_bitmask = 0x7FFF; /* 4KB-128MB */ + rnicp-attr.can_resize_wq = 0; + rnicp-attr.max_rdma_reads_per_qp = 8; + rnicp-attr.max_rdma_read_resources = + rnicp-attr.max_rdma_reads_per_qp * rnicp-attr.max_qps; + rnicp-attr.max_rdma_read_qp_depth = 8; /* IRD */ + rnicp-attr.max_rdma_read_depth = + rnicp-attr.max_rdma_read_qp_depth * rnicp-attr.max_qps; + rnicp-attr.rq_overflow_handled = 0; + rnicp-attr.can_modify_ird = 0; + rnicp-attr.can_modify_ord = 0; + rnicp-attr.max_mem_windows = rnicp-attr.max_mem_regs - 1; + rnicp-attr.stag0_value = 1; + rnicp-attr.zbva_support = 1; + rnicp-attr.local_invalidate_fence = 1; + rnicp-attr.cq_overflow_detection = 1; + return; +} + +static void open_rnic_dev(struct t3cdev *tdev) +{ + struct iwch_dev *rnicp; + static int vers_printed; + + PDBG(%s t3cdev %p\n, __FUNCTION__, tdev); + if (!vers_printed++) + printk(KERN_INFO MOD Chelsio T3 RDMA Driver - version %s\n, + DRV_VERSION); + rnicp = (struct iwch_dev *)ib_alloc_device(sizeof(*rnicp)); + if (!rnicp) { + printk(KERN_ERR MOD Cannot allocate ib device\n); + return; + } + rnicp-rdev.ulp =
[PATCH v4 13/13] Kconfig/Makefile
Signed-off-by: Steve Wise [EMAIL PROTECTED] --- drivers/infiniband/Kconfig |1 + drivers/infiniband/Makefile |1 + drivers/infiniband/hw/cxgb3/Kconfig | 27 +++ drivers/infiniband/hw/cxgb3/Makefile | 12 4 files changed, 41 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig index 59b3932..06453ab 100644 --- a/drivers/infiniband/Kconfig +++ b/drivers/infiniband/Kconfig @@ -38,6 +38,7 @@ source drivers/infiniband/hw/mthca/Kcon source drivers/infiniband/hw/ipath/Kconfig source drivers/infiniband/hw/ehca/Kconfig source drivers/infiniband/hw/amso1100/Kconfig +source drivers/infiniband/hw/cxgb3/Kconfig source drivers/infiniband/ulp/ipoib/Kconfig diff --git a/drivers/infiniband/Makefile b/drivers/infiniband/Makefile index 570b30a..69bdd55 100644 --- a/drivers/infiniband/Makefile +++ b/drivers/infiniband/Makefile @@ -3,6 +3,7 @@ obj-$(CONFIG_INFINIBAND_MTHCA) += hw/mt obj-$(CONFIG_INFINIBAND_IPATH) += hw/ipath/ obj-$(CONFIG_INFINIBAND_EHCA) += hw/ehca/ obj-$(CONFIG_INFINIBAND_AMSO1100) += hw/amso1100/ +obj-$(CONFIG_INFINIBAND_CXGB3) += hw/cxgb3/ obj-$(CONFIG_INFINIBAND_IPOIB) += ulp/ipoib/ obj-$(CONFIG_INFINIBAND_SRP) += ulp/srp/ obj-$(CONFIG_INFINIBAND_ISER) += ulp/iser/ diff --git a/drivers/infiniband/hw/cxgb3/Kconfig b/drivers/infiniband/hw/cxgb3/Kconfig new file mode 100644 index 000..d3db264 --- /dev/null +++ b/drivers/infiniband/hw/cxgb3/Kconfig @@ -0,0 +1,27 @@ +config INFINIBAND_CXGB3 + tristate Chelsio RDMA Driver + depends on CHELSIO_T3 INFINIBAND + select GENERIC_ALLOCATOR + ---help--- + This is an iWARP/RDMA driver for the Chelsio T3 1GbE and + 10GbE adapters. + + For general information about Chelsio and our products, visit + our website at http://www.chelsio.com. + + For customer support, please visit our customer support page at + http://www.chelsio.com/support.htm. + + Please send feedback to [EMAIL PROTECTED]. + + To compile this driver as a module, choose M here: the module + will be called iw_cxgb3. + +config INFINIBAND_CXGB3_DEBUG + bool Verbose debugging output + depends on INFINIBAND_CXGB3 + default n + ---help--- + This option causes the Chelsio RDMA driver to produce copious + amounts of debug messages. Select this if you are developing + the driver or trying to diagnose a problem. diff --git a/drivers/infiniband/hw/cxgb3/Makefile b/drivers/infiniband/hw/cxgb3/Makefile new file mode 100644 index 000..7a89f6d --- /dev/null +++ b/drivers/infiniband/hw/cxgb3/Makefile @@ -0,0 +1,12 @@ +EXTRA_CFLAGS += -I$(TOPDIR)/drivers/net/cxgb3 \ + -I$(TOPDIR)/drivers/infiniband/hw/cxgb3/core + +obj-$(CONFIG_INFINIBAND_CXGB3) += iw_cxgb3.o + +iw_cxgb3-y := iwch_cm.o iwch_ev.o iwch_cq.o iwch_qp.o iwch_mem.o \ + iwch_provider.o iwch.o core/cxio_hal.o core/cxio_resource.o + +ifdef CONFIG_INFINIBAND_CXGB3_DEBUG +EXTRA_CFLAGS += -DDEBUG -g +iw_cxgb3-y += core/cxio_dbg.o +endif - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 05/13] Queue Pairs
Code to manipulate the QP. Signed-off-by: Steve Wise [EMAIL PROTECTED] --- drivers/infiniband/hw/cxgb3/iwch_qp.c | 1007 + 1 files changed, 1007 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c new file mode 100644 index 000..9f6b251 --- /dev/null +++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c @@ -0,0 +1,1007 @@ +/* + * Copyright (c) 2006 Chelsio, Inc. All rights reserved. + * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#include iwch_provider.h +#include iwch.h +#include iwch_cm.h +#include cxio_hal.h + +#define NO_SUPPORT -1 + +static inline int iwch_build_rdma_send(union t3_wr *wqe, struct ib_send_wr *wr, + u8 * flit_cnt) +{ + int i; + u32 plen; + + switch (wr-opcode) { + case IB_WR_SEND: + case IB_WR_SEND_WITH_IMM: + if (wr-send_flags IB_SEND_SOLICITED) + wqe-send.rdmaop = T3_SEND_WITH_SE; + else + wqe-send.rdmaop = T3_SEND; + wqe-send.rem_stag = 0; + break; +#if 0 /* Not currently supported */ + case TYPE_SEND_INVALIDATE: + case TYPE_SEND_INVALIDATE_IMMEDIATE: + wqe-send.rdmaop = T3_SEND_WITH_INV; + wqe-send.rem_stag = cpu_to_be32(wr-wr.rdma.rkey); + break; + case TYPE_SEND_SE_INVALIDATE: + wqe-send.rdmaop = T3_SEND_WITH_SE_INV; + wqe-send.rem_stag = cpu_to_be32(wr-wr.rdma.rkey); + break; +#endif + default: + break; + } + if (wr-num_sge T3_MAX_SGE) + return -EINVAL; + wqe-send.reserved[0] = 0; + wqe-send.reserved[1] = 0; + wqe-send.reserved[2] = 0; + if (wr-opcode == IB_WR_SEND_WITH_IMM) { + plen = 4; + wqe-send.sgl[0].stag = wr-imm_data; + wqe-send.sgl[0].len = __constant_cpu_to_be32(0); + wqe-send.num_sgle = __constant_cpu_to_be32(0); + *flit_cnt = 5; + } else { + plen = 0; + for (i = 0; i wr-num_sge; i++) { + if ((plen + wr-sg_list[i].length) plen) { + return -EMSGSIZE; + } + plen += wr-sg_list[i].length; + wqe-send.sgl[i].stag = + cpu_to_be32(wr-sg_list[i].lkey); + wqe-send.sgl[i].len = + cpu_to_be32(wr-sg_list[i].length); + wqe-send.sgl[i].to = cpu_to_be64(wr-sg_list[i].addr); + } + wqe-send.num_sgle = cpu_to_be32(wr-num_sge); + *flit_cnt = 4 + ((wr-num_sge) 1); + } + wqe-send.plen = cpu_to_be32(plen); + return 0; +} + +static inline int iwch_build_rdma_write(union t3_wr *wqe, struct ib_send_wr *wr, + u8 *flit_cnt) +{ + int i; + u32 plen; + if (wr-num_sge T3_MAX_SGE) + return -EINVAL; + wqe-write.rdmaop = T3_RDMA_WRITE; + wqe-write.reserved[0] = 0; + wqe-write.reserved[1] = 0; + wqe-write.reserved[2] = 0; + wqe-write.stag_sink = cpu_to_be32(wr-wr.rdma.rkey); + wqe-write.to_sink = cpu_to_be64(wr-wr.rdma.remote_addr); + + if (wr-opcode == IB_WR_RDMA_WRITE_WITH_IMM) { + plen = 4; + wqe-write.sgl[0].stag = wr-imm_data; +
[PATCH v4 07/13] Async Event Handler
Code to handle async events coming from the T3 RDMA Core. Signed-off-by: Steve Wise [EMAIL PROTECTED] --- drivers/infiniband/hw/cxgb3/iwch_ev.c | 231 + 1 files changed, 231 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_ev.c b/drivers/infiniband/hw/cxgb3/iwch_ev.c new file mode 100644 index 000..b0bd014 --- /dev/null +++ b/drivers/infiniband/hw/cxgb3/iwch_ev.c @@ -0,0 +1,231 @@ +/* + * Copyright (c) 2006 Chelsio, Inc. All rights reserved. + * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#include linux/slab.h +#include linux/mman.h +#include net/sock.h +#include iwch_provider.h +#include iwch.h +#include iwch_cm.h +#include cxio_hal.h +#include cxio_wr.h + +static void post_qp_event(struct iwch_dev *rnicp, struct iwch_cq *chp, + struct respQ_msg_t *rsp_msg, + enum ib_event_type ib_event, + int send_term) +{ + struct ib_event event; + struct iwch_qp_attributes attrs; + struct iwch_qp *qhp; + + printk(KERN_ERR %s - AE qpid 0x%x opcode %d status 0x%x + type %d wrid.hi 0x%x wrid.lo 0x%x \n, __FUNCTION__, + CQE_QPID(rsp_msg-cqe), CQE_OPCODE(rsp_msg-cqe), + CQE_STATUS(rsp_msg-cqe), CQE_TYPE(rsp_msg-cqe), + CQE_WRID_HI(rsp_msg-cqe), CQE_WRID_LOW(rsp_msg-cqe)); + + spin_lock(rnicp-lock); + qhp = get_qhp(rnicp, CQE_QPID(rsp_msg-cqe)); + + if (!qhp) { + printk(KERN_ERR %s unaffiliated error 0x%x qpid 0x%x\n, + __FUNCTION__, CQE_STATUS(rsp_msg-cqe), + CQE_QPID(rsp_msg-cqe)); + spin_unlock(rnicp-lock); + return; + } + + if ((qhp-attr.state == IWCH_QP_STATE_ERROR) || + (qhp-attr.state == IWCH_QP_STATE_TERMINATE)) { + PDBG(%s AE received after RTS - +qp state %d qpid 0x%x status 0x%x\n, __FUNCTION__, +qhp-attr.state, qhp-wq.qpid, CQE_STATUS(rsp_msg-cqe)); + spin_unlock(rnicp-lock); + return; + } + + atomic_inc(qhp-refcnt); + spin_unlock(rnicp-lock); + + event.event = ib_event; + event.device = chp-ibcq.device; + if (ib_event == IB_EVENT_CQ_ERR) + event.element.cq = chp-ibcq; + else + event.element.qp = qhp-ibqp; + + if (qhp-ibqp.event_handler) + (*qhp-ibqp.event_handler)(event, qhp-ibqp.qp_context); + + if (qhp-attr.state == IWCH_QP_STATE_RTS) { + attrs.next_state = IWCH_QP_STATE_TERMINATE; + iwch_modify_qp(qhp-rhp, qhp, IWCH_QP_ATTR_NEXT_STATE, + attrs, 1); + if (send_term) + iwch_post_terminate(qhp, rsp_msg); + } + + if (atomic_dec_and_test(qhp-refcnt)) + wake_up(qhp-wait); +} + +void iwch_ev_dispatch(struct cxio_rdev *rdev_p, struct sk_buff *skb) +{ + struct iwch_dev *rnicp; + struct respQ_msg_t *rsp_msg = (struct respQ_msg_t *) skb-data; + struct iwch_cq *chp; + struct iwch_qp *qhp; + u32 cqid = RSPQ_CQID(rsp_msg); + + rnicp = (struct iwch_dev *) rdev_p-ulp; + spin_lock(rnicp-lock); + chp = get_chp(rnicp, cqid); + qhp = get_qhp(rnicp, CQE_QPID(rsp_msg-cqe)); + if (!chp || !qhp) { + printk(KERN_ERR MOD BAD AE cqid 0x%x qpid 0x%x opcode %d + status 0x%x type %d wrid.hi 0x%x wrid.lo 0x%x \n, +
[PATCH v4 10/13] Core HAL
The RDMA Core interfaces with the T3 HW and ULLD providing a low level RDMA interface. Signed-off-by: Steve Wise [EMAIL PROTECTED] --- drivers/infiniband/hw/cxgb3/core/cxio_hal.c | 1302 +++ drivers/infiniband/hw/cxgb3/core/cxio_hal.h | 201 2 files changed, 1503 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c new file mode 100644 index 000..ffc4ec0 --- /dev/null +++ b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c @@ -0,0 +1,1302 @@ +/* + * Copyright (c) 2006 Chelsio, Inc. All rights reserved. + * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#include asm/semaphore.h +#include asm/delay.h + +#include linux/netdevice.h +#include linux/sched.h +#include linux/spinlock.h +#include linux/pci.h + +#include cxio_resource.h +#include cxio_hal.h +#include cxgb3_offload.h +#include sge_defs.h + +static struct cxio_rdev *rdev_tbl[T3_MAX_NUM_RNIC]; +static cxio_hal_ev_callback_func_t cxio_ev_cb = NULL; + +static inline struct cxio_rdev *cxio_hal_find_rdev_by_name(char *dev_name) +{ + int i; + for (i = 0; i T3_MAX_NUM_RNIC; i++) + if (rdev_tbl[i]) + if (!strcmp(rdev_tbl[i]-dev_name, dev_name)) + return rdev_tbl[i]; + return NULL; +} + +static inline struct cxio_rdev *cxio_hal_find_rdev_by_t3cdev(struct t3cdev +*tdev) +{ + int i; + for (i = 0; i T3_MAX_NUM_RNIC; i++) + if (rdev_tbl[i]) + if (rdev_tbl[i]-t3cdev_p == tdev) + return rdev_tbl[i]; + return NULL; +} + +static inline int cxio_hal_add_rdev(struct cxio_rdev *rdev_p) +{ + int i; + for (i = 0; i T3_MAX_NUM_RNIC; i++) + if (!rdev_tbl[i]) { + rdev_tbl[i] = rdev_p; + break; + } + return (i == T3_MAX_NUM_RNIC); +} + +static inline void cxio_hal_delete_rdev(struct cxio_rdev *rdev_p) +{ + int i; + for (i = 0; i T3_MAX_NUM_RNIC; i++) + if (rdev_tbl[i] == rdev_p) { + rdev_tbl[i] = NULL; + break; + } +} + +int cxio_hal_cq_op(struct cxio_rdev *rdev_p, struct t3_cq *cq, + enum t3_cq_opcode op, u32 credit) +{ + int ret; + struct t3_cqe *cqe; + u32 rptr; + + struct rdma_cq_op setup; + setup.id = cq-cqid; + setup.credits = (op == CQ_CREDIT_UPDATE) ? credit : 0; + setup.op = op; + ret = rdev_p-t3cdev_p-ctl(rdev_p-t3cdev_p, RDMA_CQ_OP, setup); + + if ((ret 0) || (op == CQ_CREDIT_UPDATE)) + return ret; + + /* +* If the rearm returned an index other than our current index, +* then there might be CQE's in flight (being DMA'd). We must wait +* here for them to complete or the consumer can miss a notification. +*/ + if (Q_PTR2IDX((cq-rptr), cq-size_log2) != ret) { + int i=0; + + rptr = cq-rptr; + + /* +* Keep the generation correct by bumping rptr until it +* matches the index returned by the rearm - 1. +*/ + while (Q_PTR2IDX((rptr+1), cq-size_log2) != ret) + rptr++; + + /* +* Now rptr is the index for the (last) cqe that was +* in-flight at the time the HW rearmed the CQ. We
[PATCH v4 01/13] Linux RDMA Core Changes
Support provider-specific data in ib_uverbs_cmd_req_notify_cq(). The Chelsio iwarp provider library needs to pass information to the kernel verb for re-arming the CQ. Signed-off-by: Steve Wise [EMAIL PROTECTED] --- drivers/infiniband/core/uverbs_cmd.c |9 +++-- drivers/infiniband/hw/amso1100/c2.h |2 +- drivers/infiniband/hw/amso1100/c2_cq.c|3 ++- drivers/infiniband/hw/ehca/ehca_iverbs.h |3 ++- drivers/infiniband/hw/ehca/ehca_reqs.c|3 ++- drivers/infiniband/hw/ipath/ipath_cq.c|4 +++- drivers/infiniband/hw/ipath/ipath_verbs.h |3 ++- drivers/infiniband/hw/mthca/mthca_cq.c|6 -- drivers/infiniband/hw/mthca/mthca_dev.h |4 ++-- include/rdma/ib_verbs.h |5 +++-- 10 files changed, 28 insertions(+), 14 deletions(-) diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index 743247e..5dd1de9 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -959,6 +959,7 @@ ssize_t ib_uverbs_req_notify_cq(struct i int out_len) { struct ib_uverbs_req_notify_cq cmd; + struct ib_udata udata; struct ib_cq *cq; if (copy_from_user(cmd, buf, sizeof cmd)) @@ -968,8 +969,12 @@ ssize_t ib_uverbs_req_notify_cq(struct i if (!cq) return -EINVAL; - ib_req_notify_cq(cq, cmd.solicited_only ? -IB_CQ_SOLICITED : IB_CQ_NEXT_COMP); + INIT_UDATA(udata, buf + sizeof cmd, 0, + in_len - sizeof cmd, 0); + + cq-device-req_notify_cq(cq, cmd.solicited_only ? + IB_CQ_SOLICITED : IB_CQ_NEXT_COMP, + udata); put_cq_read(cq); diff --git a/drivers/infiniband/hw/amso1100/c2.h b/drivers/infiniband/hw/amso1100/c2.h index 04a9db5..9a76869 100644 --- a/drivers/infiniband/hw/amso1100/c2.h +++ b/drivers/infiniband/hw/amso1100/c2.h @@ -519,7 +519,7 @@ extern void c2_free_cq(struct c2_dev *c2 extern void c2_cq_event(struct c2_dev *c2dev, u32 mq_index); extern void c2_cq_clean(struct c2_dev *c2dev, struct c2_qp *qp, u32 mq_index); extern int c2_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry); -extern int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify); +extern int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify, struct ib_udata *udata); /* CM */ extern int c2_llp_connect(struct iw_cm_id *cm_id, diff --git a/drivers/infiniband/hw/amso1100/c2_cq.c b/drivers/infiniband/hw/amso1100/c2_cq.c index 05c9154..7ce8bca 100644 --- a/drivers/infiniband/hw/amso1100/c2_cq.c +++ b/drivers/infiniband/hw/amso1100/c2_cq.c @@ -217,7 +217,8 @@ int c2_poll_cq(struct ib_cq *ibcq, int n return npolled; } -int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify) +int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify, + struct ib_udata *udata) { struct c2_mq_shared __iomem *shared; struct c2_cq *cq; diff --git a/drivers/infiniband/hw/ehca/ehca_iverbs.h b/drivers/infiniband/hw/ehca/ehca_iverbs.h index 3720e30..566b30c 100644 --- a/drivers/infiniband/hw/ehca/ehca_iverbs.h +++ b/drivers/infiniband/hw/ehca/ehca_iverbs.h @@ -135,7 +135,8 @@ int ehca_poll_cq(struct ib_cq *cq, int n int ehca_peek_cq(struct ib_cq *cq, int wc_cnt); -int ehca_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify cq_notify); +int ehca_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify cq_notify, + struct ib_udata *udata); struct ib_qp *ehca_create_qp(struct ib_pd *pd, struct ib_qp_init_attr *init_attr, diff --git a/drivers/infiniband/hw/ehca/ehca_reqs.c b/drivers/infiniband/hw/ehca/ehca_reqs.c index b46bda1..3ed6992 100644 --- a/drivers/infiniband/hw/ehca/ehca_reqs.c +++ b/drivers/infiniband/hw/ehca/ehca_reqs.c @@ -634,7 +634,8 @@ poll_cq_exit0: return ret; } -int ehca_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify cq_notify) +int ehca_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify cq_notify, + struct ib_udata *udata) { struct ehca_cq *my_cq = container_of(cq, struct ehca_cq, ib_cq); diff --git a/drivers/infiniband/hw/ipath/ipath_cq.c b/drivers/infiniband/hw/ipath/ipath_cq.c index 87462e0..27ba4db 100644 --- a/drivers/infiniband/hw/ipath/ipath_cq.c +++ b/drivers/infiniband/hw/ipath/ipath_cq.c @@ -307,13 +307,15 @@ int ipath_destroy_cq(struct ib_cq *ibcq) * ipath_req_notify_cq - change the notification type for a completion queue * @ibcq: the completion queue * @notify: the type of notification to request + * @udata: user data * * Returns 0 for success. * * This may be called from interrupt context. Also called by * ib_req_notify_cq() in the generic verbs code. */ -int ipath_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify notify) +int ipath_req_notify_cq(struct ib_cq
Re: [PATCH 2.6.19-git19] BUG due to bad argument to ieee80211softmac_assoc_work
On Wed, 2006-12-13 at 13:17 -0500, Michael Bommarito wrote: Attached is a patch that fixes this (the actual change is two lines but context provided in patch for review). The dmesg containing call trace is attached to the bugzilla entry above. You forgot to attach the patch but IIRC it's been found and fixed already. johannes signature.asc Description: This is a digitally signed message part
Re: [RFC] split NAPI from network device.
On Wed, 2006-12-13 at 15:46 -0800, Stephen Hemminger wrote: Split off NAPI part from network device, this patch is build tested only! It breaks kernel API for network devices, and only three examples are fixed (skge, sky2, and tg3). 1. Decomposition allows different NAPI - network device Some hardware has N devices for one IRQ, others like MSI-X want multiple receive's for one device. 2. Cleanup locking with netpoll 3. Change poll callback arguements and semantics 4. Make softnet_data static (only in dev.c) Thanks ! I'll give a go at adapting emac and maybe a few more when I get 5mn to spare... Ben. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.19-git19] BUG due to bad argument to ieee80211softmac_assoc_work
Michael, I sent a patch to this list on Sunday, that patched the problem. It seems to be migrated into the wireless-2.6 git tree. Regards, Uli Am 13.12.2006 um 19:17 schrieb Michael Bommarito: This didn't get much attention on bugzilla and I figured it was important enough to forward along to the whole list since it's been lingering around in ieee80211-softmac since 19-git5 at least. http://bugzilla.kernel.org/show_bug.cgi?id=7657 Somebody was passing the whole mac device structure to ieee80211softmac_assoc_work instead of just the assocation work, which lead to much death and locking. Attached is a patch that fixes this (the actual change is two lines but context provided in patch for review). The dmesg containing call trace is attached to the bugzilla entry above. -Mike - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Uli Kunitz - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.19-git19] BUG due to bad argument to ieee80211softmac_assoc_work
Hello Uli, Yes, apologies, I had been waiting for an abandoned bugzilla entry to get attention, and when I realized it was assigned to a dead-end, I had simply posted the patch without checking for prior messages. I was further confused by the fact that it hadn't made its way into any of the 19-gitX sets (and for that matter, the window for 2.6.20-rc1 has come and gone and this still remains unfixed), despite how clear the error was and how trivial the fix seems. -Mike On 12/14/06, Uli Kunitz [EMAIL PROTECTED] wrote: Michael, I sent a patch to this list on Sunday, that patched the problem. It seems to be migrated into the wireless-2.6 git tree. Regards, Uli Am 13.12.2006 um 19:17 schrieb Michael Bommarito: This didn't get much attention on bugzilla and I figured it was important enough to forward along to the whole list since it's been lingering around in ieee80211-softmac since 19-git5 at least. http://bugzilla.kernel.org/show_bug.cgi?id=7657 Somebody was passing the whole mac device structure to ieee80211softmac_assoc_work instead of just the assocation work, which lead to much death and locking. Attached is a patch that fixes this (the actual change is two lines but context provided in patch for review). The dmesg containing call trace is attached to the bugzilla entry above. -Mike - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Uli Kunitz - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.19-git19] BUG due to bad argument to ieee80211softmac_assoc_work
Michael Bommarito wrote: Hello Uli, Yes, apologies, I had been waiting for an abandoned bugzilla entry to get attention, and when I realized it was assigned to a dead-end, I had simply posted the patch without checking for prior messages. I was further confused by the fact that it hadn't made its way into any of the 19-gitX sets (and for that matter, the window for 2.6.20-rc1 has come and gone and this still remains unfixed), despite how clear the error was and how trivial the fix seems. I was not aware that a bugzilla entry existed for this problem. I learned about it when my system would hang on bootup if the bcm43xx card was installed. By bisection, I learned which commit was causing the problem. About that time, the complete fix was discussed on the netdev and bcm43xx mailing lists. I was a little perturbed that only part of the fix was accepted into 2.6.19-gitX. The full fix was pushed to John Linville on Dec. 10, who pushed it on to Jeff Garzik on Dec. 11. I have not yet seen any message sending it on to Andrew Morton or Linus. A bug fix will always be accepted, particularly one that only changes 2 lines - it is only a new feature that will no longer be accepted once the -rc1 stage is reached. If this message doesn't do the trick and it isn't included by -rc2, I'll ping Jeff to see what happened. Changes always take longer than one likes, but one needs to be careful. Larry - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 12/14] Spidernet Avoid possible RX chain corruption
On Thu, 2006-12-14 at 11:15 -0600, Linas Vepstas wrote: On Thu, Dec 14, 2006 at 11:22:43AM +1100, Michael Ellerman wrote: spider_net_refill_rx_chain(card); - spider_net_enable_rxchtails(card); spider_net_enable_rxdmac(card); return 0; Didn't you just add that line? Dagnabbit. The earlier pach was moving around existing code. Or, more precisely, trying to maintain the general function of the old code even while moving things around. Later on, when I started looking at what the danged function actually did, and the context it was in, I realized that it was a bad idea to call the thing. So then I removed it. :-/ How should I handle this proceedurally? Resend the patch sequence? Let it slide? If it was my code I'd redo the series, it's confusing and it's going to look confused in the git history IMHO. Currently the driver calls spider_net_enable_rxchtails() from spider_net_enable_card() and spider_net_handle_rxram_full(). Your patch 3/14 removes spider_net_handle_rxram_full() entirely, leaving spider_net_enable_card() as the only caller of spider_net_enable_rxchtails(). Patch 10/14 adds a call to spider_net_enable_rxchtails() in spider_net_alloc_rx_skbs(), and nothing else (except comment changes). Patch 12/14 removes the call to spider_net_enable_rxchtails() in spider_net_alloc_rx_skbs(), and nothing else. So as far as I can tell you should just drop 10/14 and 12/14. My worry is that amongst all that rearranging of code, it's not clear what the semantic change is. Admittedly I don't know the driver that well, but that's kind of the point - if you and Jim get moved onto a new project, someone needs to be able to pick up the driver and maintain it. cheers -- Michael Ellerman OzLabs, IBM Australia Development Lab wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person signature.asc Description: This is a digitally signed message part
Fw: 2.6.20-rc1 sky2 problems (regression?)
Begin forwarded message: Date: Thu, 14 Dec 2006 12:47:05 -0800 From: Alex Romosan [EMAIL PROTECTED] To: linux-kernel@vger.kernel.org Subject: 2.6.20-rc1 sky2 problems (regression?) under heavy network load the sky2 driver (compiled in the kernel) locks up and the only way i can get the network back is to reboot the machine (bringing the network down and back up again doesn't help). this happens on an amd64 machine (athlon 3500+ processor) and the card in question is a Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 15) (from lspci). this is what i see in the syslog: kernel: sky2 eth0: rx error, status 0x414a414a length 0 kernel: eth0: hw csum failure. kernel: kernel: Call Trace: kernel: IRQ [8044681c] __skb_checksum_complete+0x4d/0x66 kernel: [80477bc5] tcp_v4_rcv+0x147/0x8ea kernel: [80479ef2] raw_rcv_skb+0x9/0x20 kernel: [8047a2ff] raw_rcv+0xbe/0xc4 kernel: [8045ea9d] ip_local_deliver+0x170/0x21b kernel: [8045e8fa] ip_rcv+0x478/0x4ab kernel: [8044905d] netif_receive_skb+0x184/0x20e kernel: [803de8e5] sky2_poll+0x68f/0x93c kernel: [802219ce] scheduler_tick+0x23/0x2f9 kernel: [8044a796] net_rx_action+0x61/0xf0 kernel: [8022a35f] __do_softirq+0x40/0x8a kernel: [8020a3cc] call_softirq+0x1c/0x28 kernel: [8020bbf0] do_softirq+0x2c/0x7d kernel: [8022a313] irq_exit+0x36/0x42 kernel: [8020bebe] do_IRQ+0x8c/0x9e kernel: [80208710] default_idle+0x0/0x3a kernel: [80209bf1] ret_from_intr+0x0/0xa kernel: EOI [80208736] default_idle+0x26/0x3a kernel: [8020878c] cpu_idle+0x42/0x75 kernel: [805df675] start_kernel+0x1ce/0x1d3 kernel: [805df140] _sinittext+0x140/0x144 kernel: kernel: eth0: hw csum failure. kernel: kernel: Call Trace: kernel: IRQ [8044681c] __skb_checksum_complete+0x4d/0x66 kernel: [80477bc5] tcp_v4_rcv+0x147/0x8ea kernel: [80479ef2] raw_rcv_skb+0x9/0x20 kernel: [8047a2ff] raw_rcv+0xbe/0xc4 kernel: [8045ea9d] ip_local_deliver+0x170/0x21b kernel: [8045e8fa] ip_rcv+0x478/0x4ab kernel: [8044905d] netif_receive_skb+0x184/0x20e kernel: [803de8e5] sky2_poll+0x68f/0x93c kernel: [80474647] tcp_delack_timer+0x0/0x1b5 kernel: [8044a796] net_rx_action+0x61/0xf0 kernel: [8022a35f] __do_softirq+0x40/0x8a kernel: [8020a3cc] call_softirq+0x1c/0x28 kernel: [8020bbf0] do_softirq+0x2c/0x7d kernel: [8022a313] irq_exit+0x36/0x42 kernel: [8020bebe] do_IRQ+0x8c/0x9e kernel: [80209bf1] ret_from_intr+0x0/0xa kernel: EOI [802a8402] inode2sd+0x104/0x117 kernel: [802b8cfa] search_by_key+0xa08/0xbfe kernel: [802b8475] search_by_key+0x183/0xbfe kernel: [80284778] ll_rw_block+0x89/0x9e kernel: [802b8475] search_by_key+0x183/0xbfe kernel: [80283cf5] __find_get_block_slow+0x101/0x10d kernel: [80284053] __find_get_block+0x197/0x1a5 kernel: [8026800c] inode_get_bytes+0x2a/0x52 kernel: [802a89f1] reiserfs_update_sd_size+0x7e/0x284 kernel: [80237700] kthread+0xed/0xfd kernel: [802be990] do_journal_end+0x34b/0xbdd kernel: [802b1729] reiserfs_dirty_inode+0x56/0x76 kernel: [80284c19] block_prepare_write+0x1a/0x24 kernel: [802809b1] __mark_inode_dirty+0x29/0x197 kernel: [802a8d04] reiserfs_commit_write+0x10d/0x19f kernel: [80284c19] block_prepare_write+0x1a/0x24 kernel: [802484fc] generic_file_buffered_write+0x4ad/0x6c4 kernel: [80271b3c] __pollwait+0x0/0xe0 kernel: [8022a006] current_fs_time+0x35/0x3b kernel: [80248a8c] __generic_file_aio_write_nolock+0x379/0x3ec kernel: [8049baca] unix_dgram_recvmsg+0x1be/0x1d9 kernel: [804b6516] __mutex_lock_slowpath+0x205/0x210 kernel: [80248b60] generic_file_aio_write+0x61/0xc1 kernel: [80248aff] generic_file_aio_write+0x0/0xc1 kernel: [80264e57] do_sync_readv_writev+0xc0/0x107 kernel: [802377f7] autoremove_wake_function+0x0/0x2e kernel: [80229d16] getnstimeofday+0x10/0x28 kernel: [80264ced] rw_copy_check_uvector+0x6c/0xdc kernel: [802654f7] do_readv_writev+0xb2/0x18b kernel: [80265a2c] sys_writev+0x45/0x93 kernel: [802096de] system_call+0x7e/0x83 and so on. some times i don't get this trace but instead i get: kernel: sky2 eth0: tx timeout kernel: sky2 eth0: transmit ring 140 .. 99 report=181 done=181 kernel: sky2 status report lost? kernel: NETDEV WATCHDOG: eth0: transmit timed out kernel: sky2 eth0: tx timeout kernel: sky2 eth0: transmit ring 181 .. 140 report=181 done=181 kernel: sky2 hardware hung? flushing but the end result is the same, the network card stops responding and i have to reboot the machine. i can reproduce this on a consistent basis so if there
Re: 2.6.20-rc1 sky2 problems (regression?)
Alex Romosan [EMAIL PROTECTED] wrote: /** does the HW need to evaluate checksum for TCP or UDP packets? if (pMessage-ip_summed == CHECKSUM_HW) maybe this needs to be replace with CHECKSUM_PARTIAL. the second one /** TCP checksum offload if ((pSKPacket-pMbuf-ip_summed == CHECKSUM_HW) (SetOpcodePacketFlag == SK_TRUE) i wonder if this is supposed to be CHECKSUM_COMPLETE The rule of thumb is that it's COMPLETE for RX, and PARTIAL for TX. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 13/22] e1000: disable TSO when debugging slab
Jeff Garzik [EMAIL PROTECTED] wrote: +#ifdef CONFIG_DEBUG_SLAB + /* 82544's work arounds do not play nicely with DEBUG SLAB */ + if (adapter-hw.mac_type == e1000_82544) + netdev-features = ~NETIF_F_TSO; +#endif ACK, provided that you greatly enhance the comment to explain -why-, not just the desired results. Actually, CONFIG_DEBUG_SLAB is not the only thing that can break the 82544 work-around, Xen for example will also generate packets that breaks it. Jesse has a more recent fix that resolves both problems. I've updated his patch to make it smaller. Note that the only reason we don't see this normally is because the TCP stack starts writing from the end, i.e., it writes the TCP header first then slaps on the IP header, etc. So the end of the TCP header (skb-tail - 1 here) is always aligned correctly. Had we made the start of the IP header (e.g., IPv6) 8-byte aligned instead, this would happen for normal TCP traffic as well. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c index 73f3a85..2c6ba42 100644 --- a/drivers/net/e1000/e1000_main.c +++ b/drivers/net/e1000/e1000_main.c @@ -3168,6 +3168,16 @@ #ifdef NETIF_F_TSO if (skb-data_len (hdr_len == (skb-len - skb-data_len))) { switch (adapter-hw.mac_type) { unsigned int pull_size; + case e1000_82544: + /* Make sure we have room to chop off 4 bytes, +* and that the end alignment will work out to +* this hardware's requirements +* NOTE: this is a TSO only workaround +* if end byte alignment not correct move us +* into the next dword */ + if ((unsigned long)(skb-tail - 1) 4) + break; + /* fall through */ case e1000_82571: case e1000_82572: case e1000_82573: - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/6] d80211: add IEEE802.11e/WMM MLMEs, Status Code and Reason Code
On Thu, 2006-12-14 at 11:27 +0100, Jiri Benc wrote: On Thu, 14 Dec 2006 12:02:04 +0800, Zhu Yi wrote: Signed-off-by: Zhu Yi [EMAIL PROTECTED] Please Cc: me and John Linville on d80211 patches otherwise your chances of review (and inclusion) are much lower. In addition to comments from Michael (which are all perfectly valid and you need to address all of them): +struct ieee802_11_ts_info { Choose a name consistent with the rest of the header (e.g. ieee80211_ prefix). OK. Thanks, -yi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.20-rc1 sky2 problems (regression?)
On Fri, 15 Dec 2006 13:24:32 +1100 Herbert Xu [EMAIL PROTECTED] wrote: Alex Romosan [EMAIL PROTECTED] wrote: /** does the HW need to evaluate checksum for TCP or UDP packets? if (pMessage-ip_summed == CHECKSUM_HW) maybe this needs to be replace with CHECKSUM_PARTIAL. the second one /** TCP checksum offload if ((pSKPacket-pMbuf-ip_summed == CHECKSUM_HW) (SetOpcodePacketFlag == SK_TRUE) i wonder if this is supposed to be CHECKSUM_COMPLETE The rule of thumb is that it's COMPLETE for RX, and PARTIAL for TX. Cheers, I have a fixed up version of the vendor driver, I'll repackage it tomorrow. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/6] d80211: create wifi.h to define WIFI OUIs
On Thu, 2006-12-14 at 11:31 +0100, Jiri Benc wrote: AFAIK wifi is a trademark and we want to avoid using it. wlan seems to be a better alternative for the prefixes. Also, I don't see a reason for a separate header file here. WI-FI(r) is a trademark, but wifi and WIFI_XXX are not. I'm OK with putting these to existed headers. Thanks, -yi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] d80211: add sysfs interface for QoS functions
On Thu, 2006-12-14 at 12:23 +0100, Jiri Benc wrote: So... what about implementing that into cfg80211? :-) I'm not inclined towards this patch (even if you address Stephen's comment). OK. This is only for my testing (or maybe someone else wants to try the code). I'm not asking to merge it. When all the other code is reviewed and accepted, I will write a cfg80211 interface for it. Thanks, -yi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.20-rc1 sky2 problems (regression?)
Stephen Hemminger [EMAIL PROTECTED] writes: I have a fixed up version of the vendor driver, I'll repackage it tomorrow. as per the include file, i ended up replacing all the CHECKSUM_HW with CHECkSUM_PARTIAL since the functions in questions had to do with transmit. seems to be working so far without any lockups. we'll see how long this lasts. --alex-- -- | I believe the moment is at hand when, by a paranoiac and active | | advance of the mind, it will be possible (simultaneously with | | automatism and other passive states) to systematize confusion | | and thus to help to discredit completely the world of reality. | - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[announce] iproute2 2.6.19-061214
This is an update to the iproute2 command set. It can be downloaded from: http://developer.osdl.org/dev/iproute2/download/iproute2-2.6.18-061214.tar.gz Repository: git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git For more info on iproute2 see: http://linux-net.osdl.org/index.php/Iproute2 The version number includes the kernel version to denote what features are supported. The same source should build on older systems, but obviously the newer kernel features won't be available. As much as possible, this package tries to be source compatible across releases. Changes from 2.6.18-061002 to 2.6.19-061214: Boian Bonev: Display local route table name correctly in output of: Hasso Tepper: Fixes for tc help commands jamal: Multicast computation off by one Update generic netlink header Add controller support for new features exposed clarify ok and pass Fix missing class/flowid oddity Mention need for db dev package update xfrm async events make muticast group to bitmask conversion generic update xfrm monitoring to use nl_mgrp Masahide NAKAMURA: ADDR: Fix print format for lifetimes. ADDR: Enable to add IPv6 address with valid/preferred lifetime. ADDR: Define 0xU as INFINITY_LIFE_TIME regarding to the kernel. TUNNEL: Split common functions to export them. TUNNEL: Import ip6tunnel.c. TUNNEL: IPv6-over-IPv6 tunnel support. XFRM: sub policy support. XFRM: Mobile IPv6 route optimization support. XFRM: support report message by monitor. XFRM: Mobility header support. Noriaki TAKAMIYA: ADDR: Add the 'change' and 'replace' commands to the IPv6 address manipulation context. Patrick McHardy: [IPROUTE]: Add support for routing rule fwmark masks Stephen Hemminger: Man page for ss submitted by Alex Wirt Typo in man page Trap possible overflow in usec values to netem genl Makefile LDFLAGS SA and SP in IPSec BEET mode. Route metrics decode bug. lnstat man page Man page for rtmon Update to 2.6.19 headers Add more includes Change to post 2.6.19 sanitized headers Eliminate trailing whitespace Thomas Graf: Add support for inverted selectors Add rule notification support to ip monitor - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] bcm43xx-d80211: Fix for PHYmode API change.
This fixes the PHYmode list API breakage for the bcm43xx-d80211 driver. Signed-off-by: Michael Buesch [EMAIL PROTECTED] Index: bu3sch-wireless-dev/drivers/net/wireless/d80211/bcm43xx/bcm43xx.h === --- bu3sch-wireless-dev.orig/drivers/net/wireless/d80211/bcm43xx/bcm43xx.h 2006-12-13 19:24:25.0 +0100 +++ bu3sch-wireless-dev/drivers/net/wireless/d80211/bcm43xx/bcm43xx.h 2006-12-14 17:42:42.0 +0100 @@ -561,6 +561,8 @@ struct bcm43xx_phy { enum bcm43xx_firmware_compat fw; /* The TX header length. This depends on the firmware. */ size_t txhdr_size; + + struct ieee80211_hw_mode hwmode; }; /* Data structures for DMA transmission, per 80211 core. */ Index: bu3sch-wireless-dev/drivers/net/wireless/d80211/bcm43xx/bcm43xx_main.c === --- bu3sch-wireless-dev.orig/drivers/net/wireless/d80211/bcm43xx/bcm43xx_main.c 2006-12-13 19:24:25.0 +0100 +++ bu3sch-wireless-dev/drivers/net/wireless/d80211/bcm43xx/bcm43xx_main.c 2006-12-14 17:57:46.0 +0100 @@ -2892,19 +2892,25 @@ static void bcm43xx_chipset_detach(struc static void bcm43xx_free_modes(struct bcm43xx_private *bcm) { - struct ieee80211_hw *hw = bcm-ieee; + struct ssb_core *core; + struct bcm43xx_corepriv_80211 *wlpriv; + struct bcm43xx_phy *phy; int i; - for (i = 0; i hw-num_modes; i++) { - kfree(hw-modes[i].channels); - kfree(hw-modes[i].rates); + for (i = 0; i bcm-nr_80211_available; i++) { + core = bcm-wlcores[i]; + wlpriv = core-priv; + phy = wlpriv-phy; + + kfree(phy-hwmode.channels); + phy-hwmode.channels = NULL; + kfree(phy-hwmode.rates); + phy-hwmode.rates = NULL; } - kfree(hw-modes); - hw-modes = NULL; - hw-num_modes = 0; } -static int bcm43xx_append_mode(struct ieee80211_hw *hw, +static int bcm43xx_append_mode(struct bcm43xx_private *bcm, + struct bcm43xx_phy *phy, int mode_id, int nr_channels, const struct ieee80211_channel *channels, @@ -2913,10 +2919,10 @@ static int bcm43xx_append_mode(struct ie int nr_rates2, const struct ieee80211_rate *rates2) { - struct ieee80211_hw_modes *mode; + struct ieee80211_hw_mode *mode; int err = -ENOMEM; - mode = (hw-modes[hw-num_modes]); + mode = phy-hwmode; mode-mode = mode_id; mode-num_channels = nr_channels; @@ -2937,11 +2943,14 @@ static int bcm43xx_append_mode(struct ie sizeof(*rates2) * nr_rates2); } - hw-num_modes++; - err = 0; + err = ieee80211_register_hwmode(bcm-ieee, mode); + if (err) + goto err_free_rates; out: return err; +err_free_rates: + kfree(mode-rates); err_free_channels: kfree(mode-channels); goto out; @@ -2950,17 +2959,9 @@ err_free_channels: static int bcm43xx_setup_modes(struct bcm43xx_private *bcm) { int err = -ENOMEM; - struct ieee80211_hw *hw = bcm-ieee; struct ssb_core *core; struct bcm43xx_corepriv_80211 *wlpriv; - int i, nr_modes; - - nr_modes = bcm-nr_80211_available; - hw-modes = kzalloc(sizeof(*(hw-modes)) * nr_modes, - GFP_KERNEL); - if (!hw-modes) - goto out; - hw-num_modes = 0; + int i; for (i = 0; i bcm-nr_80211_available; i++) { core = bcm-wlcores[i]; @@ -2968,7 +2969,7 @@ static int bcm43xx_setup_modes(struct bc switch (wlpriv-phy.type) { case BCM43xx_PHYTYPE_A: - err = bcm43xx_append_mode(bcm-ieee, MODE_IEEE80211A, + err = bcm43xx_append_mode(bcm, wlpriv-phy, MODE_IEEE80211A, ARRAY_SIZE(bcm43xx_a_chantable), bcm43xx_a_chantable, ARRAY_SIZE(bcm43xx_ofdm_ratetable), @@ -2976,7 +2977,7 @@ static int bcm43xx_setup_modes(struct bc 0, NULL); break; case BCM43xx_PHYTYPE_B: - err = bcm43xx_append_mode(bcm-ieee, MODE_IEEE80211B, + err = bcm43xx_append_mode(bcm, wlpriv-phy, MODE_IEEE80211B, ARRAY_SIZE(bcm43xx_bg_chantable), bcm43xx_bg_chantable, ARRAY_SIZE(bcm43xx_cck_ratetable), @@ -2984,7 +2985,7 @@
[PATCH 1/2] d80211: Turn PHYmode list from an array into a linked list
This turns the PHY-modes list into a linked list. The advantage is that drivers can add modes dynamically, as they probe them and don't have to settle to a given arraysize at the beginning of probing. Signed-off-by: Michael Buesch [EMAIL PROTECTED] -- Note that I will also send fixup patches for all other d80211 drivers, if no complaints are done against this patch. Index: bu3sch-wireless-dev/include/net/d80211.h === --- bu3sch-wireless-dev.orig/include/net/d80211.h 2006-12-05 18:09:34.0 +0100 +++ bu3sch-wireless-dev/include/net/d80211.h2006-12-13 19:40:05.0 +0100 @@ -76,12 +76,14 @@ struct ieee80211_rate { * optimizing channel utilization estimates */ }; -struct ieee80211_hw_modes { - int mode; - int num_channels; - struct ieee80211_channel *channels; - int num_rates; -struct ieee80211_rate *rates; +struct ieee80211_hw_mode { + int mode; /* MODE_IEEE80211... */ + int num_channels; /* Number of channels (below) */ + struct ieee80211_channel *channels; /* Array of supported channels */ + int num_rates; /* Number of rates (below) */ +struct ieee80211_rate *rates; /* Array of supported rates */ + + struct list_head list; /* Internal, don't touch */ }; struct ieee80211_tx_queue_params { @@ -420,9 +422,7 @@ typedef enum { SET_KEY, DISABLE_KEY, REMOVE_ALL_KEYS, } set_key_cmd; -/* This is driver-visible part of the per-hw state the stack keeps. - * If you change something in here, call ieee80211_update_hw() to - * notify the stack about the change. */ +/* This is driver-visible part of the per-hw state the stack keeps. */ struct ieee80211_hw { /* these are assigned by d80211, don't write */ int index; @@ -512,9 +512,6 @@ struct ieee80211_hw { /* This is maximum value for rssi reported by this device */ int maxssi; - int num_modes; - struct ieee80211_hw_modes *modes; - /* Number of available hardware TX queues for data packets. * WMM requires at least four queues. */ int queues; @@ -750,9 +747,9 @@ static inline char *ieee80211_get_rx_led #endif } -/* Call this function if you changed the hardware description after - * ieee80211_register_hw */ -int ieee80211_update_hw(struct ieee80211_hw *hw); +/* Register a new hardware PHYMODE capability to the stack. */ +int ieee80211_register_hwmode(struct ieee80211_hw *hw, + struct ieee80211_hw_mode *mode); /* Unregister a hardware device. This function instructs 802.11 code to free * allocated resources and unregister netdevices from the kernel. */ Index: bu3sch-wireless-dev/net/d80211/ieee80211.c === --- bu3sch-wireless-dev.orig/net/d80211/ieee80211.c 2006-12-05 18:09:35.0 +0100 +++ bu3sch-wireless-dev/net/d80211/ieee80211.c 2006-12-13 19:40:05.0 +0100 @@ -1915,7 +1915,8 @@ int ieee80211_if_config_beacon(struct ne int ieee80211_hw_config(struct ieee80211_local *local) { - int i, ret = 0; + struct ieee80211_hw_mode *mode; + int ret = 0; #ifdef CONFIG_D80211_VERBOSE_DEBUG printk(KERN_DEBUG HW CONFIG: channel=%d freq=%d @@ -1926,12 +1927,10 @@ int ieee80211_hw_config(struct ieee80211 if (local-ops-config) ret = local-ops-config(local_to_hw(local), local-hw.conf); - for (i = 0; i local-hw.num_modes; i++) { - struct ieee80211_hw_modes *mode = local-hw.modes[i]; + list_for_each_entry(mode, local-modes_list, list) { if (mode-mode == local-hw.conf.phymode) { - if (local-curr_rates != mode-rates) { + if (local-curr_rates != mode-rates) rate_control_clear(local); - } local-curr_rates = mode-rates; local-num_curr_rates = mode-num_rates; ieee80211_prepare_rates(local); @@ -2511,10 +2510,10 @@ ieee80211_rx_h_data(struct ieee80211_txr static struct ieee80211_rate * ieee80211_get_rate(struct ieee80211_local *local, int phymode, int hw_rate) { - int m, r; + struct ieee80211_hw_mode *mode; + int r; - for (m = 0; m local-hw.num_modes; m++) { - struct ieee80211_hw_modes *mode = local-hw.modes[m]; + list_for_each_entry(mode, local-modes_list, list) { if (mode-mode != phymode) continue; for (r = 0; r mode-num_rates; r++) { @@ -4351,24 +4350,6 @@ void ieee80211_if_mgmt_setup(struct net_ dev-destructor = ieee80211_if_free; } -static void ieee80211_precalc_modes(struct ieee80211_local *local) -{ - struct ieee80211_hw_modes *mode; - struct ieee80211_rate *rate; - struct ieee80211_hw *hw = local-hw; -
RE: [PATCH 1/6] d80211: add IEEE802.11e/WMM MLMEs, Status Code and Reason Code
None of this should be in the kernel. See wpa_supplicant. Simon -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Zhu Yi Sent: Wednesday, December 13, 2006 8:02 PM To: netdev@vger.kernel.org Subject: [PATCH 1/6] d80211: add IEEE802.11e/WMM MLMEs, Status Code and Reason Code Signed-off-by: Zhu Yi [EMAIL PROTECTED] --- include/net/d80211_mgmt.h | 148 + 1 files changed, 148 insertions(+), 0 deletions(-) d83f6236e756f5f0bb1484d99188f06704de diff --git a/include/net/d80211_mgmt.h b/include/net/d80211_mgmt.h index 87141d4..450c0a2 100644 --- a/include/net/d80211_mgmt.h +++ b/include/net/d80211_mgmt.h @@ -14,6 +14,39 @@ #include linux/types.h +struct ieee802_11_ts_info { + __le16 traffic_type:1; + __le16 tsid:4; + __le16 direction:2; + __le16 access_policy:2; + __le16 aggregation:1; + __le16 apsd:1; + __le16 up:3; + __le16 ack_policy:2; + u8 schedule:1; + u8 reserved:7; +} __attribute__ ((packed)); + +struct ieee802_11_elem_tspec { + struct ieee802_11_ts_info ts_info; + __le16 nominal_msdu_size; + __le16 max_msdu_size; + __le32 min_service_interval; + __le32 max_service_interval; + __le32 inactivity_interval; + __le32 suspension_interval; + __le32 service_start_time; + __le32 min_data_rate; + __le32 mean_data_rate; + __le32 peak_data_rate; + __le32 burst_size; + __le32 delay_bound; + __le32 min_phy_rate; + __le16 surplus_band_allow; + __le16 medium_time; +} __attribute__ ((packed)); + + struct ieee80211_mgmt { __le16 frame_control; __le16 duration; @@ -81,9 +114,51 @@ struct ieee80211_mgmt { struct { u8 action_code; u8 dialog_token; + u8 variable[0]; + } __attribute__ ((packed)) addts_req; + struct { + u8 action_code; + u8 dialog_token; + __le16 status_code; + u8 variable[0]; + } __attribute__ ((packed)) addts_resp; + struct { + u8 action_code; + struct ieee802_11_ts_info ts_info; + __le16 reason_code; + } __attribute__ ((packed)) delts; + struct { + u8 action_code; + u8 dialog_token; u8 status_code; u8 variable[0]; } __attribute__ ((packed)) wme_action; + struct { + u8 action_code; + u8 dest[6]; + u8 src[6]; + __le16 capab_info; + __le16 timeout; + /* Followed by Supported Rates and +* Extended Supported Rates */ + u8 variable[0]; + } __attribute__ ((packed)) dls_req; + struct { + u8 action_code; + __le16 status_code; + u8 dest[6]; + u8 src[6]; + /* Followed by Capability Information, +* Supported Rates and Extended +* Supported Rates */ + u8 variable[0]; + } __attribute__ ((packed)) dls_resp; + struct { + u8 action_code; + u8 dest[6]; + u8 src[6]; + __le16 reason_code; + } __attribute__ ((packed)) dls_teardown; struct{ u8 action_code; u8 element_id; @@ -150,6 +225,18 @@ enum ieee80211_statuscode { WLAN_STATUS_UNSUPP_RSN_VERSION = 44, WLAN_STATUS_INVALID_RSN_IE_CAP = 45, WLAN_STATUS_CIPHER_SUITE_REJECTED = 46, + /* 802.11e */ + WLAN_STATUS_UNSPECIFIED_QOS = 32, +
RE: [PATCH 4/6] d80211: add IEEE802.11e/WMM Traffic Stream (TS) Management support
This policing of media time must be done in the qdisc - and made to work per DA (Destination Address) - in order that AP mode will work as well as STA mode. In addition the count of used time should be updated AFTER the frame has been sent, not before, since the number of retries done cannot be taken into account before. These MUST be counted. The API to the qdisc should be via TC - not cfg80211 or other. Simon -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Zhu Yi Sent: Wednesday, December 13, 2006 8:03 PM To: netdev@vger.kernel.org Subject: [PATCH 4/6] d80211: add IEEE802.11e/WMM Traffic Stream (TS) Management support The d80211 now maintains a sta_ts_data structure for every TSID and direction combination of all the Taffic Streams. For those admission control enabled Acesss Categories (AC), STA can initiatively request a traffic stream. The stack also maintains two variables to record the admitted time and used time for each TS. In every dot11EDCAAveragingPeriod, a timer is used to track how much time (in usec) has been used (vs the admitted time). If it finds the used time is less than the admitted time in current dot11EDCAAveragingPeriod period, the STA will continue to fulfil the admitted time in the next period. Otherwise the stack will reduce the admitted time until the TS has been throttled. Finally both the AP and STA are able to delete the TS by sending a DELTS MLME. Signed-off-by: Zhu Yi [EMAIL PROTECTED] --- net/d80211/ieee80211.c |4 net/d80211/ieee80211_i.h | 49 - net/d80211/ieee80211_iface.c |5 + net/d80211/ieee80211_sta.c | 403 ++ net/d80211/wme.c | 34 +++- 5 files changed, 480 insertions(+), 15 deletions(-) d4a326b8493fb465480a68696315c05558c03b2c diff --git a/net/d80211/ieee80211.c b/net/d80211/ieee80211.c index 6e10db5..4eba18f 100644 --- a/net/d80211/ieee80211.c +++ b/net/d80211/ieee80211.c @@ -4599,6 +4599,10 @@ int ieee80211_register_hw(struct ieee802 goto fail_wep; } + /* Initialize QoS Params */ + local-dot11EDCAAveragingPeriod = 5; + local-MPDUExchangeTime = 0; + /* TODO: add rtnl locking around device creation and qdisc install */ ieee80211_install_qdisc(local-mdev); diff --git a/net/d80211/ieee80211_i.h b/net/d80211/ieee80211_i.h index ef303da..e8929d3 100644 --- a/net/d80211/ieee80211_i.h +++ b/net/d80211/ieee80211_i.h @@ -56,6 +56,10 @@ struct ieee80211_local; * increased memory use (about 2 kB of RAM per entry). */ #define IEEE80211_FRAGMENT_MAX 4 +/* Minimum and Maximum TSID used by EDCA. HCCA uses 0~7; EDCA uses 8~15 +*/ #define EDCA_TSID_MIN 8 #define EDCA_TSID_MAX 15 + struct ieee80211_fragment_entry { unsigned long first_frag_time; unsigned int seq; @@ -241,6 +245,7 @@ struct ieee80211_if_sta { IEEE80211_IBSS_SEARCH, IEEE80211_IBSS_JOINED } state; struct work_struct work; + struct timer_list admit_timer; /* Recompute EDCA admitted time */ u8 bssid[ETH_ALEN], prev_bssid[ETH_ALEN]; u8 ssid[IEEE80211_MAX_SSID_LEN]; size_t ssid_len; @@ -328,6 +333,19 @@ struct ieee80211_sub_if_data { #define IEEE80211_DEV_TO_SUB_IF(dev) netdev_priv(dev) +struct sta_ts_data { + enum { + TS_STATUS_UNUSED= 0, + TS_STATUS_ACTIVE= 1, + TS_STATUS_INACTIVE = 2, + TS_STATUS_THROTTLING= 3, + } status; + u8 dialog_token; + u8 up; + u32 admitted_time_usec; + u32 used_time_usec; +}; + struct ieee80211_local { /* embed the driver visible part. * don't cast (use the static inlines below), but we keep @@ -449,18 +467,19 @@ struct ieee80211_local { #ifdef CONFIG_HOSTAPD_WPA_TESTING u32 wpa_trigger; #endif /* CONFIG_HOSTAPD_WPA_TESTING */ -/* SNMP counters */ -/* dot11CountersTable */ -u32 dot11TransmittedFragmentCount; -u32 dot11MulticastTransmittedFrameCount; -u32 dot11FailedCount; + /* SNMP counters */ + /* dot11CountersTable */ + u32 dot11TransmittedFragmentCount; + u32 dot11MulticastTransmittedFrameCount; + u32 dot11FailedCount; u32 dot11RetryCount; u32 dot11MultipleRetryCount; u32 dot11FrameDuplicateCount; -u32 dot11ReceivedFragmentCount; -u32 dot11MulticastReceivedFrameCount; -u32 dot11TransmittedFrameCount; -u32 dot11WEPUndecryptableCount; + u32 dot11ReceivedFragmentCount; + u32 dot11MulticastReceivedFrameCount; + u32 dot11TransmittedFrameCount; + u32 dot11WEPUndecryptableCount; + u32 dot11EDCAAveragingPeriod; #ifdef CONFIG_D80211_LEDS int tx_led_counter, rx_led_counter; @@ -533,6 +552,17 @@ struct ieee80211_local { * (1 MODE_*) */ int user_space_mlme; + + u32
RE: [PATCH 6/6] d80211: add sysfs interface for QoS functions
This is all part of the client MLME - it would be much better to add this functionality to wpa_supplicant, rather than adding it to the kernel. Nothing here needs to be in the kernel for any reason. The client MLME functions that are in the kernel were put in there for test and debugging convenience only - the right client MLME to use is the one in wpa_supplicant. Especially with all the new and very complex MLME functions that are being added to 802.11 we do not want this huge amount of code in the kernel when it does not need to be there. The client MLME in the kernel should be #ifdefed out and made a kernel option - off by default. Simon -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Zhu Yi Sent: Wednesday, December 13, 2006 8:03 PM To: netdev@vger.kernel.org Subject: [PATCH 6/6] d80211: add sysfs interface for QoS functions The sysfs interface here is only a proof of concept. It provides a way for the userspace applications to use the advanced QoS features supported by d80211 stack. The finial solution should be switched to cfg80211. Signed-off-by: Zhu Yi [EMAIL PROTECTED] --- net/d80211/ieee80211_i.h | 13 ++ net/d80211/ieee80211_sysfs.c | 245 ++ 2 files changed, 258 insertions(+), 0 deletions(-) 83d49f70af1f38c152d8bd3abd69756ec087622e diff --git a/net/d80211/ieee80211_i.h b/net/d80211/ieee80211_i.h index d09f65e..7904033 100644 --- a/net/d80211/ieee80211_i.h +++ b/net/d80211/ieee80211_i.h @@ -20,6 +20,7 @@ #include linux/workqueue.h #include linux/types.h #include linux/spinlock.h +#include net/d80211_mgmt.h #include ieee80211_key.h #include sta_info.h @@ -329,6 +330,7 @@ struct ieee80211_sub_if_data { int channel_use_raw; struct attribute_group *sysfs_group; + struct attribute_group *sysfs_qos_group; }; #define IEEE80211_DEV_TO_SUB_IF(dev) netdev_priv(dev) @@ -702,6 +704,17 @@ struct sta_info * ieee80211_ibss_add_sta u8 *addr); int ieee80211_sta_deauthenticate(struct net_device *dev, u16 reason); int ieee80211_sta_disassociate(struct net_device *dev, u16 reason); +void ieee80211_send_addts(struct net_device *dev, + struct ieee802_11_elem_tspec *tspec); void wmm_send_addts(struct +net_device *dev, + struct ieee802_11_elem_tspec *tspec); void +ieee80211_send_delts(struct net_device *dev, u8 tsid, u8 direction, + u32 medium_time); +void wmm_send_delts(struct net_device *dev, u8 tsid, u8 direction, + u32 medium_time); +void ieee80211_send_dls_req(struct net_device *dev, struct dls_info +*dls); void ieee80211_send_dls_teardown(struct net_device *dev, u8 +*mac, u16 reason); void dls_info_add(struct ieee80211_local *local, +struct dls_info *dls); void dls_info_stop(struct ieee80211_local *local); int dls_link_status(struct ieee80211_local *local, u8 *addr); diff --git a/net/d80211/ieee80211_sysfs.c b/net/d80211/ieee80211_sysfs.c index 6a60077..31dc1f4 100644 --- a/net/d80211/ieee80211_sysfs.c +++ b/net/d80211/ieee80211_sysfs.c @@ -13,6 +13,7 @@ #include linux/netdevice.h #include linux/rtnetlink.h #include net/d80211.h +#include net/d80211_mgmt.h #include ieee80211_i.h #include ieee80211_rate.h @@ -21,6 +22,15 @@ #define to_net_dev(class) \ container_of(class, struct net_device, class_dev) +/* For sysfs and debug only */ +static struct ieee802_11_elem_tspec _tspec; static u8 +_dls_mac[ETH_ALEN]; + +#define TSID _tspec.ts_info.tsid +#define TSDIR _tspec.ts_info.direction +#define TSUP _tspec.ts_info.up + + static inline int rtnl_lock_local(struct ieee80211_local *local) { rtnl_lock(); @@ -657,6 +667,230 @@ static struct class ieee80211_class = { #endif }; + +/* QoS sysfs entries */ +static ssize_t show_ts_info(struct class_device *dev, char *buf) { + /* TSID, Direction, UP */ + return sprintf(buf, %u %u %u\n, TSID, TSDIR, TSUP); } + +static ssize_t store_ts_info(struct class_device *dev, const char *buf, +size_t len) +{ + unsigned int id, index, up; + + if (sscanf(buf, %u, %u, %u, id, index, up) != 3) { + printk(KERN_ERR %s: sscanf error\n, __FUNCTION__); + return -EINVAL; + } + if (id 8 || id 15) { + printk(KERN_ERR invalid tsid %d\n, id); + return -EINVAL; + } + if ((index != 0) (index != 1) (index != 3)) { + printk(KERN_ERR invalid direction %d\n, index); + return -EINVAL; + } + if (up 0 || up 7) { + printk(KERN_ERR invalid UP %d\n, up); + return -EINVAL; + } + TSID = id; + TSDIR = index; + TSUP = up; + return len; +} + +static CLASS_DEVICE_ATTR(ts_info, S_IWUSR|S_IRUGO, show_ts_info, +store_ts_info); + +static ssize_t show_tspec(struct class_device *dev, char *buf) { +
RE: [PATCH 5/6] d80211: add IEEE 802.11e Direct Link Setup (DLS) support
Again - this DLS management frame processing code should not be in the kernel - it should be in wpa_supplicant. Only the frame processing code should be in the kernel. Simon -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Zhu Yi Sent: Wednesday, December 13, 2006 8:03 PM To: netdev@vger.kernel.org Subject: [PATCH 5/6] d80211: add IEEE 802.11e Direct Link Setup (DLS) support Struct dls_info is declared to store the peer's MAC address, timeout value, supported rates, etc information for the DLS link. The stack also maintains a hash table to store the dls_info for all the DLS peers for local interface. The peer's MAC address is used as the hash table key. The DLS MLMEs handling functions for DLS Setup Request, DLS Response and DLS teardown are added. During packet TX, the stack compares the destination MAC address against the dls_info hash table and see whether a Direct Link should be used for the packet transmission. If so, it modifiess the IEEE 802.11 MAC header DA, SA and BSS fields to reflect the direct link setup. Signed-off-by: Zhu Yi [EMAIL PROTECTED] --- net/d80211/ieee80211.c | 19 +- net/d80211/ieee80211_i.h | 17 ++ net/d80211/ieee80211_sta.c | 450 3 files changed, 481 insertions(+), 5 deletions(-) 077c391798f72f356c0a5cb50f307b50143a5dcc diff --git a/net/d80211/ieee80211.c b/net/d80211/ieee80211.c index 4eba18f..b25d00e 100644 --- a/net/d80211/ieee80211.c +++ b/net/d80211/ieee80211.c @@ -1472,11 +1472,18 @@ static int ieee80211_subif_start_xmit(st memcpy(hdr.addr4, skb-data + ETH_ALEN, ETH_ALEN); hdrlen = 30; } else if (sdata-type == IEEE80211_IF_TYPE_STA) { - fc |= IEEE80211_FCTL_TODS; - /* BSSID SA DA */ - memcpy(hdr.addr1, sdata-u.sta.bssid, ETH_ALEN); - memcpy(hdr.addr2, skb-data + ETH_ALEN, ETH_ALEN); - memcpy(hdr.addr3, skb-data, ETH_ALEN); + if (dls_link_status(local, hdr.addr1) == DLS_STATUS_OK) { + /* DA SA BSSID */ + memcpy(hdr.addr1, skb-data, ETH_ALEN); + memcpy(hdr.addr2, skb-data + ETH_ALEN, ETH_ALEN); + memcpy(hdr.addr3, sdata-u.sta.bssid, ETH_ALEN); + } else { + fc |= IEEE80211_FCTL_TODS; + /* BSSID SA DA */ + memcpy(hdr.addr1, sdata-u.sta.bssid, ETH_ALEN); + memcpy(hdr.addr2, skb-data + ETH_ALEN, ETH_ALEN); + memcpy(hdr.addr3, skb-data, ETH_ALEN); + } hdrlen = 24; } else if (sdata-type == IEEE80211_IF_TYPE_IBSS) { /* DA SA BSSID */ @@ -4602,6 +4609,7 @@ int ieee80211_register_hw(struct ieee802 /* Initialize QoS Params */ local-dot11EDCAAveragingPeriod = 5; local-MPDUExchangeTime = 0; + spin_lock_init(local-dls_lock); /* TODO: add rtnl locking around device creation and qdisc install */ ieee80211_install_qdisc(local-mdev); @@ -4702,6 +4710,7 @@ void ieee80211_unregister_hw(struct ieee ieee80211_rx_bss_list_deinit(local-mdev); ieee80211_clear_tx_pending(local); + dls_info_stop(local); sta_info_stop(local); rate_control_deinitialize(local); ieee80211_dev_sysfs_del(local); diff --git a/net/d80211/ieee80211_i.h b/net/d80211/ieee80211_i.h index e8929d3..d09f65e 100644 --- a/net/d80211/ieee80211_i.h +++ b/net/d80211/ieee80211_i.h @@ -346,6 +346,18 @@ struct sta_ts_data { u32 used_time_usec; }; +#define DLS_STATUS_OK 0 +#define DLS_STATUS_NOLINK 1 +#define DLS_STATUS_SETUP 2 +struct dls_info { + atomic_t refcnt; + int status; + u8 addr[ETH_ALEN]; + struct dls_info *hnext; /* next entry in hash table list */ + u32 timeout; + u32 supp_rates; +}; + struct ieee80211_local { /* embed the driver visible part. * don't cast (use the static inlines below), but we keep @@ -558,6 +570,9 @@ struct ieee80211_local { #define STA_TSDIR_NUM 2 /* HCCA: 0~7, EDCA: 8~15 */ struct sta_ts_data ts_data[STA_TSID_NUM][STA_TSDIR_NUM]; + + struct dls_info *dls_hash[STA_HASH_SIZE]; + spinlock_t dls_lock; }; enum sta_link_direction { @@ -687,6 +702,8 @@ struct sta_info * ieee80211_ibss_add_sta u8 *addr); int ieee80211_sta_deauthenticate(struct net_device *dev, u16 reason); int ieee80211_sta_disassociate(struct net_device *dev, u16 reason); +void dls_info_stop(struct ieee80211_local *local); int +dls_link_status(struct ieee80211_local *local, u8 *addr); /* ieee80211_dev.c */ int ieee80211_dev_alloc_index(struct ieee80211_local *local); diff --git a/net/d80211/ieee80211_sta.c b/net/d80211/ieee80211_sta.c index 81b2ded..393a294 100644 --- a/net/d80211/ieee80211_sta.c
RE: d80211-drivers pull request (week-48)
Devicescape does understant that the hardware can do retries - but it adds software retries on top. This allows higher reliability, as well as correct handling of the powersave state machine. (PS bit from a STA is supposed to stop APs transmission immediately). Simon -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Michael Buesch Sent: Tuesday, December 12, 2006 1:35 AM To: Daniel Drake Cc: Michael Wu; John Linville; netdev@vger.kernel.org; Ulrich Kunitz Subject: Re: d80211-drivers pull request (week-48) On Tuesday 12 December 2006 02:07, Daniel Drake wrote: Michael Wu wrote: zd1211rw-d80211: Use ieee80211_tx_status I've thought some more about this and I'm not so sure that this is the right approach. Can't devicescape be taught that the ZD1211 handles retries in hardware and the stack doesn't need to worry about it? What does devicescape do in response to not getting an ack? It does ratecontrol based on that. Basically: No ACK == failed packet. If too many failures, lower the rate. -- Greetings Michael. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html