[take28 0/8] kevent: Generic event handling mechanism.

2006-12-14 Thread Evgeniy Polyakov

Generic event handling mechanism.

Kevent is a generic subsytem which allows to handle event notifications.
It supports both level and edge triggered events. It is similar to
poll/epoll in some cases, but it is more scalable, it is faster and
allows to work with essentially eny kind of events.

Events are provided into kernel through control syscall and can be read
back through ring buffer or using usual syscalls.
Kevent update (i.e. readiness switching) happens directly from internals
of the appropriate state machine of the underlying subsytem (like
network, filesystem, timer or any other).

Homepage:
http://tservice.net.ru/~s0mbre/old/?section=projectsitem=kevent

Documentation page:
http://linux-net.osdl.org/index.php/Kevent

Consider for inclusion.

New benchmark, which can be a hoax though, can be found at 
http://tservice.net.ru/~s0mbre/blog/2006/11/30#2006_11_30
where kevent on amd64 with 1gb of ram can handle more than 7200 events per 
second with 8000 requests concurrency with 'ab' benchmark and lighttpd.
Although I tought it should not be published due to possible errors,
I decided to send it for review.

With this release I start 3 days resending timeout - i.e. each third day I 
will send either new version (if something new was requested and agreed to 
be implemented) or resending with back counter started from three. 
When back counter hits zero after three resending I consider there is no 
interest in subsystem and I will stop further sending.

Thanks for understanding and your time.

Changes from 'take27' patchset:
 * made kevent default yes in non embedded case.
 * added falgs to callback structures - currently used to check if kevent
can be requested from kernelspace only (posix timers) or 
userspace (all others)

Changes from 'take26' patchset:
 * made kevent visible in config only in case of embedded setup.
 * added comment about KEVENT_MAX number.
 * spell fix.

Changes from 'take25' patchset:
 * use timespec as timeout parameter.
 * added high-resolution timer to handle absolute timeouts.
 * added flags to waiting and initialization syscalls.
 * kevent_commit() has new_uidx parameter.
 * kevent_wait() has old_uidx parameter, which, if not equal to u-uidx,
results in immediate wakeup (usefull for the case when entries
are added asynchronously from kernel (not supported for now)).
 * added interface to mark any event as ready.
 * event POSIX timers support.
 * return -ENOSYS if there is no registered event type.
 * provided file descriptor must be checked for fifo type (spotted by Eric 
Dumazet).
 * signal notifications.
 * documentation update.
 * lighttpd patch updated (the latest benchmarks with lighttpd patch can be 
found in blog).

Changes from 'take24' patchset:
 * new (old (new)) ring buffer implementation with kernel and user indexes.
 * added initialization syscall instead of opening /dev/kevent
 * kevent_commit() syscall to commit ring buffer entries
 * changed KEVENT_REQ_WAKEUP_ONE flag to KEVENT_REQ_WAKEUP_ALL, kevent wakes
   only first thread always if that flag is not set
 * KEVENT_REQ_ALWAYS_QUEUE flag. If set, kevent will be queued into ready queue
   instead of copying back to userspace when kevent is ready immediately when
   it is added.
 * lighttpd patch (Hail! Although nothing really outstanding compared to epoll)

Changes from 'take23' patchset:
 * kevent PIPE notifications
 * KEVENT_REQ_LAST_CHECK flag, which allows to perform last check at dequeueing 
time
 * fixed poll/select notifications (were broken due to tree manipulations)
 * made Documentation/kevent.txt look nice in 80-col terminal
 * fix for copy_to_user() failure report for the first kevent (Andrew Morton)
 * minor function renames

Changes from 'take22' patchset:
 * new ring buffer implementation in process' memory
 * wakeup-one-thread flag
 * edge-triggered behaviour

Changes from 'take21' patchset:
 * minor cleanups (different return values, removed unneded variables, 
whitespaces and so on)
 * fixed bug in kevent removal in case when kevent being removed
   is the same as overflow_kevent (spotted by Eric Dumazet)

Changes from 'take20' patchset:
 * new ring buffer implementation
 * removed artificial limit on possible number of kevents

Changes from 'take19' patchset:
 * use __init instead of __devinit
 * removed 'default N' from config for user statistic
 * removed kevent_user_fini() since kevent can not be unloaded
 * use KERN_INFO for statistic output

Changes from 'take18' patchset:
 * use __init instead of __devinit
 * removed 'default N' from config for user statistic
 * removed kevent_user_fini() since kevent can not be unloaded
 * use KERN_INFO for statistic output

Changes from 'take17' patchset:
 * Use RB tree instead of hash table. 
At least for a web sever, frequency of addition/deletion of new kevent 
is comparable with number of search access, i.e. most of the time 
events 
are added, accesed only couple of times and then 

[take28 4/8] kevent: Socket notifications.

2006-12-14 Thread Evgeniy Polyakov

Socket notifications.

This patch includes socket send/recv/accept notifications.
Using trivial web server based on kevent and this features
instead of epoll it's performance increased more than noticebly.
More details about various benchmarks and server itself 
(evserver_kevent.c) can be found on project's homepage.

Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED]

diff --git a/fs/inode.c b/fs/inode.c
index ada7643..2740617 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -21,6 +21,7 @@
 #include linux/cdev.h
 #include linux/bootmem.h
 #include linux/inotify.h
+#include linux/kevent.h
 #include linux/mount.h
 
 /*
@@ -164,12 +165,18 @@ static struct inode *alloc_inode(struct super_block *sb)
}
inode-i_private = 0;
inode-i_mapping = mapping;
+#if defined CONFIG_KEVENT_SOCKET || defined CONFIG_KEVENT_PIPE
+   kevent_storage_init(inode, inode-st);
+#endif
}
return inode;
 }
 
 void destroy_inode(struct inode *inode) 
 {
+#if defined CONFIG_KEVENT_SOCKET || defined CONFIG_KEVENT_PIPE
+   kevent_storage_fini(inode-st);
+#endif
BUG_ON(inode_has_buffers(inode));
security_inode_free(inode);
if (inode-i_sb-s_op-destroy_inode)
diff --git a/include/net/sock.h b/include/net/sock.h
index edd4d73..d48ded8 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -48,6 +48,7 @@
 #include linux/netdevice.h
 #include linux/skbuff.h  /* struct sk_buff */
 #include linux/security.h
+#include linux/kevent.h
 
 #include linux/filter.h
 
@@ -450,6 +451,21 @@ static inline int sk_stream_memory_free(struct sock *sk)
 
 extern void sk_stream_rfree(struct sk_buff *skb);
 
+struct socket_alloc {
+   struct socket socket;
+   struct inode vfs_inode;
+};
+
+static inline struct socket *SOCKET_I(struct inode *inode)
+{
+   return container_of(inode, struct socket_alloc, vfs_inode)-socket;
+}
+
+static inline struct inode *SOCK_INODE(struct socket *socket)
+{
+   return container_of(socket, struct socket_alloc, socket)-vfs_inode;
+}
+
 static inline void sk_stream_set_owner_r(struct sk_buff *skb, struct sock *sk)
 {
skb-sk = sk;
@@ -477,6 +493,7 @@ static inline void sk_add_backlog(struct sock *sk, struct 
sk_buff *skb)
sk-sk_backlog.tail = skb;
}
skb-next = NULL;
+   kevent_socket_notify(sk, KEVENT_SOCKET_RECV);
 }
 
 #define sk_wait_event(__sk, __timeo, __condition)  \
@@ -679,21 +696,6 @@ static inline struct kiocb *siocb_to_kiocb(struct 
sock_iocb *si)
return si-kiocb;
 }
 
-struct socket_alloc {
-   struct socket socket;
-   struct inode vfs_inode;
-};
-
-static inline struct socket *SOCKET_I(struct inode *inode)
-{
-   return container_of(inode, struct socket_alloc, vfs_inode)-socket;
-}
-
-static inline struct inode *SOCK_INODE(struct socket *socket)
-{
-   return container_of(socket, struct socket_alloc, socket)-vfs_inode;
-}
-
 extern void __sk_stream_mem_reclaim(struct sock *sk);
 extern int sk_stream_mem_schedule(struct sock *sk, int size, int kind);
 
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 7a093d0..69f4ad2 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -857,6 +857,7 @@ static inline int tcp_prequeue(struct sock *sk, struct 
sk_buff *skb)
tp-ucopy.memory = 0;
} else if (skb_queue_len(tp-ucopy.prequeue) == 1) {
wake_up_interruptible(sk-sk_sleep);
+   kevent_socket_notify(sk, 
KEVENT_SOCKET_RECV|KEVENT_SOCKET_SEND);
if (!inet_csk_ack_scheduled(sk))
inet_csk_reset_xmit_timer(sk, ICSK_TIME_DACK,
  (3 * TCP_RTO_MIN) / 4,
diff --git a/kernel/kevent/kevent_socket.c b/kernel/kevent/kevent_socket.c
new file mode 100644
index 000..1798092
--- /dev/null
+++ b/kernel/kevent/kevent_socket.c
@@ -0,0 +1,144 @@
+/*
+ * kevent_socket.c
+ * 
+ * 2006 Copyright (c) Evgeniy Polyakov [EMAIL PROTECTED]
+ * All rights reserved.
+ * 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#include linux/kernel.h
+#include linux/types.h
+#include linux/list.h
+#include linux/slab.h
+#include linux/spinlock.h
+#include linux/timer.h
+#include linux/file.h

[take28 1/8] kevent: Description.

2006-12-14 Thread Evgeniy Polyakov

Description.


diff --git a/Documentation/kevent.txt b/Documentation/kevent.txt
new file mode 100644
index 000..2e03a3f
--- /dev/null
+++ b/Documentation/kevent.txt
@@ -0,0 +1,240 @@
+Description.
+
+int kevent_init(struct kevent_ring *ring, unsigned int ring_size, 
+   unsigned int flags);
+
+num - size of the ring buffer in events 
+ring - pointer to allocated ring buffer
+flags - various flags, see KEVENT_FLAGS_* definitions.
+
+Return value: kevent control file descriptor or negative error value.
+
+ struct kevent_ring
+ {
+   unsigned int ring_kidx, ring_over;
+   struct ukevent event[0];
+ }
+
+ring_kidx - index in the ring buffer where kernel will put new events 
+   when kevent_wait() or kevent_get_events() is called 
+ring_over - number of overflows of ring_uidx happend from the start.
+   Overflow counter is used to prevent situation when two threads 
+   are going to free the same events, but one of them was scheduled 
+   away for too long, so ring indexes were wrapped, so when that 
+   thread will be awakened, it will free not those events, which 
+   it suppose to free.
+
+Example userspace code (ring_buffer.c) can be found on project's homepage.
+
+Each kevent syscall can be so called cancellation point in glibc, i.e. when 
+thread has been cancelled in kevent syscall, thread can be safely removed 
+and no events will be lost, since each syscall (kevent_wait() or 
+kevent_get_events()) will copy event into special ring buffer, accessible 
+from other threads or even processes (if shared memory is used).
+
+When kevent is removed (not dequeued when it is ready, but just removed), 
+even if it was ready, it is not copied into ring buffer, since if it is 
+removed, no one cares about it (otherwise user would wait until it becomes 
+ready and got it through usual way using kevent_get_events() or kevent_wait()) 
+and thus no need to copy it to the ring buffer.
+
+---
+
+
+int kevent_ctl(int fd, unsigned int cmd, unsigned int num, struct ukevent 
*arg);
+
+fd - is the file descriptor referring to the kevent queue to manipulate. 
+It is created by opening /dev/kevent char device, which is created with 
+dynamic minor number and major number assigned for misc devices. 
+
+cmd - is the requested operation. It can be one of the following:
+KEVENT_CTL_ADD - add event notification 
+KEVENT_CTL_REMOVE - remove event notification 
+KEVENT_CTL_MODIFY - modify existing notification 
+KEVENT_CTL_READY - mark existing events as ready, if number of events is 
zero,
+   it just wakes up parked in syscall thread
+
+num - number of struct ukevent in the array pointed to by arg 
+arg - array of struct ukevent
+
+Return value: 
+ number of events processed or negative error value.
+
+When called, kevent_ctl will carry out the operation specified in the 
+cmd parameter.
+---
+
+ int kevent_get_events(int ctl_fd, unsigned int min_nr, unsigned int max_nr, 
+   struct timespec timeout, struct ukevent *buf, unsigned flags);
+
+ctl_fd - file descriptor referring to the kevent queue 
+min_nr - minimum number of completed events that kevent_get_events will block 
+waiting for 
+max_nr - number of struct ukevent in buf 
+timeout - time to wait before returning less than min_nr 
+ events. If this is -1, then wait forever. 
+buf - pointer to an array of struct ukevent. 
+flags - various flags, see KEVENT_FLAGS_* definitions.
+
+Return value:
+ number of events copied or negative error value.
+
+kevent_get_events will wait timeout milliseconds for at least min_nr completed 
+events, copying completed struct ukevents to buf and deleting any 
+KEVENT_REQ_ONESHOT event requests. In nonblocking mode it returns as many 
+events as possible, but not more than max_nr. In blocking mode it waits until 
+timeout or if at least min_nr events are ready.
+
+This function copies event into ring buffer if it was initialized, if ring 
buffer
+is full, KEVENT_RET_COPY_FAILED flag is set in ret_flags field.
+---
+
+ int kevent_wait(int ctl_fd, unsigned int num, unsigned int old_uidx, 
+   struct timespec timeout, unsigned int flags);
+
+ctl_fd - file descriptor referring to the kevent queue 
+num - number of processed kevents 
+old_uidx - the last index user is aware of
+timeout - time to wait until there is free space in kevent queue
+flags - various flags, see KEVENT_FLAGS_* definitions.
+
+Return value:
+ number of events copied into ring buffer or negative error value.
+
+This syscall waits until either timeout expires or at least one event becomes 
+ready. It also copies events into special ring buffer. If ring buffer is full,
+it waits until there are ready events and then return.
+If kevent is one-shot kevent it is 

[take28 8/8] kevent: Kevent posix timer notifications.

2006-12-14 Thread Evgeniy Polyakov

Kevent posix timer notifications.

Simple extensions to POSIX timers which allows
to deliver notification of the timer expiration
through kevent queue.

Example application posix_timer.c can be found
in archive on project homepage.

Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED]


diff --git a/include/asm-generic/siginfo.h b/include/asm-generic/siginfo.h
index 8786e01..3768746 100644
--- a/include/asm-generic/siginfo.h
+++ b/include/asm-generic/siginfo.h
@@ -235,6 +235,7 @@ typedef struct siginfo {
 #define SIGEV_NONE 1   /* other notification: meaningless */
 #define SIGEV_THREAD   2   /* deliver via thread creation */
 #define SIGEV_THREAD_ID 4  /* deliver to thread */
+#define SIGEV_KEVENT   8   /* deliver through kevent queue */
 
 /*
  * This works because the alignment is ok on all current architectures
@@ -260,6 +261,8 @@ typedef struct sigevent {
void (*_function)(sigval_t);
void *_attribute;   /* really pthread_attr_t */
} _sigev_thread;
+
+   int kevent_fd;
} _sigev_un;
 } sigevent_t;
 
diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h
index a7dd38f..4b9deb4 100644
--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -4,6 +4,7 @@
 #include linux/spinlock.h
 #include linux/list.h
 #include linux/sched.h
+#include linux/kevent_storage.h
 
 union cpu_time_count {
cputime_t cpu;
@@ -49,6 +50,9 @@ struct k_itimer {
sigval_t it_sigev_value;/* value word of sigevent struct */
struct task_struct *it_process; /* process to send signal to */
struct sigqueue *sigq;  /* signal queue entry. */
+#ifdef CONFIG_KEVENT_TIMER
+   struct kevent_storage st;
+#endif
union {
struct {
struct hrtimer timer;
diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c
index e5ebcc1..74270f8 100644
--- a/kernel/posix-timers.c
+++ b/kernel/posix-timers.c
@@ -48,6 +48,8 @@
 #include linux/wait.h
 #include linux/workqueue.h
 #include linux/module.h
+#include linux/kevent.h
+#include linux/file.h
 
 /*
  * Management arrays for POSIX timers.  Timers are kept in slab memory
@@ -224,6 +226,100 @@ static int posix_ktime_get_ts(clockid_t which_clock, 
struct timespec *tp)
return 0;
 }
 
+#ifdef CONFIG_KEVENT_TIMER
+static int posix_kevent_enqueue(struct kevent *k)
+{
+   /*
+* It is not ugly - there is no pointer in the id field union, 
+* but its size is 64bits, which is ok for any known pointer size.
+*/
+   struct k_itimer *tmr = (struct k_itimer *)(unsigned 
long)k-event.id.raw_u64;
+   return kevent_storage_enqueue(tmr-st, k);
+}
+static int posix_kevent_dequeue(struct kevent *k)
+{
+   struct k_itimer *tmr = (struct k_itimer *)(unsigned 
long)k-event.id.raw_u64;
+   kevent_storage_dequeue(tmr-st, k);
+   return 0;
+}
+static int posix_kevent_callback(struct kevent *k)
+{
+   return 1;
+}
+static int posix_kevent_init(void)
+{
+   struct kevent_callbacks tc = {
+   .callback = posix_kevent_callback,
+   .enqueue = posix_kevent_enqueue,
+   .dequeue = posix_kevent_dequeue,
+   .flags = KEVENT_CALLBACKS_KERNELONLY};
+
+   return kevent_add_callbacks(tc, KEVENT_POSIX_TIMER);
+}
+
+extern struct file_operations kevent_user_fops;
+
+static int posix_kevent_init_timer(struct k_itimer *tmr, int fd)
+{
+   struct ukevent uk;
+   struct file *file;
+   struct kevent_user *u;
+   int err;
+
+   file = fget(fd);
+   if (!file) {
+   err = -EBADF;
+   goto err_out;
+   }
+
+   if (file-f_op != kevent_user_fops) {
+   err = -EINVAL;
+   goto err_out_fput;
+   }
+
+   u = file-private_data;
+
+   memset(uk, 0, sizeof(struct ukevent));
+
+   uk.event = KEVENT_MASK_ALL;
+   uk.type = KEVENT_POSIX_TIMER;
+   uk.id.raw_u64 = (unsigned long)(tmr); /* Just cast to something unique 
*/
+   uk.req_flags = KEVENT_REQ_ONESHOT | KEVENT_REQ_ALWAYS_QUEUE;
+   uk.ptr = tmr-it_sigev_value.sival_ptr;
+
+   err = kevent_user_add_ukevent(uk, u);
+   if (err)
+   goto err_out_fput;
+
+   fput(file);
+
+   return 0;
+
+err_out_fput:
+   fput(file);
+err_out:
+   return err;
+}
+
+static void posix_kevent_fini_timer(struct k_itimer *tmr)
+{
+   kevent_storage_fini(tmr-st);
+}
+#else
+static int posix_kevent_init_timer(struct k_itimer *tmr, int fd)
+{
+   return -ENOSYS;
+}
+static int posix_kevent_init(void)
+{
+   return 0;
+}
+static void posix_kevent_fini_timer(struct k_itimer *tmr)
+{
+}
+#endif
+
+
 /*
  * Initialize everything, well, just everything in Posix clocks/timers ;)
  */
@@ -241,6 +337,11 @@ static __init int init_posix_timers(void)
register_posix_clock(CLOCK_REALTIME, clock_realtime);

[take28 3/8] kevent: poll/select() notifications.

2006-12-14 Thread Evgeniy Polyakov

poll/select() notifications.

This patch includes generic poll/select notifications.
kevent_poll works simialr to epoll and has the same issues (callback
is invoked not from internal state machine of the caller, but through
process awake, a lot of allocations and so on).

Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED]

diff --git a/fs/file_table.c b/fs/file_table.c
index bc35a40..0805547 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -20,6 +20,7 @@
 #include linux/cdev.h
 #include linux/fsnotify.h
 #include linux/sysctl.h
+#include linux/kevent.h
 #include linux/percpu_counter.h
 
 #include asm/atomic.h
@@ -119,6 +120,7 @@ struct file *get_empty_filp(void)
f-f_uid = tsk-fsuid;
f-f_gid = tsk-fsgid;
eventpoll_init_file(f);
+   kevent_init_file(f);
/* f-f_version: 0 */
return f;
 
@@ -164,6 +166,7 @@ void fastcall __fput(struct file *file)
 * in the file cleanup chain.
 */
eventpoll_release(file);
+   kevent_cleanup_file(file);
locks_remove_flock(file);
 
if (file-f_op  file-f_op-release)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 5baf3a1..8bbf3a5 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -276,6 +276,7 @@ extern int dir_notify_enable;
 #include linux/init.h
 #include linux/sched.h
 #include linux/mutex.h
+#include linux/kevent_storage.h
 
 #include asm/atomic.h
 #include asm/semaphore.h
@@ -586,6 +587,10 @@ struct inode {
struct mutexinotify_mutex;  /* protects the watches list */
 #endif
 
+#if defined CONFIG_KEVENT_SOCKET || defined CONFIG_KEVENT_PIPE
+   struct kevent_storage   st;
+#endif
+
unsigned long   i_state;
unsigned long   dirtied_when;   /* jiffies of first dirtying */
 
@@ -739,6 +744,9 @@ struct file {
struct list_headf_ep_links;
spinlock_t  f_ep_lock;
 #endif /* #ifdef CONFIG_EPOLL */
+#ifdef CONFIG_KEVENT_POLL
+   struct kevent_storage   st;
+#endif
struct address_space*f_mapping;
 };
 extern spinlock_t files_lock;
diff --git a/kernel/kevent/kevent_poll.c b/kernel/kevent/kevent_poll.c
new file mode 100644
index 000..7ccf7da
--- /dev/null
+++ b/kernel/kevent/kevent_poll.c
@@ -0,0 +1,234 @@
+/*
+ * 2006 Copyright (c) Evgeniy Polyakov [EMAIL PROTECTED]
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include linux/kernel.h
+#include linux/types.h
+#include linux/list.h
+#include linux/slab.h
+#include linux/spinlock.h
+#include linux/timer.h
+#include linux/file.h
+#include linux/kevent.h
+#include linux/poll.h
+#include linux/fs.h
+
+static kmem_cache_t *kevent_poll_container_cache;
+static kmem_cache_t *kevent_poll_priv_cache;
+
+struct kevent_poll_ctl
+{
+   struct poll_table_structpt;
+   struct kevent   *k;
+};
+
+struct kevent_poll_wait_container
+{
+   struct list_headcontainer_entry;
+   wait_queue_head_t   *whead;
+   wait_queue_twait;
+   struct kevent   *k;
+};
+
+struct kevent_poll_private
+{
+   struct list_headcontainer_list;
+   spinlock_t  container_lock;
+};
+
+static int kevent_poll_enqueue(struct kevent *k);
+static int kevent_poll_dequeue(struct kevent *k);
+static int kevent_poll_callback(struct kevent *k);
+
+static int kevent_poll_wait_callback(wait_queue_t *wait,
+   unsigned mode, int sync, void *key)
+{
+   struct kevent_poll_wait_container *cont =
+   container_of(wait, struct kevent_poll_wait_container, wait);
+   struct kevent *k = cont-k;
+
+   kevent_storage_ready(k-st, NULL, KEVENT_MASK_ALL);
+   return 0;
+}
+
+static void kevent_poll_qproc(struct file *file, wait_queue_head_t *whead,
+   struct poll_table_struct *poll_table)
+{
+   struct kevent *k =
+   container_of(poll_table, struct kevent_poll_ctl, pt)-k;
+   struct kevent_poll_private *priv = k-priv;
+   struct kevent_poll_wait_container *cont;
+   unsigned long flags;
+
+   cont = kmem_cache_alloc(kevent_poll_container_cache, GFP_KERNEL);
+   if (!cont) {
+   kevent_break(k);
+   return;
+   }
+
+   cont-k = k;
+   init_waitqueue_func_entry(cont-wait, kevent_poll_wait_callback);
+   cont-whead = whead;
+
+   spin_lock_irqsave(priv-container_lock, flags);
+   

Re: [PATCH 1/6] d80211: add IEEE802.11e/WMM MLMEs, Status Code and Reason Code

2006-12-14 Thread Jiri Benc
On Thu, 14 Dec 2006 12:02:04 +0800, Zhu Yi wrote:
 Signed-off-by: Zhu Yi [EMAIL PROTECTED]

Please Cc: me and John Linville on d80211 patches otherwise your
chances of review (and inclusion) are much lower.

In addition to comments from Michael (which are all perfectly valid and
you need to address all of them):

 +struct ieee802_11_ts_info {

Choose a name consistent with the rest of the header (e.g. ieee80211_
prefix).

Thanks,

 Jiri

-- 
Jiri Benc
SUSE Labs
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/6] d80211: create wifi.h to define WIFI OUIs

2006-12-14 Thread Jiri Benc
On Thu, 14 Dec 2006 12:02:16 +0800, Zhu Yi wrote:
 --- /dev/null
 +++ b/net/d80211/wifi.h
 @@ -0,0 +1,28 @@
 +/*
 + * This file defines Wi-Fi(r) OUIs for 80211.o
 + * Copyright 2006, Zhu Yi [EMAIL PROTECTED]  Intel Corp.
 + *
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License version 2 as
 + * published by the Free Software Foundation.
 + */
 +
 +#ifndef D802_11_WIFI_H
 +#define D802_11_WIFI_H
 +
 +/* WI-FI Alliance OUI Type and Subtype */
 +enum wifi_oui_type {
 + WIFI_OUI_TYPE_WPA = 1,
 + WIFI_OUI_TYPE_WMM = 2,
 + WIFI_OUI_TYPE_WSC = 4,
 + WIFI_OUI_TYPE_PSD = 6,
 +};
 +
 +enum wifi_oui_stype_wmm {
 + WIFI_OUI_STYPE_WMM_INFO = 0,
 + WIFI_OUI_STYPE_WMM_PARAM = 1,
 + WIFI_OUI_STYPE_WMM_TSPEC = 2,
 +};
 +
 +
 +#endif /* D802_11_WIFI_H */

AFAIK wifi is a trademark and we want to avoid using it. wlan seems
to be a better alternative for the prefixes. Also, I don't see a reason
for a separate header file here.

Thanks,

 Jiri

-- 
Jiri Benc
SUSE Labs
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6] d80211: fix classify_1d() priority selection

2006-12-14 Thread Jiri Benc
On Thu, 14 Dec 2006 12:02:27 +0800, Zhu Yi wrote:
 I don't see any reason why packets with DSCP=0x40 should have lower IEEE
 802.1D priority than packets with DSCP=0x20. Spare  Background. No?

Hm, seems so. Jouni, is there any reason for this?

 
 Signed-off-by: Zhu Yi [EMAIL PROTECTED]
 
 ---
 
  net/d80211/wme.c |4 ++--
  1 files changed, 2 insertions(+), 2 deletions(-)
 
 e1765ea0d80ad86619300d3253e801883fd745a5
 diff --git a/net/d80211/wme.c b/net/d80211/wme.c
 index b9505dc..f26fe6c 100644
 --- a/net/d80211/wme.c
 +++ b/net/d80211/wme.c
 @@ -131,9 +131,9 @@ static inline unsigned classify_1d(struc
   dscp = ip-tos  0xfc;
   switch (dscp) {
   case 0x20:
 - return 2;
 - case 0x40:
   return 1;
 + case 0x40:
 + return 2;
   case 0x60:
   return 3;
   case 0x80:

Thanks,

 Jiri

-- 
Jiri Benc
SUSE Labs
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6] d80211: add sysfs interface for QoS functions

2006-12-14 Thread Jiri Benc
On Thu, 14 Dec 2006 12:03:02 +0800, Zhu Yi wrote:
 The sysfs interface here is only a proof of concept. It provides a way for
 the userspace applications to use the advanced QoS features supported by
 d80211 stack. The finial solution should be switched to cfg80211.

So... what about implementing that into cfg80211? :-)

I'm not inclined towards this patch (even if you address Stephen's
comment).

Thanks,

 Jiri

(Btw, it will take me some time to review patches 4 and 5.)

-- 
Jiri Benc
SUSE Labs
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] drivers/net: spidernet driver on Celleb

2006-12-14 Thread Ishizaki Kou
Christoph-san,

Thanks for your comments.

On Tue, Dec 12, 2006 at 02:25:50PM +0900, Ishizaki Kou wrote:
 
 Following are the changes.
 -This patch enables auto-negotiation.
 -Loading firmware is done when spidernet_open() is called.
 -And this patch adds other several small changes for Celleb. 

This should be split into three separate patches, sent as a patch
series.

We are now working to separeting the patch. We'll send later.

 -/* outside loopback mode: ETOMOD signal dont matter, not connected */
 -#define SPIDER_NET_OPMODE_VALUE 0x0063
 +/* ETOMOD signal is brought to PHY reset. bit2 must be 1 in Celleb */
 +#define SPIDER_NET_OPMODE_VALUE 0x0067

Is it okay to simple change this value for the ibm blades?

Sorry, we didn't test on ibm blades, because we don't have one.
We hope to develop together so that the driver works on both platform.

 +static int is1000 = 1;

This should be in struct spider_net_card instead of a global flag.

We'll move it in struct spider_net_card.

  case SPIDER_NET_GTMFLLINT:
 -if (netif_msg_intr(card)  net_ratelimit())
 -pr_err(Spider TX RAM full\n);
 +/* if (netif_msg_intr(card)  net_ratelimit())
 +pr_err(Spider TX RAM full\n); */

Either this should be kept or removed entirely.  In the latter case you
need a good description why it's removed in the patch header.

We'll remove it entirely.

GTMFLLINT occures frequently when we use 100M HUB. We didn't find any
bad influence by this interrupt so far, so we removed the output.

 +
 +spider_net_write_reg(card, SPIDER_NET_GMACOPEMD,
 + spider_net_read_reg(card, SPIDER_NET_GMACOPEMD) | 
 0x4);

Please make sure this doesn't overflow the 80 characters per line limit.

We'll correct it. 

 +static int spider_net_init_firmware(struct spider_net_card *);

Random forward declarations in the middle of the file aren't very nice.
If you really need them put them at the beggining of the file, but it would
be even better if you moved spider_net_init_firmware further up in the
file so we wouldn't need the forward-declaration at all.

We'll move some functions.

 +if (card-phy.def-phy_id)
 +mod_timer(card-aneg_timer, jiffies + SPIDER_NET_ANEG_TIMER);
 +else
 +pr_err(No phy is available\n);

What is this idiom about?  Is not having a phy a fatal error in which case
we should abort here, or is it tolerable in which case pr_err is too much.

Checking phy_id is not required here, so we'll change to call
mod_timer() simply.

 +static void spider_net_init_card(struct spider_net_card *);

Same comment above forward declarations as above.

Thank you,
Kou Ishizaki
Toshiba
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][RFC] tcp: fix ambiguity in the `before' relation

2006-12-14 Thread Gerrit Renker
While looking at DCCP sequence numbers, I stumbled over a problem with
the following definition of before in tcp.h:

static inline int before(__u32 seq1, __u32 seq2)
{
return (__s32)(seq1-seq2)  0;
}

Problem: This definition suffers from an an ambiguity, i.e. always
   
   before(a, (a + 2^31) % 2^32)) = 1
   before((a + 2^31) % 2^32), a) = 1
 
 In text: when the difference between a and b amounts to 2^31,
 a is always considered `before' b, the function can not decide. 
 The reason is that implicitly 0 is `before' 1 ... 2^31-1 ... 2^31
  
Solution: There is a simple fix, by defining before in such a way that 
  0 is no longer `before' 2^31, i.e. 0 `before' 1 ... 2^31-1
  By not using the middle between 0 and 2^32, before can be made 
  unambiguous. 
  This is achieved by testing whether seq2-seq1  0 (using signed
  32-bit arithmetic).

I attach a patch to codify this. Also the `after' relation is basically 
a redefinition of `before', it is now defined as a macro after before.

Signed-off-by: Gerrit Renker [EMAIL PROTECTED]
---
 tcp.h |9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)


diff --git a/include/net/tcp.h b/include/net/tcp.h
index c99774f..b7d8317 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -242,14 +242,9 @@ extern int tcp_memory_pressure;
 
 static inline int before(__u32 seq1, __u32 seq2)
 {
-return (__s32)(seq1-seq2)  0;
+return (__s32)(seq2-seq1)  0;
 }
-
-static inline int after(__u32 seq1, __u32 seq2)
-{
-   return (__s32)(seq2-seq1)  0;
-}
-
+#define after(seq2, seq1)  before(seq1, seq2)
 
 /* is s2=s1=s3 ? */
 static inline int between(__u32 seq1, __u32 seq2, __u32 seq3)


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6] d80211: add sysfs interface for QoS functions

2006-12-14 Thread Johannes Berg
On Thu, 2006-12-14 at 12:23 +0100, Jiri Benc wrote:
 On Thu, 14 Dec 2006 12:03:02 +0800, Zhu Yi wrote:
  The sysfs interface here is only a proof of concept. It provides a way for
  the userspace applications to use the advanced QoS features supported by
  d80211 stack. The finial solution should be switched to cfg80211.
 
 So... what about implementing that into cfg80211? :-)

I agree, we should put this into cfg80211/nl80211 from the start. It's
not really hard to still have some things over WE and start using
cfg/nl80211 now.

johannes


signature.asc
Description: This is a digitally signed message part


[PATCH 3/3] d80211: fix workqueue breakage (v2)

2006-12-14 Thread Michael Wu
d80211: fix workqueue breakage

This patch updates d80211 to use the new workqueue API.

Signed-off-by: Michael Wu [EMAIL PROTECTED]
---

 net/d80211/ieee80211.c   |7 ---
 net/d80211/ieee80211_i.h |8 +---
 net/d80211/ieee80211_iface.c |2 +-
 net/d80211/ieee80211_sta.c   |   32 +++-
 net/d80211/sta_info.c|7 ---
 5 files changed, 29 insertions(+), 27 deletions(-)

diff --git a/net/d80211/ieee80211.c b/net/d80211/ieee80211.c
index 6e10db5..76ee491 100644
--- a/net/d80211/ieee80211.c
+++ b/net/d80211/ieee80211.c
@@ -2092,13 +2092,13 @@ void ieee80211_if_shutdown(struct net_de
case IEEE80211_IF_TYPE_IBSS:
sdata-u.sta.state = IEEE80211_DISABLED;
cancel_delayed_work(sdata-u.sta.work);
-   if (local-scan_work.data == sdata-dev) {
+   if (local-scan_dev == sdata-dev) {
local-sta_scanning = 0;
cancel_delayed_work(local-scan_work);
flush_scheduled_work();
/* see comment in ieee80211_unregister_hw to
 * understand why this works */
-   local-scan_work.data = NULL;
+   local-scan_dev = NULL;
} else
flush_scheduled_work();
break;
@@ -4486,6 +4486,7 @@ struct ieee80211_hw *ieee80211_alloc_hw(
INIT_LIST_HEAD(local-sub_if_list);
 
 spin_lock_init(local-generic_lock);
+   INIT_DELAYED_WORK(local-scan_work, ieee80211_sta_scan_work);
init_timer(local-stat_timer);
local-stat_timer.function = ieee80211_stat_refresh;
local-stat_timer.data = (unsigned long) local;
@@ -4686,7 +4687,7 @@ void ieee80211_unregister_hw(struct ieee
 
if (local-stat_time)
del_timer_sync(local-stat_timer);
-   if (local-scan_work.data) {
+   if (local-scan_dev) {
local-sta_scanning = 0;
cancel_delayed_work(local-scan_work);
flush_scheduled_work();
diff --git a/net/d80211/ieee80211_i.h b/net/d80211/ieee80211_i.h
index ef303da..b7b4b35 100644
--- a/net/d80211/ieee80211_i.h
+++ b/net/d80211/ieee80211_i.h
@@ -240,7 +240,7 @@ struct ieee80211_if_sta {
IEEE80211_ASSOCIATE, IEEE80211_ASSOCIATED,
IEEE80211_IBSS_SEARCH, IEEE80211_IBSS_JOINED
} state;
-   struct work_struct work;
+   struct delayed_work work;
u8 bssid[ETH_ALEN], prev_bssid[ETH_ALEN];
u8 ssid[IEEE80211_MAX_SSID_LEN];
size_t ssid_len;
@@ -429,7 +429,8 @@ struct ieee80211_local {
int scan_channel_idx;
enum { SCAN_SET_CHANNEL, SCAN_SEND_PROBE } scan_state;
unsigned long last_scan_completed;
-   struct work_struct scan_work;
+   struct delayed_work scan_work;
+   struct net_device *scan_dev;
int scan_oper_channel;
int scan_oper_channel_val;
int scan_oper_power_level;
@@ -638,7 +639,8 @@ int ieee80211_set_compression(struct iee
  struct net_device *dev, struct sta_info *sta);
 int ieee80211_init_client(struct net_device *dev);
 /* ieee80211_sta.c */
-void ieee80211_sta_work(void *ptr);
+void ieee80211_sta_work(struct work_struct *work);
+void ieee80211_sta_scan_work(struct work_struct *work);
 void ieee80211_sta_rx_mgmt(struct net_device *dev, struct sk_buff *skb,
   struct ieee80211_rx_status *rx_status);
 int ieee80211_sta_set_ssid(struct net_device *dev, char *ssid, size_t len);
diff --git a/net/d80211/ieee80211_iface.c b/net/d80211/ieee80211_iface.c
index 3e9d531..288dce5 100644
--- a/net/d80211/ieee80211_iface.c
+++ b/net/d80211/ieee80211_iface.c
@@ -185,7 +185,7 @@ void ieee80211_if_set_type(struct net_de
struct ieee80211_if_sta *ifsta;
 
ifsta = sdata-u.sta;
-   INIT_WORK(ifsta-work, ieee80211_sta_work, dev);
+   INIT_DELAYED_WORK(ifsta-work, ieee80211_sta_work);
 
ifsta-capab = WLAN_CAPABILITY_ESS;
ifsta-auth_algs = IEEE80211_AUTH_ALG_OPEN |
diff --git a/net/d80211/ieee80211_sta.c b/net/d80211/ieee80211_sta.c
index 04bd5cd..5df585a 100644
--- a/net/d80211/ieee80211_sta.c
+++ b/net/d80211/ieee80211_sta.c
@@ -1837,10 +1837,11 @@ static void ieee80211_sta_merge_ibss(str
 }
 
 
-void ieee80211_sta_work(void *ptr)
+void ieee80211_sta_work(struct work_struct *work)
 {
-   struct net_device *dev = ptr;
-   struct ieee80211_sub_if_data *sdata;
+   struct ieee80211_sub_if_data *sdata =
+   container_of(work, struct ieee80211_sub_if_data, 
u.sta.work.work);
+   struct net_device *dev = sdata-dev;
struct ieee80211_if_sta *ifsta;
 
if (!netif_running(dev))
@@ -2407,7 +2408,7 @@ static int ieee80211_active_scan(struct
 void ieee80211_scan_completed(struct ieee80211_hw *hw)
 {
struct ieee80211_local *local = 

Re: [PATCH 1/14] Spidernet DMA coalescing

2006-12-14 Thread Linas Vepstas
On Thu, Dec 14, 2006 at 11:05:17AM +, Christoph Hellwig wrote:
 On Wed, Dec 13, 2006 at 03:06:59PM -0600, Linas Vepstas wrote:
  
  The current driver code performs 512 DMA mappings of a bunch of 
  32-byte ring descriptor structures. This is silly, as they are 
  all in contiguous memory. This patch changes the code to 
  dma_map_coherent() each rx/tx ring as a whole.
 
 It's acutally dma_alloc_coherent now that you updated the patch :)
 
  +   chain-ring = dma_alloc_coherent(card-pdev-dev, alloc_size,
  +   chain-dma_addr, GFP_KERNEL);
   
  +   if (!chain-ring)
  +   return -ENOMEM;
   
  +   descr = chain-ring;
  +   memset(descr, 0, alloc_size);
 
 dma_alloc_coherent is defined to zero the allocated memory, so you
 won't need this memset.

Being unclear on the concept, should a send a new version of this patch,
or should I send a new patch that removes this?

--linas
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/14] Spidernet Avoid possible RX chain corruption

2006-12-14 Thread Linas Vepstas
On Thu, Dec 14, 2006 at 11:22:43AM +1100, Michael Ellerman wrote:
  spider_net_refill_rx_chain(card);
  -   spider_net_enable_rxchtails(card);
  spider_net_enable_rxdmac(card);
  return 0;
 
 Didn't you just add that line?

Dagnabbit. The earlier pach was moving around existing code.
Or, more precisely, trying to maintain the general function
of the old code even while moving things around.

Later on, when I started looking at what the danged function 
actually did, and the context it was in, I realized that it 
was a bad idea to call the thing.  So then I removed it. :-/

How should I handle this proceedurally? Resend the patch sequence? 
Let it slide?

--linas

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/14] Spidernet DMA coalescing

2006-12-14 Thread Christoph Hellwig
On Thu, Dec 14, 2006 at 11:07:37AM -0600, Linas Vepstas wrote:
 Being unclear on the concept, should a send a new version of this patch,
 or should I send a new patch that removes this?

For just the memset issue an incremental patch would be fine.  But given
the small mistake in the patch description a resend with the fixed
description mighrt be in order here.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/14] Spidernet Avoid possible RX chain corruption

2006-12-14 Thread Christoph Hellwig
On Thu, Dec 14, 2006 at 11:15:11AM -0600, Linas Vepstas wrote:
 On Thu, Dec 14, 2006 at 11:22:43AM +1100, Michael Ellerman wrote:
 spider_net_refill_rx_chain(card);
   - spider_net_enable_rxchtails(card);
 spider_net_enable_rxdmac(card);
 return 0;
  
  Didn't you just add that line?
 
 Dagnabbit. The earlier pach was moving around existing code.
 Or, more precisely, trying to maintain the general function
 of the old code even while moving things around.
 
 Later on, when I started looking at what the danged function 
 actually did, and the context it was in, I realized that it 
 was a bad idea to call the thing.  So then I removed it. :-/
 
 How should I handle this proceedurally? Resend the patch sequence? 
 Let it slide?

Just keep it as is in this case.  In case you have to redo the patch
series for some other reason or for similar cases in the future put
the patch to remove things in front of the one that reorders the surrounding
bits.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Revised: [PATCH 1/14] Spidernet DMA coalescing

2006-12-14 Thread Linas Vepstas

Andrew, 

I'm hoping its not irritatingly obthersome to ask you to rip out 
the first patch of this series, and replace it with the one below. 

On Thu, Dec 14, 2006 at 05:35:34PM +, Christoph Hellwig wrote:
 On Thu, Dec 14, 2006 at 11:07:37AM -0600, Linas Vepstas wrote:
  Being unclear on the concept, should a send a new version of this patch,
  or should I send a new patch that removes this?
 
 For just the memset issue an incremental patch would be fine.  But given
 the small mistake in the patch description a resend with the fixed
 description mighrt be in order here.

--linas

The current driver code performs 512 DMA mappings of a bunch of 
32-byte ring descriptor structures. This is silly, as they are 
all in contiguous memory. This patch changes the code to 
dma_alloc_coherent() each rx/tx ring as a whole.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]
Cc: James K Lewis [EMAIL PROTECTED]
Cc: Arnd Bergmann [EMAIL PROTECTED]


 drivers/net/spider_net.c |  101 +--
 drivers/net/spider_net.h |   17 +-
 drivers/net/spider_net_ethtool.c |4 -
 3 files changed, 52 insertions(+), 70 deletions(-)

Index: linux-2.6.19-git7/drivers/net/spider_net.c
===
--- linux-2.6.19-git7.orig/drivers/net/spider_net.c 2006-12-13 
14:23:11.0 -0600
+++ linux-2.6.19-git7/drivers/net/spider_net.c  2006-12-14 11:02:59.0 
-0600
@@ -280,72 +280,65 @@ spider_net_free_chain(struct spider_net_
 {
struct spider_net_descr *descr;
 
-   for (descr = chain-tail; !descr-bus_addr; descr = descr-next) {
-   pci_unmap_single(card-pdev, descr-bus_addr,
-SPIDER_NET_DESCR_SIZE, PCI_DMA_BIDIRECTIONAL);
+   descr = chain-ring;
+   do {
descr-bus_addr = 0;
-   }
+   descr-next_descr_addr = 0;
+   descr = descr-next;
+   } while (descr != chain-ring);
+
+   dma_free_coherent(card-pdev-dev, chain-num_desc,
+   chain-ring, chain-dma_addr);
 }
 
 /**
- * spider_net_init_chain - links descriptor chain
+ * spider_net_init_chain - alloc and link descriptor chain
  * @card: card structure
  * @chain: address of chain
- * @start_descr: address of descriptor array
- * @no: number of descriptors
  *
- * we manage a circular list that mirrors the hardware structure,
+ * We manage a circular list that mirrors the hardware structure,
  * except that the hardware uses bus addresses.
  *
- * returns 0 on success, 0 on failure
+ * Returns 0 on success, 0 on failure
  */
 static int
 spider_net_init_chain(struct spider_net_card *card,
-  struct spider_net_descr_chain *chain,
-  struct spider_net_descr *start_descr,
-  int no)
+  struct spider_net_descr_chain *chain)
 {
int i;
struct spider_net_descr *descr;
dma_addr_t buf;
+   size_t alloc_size;
 
-   descr = start_descr;
-   memset(descr, 0, sizeof(*descr) * no);
+   alloc_size = chain-num_desc * sizeof (struct spider_net_descr);
 
-   /* set up the hardware pointers in each descriptor */
-   for (i=0; ino; i++, descr++) {
-   descr-dmac_cmd_status = SPIDER_NET_DESCR_NOT_IN_USE;
+   chain-ring = dma_alloc_coherent(card-pdev-dev, alloc_size,
+   chain-dma_addr, GFP_KERNEL);
 
-   buf = pci_map_single(card-pdev, descr,
-SPIDER_NET_DESCR_SIZE,
-PCI_DMA_BIDIRECTIONAL);
+   if (!chain-ring)
+   return -ENOMEM;
 
-   if (pci_dma_mapping_error(buf))
-   goto iommu_error;
+   /* Set up the hardware pointers in each descriptor */
+   descr = chain-ring;
+   buf = chain-dma_addr;
+   for (i=0; i  chain-num_desc; i++, descr++) {
+   descr-dmac_cmd_status = SPIDER_NET_DESCR_NOT_IN_USE;
 
descr-bus_addr = buf;
+   descr-next_descr_addr = 0;
descr-next = descr + 1;
descr-prev = descr - 1;
 
+   buf += sizeof(struct spider_net_descr);
}
/* do actual circular list */
-   (descr-1)-next = start_descr;
-   start_descr-prev = descr-1;
+   (descr-1)-next = chain-ring;
+   chain-ring-prev = descr-1;
 
spin_lock_init(chain-lock);
-   chain-head = start_descr;
-   chain-tail = start_descr;
-
+   chain-head = chain-ring;
+   chain-tail = chain-ring;
return 0;
-
-iommu_error:
-   descr = start_descr;
-   for (i=0; i  no; i++, descr++)
-   if (descr-bus_addr)
-   pci_unmap_single(card-pdev, descr-bus_addr,
-SPIDER_NET_DESCR_SIZE,
-PCI_DMA_BIDIRECTIONAL);
-   return -ENOMEM;
 }
 
 /**
@@ -707,7 +700,7 @@ 

[PATCH 1/4] net: make dev_kfree_skb_irq not inline

2006-12-14 Thread Stephen Hemminger
Move the dev_kfree_skb_irq function from netdevice.h to dev.c
for a couple of reasons. Primarily, I want to make softnet_data
local to dev.c; also this function is called 300+ places already.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

--- linux-2.6.20-rc1.orig/include/linux/netdevice.h
+++ linux-2.6.20-rc1/include/linux/netdevice.h
@@ -676,20 +676,7 @@ static inline int netif_running(const st
 /* Use this variant when it is known for sure that it
  * is executing from interrupt context.
  */
-static inline void dev_kfree_skb_irq(struct sk_buff *skb)
-{
-   if (atomic_dec_and_test(skb-users)) {
-   struct softnet_data *sd;
-   unsigned long flags;
-
-   local_irq_save(flags);
-   sd = __get_cpu_var(softnet_data);
-   skb-next = sd-completion_queue;
-   sd-completion_queue = skb;
-   raise_softirq_irqoff(NET_TX_SOFTIRQ);
-   local_irq_restore(flags);
-   }
-}
+extern void dev_kfree_skb_irq(struct sk_buff *skb);
 
 /* Use this variant in places where it could be invoked
  * either from interrupt or non-interrupt context.
--- linux-2.6.20-rc1.orig/net/core/dev.c
+++ linux-2.6.20-rc1/net/core/dev.c
@@ -1141,6 +1141,21 @@ void dev_kfree_skb_any(struct sk_buff *s
 }
 EXPORT_SYMBOL(dev_kfree_skb_any);
 
+void dev_kfree_skb_irq(struct sk_buff *skb)
+{
+   if (atomic_dec_and_test(skb-users)) {
+   struct softnet_data *sd;
+   unsigned long flags;
+
+   local_irq_save(flags);
+   sd = __get_cpu_var(softnet_data);
+   skb-next = sd-completion_queue;
+   sd-completion_queue = skb;
+   raise_softirq_irqoff(NET_TX_SOFTIRQ);
+   local_irq_restore(flags);
+   }
+}
+EXPORT_SYMBOL(dev_kfree_skb_irq);
 
 /* Hot-plugging. */
 void netif_device_detach(struct net_device *dev)

-- 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] net: uninline netif_rx_reschedule

2006-12-14 Thread Stephen Hemminger
Move netif_rx_reschedule out of line, so that softnet_data can be
made local.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]


--- linux-2.6.20-rc1.orig/include/linux/netdevice.h
+++ linux-2.6.20-rc1/include/linux/netdevice.h
@@ -851,21 +851,7 @@ static inline void netif_rx_schedule(str
 /* Try to reschedule poll. Called by dev-poll() after netif_rx_complete().
  * Do not inline this?
  */
-static inline int netif_rx_reschedule(struct net_device *dev, int undo)
-{
-   if (netif_rx_schedule_prep(dev)) {
-   unsigned long flags;
-
-   dev-quota += undo;
-
-   local_irq_save(flags);
-   list_add_tail(dev-poll_list, 
__get_cpu_var(softnet_data).poll_list);
-   __raise_softirq_irqoff(NET_RX_SOFTIRQ);
-   local_irq_restore(flags);
-   return 1;
-   }
-   return 0;
-}
+extern int netif_rx_reschedule(struct net_device *dev, int undo);
 
 /* Remove interface from poll list: it must be in the poll list
  * on current cpu. This primitive is called by dev-poll(), when
--- linux-2.6.20-rc1.orig/net/core/dev.c
+++ linux-2.6.20-rc1/net/core/dev.c
@@ -1132,6 +1132,23 @@ void __netif_rx_schedule(struct net_devi
 }
 EXPORT_SYMBOL(__netif_rx_schedule);
 
+int netif_rx_reschedule(struct net_device *dev, int undo)
+{
+   if (netif_rx_schedule_prep(dev)) {
+   unsigned long flags;
+
+   dev-quota += undo;
+
+   local_irq_save(flags);
+   list_add_tail(dev-poll_list, 
__get_cpu_var(softnet_data).poll_list);
+   __raise_softirq_irqoff(NET_RX_SOFTIRQ);
+   local_irq_restore(flags);
+   return 1;
+   }
+   return 0;
+}
+EXPORT_SYMBOL(netif_rx_reschedule);
+
 void dev_kfree_skb_any(struct sk_buff *skb)
 {
if (in_irq() || irqs_disabled())

-- 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


NAPI wait before enabling irq's [was Re: [Cbe-oss-dev] Spider DMA wrongness]

2006-12-14 Thread Linas Vepstas
On Wed, Nov 08, 2006 at 07:38:12AM +1100, Benjamin Herrenschmidt wrote:
 
 What about Linas patches to do interrupt mitigation with NAPI polling ?
 That didn't end up working ?

It seems to be working as designed, which is different than working
as naively expected.

For large packets: 
-- a packet comes in
-- rx interrupt generated
-- rx interrupts turned off
-- tcp poll function runs, receives packet
-- completes all work before next packet has arrived, 
   so interupts are turned back on.
-- go to start

This results in a high number of interrupts, and a high cpu usage.
We were able to prove that napi works by stalling in the poll function
just long enough to allow the next packet to arrive.  In this case, 
napi works great, and number of irqs is vastly reduced. 

Unfortunately, I could not figure out any simple way of turning this
into acceptable code.  I can't just wait a little bit before turning 
on interrupts.  Some network apps, such as netpipe, want to receive 
something before sending the next thing. Without the interrupt, the
packet just sits there, and the OS doesn't realize (until milliseconds
later) that there's a packet that can be handled.  This is a variant
of the so-called rotting packet discussed in the napi docs.

What is needed is for the tcp stack to wait for 
  1500Bytes / (1Gbit/sec) = 12 microsecs
and then poll again. If there are *still* no new packets, then and
only then do we re-enable interrupts. This would require a new napi.
Presuming the network stack folks find it even remotely acceptable.

--linas

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] net: rearrange functions in netdevice.h

2006-12-14 Thread Stephen Hemminger
Use existing inline functions rather than having multiple
copies of same code.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]


--- linux-2.6.20-rc1.orig/include/linux/netdevice.h
+++ linux-2.6.20-rc1/include/linux/netdevice.h
@@ -615,9 +615,14 @@ static inline int unregister_gifconf(uns
 
 extern void __netif_schedule(struct net_device *dev);
 
+static inline int netif_queue_stopped(const struct net_device *dev)
+{
+   return test_bit(__LINK_STATE_XOFF, dev-state);
+}
+
 static inline void netif_schedule(struct net_device *dev)
 {
-   if (!test_bit(__LINK_STATE_XOFF, dev-state))
+   if (!netif_queue_stopped(dev))
__netif_schedule(dev);
 }
 
@@ -645,11 +650,6 @@ static inline void netif_stop_queue(stru
set_bit(__LINK_STATE_XOFF, dev-state);
 }
 
-static inline int netif_queue_stopped(const struct net_device *dev)
-{
-   return test_bit(__LINK_STATE_XOFF, dev-state);
-}
-
 static inline int netif_running(const struct net_device *dev)
 {
return test_bit(__LINK_STATE_START, dev-state);
@@ -841,15 +841,20 @@ extern int netif_rx_reschedule(struct ne
  * it completes the work. The device cannot be out of poll list at this
  * moment, it is BUG().
  */
-static inline void netif_rx_complete(struct net_device *dev)
+static inline void __netif_rx_complete(struct net_device *dev)
 {
-   unsigned long flags;
-
-   local_irq_save(flags);
BUG_ON(!test_bit(__LINK_STATE_RX_SCHED, dev-state));
list_del(dev-poll_list);
smp_mb__before_clear_bit();
clear_bit(__LINK_STATE_RX_SCHED, dev-state);
+}
+
+static inline void netif_rx_complete(struct net_device *dev)
+{
+   unsigned long flags;
+
+   local_irq_save(flags);
+   __netif_rx_complete(dev);
local_irq_restore(flags);
 }
 
@@ -865,17 +870,6 @@ static inline void netif_poll_enable(str
clear_bit(__LINK_STATE_RX_SCHED, dev-state);
 }
 
-/* same as netif_rx_complete, except that local_irq_save(flags)
- * has already been issued
- */
-static inline void __netif_rx_complete(struct net_device *dev)
-{
-   BUG_ON(!test_bit(__LINK_STATE_RX_SCHED, dev-state));
-   list_del(dev-poll_list);
-   smp_mb__before_clear_bit();
-   clear_bit(__LINK_STATE_RX_SCHED, dev-state);
-}
-
 static inline void netif_tx_lock(struct net_device *dev)
 {
spin_lock(dev-_xmit_lock);

-- 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] net: move softnet_data

2006-12-14 Thread Stephen Hemminger
Make softnet_data local to dev.c.  

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]


--- linux-2.6.20-rc1.orig/include/linux/netdevice.h
+++ linux-2.6.20-rc1/include/linux/netdevice.h
@@ -600,6 +600,9 @@ extern int  dev_restart(struct net_devic
 #ifdef CONFIG_NETPOLL_TRAP
 extern int netpoll_trap(void);
 #endif
+#ifdef CONFIG_NETPOLL
+extern voidnetpoll_do_completion(void);
+#endif
 
 typedef int gifconf_func_t(struct net_device * dev, char __user * bufptr, int 
len);
 extern int register_gifconf(unsigned int family, gifconf_func_t * 
gifconf);
@@ -608,26 +611,6 @@ static inline int unregister_gifconf(uns
return register_gifconf(family, NULL);
 }
 
-/*
- * Incoming packets are placed on per-cpu queues so that
- * no locking is needed.
- */
-
-struct softnet_data
-{
-   struct net_device   *output_queue;
-   struct sk_buff_head input_pkt_queue;
-   struct list_headpoll_list;
-   struct sk_buff  *completion_queue;
-
-   struct net_device   backlog_dev;/* Sorry. 8) */
-#ifdef CONFIG_NET_DMA
-   struct dma_chan *net_dma;
-#endif
-};
-
-DECLARE_PER_CPU(struct softnet_data,softnet_data);
-
 #define HAVE_NETIF_QUEUE
 
 extern void __netif_schedule(struct net_device *dev);
--- linux-2.6.20-rc1.orig/net/core/dev.c
+++ linux-2.6.20-rc1/net/core/dev.c
@@ -203,10 +203,23 @@ static inline struct hlist_head *dev_ind
 static RAW_NOTIFIER_HEAD(netdev_chain);
 
 /*
- * Device drivers call our routines to queue packets here. We empty the
- * queue in the local softnet handler.
+ * Incoming packets are placed on per-cpu queues so that
+ * no locking is needed.
  */
-DEFINE_PER_CPU(struct softnet_data, softnet_data) = { NULL };
+struct softnet_data
+{
+   struct net_device   *output_queue;
+   struct sk_buff_head input_pkt_queue;
+   struct list_headpoll_list;
+   struct sk_buff  *completion_queue;
+
+   struct net_device   backlog_dev;/* Sorry. 8) */
+#ifdef CONFIG_NET_DMA
+   struct dma_chan *net_dma;
+#endif
+};
+
+static DEFINE_PER_CPU(struct softnet_data, softnet_data);
 
 #ifdef CONFIG_SYSFS
 extern int netdev_sysfs_init(void);
@@ -1673,6 +1686,34 @@ static inline struct net_device *skb_bon
return dev;
 }
 
+#ifdef CONFIG_NETPOLL
+void netpoll_do_completion(void)
+{
+   unsigned long flags;
+   struct softnet_data *sd = get_cpu_var(softnet_data);
+
+   if (sd-completion_queue) {
+   struct sk_buff *clist;
+
+   local_irq_save(flags);
+   clist = sd-completion_queue;
+   sd-completion_queue = NULL;
+   local_irq_restore(flags);
+
+   while (clist != NULL) {
+   struct sk_buff *skb = clist;
+   clist = clist-next;
+   if (skb-destructor)
+   dev_kfree_skb_any(skb); /* put this one back */
+   else
+   __kfree_skb(skb);
+   }
+   }
+
+   put_cpu_var(softnet_data);
+}
+#endif
+
 static void net_tx_action(struct softirq_action *h)
 {
struct softnet_data *sd = __get_cpu_var(softnet_data);
--- linux-2.6.20-rc1.orig/net/core/netpoll.c
+++ linux-2.6.20-rc1/net/core/netpoll.c
@@ -47,7 +47,6 @@ static atomic_t trapped;
(MAX_UDP_CHUNK + sizeof(struct udphdr) + \
sizeof(struct iphdr) + sizeof(struct ethhdr))
 
-static void zap_completion_queue(void);
 static void arp_reply(struct sk_buff *skb);
 
 static void queue_process(struct work_struct *work)
@@ -162,7 +161,7 @@ void netpoll_poll(struct netpoll *np)
 
service_arp_queue(np-dev-npinfo);
 
-   zap_completion_queue();
+   netpoll_do_completion();
 }
 
 static void refill_skbs(void)
@@ -181,7 +180,7 @@ static void refill_skbs(void)
spin_unlock_irqrestore(skb_pool.lock, flags);
 }
 
-static void zap_completion_queue(void)
+static void netpoll_do_completion(void)
 {
unsigned long flags;
struct softnet_data *sd = get_cpu_var(softnet_data);
@@ -212,7 +211,7 @@ static struct sk_buff *find_skb(struct n
int count = 0;
struct sk_buff *skb;
 
-   zap_completion_queue();
+   netpoll_do_completion();
refill_skbs();
 repeat:
 

-- 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] network device interface cleanups

2006-12-14 Thread Stephen Hemminger
This set of patches makes softnet_data local to dev.c and
does some code cleanups, no API changes.

-- 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NAPI wait before enabling irq's [was Re: [Cbe-oss-dev] Spider DMA wrongness]

2006-12-14 Thread akepner

On Thu, 14 Dec 2006, Linas Vepstas wrote:


On Wed, Nov 08, 2006 at 07:38:12AM +1100, Benjamin Herrenschmidt wrote:


What about Linas patches to do interrupt mitigation with NAPI polling ?
That didn't end up working ?


It seems to be working as designed, which is different than working
as naively expected.

For large packets:
-- a packet comes in
-- rx interrupt generated
-- rx interrupts turned off
-- tcp poll function runs, receives packet
-- completes all work before next packet has arrived,
  so interupts are turned back on.
-- go to start

This results in a high number of interrupts, and a high cpu usage.
We were able to prove that napi works by stalling in the poll function
just long enough to allow the next packet to arrive.  In this case,
napi works great, and number of irqs is vastly reduced.



This sounds awfully familiar. We went through the same
with the tg3 driver on Altix. In that case we succeeded
getting interrupt coalescence added to the driver, which
ended up working pretty well for us. See the thread
beginning with:

http://oss.sgi.com/archives/netdev/2005-05/msg00497.html

if you're interested.

As for the stalling NAPI idea, Jamal did a bit of work
with that idea and wrote it up in:

www.kernel.org/pub/linux/kernel/people/hadi/docs/UKUUG2005.pdf

--
Arthur


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.20-rc1 sky2 problems (regression?)

2006-12-14 Thread Stephen Hemminger
On Thu, 14 Dec 2006 12:47:05 -0800
Alex Romosan [EMAIL PROTECTED] wrote:

 under heavy network load the sky2 driver (compiled in the kernel)
 locks up and the only way i can get the network back is to reboot the
 machine (bringing the network down and back up again doesn't help).
 this happens on an amd64 machine (athlon 3500+ processor) and the card
 in question is a Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit
 Ethernet Controller (rev 15) (from lspci). this is what i see in the
 syslog:
 
 kernel: sky2 eth0: rx error, status 0x414a414a length 0
 kernel: eth0: hw csum failure.
 kernel: 
 kernel: Call Trace:
 kernel:  IRQ  [8044681c] __skb_checksum_complete+0x4d/0x66
 kernel:  [80477bc5] tcp_v4_rcv+0x147/0x8ea
 kernel:  [80479ef2] raw_rcv_skb+0x9/0x20
 kernel:  [8047a2ff] raw_rcv+0xbe/0xc4
 kernel:  [8045ea9d] ip_local_deliver+0x170/0x21b
 kernel:  [8045e8fa] ip_rcv+0x478/0x4ab
 kernel:  [8044905d] netif_receive_skb+0x184/0x20e
 kernel:  [803de8e5] sky2_poll+0x68f/0x93c
 kernel:  [802219ce] scheduler_tick+0x23/0x2f9
 kernel:  [8044a796] net_rx_action+0x61/0xf0
 kernel:  [8022a35f] __do_softirq+0x40/0x8a
 kernel:  [8020a3cc] call_softirq+0x1c/0x28
 kernel:  [8020bbf0] do_softirq+0x2c/0x7d
 kernel:  [8022a313] irq_exit+0x36/0x42
 kernel:  [8020bebe] do_IRQ+0x8c/0x9e
 kernel:  [80208710] default_idle+0x0/0x3a
 kernel:  [80209bf1] ret_from_intr+0x0/0xa
 kernel:  EOI  [80208736] default_idle+0x26/0x3a
 kernel:  [8020878c] cpu_idle+0x42/0x75
 kernel:  [805df675] start_kernel+0x1ce/0x1d3
 kernel:  [805df140] _sinittext+0x140/0x144
 kernel: 
 kernel: eth0: hw csum failure.
 kernel: 
 kernel: Call Trace:
 kernel:  IRQ  [8044681c] __skb_checksum_complete+0x4d/0x66
 kernel:  [80477bc5] tcp_v4_rcv+0x147/0x8ea
 kernel:  [80479ef2] raw_rcv_skb+0x9/0x20
 kernel:  [8047a2ff] raw_rcv+0xbe/0xc4
 kernel:  [8045ea9d] ip_local_deliver+0x170/0x21b
 kernel:  [8045e8fa] ip_rcv+0x478/0x4ab
 kernel:  [8044905d] netif_receive_skb+0x184/0x20e
 kernel:  [803de8e5] sky2_poll+0x68f/0x93c
 kernel:  [80474647] tcp_delack_timer+0x0/0x1b5
 kernel:  [8044a796] net_rx_action+0x61/0xf0
 kernel:  [8022a35f] __do_softirq+0x40/0x8a
 kernel:  [8020a3cc] call_softirq+0x1c/0x28
 kernel:  [8020bbf0] do_softirq+0x2c/0x7d
 kernel:  [8022a313] irq_exit+0x36/0x42
 kernel:  [8020bebe] do_IRQ+0x8c/0x9e
 kernel:  [80209bf1] ret_from_intr+0x0/0xa
 kernel:  EOI  [802a8402] inode2sd+0x104/0x117
 kernel:  [802b8cfa] search_by_key+0xa08/0xbfe
 kernel:  [802b8475] search_by_key+0x183/0xbfe
 kernel:  [80284778] ll_rw_block+0x89/0x9e
 kernel:  [802b8475] search_by_key+0x183/0xbfe
 kernel:  [80283cf5] __find_get_block_slow+0x101/0x10d
 kernel:  [80284053] __find_get_block+0x197/0x1a5
 kernel:  [8026800c] inode_get_bytes+0x2a/0x52
 kernel:  [802a89f1] reiserfs_update_sd_size+0x7e/0x284
 kernel:  [80237700] kthread+0xed/0xfd
 kernel:  [802be990] do_journal_end+0x34b/0xbdd
 kernel:  [802b1729] reiserfs_dirty_inode+0x56/0x76
 kernel:  [80284c19] block_prepare_write+0x1a/0x24
 kernel:  [802809b1] __mark_inode_dirty+0x29/0x197
 kernel:  [802a8d04] reiserfs_commit_write+0x10d/0x19f
 kernel:  [80284c19] block_prepare_write+0x1a/0x24
 kernel:  [802484fc] generic_file_buffered_write+0x4ad/0x6c4
 kernel:  [80271b3c] __pollwait+0x0/0xe0
 kernel:  [8022a006] current_fs_time+0x35/0x3b
 kernel:  [80248a8c] __generic_file_aio_write_nolock+0x379/0x3ec
 kernel:  [8049baca] unix_dgram_recvmsg+0x1be/0x1d9
 kernel:  [804b6516] __mutex_lock_slowpath+0x205/0x210
 kernel:  [80248b60] generic_file_aio_write+0x61/0xc1
 kernel:  [80248aff] generic_file_aio_write+0x0/0xc1
 kernel:  [80264e57] do_sync_readv_writev+0xc0/0x107
 kernel:  [802377f7] autoremove_wake_function+0x0/0x2e
 kernel:  [80229d16] getnstimeofday+0x10/0x28
 kernel:  [80264ced] rw_copy_check_uvector+0x6c/0xdc
 kernel:  [802654f7] do_readv_writev+0xb2/0x18b
 kernel:  [80265a2c] sys_writev+0x45/0x93
 kernel:  [802096de] system_call+0x7e/0x83
 
 and so on. some times i don't get this trace but instead i get:
 
 kernel: sky2 eth0: tx timeout
 kernel: sky2 eth0: transmit ring 140 .. 99 report=181 done=181
 kernel: sky2 status report lost?
 kernel: NETDEV WATCHDOG: eth0: transmit timed out
 kernel: sky2 eth0: tx timeout
 kernel: sky2 eth0: transmit ring 181 .. 140 report=181 done=181
 kernel: sky2 hardware hung? flushing
 
 but the end result is the same, the network card stops responding and
 i have to reboot the machine. i can reproduce this on a consistent
 basis so if there are any 

Re: 2.6.20-rc1 sky2 problems (regression?)

2006-12-14 Thread Alex Romosan
Stephen Hemminger [EMAIL PROTECTED] writes:

 On Thu, 14 Dec 2006 12:47:05 -0800
 Alex Romosan [EMAIL PROTECTED] wrote:

 under heavy network load the sky2 driver (compiled in the kernel)
 locks up and the only way i can get the network back is to reboot the
 machine (bringing the network down and back up again doesn't help).
 this happens on an amd64 machine (athlon 3500+ processor) and the card
 in question is a Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit
 Ethernet Controller (rev 15) (from lspci). this is what i see in the
 syslog:
 
 kernel: sky2 eth0: rx error, status 0x414a414a length 0
 kernel: eth0: hw csum failure.
 kernel: 
 kernel: Call Trace:
 kernel:  IRQ  [8044681c] __skb_checksum_complete+0x4d/0x66
 kernel:  [80477bc5] tcp_v4_rcv+0x147/0x8ea
 kernel:  [80479ef2] raw_rcv_skb+0x9/0x20
 kernel:  [8047a2ff] raw_rcv+0xbe/0xc4
 kernel:  [8045ea9d] ip_local_deliver+0x170/0x21b
 kernel:  [8045e8fa] ip_rcv+0x478/0x4ab
 kernel:  [8044905d] netif_receive_skb+0x184/0x20e
 kernel:  [803de8e5] sky2_poll+0x68f/0x93c
 kernel:  [802219ce] scheduler_tick+0x23/0x2f9
 kernel:  [8044a796] net_rx_action+0x61/0xf0
 kernel:  [8022a35f] __do_softirq+0x40/0x8a
 kernel:  [8020a3cc] call_softirq+0x1c/0x28
 kernel:  [8020bbf0] do_softirq+0x2c/0x7d
 kernel:  [8022a313] irq_exit+0x36/0x42
 kernel:  [8020bebe] do_IRQ+0x8c/0x9e
 kernel:  [80208710] default_idle+0x0/0x3a
 kernel:  [80209bf1] ret_from_intr+0x0/0xa
 kernel:  EOI  [80208736] default_idle+0x26/0x3a
 kernel:  [8020878c] cpu_idle+0x42/0x75
 kernel:  [805df675] start_kernel+0x1ce/0x1d3
 kernel:  [805df140] _sinittext+0x140/0x144
 kernel: 
 kernel: eth0: hw csum failure.
 kernel: 
 kernel: Call Trace:
 kernel:  IRQ  [8044681c] __skb_checksum_complete+0x4d/0x66
 kernel:  [80477bc5] tcp_v4_rcv+0x147/0x8ea
 kernel:  [80479ef2] raw_rcv_skb+0x9/0x20
 kernel:  [8047a2ff] raw_rcv+0xbe/0xc4
 kernel:  [8045ea9d] ip_local_deliver+0x170/0x21b
 kernel:  [8045e8fa] ip_rcv+0x478/0x4ab
 kernel:  [8044905d] netif_receive_skb+0x184/0x20e
 kernel:  [803de8e5] sky2_poll+0x68f/0x93c
 kernel:  [80474647] tcp_delack_timer+0x0/0x1b5
 kernel:  [8044a796] net_rx_action+0x61/0xf0
 kernel:  [8022a35f] __do_softirq+0x40/0x8a
 kernel:  [8020a3cc] call_softirq+0x1c/0x28
 kernel:  [8020bbf0] do_softirq+0x2c/0x7d
 kernel:  [8022a313] irq_exit+0x36/0x42
 kernel:  [8020bebe] do_IRQ+0x8c/0x9e
 kernel:  [80209bf1] ret_from_intr+0x0/0xa
 kernel:  EOI  [802a8402] inode2sd+0x104/0x117
 kernel:  [802b8cfa] search_by_key+0xa08/0xbfe
 kernel:  [802b8475] search_by_key+0x183/0xbfe
 kernel:  [80284778] ll_rw_block+0x89/0x9e
 kernel:  [802b8475] search_by_key+0x183/0xbfe
 kernel:  [80283cf5] __find_get_block_slow+0x101/0x10d
 kernel:  [80284053] __find_get_block+0x197/0x1a5
 kernel:  [8026800c] inode_get_bytes+0x2a/0x52
 kernel:  [802a89f1] reiserfs_update_sd_size+0x7e/0x284
 kernel:  [80237700] kthread+0xed/0xfd
 kernel:  [802be990] do_journal_end+0x34b/0xbdd
 kernel:  [802b1729] reiserfs_dirty_inode+0x56/0x76
 kernel:  [80284c19] block_prepare_write+0x1a/0x24
 kernel:  [802809b1] __mark_inode_dirty+0x29/0x197
 kernel:  [802a8d04] reiserfs_commit_write+0x10d/0x19f
 kernel:  [80284c19] block_prepare_write+0x1a/0x24
 kernel:  [802484fc] generic_file_buffered_write+0x4ad/0x6c4
 kernel:  [80271b3c] __pollwait+0x0/0xe0
 kernel:  [8022a006] current_fs_time+0x35/0x3b
 kernel:  [80248a8c] __generic_file_aio_write_nolock+0x379/0x3ec
 kernel:  [8049baca] unix_dgram_recvmsg+0x1be/0x1d9
 kernel:  [804b6516] __mutex_lock_slowpath+0x205/0x210
 kernel:  [80248b60] generic_file_aio_write+0x61/0xc1
 kernel:  [80248aff] generic_file_aio_write+0x0/0xc1
 kernel:  [80264e57] do_sync_readv_writev+0xc0/0x107
 kernel:  [802377f7] autoremove_wake_function+0x0/0x2e
 kernel:  [80229d16] getnstimeofday+0x10/0x28
 kernel:  [80264ced] rw_copy_check_uvector+0x6c/0xdc
 kernel:  [802654f7] do_readv_writev+0xb2/0x18b
 kernel:  [80265a2c] sys_writev+0x45/0x93
 kernel:  [802096de] system_call+0x7e/0x83
 
 and so on. some times i don't get this trace but instead i get:
 
 kernel: sky2 eth0: tx timeout
 kernel: sky2 eth0: transmit ring 140 .. 99 report=181 done=181
 kernel: sky2 status report lost?
 kernel: NETDEV WATCHDOG: eth0: transmit timed out
 kernel: sky2 eth0: tx timeout
 kernel: sky2 eth0: transmit ring 181 .. 140 report=181 done=181
 kernel: sky2 hardware hung? flushing
 
 Pleas report these problems to netdev@vger.kernel.org, I rarely go
 looking in LKML.

 These are the things you need to 

Re: [PATCH 4/4][SCTP]: Change adaption - adaptation as per the latest API draft.

2006-12-14 Thread Sridhar Samudrala
On Wed, 2006-12-13 at 18:03 -0800, David Miller wrote:
 From: Sridhar Samudrala [EMAIL PROTECTED]
 Date: Wed, 13 Dec 2006 17:38:52 -0800
 
  These parameters are not used by user-space apps. They define the
  parameters used by the protocol in SCTP headers that go on wire.
 
 There is no __KERNEL__ ifdef protection for these defines,
 and the linux/sctp.h header is exported to userspace via
 include/linux/Kbuild, therefore the interface is exposed to
 userspace and you cannot break it.

I didn't know that all the files under include/linux are exported
to userspace.
AFAIK, i don't think there are any SCTP apps that use this file.

But if you say that we shouldn't remove any of the APIs in 
linux/sctp.h, i am OK with keeping the existing ones and adding
new ones with the changed name.

Now that 2.6.20-rc1 is out, should i wait until 2.6.21 tree to open
for re-submission or is there a window still open for 2.6.20?

Thanks
Sridhar

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.20-rc1 sky2 problems (regression?)

2006-12-14 Thread Stephen Hemminger
On Thu, 14 Dec 2006 14:25:06 -0800
Alex Romosan [EMAIL PROTECTED] wrote:

 Stephen Hemminger [EMAIL PROTECTED] writes:
 
  4) What is the IRQ routing?
 There are two issues here, first the driver will never work with edge
 trigger IRQ's, some motherboards also have busted BIOS and chipsets
 that don't do MSI properly. A couple of module parameters are available
 to help:
disable_msi=1 avoids using MSI
idle_timeout=10   polls for lost IRQ's every N ms (10)
 
 i didn't take long to lock up the machine again. i've rebooted back
 into stock 2.6.20-rc1 and added the two module parameters above. cat
 /proc/interrupts now gives me:
 
  17:203   IO-APIC-fasteoi   eth0, CMI8738
 
 so i guess the MSI interrupts are disabled. we'll see how this works.

probably won't do much but now the IRQ ends up shared.

  5) What are the messages in the console log when problem happens?
 
 kernel: NETDEV WATCHDOG: eth0: transmit timed out
 kernel: sky2 eth0: tx timeout
 kernel: sky2 eth0: transmit ring 402 .. 361 report=406 done=406
 kernel: sky2 status report lost?

The transmit timeout code trys to be smart, but doesn't really
recover properly if hardware is stuck.


  7) Please get a current version of ethtool from:
 git://git.kernel.org/pub/scm/network/ethtool/ethtool.git
 and run ethtool register dump after a problem occurs:
ethtool -d eth0
 
 this is the output after it stopped working:
 
 
 PCI config
 --
 00: ab 11 62 43 07 04 18 00 15 00 00 02 08 00 00 00
 10: 04 c0 df fd 00 00 00 00 01 ce 00 00 00 00 00 00
 20: 00 00 00 00 00 00 00 00 00 00 00 00 62 14 8c 05
 30: 00 00 00 00 48 00 00 00 00 00 00 00 03 01 00 00
 40: 00 00 f0 01 00 80 a0 01 01 50 02 fe 00 20 00 14
 50: 03 5c 00 80 00 00 00 01 00 00 00 01 05 e0 83 00
 60: 0c 10 e0 fe 00 00 00 00 61 41 00 00 00 00 00 00
 70: 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 
 Control Registers
 -
 Register Access Port 0x00
 LED Control/Status   0xA603164A
 Interrupt Source 0x4000
 Interrupt Mask   0xC01D
 Interrupt Hardware Error Source  0x
 Interrupt Hardware Error Mask0x2E003F3F
 
 Bus Management Unit
 ---
 CSR Receive Queue 1  0x0001
 CSR Sync Queue 1 0x
 CSR Async Queue 10x
 
 MAC Addresses
 ---
 Addr 100 11 09 DA 39 A3
 Addr 200 11 09 DA 39 A3
 Addr 300 00 00 00 00 00
 
 Connector type   0x4A (J)
 PMD type 0x54 (T)
 PHY type 0x80
 Chip Id  0xB6 Yukon-2 EC
  (rev 0)
 Ram Buffer   0x0C
 
 Status BMU:
 ---
 Control0x0002220A
 Last Index 0x07FF
 Put Index  0x0601
 List Address   0x7FBF8000
 Transmit 1 done index  0x0196
 Transmit index threshold   0x000A
 
 Status FIFO
   Write Pointer0x16
   Read Pointer 0x16
   Level0x00
   Watermark0x10
   ISR Watermark0x10
 Status level
   Init 0x30D4 Value 0x0D00
   Test 0x04   Control 0x02
 TX status
   Init 0x0001E848 Value 0x0001E848
   Test 0x04   Control 0x02
 ISR
   Init 0x09C4 Value 0x09C4
   Test 0x04   Control 0x02
 
 GMAC control 0x005A
 GPHY control 0x2002
 LINK control 0x02
 
 GMAC 1
 Status   0xD000
 Control  0x1800
 Transmit 0x1000
 Receive  0xE000
 Transmit flow control0x
 Transmit parameter   0xD7C4
 Serial mode  0x221E
   Source address:  00 11 09 DA 39 A3
 Physical address:  00 11 09 DA 39 A3
 
 Rx GMAC 1
 End Address  0x007F
 Almost Full Thresh   0x0070
 Control/Test 0x0900228A
 FIFO Flush Mask  0x18FB
 FIFO Flush Threshold 0x000B
 Truncation Threshold 0x017C
 Upper Pause Threshold0x
 Lower Pause Threshold0x0081
 VLAN Tag 0x0074
 FIFO Write Pointer   0x
 FIFO Write Level 0x007B
 FIFO Read Pointer0x
 FIFO Read Level  0x0079
 
 Tx GMAC 1
 End Address  0x007F
 Almost Full Thresh   0x0010
 Control/Test 0x0102220A
 FIFO Flush Mask  0x
 FIFO Flush Threshold 0x
 Truncation Threshold 0x
 Upper Pause Threshold0x
 Lower Pause Threshold0x0081
 VLAN Tag 

Re: NAPI wait before enabling irq's [was Re: [Cbe-oss-dev] Spider DMA wrongness]

2006-12-14 Thread Linas Vepstas
On Thu, Dec 14, 2006 at 12:51:14PM -0800, [EMAIL PROTECTED] wrote:
 On Thu, 14 Dec 2006, Linas Vepstas wrote:
 
 On Wed, Nov 08, 2006 at 07:38:12AM +1100, Benjamin Herrenschmidt wrote:
 
 What about Linas patches to do interrupt mitigation with NAPI polling ?
 That didn't end up working ?
 
 It seems to be working as designed, which is different than working
 as naively expected.
 
 For large packets:
 -- a packet comes in
 -- rx interrupt generated
 -- rx interrupts turned off
 -- tcp poll function runs, receives packet
 -- completes all work before next packet has arrived,
   so interupts are turned back on.
 -- go to start
 
 This results in a high number of interrupts, and a high cpu usage.
 We were able to prove that napi works by stalling in the poll function
 just long enough to allow the next packet to arrive.  In this case,
 napi works great, and number of irqs is vastly reduced.
 
 
 This sounds awfully familiar. We went through the same
 with the tg3 driver on Altix. In that case we succeeded
 getting interrupt coalescence added to the driver, which
 ended up working pretty well for us. See the thread
 beginning with:
 
 http://oss.sgi.com/archives/netdev/2005-05/msg00497.html
 
 if you're interested.

I'm interested. The tg3 seems to have hardware coalescing,
which, from what I can tell, is a way of delaying an RX
interrupt for some number of microseconds?  I assume there's 
nothing more to it than that?

The spider has some suggestively named registers and functions,
hinting that it can similarly delay an RX interupt, but the 
docs are opaque and mysteriously worded, so I cannot really tell.

Perhaps Ishizaki Kou can clue us in? 

 As for the stalling NAPI idea, Jamal did a bit of work
 with that idea and wrote it up in:
 
 www.kernel.org/pub/linux/kernel/people/hadi/docs/UKUUG2005.pdf

Reading now ... 

--linas
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.20-rc1 sky2 problems (regression?)

2006-12-14 Thread Alex Romosan
Stephen Hemminger [EMAIL PROTECTED] writes:

 Another useful bit of information is the statistics (ethtool -S eth0).
 When there were flow control bugs, they would show up as count of 1.

we'll see if the machine locks up again.

 Are you doing jumbo frames (MTU  1500)?

no (or at least i don't think so). how can i tell?

assuming the machine doesn't lock up with msi interrupts disabled, do
you want me to do anything to debug why the driver locks up when the
msi interrupts are enabled?

--alex--

-- 
| I believe the moment is at hand when, by a paranoiac and active |
|  advance of the mind, it will be possible (simultaneously with  |
|  automatism and other passive states) to systematize confusion  |
|  and thus to help to discredit completely the world of reality. |
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4][SCTP]: Change adaption - adaptation as per the latest API draft.

2006-12-14 Thread David Miller
From: Sridhar Samudrala [EMAIL PROTECTED]
Date: Thu, 14 Dec 2006 14:22:16 -0800

 On Wed, 2006-12-13 at 18:03 -0800, David Miller wrote:
  From: Sridhar Samudrala [EMAIL PROTECTED]
  Date: Wed, 13 Dec 2006 17:38:52 -0800
  
   These parameters are not used by user-space apps. They define the
   parameters used by the protocol in SCTP headers that go on wire.
  
  There is no __KERNEL__ ifdef protection for these defines,
  and the linux/sctp.h header is exported to userspace via
  include/linux/Kbuild, therefore the interface is exposed to
  userspace and you cannot break it.
 
 I didn't know that all the files under include/linux are exported
 to userspace.

Not all of them, only select ones specified in the Kbuild file.

If these structures and defines are meant for kernel-only, or only
partially so, you should either annotate linux/sctp.h with
appropriate __KERNEL__ ifdefs, or remove the header file from
include/linux/Kbuild

You cannot remove the file from Kbuild if it somehow is required
by your SCTP user.h header file, for example.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] net: make dev_kfree_skb_irq not inline

2006-12-14 Thread David Miller
From: Christoph Hellwig [EMAIL PROTECTED]
Date: Thu, 14 Dec 2006 22:30:09 +

 Maybe you should only move the slowpath out of line ala:
 
 static inline void dev_kfree_skb_irq(struct sk_buff *skb)
 {
   if (atomic_dec_and_test(skb-users)) 
   __dev_kfree_skb_irq(skb);
 }

The atomic operation all by itself is either a function
call or a 6-7 instruction sequence, so the inlining doesn't
make sense even in this case.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.20-rc1 sky2 problems (regression?)

2006-12-14 Thread Alex Romosan
Stephen Hemminger [EMAIL PROTECTED] writes:

 Another useful bit of information is the statistics (ethtool -S eth0).
 When there were flow control bugs, they would show up as count of 1.

the driver locked up again, even with msi interrupts disabled and
idle_timeout=10. the console message was pretty much as before:

kernel: NETDEV WATCHDOG: eth0: transmit timed out
kernel: sky2 eth0: tx timeout
kernel: sky2 eth0: transmit ring 336 .. 296 report=336 done=336
kernel: sky2 hardware hung? flushing
kernel: NETDEV WATCHDOG: eth0: transmit timed out
kernel: sky2 eth0: tx timeout
kernel: sky2 eth0: transmit ring 296 .. 255 report=336 done=336
kernel: sky2 status report lost?

and this is the output from ethtool -S:

NIC statistics:
 tx_bytes: 3092123897
 rx_bytes: 546577898
 tx_broadcast: 20
 rx_broadcast: 4376
 tx_multicast: 0
 rx_multicast: 459
 tx_unicast: 2585993
 rx_unicast: 1550758
 tx_mac_pause: 1
 rx_mac_pause: 0
 collisions: 0
 late_collision: 0
 aborted: 0
 single_collisions: 0
 multi_collisions: 0
 rx_short: 0
 rx_runt: 0
 rx_64_byte_packets: 850693
 rx_65_to_127_byte_packets: 297029
 rx_128_to_255_byte_packets: 62116
 rx_256_to_511_byte_packets: 28795
 rx_512_to_1023_byte_packets: 31357
 rx_1024_to_1518_byte_packets: 285603
 rx_1518_to_max_byte_packets: 0
 rx_too_long: 0
 rx_fifo_overflow: 0
 rx_jabber: 0
 rx_fcs_error: 0
 tx_64_byte_packets: 194159
 tx_65_to_127_byte_packets: 239961
 tx_128_to_255_byte_packets: 48148
 tx_256_to_511_byte_packets: 27635
 tx_512_to_1023_byte_packets: 95557
 tx_1024_to_1518_byte_packets: 1980554
 tx_1519_to_max_byte_packets: 0
 tx_fifo_underrun: 0

time to try the vendor driver and see if that provides any clues.

--alex--

-- 
| I believe the moment is at hand when, by a paranoiac and active |
|  advance of the mind, it will be possible (simultaneously with  |
|  automatism and other passive states) to systematize confusion  |
|  and thus to help to discredit completely the world of reality. |
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.20-rc1 sky2 problems (regression?)

2006-12-14 Thread Stephen Hemminger
On Thu, 14 Dec 2006 15:21:00 -0800
Alex Romosan [EMAIL PROTECTED] wrote:

 Stephen Hemminger [EMAIL PROTECTED] writes:
 
  Another useful bit of information is the statistics (ethtool -S eth0).
  When there were flow control bugs, they would show up as count of 1.
 
 the driver locked up again, even with msi interrupts disabled and
 idle_timeout=10. the console message was pretty much as before:
 
 kernel: NETDEV WATCHDOG: eth0: transmit timed out
 kernel: sky2 eth0: tx timeout
 kernel: sky2 eth0: transmit ring 336 .. 296 report=336 done=336
 kernel: sky2 hardware hung? flushing
 kernel: NETDEV WATCHDOG: eth0: transmit timed out
 kernel: sky2 eth0: tx timeout
 kernel: sky2 eth0: transmit ring 296 .. 255 report=336 done=336
 kernel: sky2 status report lost?
 
 and this is the output from ethtool -S:
 
 NIC statistics:
  tx_bytes: 3092123897
  rx_bytes: 546577898
  tx_broadcast: 20
  rx_broadcast: 4376
  tx_multicast: 0
  rx_multicast: 459
  tx_unicast: 2585993
  rx_unicast: 1550758
  tx_mac_pause: 1

If this is repeatable... and mac_pause is always one then the
problem is hardware flow control.  I saw bugs before in the bus
interface where it would not resume on unaligned buffer, but
that was on receive.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3][BNX2]: Fix minor loopback problem.

2006-12-14 Thread David Miller
From: Michael Chan [EMAIL PROTECTED]
Date: Wed, 13 Dec 2006 18:31:19 -0800

 [BNX2]: Fix minor loopback problem.
 
 Use the configured MAC address instead of the permanent MAC address
 for loopback frames.
 
 Update version to 1.5.2.
 
 Signed-off-by: Michael Chan [EMAIL PROTECTED]

Also applied, thanks Michael.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3][BNX2]: Fix bug in bnx2_nvram_write().

2006-12-14 Thread David Miller
From: Michael Chan [EMAIL PROTECTED]
Date: Wed, 13 Dec 2006 18:30:39 -0800

 [BNX2]: Fix bug in bnx2_nvram_write().
 
 Length was not calculated correctly if the NVRAM offset is on a non-
 aligned offset.
 
 Signed-off-by: Michael Chan [EMAIL PROTECTED]

Applied, thanks a lot.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3][BNX2]: Fix panic in bnx2_tx_int().

2006-12-14 Thread David Miller
From: Michael Chan [EMAIL PROTECTED]
Date: Wed, 13 Dec 2006 18:30:33 -0800

 [BNX2]: Fix panic in bnx2_tx_int().
 
 There was an off-by-one bug in bnx2_tx_avail().  If the tx ring is
 completely full, the producer and consumer indices may be apart by
 256 even though the ring size is only 255.  One entry in the ring is
 unused and must be properly accounted for when calculating the number
 of available entries.  The bug caused the tx ring entries to be
 reused by mistake, overwriting active entries, and ultimately causing
 it to crash.
 
 This bug rarely occurs because the tx ring is rarely completely full.
 We always stop when there is less than MAX_SKB_FRAGS entries available
 in the ring.
 
 Thanks to Corey Kovacs [EMAIL PROTECTED] and Andy Gospodarek
 [EMAIL PROTECTED] for reporting the problem and helping to collect
 debug information.
 
 Signed-off-by: Michael Chan [EMAIL PROTECTED]

Applied, thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dhcpclient netlink bugs (was Re: [NETLINK]: Schedule removal of old macros exported to userspace)

2006-12-14 Thread Herbert Xu
Stefan Rompf [EMAIL PROTECTED] wrote:

 Yes, the code has quite some trust into the kernel that if it answers the 
 asked question the answer is semantically correct. But to be fair, if you 
 issue a write(), you also expect the number of bytes written in return and 
 not the msec taken ;-) Will fix that and the other stuff you pointed out, 
 thanks!

I hope you checked that the message is really from the kernel (based on
saddr).  Unconnected sockets can receive messages from any user on the
host.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch sungem] improved locking

2006-12-14 Thread Benjamin Herrenschmidt
On Tue, 2006-12-12 at 06:49 +0100, Eric Lemoine wrote:
 On 12/12/06, Benjamin Herrenschmidt [EMAIL PROTECTED] wrote:
  On Tue, 2006-12-12 at 06:33 +0100, Eric Lemoine wrote:
   On 12/12/06, David Miller [EMAIL PROTECTED] wrote:
[...]
Anyways, Eric your changes look fine as far as I can tell, can you
give them a really good testing on some SMP boxes?
  
   Unfortunately I can't, I don't have the hardware (only an old ibook here).
 
  I do however, I'll give it a beating on a dual G5 as soon as I get a
  chance. I'm pretty swamped at the moment and the box is used by somebody
  else today.
 
 Ok, thanks a lot Benjamin.

Patched driver's been running fine for a couple of days  nights with
constant beating... just those RX MAC fifo overflows every now and then
(though they cause no data corruption and no big hit on the driver perfs
neither). I suppose still worth investigating when I have a bit of time,
I must have done something stupid with the pause settings.

In the meantime, Eric's patch is all good.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.20-rc1 sky2 problems (regression?)

2006-12-14 Thread Alex Romosan
Stephen Hemminger [EMAIL PROTECTED] writes:

 If this is repeatable... and mac_pause is always one then the
 problem is hardware flow control.  I saw bugs before in the bus
 interface where it would not resume on unaligned buffer, but
 that was on receive.

i tried to switch over to the latest vendor driver but unfortunately
it doesn't work with kernel 2.6.19+. it still uses CHECKSUM_HW which
looks like it was replaced by CHECKSUM_PARTIAL and CHECKSUM_COMPLETE
was also added. i think i can replace CHECKSUM_HW in the marvell
driver with CHECKSUM_PARTIAL, except for a couple of places where i
i am not sure what i am supposed to do. the first instance it says (i
am kind of paraphrasing here since i am copying from the screen and
not cutting and pasting):

/** does the HW need to evaluate checksum for TCP or UDP packets?
if (pMessage-ip_summed == CHECKSUM_HW)

maybe this needs to be replace with CHECKSUM_PARTIAL. the second one

/** TCP checksum offload
if ((pSKPacket-pMbuf-ip_summed == CHECKSUM_HW) 
(SetOpcodePacketFlag == SK_TRUE)

i wonder if this is supposed to be CHECKSUM_COMPLETE

if you have any suggestions, i'll appreciate it.

--alex--

-- 
| I believe the moment is at hand when, by a paranoiac and active |
|  advance of the mind, it will be possible (simultaneously with  |
|  automatism and other passive states) to systematize confusion  |
|  and thus to help to discredit completely the world of reality. |
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[AX.25 4/7] Fix unchecked nr_add_node uses

2006-12-14 Thread Ralf Baechle
Signed-off-by: Ralf Baechle [EMAIL PROTECTED]

 net/netrom/nr_route.c |   11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

Index: linux-net/net/netrom/nr_route.c
===
--- linux-net.orig/net/netrom/nr_route.c
+++ linux-net/net/netrom/nr_route.c
@@ -779,9 +779,13 @@ int nr_route_frame(struct sk_buff *skb, 
nr_src  = (ax25_address *)(skb-data + 0);
nr_dest = (ax25_address *)(skb-data + 7);
 
-   if (ax25 != NULL)
-   nr_add_node(nr_src, , ax25-dest_addr, ax25-digipeat,
-   ax25-ax25_dev-dev, 0, 
sysctl_netrom_obsolescence_count_initialiser);
+   if (ax25 != NULL) {
+   ret = nr_add_node(nr_src, , ax25-dest_addr, ax25-digipeat,
+ ax25-ax25_dev-dev, 0,
+ sysctl_netrom_obsolescence_count_initialiser);
+   if (ret)
+   return ret;
+   }
 
if ((dev = nr_dev_get(nr_dest)) != NULL) {  /* Its for me */
if (ax25 == NULL)   /* Its from me */
@@ -846,6 +850,7 @@ int nr_route_frame(struct sk_buff *skb, 
ret = (nr_neigh-ax25 != NULL);
nr_node_unlock(nr_node);
nr_node_put(nr_node);
+
return ret;
 }
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[AX.25 7/7] Fix unchecked rose_add_loopback_neigh uses

2006-12-14 Thread Ralf Baechle
rose_add_loopback_neigh uses kmalloc and the callers were ignoring the
error value.  Rewrite to let the caller deal with the allocation.  This
allows the use of static allocation of kmalloc use entirely.

Signed-off-by: Ralf Baechle [EMAIL PROTECTED]

 include/net/rose.h   |4 ++--
 net/rose/rose_loopback.c |5 +++--
 net/rose/rose_route.c|   45 +
 3 files changed, 26 insertions(+), 28 deletions(-)

Index: linux-net/include/net/rose.h
===
--- linux-net.orig/include/net/rose.h
+++ linux-net/include/net/rose.h
@@ -188,12 +188,12 @@ extern void rose_kick(struct sock *);
 extern void rose_enquiry_response(struct sock *);
 
 /* rose_route.c */
-extern struct rose_neigh *rose_loopback_neigh;
+extern struct rose_neigh rose_loopback_neigh;
 extern struct file_operations rose_neigh_fops;
 extern struct file_operations rose_nodes_fops;
 extern struct file_operations rose_routes_fops;
 
-extern int __must_check rose_add_loopback_neigh(void);
+extern void rose_add_loopback_neigh(void);
 extern int __must_check rose_add_loopback_node(rose_address *);
 extern void rose_del_loopback_node(rose_address *);
 extern void rose_rt_device_down(struct net_device *);
Index: linux-net/net/rose/rose_loopback.c
===
--- linux-net.orig/net/rose/rose_loopback.c
+++ linux-net/net/rose/rose_loopback.c
@@ -79,7 +79,8 @@ static void rose_loopback_timer(unsigned
 
skb-h.raw = skb-data;
 
-   if ((sk = rose_find_socket(lci_o, rose_loopback_neigh)) != 
NULL) {
+   sk = rose_find_socket(lci_o, rose_loopback_neigh);
+   if (sk) {
if (rose_process_rx_frame(sk, skb) == 0)
kfree_skb(skb);
continue;
@@ -87,7 +88,7 @@ static void rose_loopback_timer(unsigned
 
if (frametype == ROSE_CALL_REQUEST) {
if ((dev = rose_dev_get(dest)) != NULL) {
-   if (rose_rx_call_request(skb, dev, 
rose_loopback_neigh, lci_o) == 0)
+   if (rose_rx_call_request(skb, dev, 
rose_loopback_neigh, lci_o) == 0)
kfree_skb(skb);
} else {
kfree_skb(skb);
Index: linux-net/net/rose/rose_route.c
===
--- linux-net.orig/net/rose/rose_route.c
+++ linux-net/net/rose/rose_route.c
@@ -46,7 +46,7 @@ static DEFINE_SPINLOCK(rose_neigh_list_l
 static struct rose_route *rose_route_list;
 static DEFINE_SPINLOCK(rose_route_list_lock);
 
-struct rose_neigh *rose_loopback_neigh;
+struct rose_neigh rose_loopback_neigh;
 
 /*
  * Add a new route to a node, and in the process add the node and the
@@ -361,33 +361,30 @@ out:
 /*
  * Add the loopback neighbour.
  */
-int rose_add_loopback_neigh(void)
+void rose_add_loopback_neigh(void)
 {
-   if ((rose_loopback_neigh = kmalloc(sizeof(struct rose_neigh), 
GFP_ATOMIC)) == NULL)
-   return -ENOMEM;
+   struct rose_neigh *sn = rose_loopback_neigh;
 
-   rose_loopback_neigh-callsign  = null_ax25_address;
-   rose_loopback_neigh-digipeat  = NULL;
-   rose_loopback_neigh-ax25  = NULL;
-   rose_loopback_neigh-dev   = NULL;
-   rose_loopback_neigh-count = 0;
-   rose_loopback_neigh-use   = 0;
-   rose_loopback_neigh-dce_mode  = 1;
-   rose_loopback_neigh-loopback  = 1;
-   rose_loopback_neigh-number= rose_neigh_no++;
-   rose_loopback_neigh-restarted = 1;
+   sn-callsign  = null_ax25_address;
+   sn-digipeat  = NULL;
+   sn-ax25  = NULL;
+   sn-dev   = NULL;
+   sn-count = 0;
+   sn-use   = 0;
+   sn-dce_mode  = 1;
+   sn-loopback  = 1;
+   sn-number= rose_neigh_no++;
+   sn-restarted = 1;
 
-   skb_queue_head_init(rose_loopback_neigh-queue);
+   skb_queue_head_init(sn-queue);
 
-   init_timer(rose_loopback_neigh-ftimer);
-   init_timer(rose_loopback_neigh-t0timer);
+   init_timer(sn-ftimer);
+   init_timer(sn-t0timer);
 
spin_lock_bh(rose_neigh_list_lock);
-   rose_loopback_neigh-next = rose_neigh_list;
-   rose_neigh_list   = rose_loopback_neigh;
+   sn-next = rose_neigh_list;
+   rose_neigh_list   = sn;
spin_unlock_bh(rose_neigh_list_lock);
-
-   return 0;
 }
 
 /*
@@ -421,13 +418,13 @@ int rose_add_loopback_node(rose_address 
rose_node-mask = 10;
rose_node-count= 1;
rose_node-loopback = 1;
-   rose_node-neighbour[0] = rose_loopback_neigh;
+   rose_node-neighbour[0] = rose_loopback_neigh;
 
/* Insert at the head of list. Address is always mask=10 */
rose_node-next = rose_node_list;
rose_node_list  = 

[AX.25 1/7] Mark all kmalloc users __must_check

2006-12-14 Thread Ralf Baechle
The recent fix 0506d4068bad834aab1141b5dc5e748eb175c6b3 made obvious that
error values were not being propagated through the AX.25 stack.  To help
with that this patch marks all kmalloc users in the AX.25, NETROM and
ROSE stacks as __must_check.

Signed-off-by: Ralf Baechle [EMAIL PROTECTED]

 include/net/ax25.h|   11 ++-
 include/net/rose.h|4 ++--
 net/ax25/af_ax25.c|4 ++--
 net/ax25/ax25_route.c |2 +-
 net/netrom/nr_route.c |8 +---
 net/rose/rose_route.c |2 +-
 6 files changed, 17 insertions(+), 14 deletions(-)

Index: linux-net/include/net/ax25.h
===
--- linux-net.orig/include/net/ax25.h
+++ linux-net/include/net/ax25.h
@@ -277,7 +277,7 @@ struct sock *ax25_get_socket(ax25_addres
 extern ax25_cb *ax25_find_cb(ax25_address *, ax25_address *, ax25_digi *, 
struct net_device *);
 extern void ax25_send_to_raw(ax25_address *, struct sk_buff *, int);
 extern void ax25_destroy_socket(ax25_cb *);
-extern ax25_cb *ax25_create_cb(void);
+extern ax25_cb * __must_check ax25_create_cb(void);
 extern void ax25_fillin_cb(ax25_cb *, ax25_dev *);
 extern struct sock *ax25_make_new(struct sock *, struct ax25_dev *);
 
@@ -333,11 +333,12 @@ extern void ax25_ds_t3timer_expiry(ax25_
 extern void ax25_ds_idletimer_expiry(ax25_cb *);
 
 /* ax25_iface.c */
-extern int  ax25_protocol_register(unsigned int, int (*)(struct sk_buff *, 
ax25_cb *));
+extern int __must_check ax25_protocol_register(unsigned int, int (*)(struct 
sk_buff *, ax25_cb *));
 extern void ax25_protocol_release(unsigned int);
-extern int  ax25_linkfail_register(void (*)(ax25_cb *, int));
+extern int __must_check ax25_linkfail_register(void (*)(ax25_cb *, int));
 extern void ax25_linkfail_release(void (*)(ax25_cb *, int));
-extern int  ax25_listen_register(ax25_address *, struct net_device *);
+extern int __must_check ax25_listen_register(ax25_address *,
+   struct net_device *);
 extern void ax25_listen_release(ax25_address *, struct net_device *);
 extern int  (*ax25_protocol_function(unsigned int))(struct sk_buff *, ax25_cb 
*);
 extern int  ax25_listen_mine(ax25_address *, struct net_device *);
@@ -415,7 +416,7 @@ extern unsigned long ax25_display_timer(
 /* ax25_uid.c */
 extern int  ax25_uid_policy;
 extern ax25_uid_assoc *ax25_findbyuid(uid_t);
-extern int  ax25_uid_ioctl(int, struct sockaddr_ax25 *);
+extern int __must_check ax25_uid_ioctl(int, struct sockaddr_ax25 *);
 extern struct file_operations ax25_uid_fops;
 extern void ax25_uid_free(void);
 
Index: linux-net/include/net/rose.h
===
--- linux-net.orig/include/net/rose.h
+++ linux-net/include/net/rose.h
@@ -193,8 +193,8 @@ extern struct file_operations rose_neigh
 extern struct file_operations rose_nodes_fops;
 extern struct file_operations rose_routes_fops;
 
-extern int  rose_add_loopback_neigh(void);
-extern int  rose_add_loopback_node(rose_address *);
+extern int __must_check rose_add_loopback_neigh(void);
+extern int __must_check rose_add_loopback_node(rose_address *);
 extern void rose_del_loopback_node(rose_address *);
 extern void rose_rt_device_down(struct net_device *);
 extern void rose_link_device_down(struct net_device *);
Index: linux-net/net/ax25/af_ax25.c
===
--- linux-net.orig/net/ax25/af_ax25.c
+++ linux-net/net/ax25/af_ax25.c
@@ -1088,8 +1088,8 @@ out:
 /*
  * FIXME: nonblock behaviour looks like it may have a bug.
  */
-static int ax25_connect(struct socket *sock, struct sockaddr *uaddr,
-   int addr_len, int flags)
+static int __must_check ax25_connect(struct socket *sock,
+   struct sockaddr *uaddr, int addr_len, int flags)
 {
struct sock *sk = sock-sk;
ax25_cb *ax25 = ax25_sk(sk), *ax25t;
Index: linux-net/net/ax25/ax25_route.c
===
--- linux-net.orig/net/ax25/ax25_route.c
+++ linux-net/net/ax25/ax25_route.c
@@ -71,7 +71,7 @@ void ax25_rt_device_down(struct net_devi
write_unlock(ax25_route_lock);
 }
 
-static int ax25_rt_add(struct ax25_routes_struct *route)
+static int __must_check ax25_rt_add(struct ax25_routes_struct *route)
 {
ax25_route *ax25_rt;
ax25_dev *ax25_dev;
Index: linux-net/net/rose/rose_route.c
===
--- linux-net.orig/net/rose/rose_route.c
+++ linux-net/net/rose/rose_route.c
@@ -52,7 +52,7 @@ struct rose_neigh *rose_loopback_neigh;
  * Add a new route to a node, and in the process add the node and the
  * neighbour if it is new.
  */
-static int rose_add_node(struct rose_route_struct *rose_route,
+static int __must_check rose_add_node(struct rose_route_struct *rose_route,
struct net_device *dev)
 {
struct rose_node  *rose_node, *rose_tmpn, *rose_tmpp;
Index: linux-net/net/netrom/nr_route.c

[AX.25 2/7] Fix unchecked ax25_protocol_register uses.

2006-12-14 Thread Ralf Baechle
Replace ax25_protocol_register by ax25_register_pid which assumes the
caller has done the memory allocation.  This allows replacing the
kmalloc allocations entirely by static allocations.

Signed-off-by: Ralf Baechle [EMAIL PROTECTED]

 include/net/ax25.h |9 -
 net/ax25/ax25_iface.c  |   41 -
 net/netrom/af_netrom.c |7 ++-
 net/rose/af_rose.c |7 ++-
 4 files changed, 32 insertions(+), 32 deletions(-)

Index: linux-net/include/net/ax25.h
===
--- linux-net.orig/include/net/ax25.h
+++ linux-net/include/net/ax25.h
@@ -333,7 +333,14 @@ extern void ax25_ds_t3timer_expiry(ax25_
 extern void ax25_ds_idletimer_expiry(ax25_cb *);
 
 /* ax25_iface.c */
-extern int __must_check ax25_protocol_register(unsigned int, int (*)(struct 
sk_buff *, ax25_cb *));
+
+struct ax25_protocol {
+   struct ax25_protocol *next;
+   unsigned int pid;
+   int (*func)(struct sk_buff *, ax25_cb *);
+};
+
+extern void ax25_register_pid(struct ax25_protocol *ap);
 extern void ax25_protocol_release(unsigned int);
 extern int __must_check ax25_linkfail_register(void (*)(ax25_cb *, int));
 extern void ax25_linkfail_release(void (*)(ax25_cb *, int));
Index: linux-net/net/ax25/ax25_iface.c
===
--- linux-net.orig/net/ax25/ax25_iface.c
+++ linux-net/net/ax25/ax25_iface.c
@@ -29,11 +29,7 @@
 #include linux/mm.h
 #include linux/interrupt.h
 
-static struct protocol_struct {
-   struct protocol_struct *next;
-   unsigned int pid;
-   int (*func)(struct sk_buff *, ax25_cb *);
-} *protocol_list = NULL;
+static struct ax25_protocol *protocol_list;
 static DEFINE_RWLOCK(protocol_list_lock);
 
 static struct linkfail_struct {
@@ -49,36 +45,23 @@ static struct listen_struct {
 } *listen_list = NULL;
 static DEFINE_SPINLOCK(listen_lock);
 
-int ax25_protocol_register(unsigned int pid,
-   int (*func)(struct sk_buff *, ax25_cb *))
+/*
+ * Do not register the internal protocols AX25_P_TEXT, AX25_P_SEGMENT,
+ * AX25_P_IP or AX25_P_ARP ...
+ */
+void ax25_register_pid(struct ax25_protocol *ap)
 {
-   struct protocol_struct *protocol;
-
-   if (pid == AX25_P_TEXT || pid == AX25_P_SEGMENT)
-   return 0;
-#ifdef CONFIG_INET
-   if (pid == AX25_P_IP || pid == AX25_P_ARP)
-   return 0;
-#endif
-   if ((protocol = kmalloc(sizeof(*protocol), GFP_ATOMIC)) == NULL)
-   return 0;
-
-   protocol-pid  = pid;
-   protocol-func = func;
-
write_lock_bh(protocol_list_lock);
-   protocol-next = protocol_list;
-   protocol_list  = protocol;
+   ap-next = protocol_list;
+   protocol_list = ap;
write_unlock_bh(protocol_list_lock);
-
-   return 1;
 }
 
-EXPORT_SYMBOL(ax25_protocol_register);
+EXPORT_SYMBOL_GPL(ax25_register_pid);
 
 void ax25_protocol_release(unsigned int pid)
 {
-   struct protocol_struct *s, *protocol;
+   struct ax25_protocol *s, *protocol;
 
write_lock_bh(protocol_list_lock);
protocol = protocol_list;
@@ -223,7 +206,7 @@ EXPORT_SYMBOL(ax25_listen_release);
 int (*ax25_protocol_function(unsigned int pid))(struct sk_buff *, ax25_cb *)
 {
int (*res)(struct sk_buff *, ax25_cb *) = NULL;
-   struct protocol_struct *protocol;
+   struct ax25_protocol *protocol;
 
read_lock(protocol_list_lock);
for (protocol = protocol_list; protocol != NULL; protocol = 
protocol-next)
@@ -263,7 +246,7 @@ void ax25_link_failed(ax25_cb *ax25, int
 
 int ax25_protocol_is_registered(unsigned int pid)
 {
-   struct protocol_struct *protocol;
+   struct ax25_protocol *protocol;
int res = 0;
 
read_lock_bh(protocol_list_lock);
Index: linux-net/net/netrom/af_netrom.c
===
--- linux-net.orig/net/netrom/af_netrom.c
+++ linux-net/net/netrom/af_netrom.c
@@ -1377,6 +1377,11 @@ static struct notifier_block nr_dev_noti
 
 static struct net_device **dev_nr;
 
+static struct ax25_protocol nr_pid = {
+   .pid= AX25_P_NETROM,
+   .func   = nr_route_frame
+};
+
 static int __init nr_proto_init(void)
 {
int i;
@@ -1424,7 +1429,7 @@ static int __init nr_proto_init(void)

register_netdevice_notifier(nr_dev_notifier);
 
-   ax25_protocol_register(AX25_P_NETROM, nr_route_frame);
+   ax25_register_pid(nr_pid);
ax25_linkfail_register(nr_link_failed);
 
 #ifdef CONFIG_SYSCTL
Index: linux-net/net/rose/af_rose.c
===
--- linux-net.orig/net/rose/af_rose.c
+++ linux-net/net/rose/af_rose.c
@@ -1481,6 +1481,11 @@ static struct notifier_block rose_dev_no
 
 static struct net_device **dev_rose;
 
+static struct ax25_protocol rose_pid = {
+   .pid= AX25_P_ROSE,
+   .func   = rose_route_frame
+};
+
 static int __init 

Re: [AX.25 3/7] Fix unchecked ax25_listen_register uses

2006-12-14 Thread David Miller
From: Ralf Baechle [EMAIL PROTECTED]
Date: Thu, 14 Dec 2006 23:42:09 +0100

 Fix ax25_listen_register to return something that's a sane error code,
 then all callers to use it.
 
 Signed-off-by: Ralf Baechle [EMAIL PROTECTED]

Applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AX.25 4/7] Fix unchecked nr_add_node uses

2006-12-14 Thread David Miller
From: Ralf Baechle [EMAIL PROTECTED]
Date: Thu, 14 Dec 2006 23:42:10 +0100

 Signed-off-by: Ralf Baechle [EMAIL PROTECTED]

Applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AX.25 5/7] Fix unchecked ax25_linkfail_register uses

2006-12-14 Thread David Miller
From: Ralf Baechle [EMAIL PROTECTED]
Date: Thu, 14 Dec 2006 23:42:11 +0100

 ax25_linkfail_register uses kmalloc and the callers were ignoring the
 error value.  Rewrite to let the caller deal with the allocation.  This
 allows the use of static allocation of kmalloc use entirely.
 
 Signed-off-by: Ralf Baechle [EMAIL PROTECTED]

Applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AX.25 2/7] Fix unchecked ax25_protocol_register uses.

2006-12-14 Thread David Miller
From: Ralf Baechle [EMAIL PROTECTED]
Date: Thu, 14 Dec 2006 23:42:08 +0100

 Replace ax25_protocol_register by ax25_register_pid which assumes the
 caller has done the memory allocation.  This allows replacing the
 kmalloc allocations entirely by static allocations.
 
 Signed-off-by: Ralf Baechle [EMAIL PROTECTED]

Applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AX.25 6/7] Fix unchecked rose_add_loopback_node uses

2006-12-14 Thread David Miller
From: Ralf Baechle [EMAIL PROTECTED]
Date: Thu, 14 Dec 2006 23:42:12 +0100

 Signed-off-by: Ralf Baechle [EMAIL PROTECTED]

Applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 00/13] 2.6.20 Chelsio T3 RDMA Driver

2006-12-14 Thread Steve Wise

Roland, 

I think this is ready to go once the ethernet driver is pulled in.

Version 4 changes:

- Cleaned up spacing in the Kconfig file
- Remove locking.txt file - its not needed
- Remove -O1 from the debug config option
- BugFix: support new LLD interface for dual-port adapters

Version 3 changes:

- BugFix: Don't use mutex inside of the mmap function.
- BugFix: Move QP to TERMINATE when TERMINATE AE is processed
- Support the new work queue design
- Merged up to linus's tree as of 12/8/2006
- Misc nits

Version 2 changes:

- Make code sparse endian clean
- Use IDRs for mapping QP and CQ IDs to structure pointers instead
  of arrays
- Clean up confusing bitfields
- Use random32() instead of local random function
- Use krefs to track endpoint reference counts
- Misc nits

-

The following series implements the Chelsio T3 iWARP/RDMA Driver to
be considered for inclusion in 2.6.20.  It depends on the Chelsio T3
Ethernet driver which is also under review now for 2.6.20. 

The latest Chelsio T3 Ethernet driver patch can be pulled from:

http://service.chelsio.com/kernel.org/cxgb3.patch.bz2

A complete GIT kernel tree with all the T3 drivers can be pulled from:

git://staging.openfabrics.org/~swise/cxgb3.git

Thanks,

Steve.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 03/13] Provider Methods and Data Structures

2006-12-14 Thread Steve Wise

Provider methods to support the Linux RDMA verbs.

Signed-off-by: Steve Wise [EMAIL PROTECTED]
---

 drivers/infiniband/hw/cxgb3/iwch_provider.c | 1171 +++
 drivers/infiniband/hw/cxgb3/iwch_provider.h |  363 
 drivers/infiniband/hw/cxgb3/iwch_user.h |   68 ++
 3 files changed, 1602 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c 
b/drivers/infiniband/hw/cxgb3/iwch_provider.c
new file mode 100644
index 000..e9721b1
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -0,0 +1,1171 @@
+/*
+ * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
+ * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#include linux/module.h
+#include linux/moduleparam.h
+#include linux/device.h
+#include linux/netdevice.h
+#include linux/etherdevice.h
+#include linux/delay.h
+#include linux/errno.h
+#include linux/list.h
+#include linux/spinlock.h
+#include linux/ethtool.h
+
+#include asm/io.h
+#include asm/irq.h
+#include asm/byteorder.h
+
+#include rdma/iw_cm.h
+#include rdma/ib_verbs.h
+#include rdma/ib_smi.h
+#include rdma/ib_user_verbs.h
+
+#include cxio_hal.h
+#include iwch.h
+#include iwch_provider.h
+#include iwch_cm.h
+#include iwch_user.h
+
+static int iwch_modify_port(struct ib_device *ibdev,
+   u8 port, int port_modify_mask,
+   struct ib_port_modify *props)
+{
+   return -ENOSYS;
+}
+
+static struct ib_ah *iwch_ah_create(struct ib_pd *pd,
+   struct ib_ah_attr *ah_attr)
+{
+   return ERR_PTR(-ENOSYS);
+}
+
+static int iwch_ah_destroy(struct ib_ah *ah)
+{
+   return -ENOSYS;
+}
+
+static int iwch_multicast_attach(struct ib_qp *ibqp, union ib_gid *gid, u16 
lid)
+{
+   return -ENOSYS;
+}
+
+static int iwch_multicast_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 
lid)
+{
+   return -ENOSYS;
+}
+
+static int iwch_process_mad(struct ib_device *ibdev,
+   int mad_flags,
+   u8 port_num,
+   struct ib_wc *in_wc,
+   struct ib_grh *in_grh,
+   struct ib_mad *in_mad, struct ib_mad *out_mad)
+{
+   return -ENOSYS;
+}
+
+static int iwch_dealloc_ucontext(struct ib_ucontext *context)
+{
+   struct iwch_dev *rhp = to_iwch_dev(context-device);
+   struct iwch_ucontext *ucontext = to_iwch_ucontext(context);
+   PDBG(%s context %p\n, __FUNCTION__, context);
+   cxio_release_ucontext(rhp-rdev, ucontext-uctx);
+   kfree(ucontext);
+   return 0;
+}
+
+static struct ib_ucontext *iwch_alloc_ucontext(struct ib_device *ibdev,
+   struct ib_udata *udata)
+{
+   struct iwch_ucontext *context;
+   struct iwch_dev *rhp = to_iwch_dev(ibdev);
+
+   PDBG(%s ibdev %p\n, __FUNCTION__, ibdev);
+   context = kmalloc(sizeof(*context), GFP_KERNEL);
+   if (!context)
+   return ERR_PTR(-ENOMEM);
+   cxio_init_ucontext(rhp-rdev, context-uctx);
+   INIT_LIST_HEAD(context-mmaps);
+   spin_lock_init(context-mmap_lock);
+   return context-ibucontext;
+}
+
+static int iwch_destroy_cq(struct ib_cq *ib_cq)
+{
+   struct iwch_cq *chp;
+
+   PDBG(%s ib_cq %p\n, __FUNCTION__, ib_cq);
+   chp = to_iwch_cq(ib_cq);
+
+   remove_handle(chp-rhp, chp-rhp-cqidr, chp-cq.cqid);
+   atomic_dec(chp-refcnt);
+   wait_event(chp-wait, !atomic_read(chp-refcnt));
+
+   cxio_destroy_cq(chp-rhp-rdev, chp-cq);
+   kfree(chp);
+   return 0;

[PATCH v4 09/13] Core WQE/CQE Types

2006-12-14 Thread Steve Wise

T3 WQE and CQE structures, defines, etc...

Signed-off-by: Steve Wise [EMAIL PROTECTED]
---

 drivers/infiniband/hw/cxgb3/core/cxio_wr.h |  685 
 1 files changed, 685 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_wr.h 
b/drivers/infiniband/hw/cxgb3/core/cxio_wr.h
new file mode 100644
index 000..45870be
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb3/core/cxio_wr.h
@@ -0,0 +1,685 @@
+/*
+ * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
+ * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#ifndef __CXIO_WR_H__
+#define __CXIO_WR_H__
+
+#include asm/io.h
+#include linux/pci.h
+#include linux/timer.h
+#include firmware_exports.h
+
+#define T3_MAX_SGE  4
+
+#define Q_EMPTY(rptr,wptr) ((rptr)==(wptr))
+#define Q_FULL(rptr,wptr,size_log2)  ( (((wptr)-(rptr))(size_log2))  \
+  ((rptr)!=(wptr)) )
+#define Q_GENBIT(ptr,size_log2) (!(((ptr)size_log2)0x1))
+#define Q_FREECNT(rptr,wptr,size_log2) ((1ULsize_log2)-((wptr)-(rptr)))
+#define Q_COUNT(rptr,wptr) ((wptr)-(rptr))
+#define Q_PTR2IDX(ptr,size_log2) (ptr  ((1ULsize_log2)-1))
+
+static inline void ring_doorbell(void __iomem *doorbell, u32 qpid) 
+{
+   writel(((131) | qpid), doorbell);
+}
+
+#define SEQ32_GE(x,y) (!( (((u32) (x)) - ((u32) (y)))  0x8000 ))
+
+enum t3_wr_flags {
+   T3_COMPLETION_FLAG = 0x01,
+   T3_NOTIFY_FLAG = 0x02,
+   T3_SOLICITED_EVENT_FLAG = 0x04,
+   T3_READ_FENCE_FLAG = 0x08,
+   T3_LOCAL_FENCE_FLAG = 0x10
+} __attribute__ ((packed));
+
+enum t3_wr_opcode {
+   T3_WR_BP = FW_WROPCODE_RI_BYPASS,
+   T3_WR_SEND = FW_WROPCODE_RI_SEND,
+   T3_WR_WRITE = FW_WROPCODE_RI_RDMA_WRITE,
+   T3_WR_READ = FW_WROPCODE_RI_RDMA_READ,
+   T3_WR_INV_STAG = FW_WROPCODE_RI_LOCAL_INV,
+   T3_WR_BIND = FW_WROPCODE_RI_BIND_MW,
+   T3_WR_RCV = FW_WROPCODE_RI_RECEIVE,
+   T3_WR_INIT = FW_WROPCODE_RI_RDMA_INIT,
+   T3_WR_QP_MOD = FW_WROPCODE_RI_MODIFY_QP
+} __attribute__ ((packed));
+
+enum t3_rdma_opcode {
+   T3_RDMA_WRITE,  /* IETF RDMAP v1.0 ... */
+   T3_READ_REQ,
+   T3_READ_RESP,
+   T3_SEND,
+   T3_SEND_WITH_INV,
+   T3_SEND_WITH_SE,
+   T3_SEND_WITH_SE_INV,
+   T3_TERMINATE,
+   T3_RDMA_INIT,   /* CHELSIO RI specific ... */
+   T3_BIND_MW,
+   T3_FAST_REGISTER,
+   T3_LOCAL_INV,
+   T3_QP_MOD,
+   T3_BYPASS
+} __attribute__ ((packed));
+
+static inline enum t3_rdma_opcode wr2opcode(enum t3_wr_opcode wrop)
+{
+   switch (wrop) {
+   case T3_WR_BP: return T3_BYPASS;
+   case T3_WR_SEND: return T3_SEND;
+   case T3_WR_WRITE: return T3_RDMA_WRITE;
+   case T3_WR_READ: return T3_READ_REQ;
+   case T3_WR_INV_STAG: return T3_LOCAL_INV;
+   case T3_WR_BIND: return T3_BIND_MW;
+   case T3_WR_INIT: return T3_RDMA_INIT;
+   case T3_WR_QP_MOD: return T3_QP_MOD;
+   default: break;
+   }
+   return -1;
+}
+
+
+/* Work request id */
+union t3_wrid {
+   struct {
+   u32 hi;
+   u32 low;
+   } id0;
+   u64 id1;
+};
+
+#define WRID(wrid) (wrid.id1)
+#define WRID_GEN(wrid) (wrid.id0.wr_gen)
+#define WRID_IDX(wrid) (wrid.id0.wr_idx)
+#define WRID_LO(wrid)  (wrid.id0.wr_lo)
+
+struct fw_riwrh {
+   __be32 op_seop_flags;
+   __be32 gen_tid_len;
+};
+
+#define S_FW_RIWR_OP   24
+#define M_FW_RIWR_OP   0xff
+#define V_FW_RIWR_OP(x)  

[PATCH v4 02/13] Device Discovery and ULLD Linkage

2006-12-14 Thread Steve Wise

Code to discover all the T3 devices and register them 
with the T3 RDMA Core and the Linux RDMA Core.

Signed-off-by: Steve Wise [EMAIL PROTECTED]
---

 drivers/infiniband/hw/cxgb3/iwch.c |  189 
 drivers/infiniband/hw/cxgb3/iwch.h |  175 +
 2 files changed, 364 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch.c 
b/drivers/infiniband/hw/cxgb3/iwch.c
new file mode 100644
index 000..acbe449
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb3/iwch.c
@@ -0,0 +1,189 @@
+/*
+ * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
+ * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#include linux/module.h
+#include linux/moduleparam.h
+
+#include rdma/ib_verbs.h
+
+#include cxgb3_offload.h
+#include iwch_provider.h
+#include iwch_user.h
+#include iwch.h
+#include iwch_cm.h
+
+#define DRV_VERSION 1.1
+
+MODULE_AUTHOR(Boyd Faulkner, Steve Wise);
+MODULE_DESCRIPTION(Chelsio T3 RDMA Driver);
+MODULE_LICENSE(Dual BSD/GPL);
+MODULE_VERSION(DRV_VERSION);
+
+cxgb3_cpl_handler_func t3c_handlers[NUM_CPL_CMDS];
+
+static void open_rnic_dev(struct t3cdev *);
+static void close_rnic_dev(struct t3cdev *);
+
+struct cxgb3_client t3c_client = {
+   .name = iw_cxgb3,
+   .add = open_rnic_dev,
+   .remove = close_rnic_dev,
+   .handlers = t3c_handlers,
+   .redirect = iwch_ep_redirect
+};
+
+static LIST_HEAD(dev_list);
+static DEFINE_MUTEX(dev_mutex);
+
+static void rnic_init(struct iwch_dev *rnicp)
+{
+   PDBG(%s iwch_dev %p\n, __FUNCTION__,  rnicp);
+   idr_init(rnicp-cqidr);
+   idr_init(rnicp-qpidr);
+   idr_init(rnicp-mmidr);
+   spin_lock_init(rnicp-lock);
+
+   rnicp-attr.vendor_id = 0x168;
+   rnicp-attr.vendor_part_id = 7;
+   rnicp-attr.max_qps = T3_MAX_NUM_QP - 32;
+   rnicp-attr.max_wrs = (1UL  24) - 1;
+   rnicp-attr.max_sge_per_wr = T3_MAX_SGE;
+   rnicp-attr.max_sge_per_rdma_write_wr = T3_MAX_SGE;
+   rnicp-attr.max_cqs = T3_MAX_NUM_CQ - 1;
+   rnicp-attr.max_cqes_per_cq = (1UL  24) - 1;
+   rnicp-attr.max_mem_regs = cxio_num_stags(rnicp-rdev);
+   rnicp-attr.max_phys_buf_entries = T3_MAX_PBL_SIZE;
+   rnicp-attr.max_pds = T3_MAX_NUM_PD - 1;
+   rnicp-attr.mem_pgsizes_bitmask = 0x7FFF;   /* 4KB-128MB */
+   rnicp-attr.can_resize_wq = 0;
+   rnicp-attr.max_rdma_reads_per_qp = 8;
+   rnicp-attr.max_rdma_read_resources =
+   rnicp-attr.max_rdma_reads_per_qp * rnicp-attr.max_qps;
+   rnicp-attr.max_rdma_read_qp_depth = 8; /* IRD */
+   rnicp-attr.max_rdma_read_depth =
+   rnicp-attr.max_rdma_read_qp_depth * rnicp-attr.max_qps;
+   rnicp-attr.rq_overflow_handled = 0;
+   rnicp-attr.can_modify_ird = 0;
+   rnicp-attr.can_modify_ord = 0;
+   rnicp-attr.max_mem_windows = rnicp-attr.max_mem_regs - 1;
+   rnicp-attr.stag0_value = 1;
+   rnicp-attr.zbva_support = 1;
+   rnicp-attr.local_invalidate_fence = 1;
+   rnicp-attr.cq_overflow_detection = 1;
+   return;
+}
+
+static void open_rnic_dev(struct t3cdev *tdev)
+{
+   struct iwch_dev *rnicp;
+   static int vers_printed;
+
+   PDBG(%s t3cdev %p\n, __FUNCTION__,  tdev);
+   if (!vers_printed++) 
+   printk(KERN_INFO MOD Chelsio T3 RDMA Driver - version %s\n,
+  DRV_VERSION);
+   rnicp = (struct iwch_dev *)ib_alloc_device(sizeof(*rnicp));
+   if (!rnicp) {
+   printk(KERN_ERR MOD Cannot allocate ib device\n);
+   return;
+   }
+   rnicp-rdev.ulp = 

[PATCH v4 13/13] Kconfig/Makefile

2006-12-14 Thread Steve Wise

Signed-off-by: Steve Wise [EMAIL PROTECTED]
---

 drivers/infiniband/Kconfig   |1 +
 drivers/infiniband/Makefile  |1 +
 drivers/infiniband/hw/cxgb3/Kconfig  |   27 +++
 drivers/infiniband/hw/cxgb3/Makefile |   12 
 4 files changed, 41 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index 59b3932..06453ab 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -38,6 +38,7 @@ source drivers/infiniband/hw/mthca/Kcon
 source drivers/infiniband/hw/ipath/Kconfig
 source drivers/infiniband/hw/ehca/Kconfig
 source drivers/infiniband/hw/amso1100/Kconfig
+source drivers/infiniband/hw/cxgb3/Kconfig
 
 source drivers/infiniband/ulp/ipoib/Kconfig
 
diff --git a/drivers/infiniband/Makefile b/drivers/infiniband/Makefile
index 570b30a..69bdd55 100644
--- a/drivers/infiniband/Makefile
+++ b/drivers/infiniband/Makefile
@@ -3,6 +3,7 @@ obj-$(CONFIG_INFINIBAND_MTHCA)  += hw/mt
 obj-$(CONFIG_INFINIBAND_IPATH) += hw/ipath/
 obj-$(CONFIG_INFINIBAND_EHCA)  += hw/ehca/
 obj-$(CONFIG_INFINIBAND_AMSO1100)  += hw/amso1100/
+obj-$(CONFIG_INFINIBAND_CXGB3) += hw/cxgb3/
 obj-$(CONFIG_INFINIBAND_IPOIB) += ulp/ipoib/
 obj-$(CONFIG_INFINIBAND_SRP)   += ulp/srp/
 obj-$(CONFIG_INFINIBAND_ISER)  += ulp/iser/
diff --git a/drivers/infiniband/hw/cxgb3/Kconfig 
b/drivers/infiniband/hw/cxgb3/Kconfig
new file mode 100644
index 000..d3db264
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb3/Kconfig
@@ -0,0 +1,27 @@
+config INFINIBAND_CXGB3
+   tristate Chelsio RDMA Driver
+   depends on CHELSIO_T3  INFINIBAND
+   select GENERIC_ALLOCATOR
+   ---help---
+ This is an iWARP/RDMA driver for the Chelsio T3 1GbE and
+ 10GbE adapters.
+
+ For general information about Chelsio and our products, visit
+ our website at http://www.chelsio.com.
+
+ For customer support, please visit our customer support page at
+ http://www.chelsio.com/support.htm.
+
+ Please send feedback to [EMAIL PROTECTED].
+
+ To compile this driver as a module, choose M here: the module
+ will be called iw_cxgb3.
+
+config INFINIBAND_CXGB3_DEBUG
+   bool Verbose debugging output
+   depends on INFINIBAND_CXGB3
+   default n
+   ---help---
+ This option causes the Chelsio RDMA driver to produce copious
+ amounts of debug messages.  Select this if you are developing
+ the driver or trying to diagnose a problem.
diff --git a/drivers/infiniband/hw/cxgb3/Makefile 
b/drivers/infiniband/hw/cxgb3/Makefile
new file mode 100644
index 000..7a89f6d
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb3/Makefile
@@ -0,0 +1,12 @@
+EXTRA_CFLAGS += -I$(TOPDIR)/drivers/net/cxgb3 \
+   -I$(TOPDIR)/drivers/infiniband/hw/cxgb3/core 
+
+obj-$(CONFIG_INFINIBAND_CXGB3) += iw_cxgb3.o
+
+iw_cxgb3-y :=  iwch_cm.o iwch_ev.o iwch_cq.o iwch_qp.o iwch_mem.o \
+  iwch_provider.o iwch.o core/cxio_hal.o core/cxio_resource.o
+
+ifdef CONFIG_INFINIBAND_CXGB3_DEBUG
+EXTRA_CFLAGS += -DDEBUG -g 
+iw_cxgb3-y += core/cxio_dbg.o
+endif
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 05/13] Queue Pairs

2006-12-14 Thread Steve Wise

Code to manipulate the QP.

Signed-off-by: Steve Wise [EMAIL PROTECTED]
---

 drivers/infiniband/hw/cxgb3/iwch_qp.c | 1007 +
 1 files changed, 1007 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c 
b/drivers/infiniband/hw/cxgb3/iwch_qp.c
new file mode 100644
index 000..9f6b251
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c
@@ -0,0 +1,1007 @@
+/*
+ * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
+ * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#include iwch_provider.h
+#include iwch.h
+#include iwch_cm.h
+#include cxio_hal.h
+
+#define NO_SUPPORT -1
+
+static inline int iwch_build_rdma_send(union t3_wr *wqe, struct ib_send_wr *wr,
+  u8 * flit_cnt)
+{
+   int i;
+   u32 plen;
+
+   switch (wr-opcode) {
+   case IB_WR_SEND:
+   case IB_WR_SEND_WITH_IMM:
+   if (wr-send_flags  IB_SEND_SOLICITED)
+   wqe-send.rdmaop = T3_SEND_WITH_SE;
+   else
+   wqe-send.rdmaop = T3_SEND;
+   wqe-send.rem_stag = 0;
+   break;
+#if 0  /* Not currently supported */
+   case TYPE_SEND_INVALIDATE:
+   case TYPE_SEND_INVALIDATE_IMMEDIATE:
+   wqe-send.rdmaop = T3_SEND_WITH_INV;
+   wqe-send.rem_stag = cpu_to_be32(wr-wr.rdma.rkey);
+   break;
+   case TYPE_SEND_SE_INVALIDATE:
+   wqe-send.rdmaop = T3_SEND_WITH_SE_INV;
+   wqe-send.rem_stag = cpu_to_be32(wr-wr.rdma.rkey);
+   break;
+#endif
+   default:
+   break;
+   }
+   if (wr-num_sge  T3_MAX_SGE)
+   return -EINVAL;
+   wqe-send.reserved[0] = 0;
+   wqe-send.reserved[1] = 0;
+   wqe-send.reserved[2] = 0;
+   if (wr-opcode == IB_WR_SEND_WITH_IMM) {
+   plen = 4;
+   wqe-send.sgl[0].stag = wr-imm_data;
+   wqe-send.sgl[0].len = __constant_cpu_to_be32(0);
+   wqe-send.num_sgle = __constant_cpu_to_be32(0);
+   *flit_cnt = 5;
+   } else {
+   plen = 0;
+   for (i = 0; i  wr-num_sge; i++) {
+   if ((plen + wr-sg_list[i].length)  plen) {
+   return -EMSGSIZE;
+   }
+   plen += wr-sg_list[i].length;
+   wqe-send.sgl[i].stag =
+   cpu_to_be32(wr-sg_list[i].lkey);
+   wqe-send.sgl[i].len =
+   cpu_to_be32(wr-sg_list[i].length);
+   wqe-send.sgl[i].to = cpu_to_be64(wr-sg_list[i].addr);
+   }
+   wqe-send.num_sgle = cpu_to_be32(wr-num_sge);
+   *flit_cnt = 4 + ((wr-num_sge)  1);
+   }
+   wqe-send.plen = cpu_to_be32(plen);
+   return 0;
+}
+
+static inline int iwch_build_rdma_write(union t3_wr *wqe, struct ib_send_wr 
*wr,
+   u8 *flit_cnt)
+{
+   int i;
+   u32 plen;
+   if (wr-num_sge  T3_MAX_SGE)
+   return -EINVAL;
+   wqe-write.rdmaop = T3_RDMA_WRITE;
+   wqe-write.reserved[0] = 0;
+   wqe-write.reserved[1] = 0;
+   wqe-write.reserved[2] = 0;
+   wqe-write.stag_sink = cpu_to_be32(wr-wr.rdma.rkey);
+   wqe-write.to_sink = cpu_to_be64(wr-wr.rdma.remote_addr);
+
+   if (wr-opcode == IB_WR_RDMA_WRITE_WITH_IMM) {
+   plen = 4;
+   wqe-write.sgl[0].stag = wr-imm_data;
+  

[PATCH v4 07/13] Async Event Handler

2006-12-14 Thread Steve Wise

Code to handle async events coming from the T3 RDMA Core.

Signed-off-by: Steve Wise [EMAIL PROTECTED]
---

 drivers/infiniband/hw/cxgb3/iwch_ev.c |  231 +
 1 files changed, 231 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_ev.c 
b/drivers/infiniband/hw/cxgb3/iwch_ev.c
new file mode 100644
index 000..b0bd014
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb3/iwch_ev.c
@@ -0,0 +1,231 @@
+/*
+ * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
+ * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#include linux/slab.h
+#include linux/mman.h
+#include net/sock.h
+#include iwch_provider.h
+#include iwch.h
+#include iwch_cm.h
+#include cxio_hal.h
+#include cxio_wr.h
+
+static void post_qp_event(struct iwch_dev *rnicp, struct iwch_cq *chp,
+ struct respQ_msg_t *rsp_msg,
+ enum ib_event_type ib_event, 
+ int send_term)
+{
+   struct ib_event event;
+   struct iwch_qp_attributes attrs;
+   struct iwch_qp *qhp;
+
+   printk(KERN_ERR %s - AE qpid 0x%x opcode %d status 0x%x 
+  type %d wrid.hi 0x%x wrid.lo 0x%x \n, __FUNCTION__, 
+  CQE_QPID(rsp_msg-cqe), CQE_OPCODE(rsp_msg-cqe), 
+  CQE_STATUS(rsp_msg-cqe), CQE_TYPE(rsp_msg-cqe),
+  CQE_WRID_HI(rsp_msg-cqe), CQE_WRID_LOW(rsp_msg-cqe));
+
+   spin_lock(rnicp-lock);
+   qhp = get_qhp(rnicp, CQE_QPID(rsp_msg-cqe));
+
+   if (!qhp) {
+   printk(KERN_ERR %s unaffiliated error 0x%x qpid 0x%x\n, 
+  __FUNCTION__, CQE_STATUS(rsp_msg-cqe), 
+  CQE_QPID(rsp_msg-cqe));
+   spin_unlock(rnicp-lock);
+   return;
+   }
+
+   if ((qhp-attr.state == IWCH_QP_STATE_ERROR) ||
+   (qhp-attr.state == IWCH_QP_STATE_TERMINATE)) {
+   PDBG(%s AE received after RTS - 
+qp state %d qpid 0x%x status 0x%x\n, __FUNCTION__, 
+qhp-attr.state, qhp-wq.qpid, CQE_STATUS(rsp_msg-cqe));
+   spin_unlock(rnicp-lock);
+   return;
+   }
+
+   atomic_inc(qhp-refcnt);
+   spin_unlock(rnicp-lock);
+
+   event.event = ib_event;
+   event.device = chp-ibcq.device;
+   if (ib_event == IB_EVENT_CQ_ERR)
+   event.element.cq = chp-ibcq;
+   else 
+   event.element.qp = qhp-ibqp;
+
+   if (qhp-ibqp.event_handler)
+   (*qhp-ibqp.event_handler)(event, qhp-ibqp.qp_context);
+
+   if (qhp-attr.state == IWCH_QP_STATE_RTS) {
+   attrs.next_state = IWCH_QP_STATE_TERMINATE;
+   iwch_modify_qp(qhp-rhp, qhp, IWCH_QP_ATTR_NEXT_STATE, 
+  attrs, 1);
+   if (send_term)
+   iwch_post_terminate(qhp, rsp_msg);
+   } 
+
+   if (atomic_dec_and_test(qhp-refcnt))
+   wake_up(qhp-wait);
+}
+
+void iwch_ev_dispatch(struct cxio_rdev *rdev_p, struct sk_buff *skb)
+{
+   struct iwch_dev *rnicp;
+   struct respQ_msg_t *rsp_msg = (struct respQ_msg_t *) skb-data;
+   struct iwch_cq *chp;
+   struct iwch_qp *qhp;
+   u32 cqid = RSPQ_CQID(rsp_msg);
+
+   rnicp = (struct iwch_dev *) rdev_p-ulp;
+   spin_lock(rnicp-lock);
+   chp = get_chp(rnicp, cqid);
+   qhp = get_qhp(rnicp, CQE_QPID(rsp_msg-cqe));
+   if (!chp || !qhp) {
+   printk(KERN_ERR MOD BAD AE cqid 0x%x qpid 0x%x opcode %d 
+  status 0x%x type %d wrid.hi 0x%x wrid.lo 0x%x \n, 
+  

[PATCH v4 10/13] Core HAL

2006-12-14 Thread Steve Wise

The RDMA Core interfaces with the T3 HW and ULLD providing a low level
RDMA interface.

Signed-off-by: Steve Wise [EMAIL PROTECTED]
---

 drivers/infiniband/hw/cxgb3/core/cxio_hal.c | 1302 +++
 drivers/infiniband/hw/cxgb3/core/cxio_hal.h |  201 
 2 files changed, 1503 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c 
b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c
new file mode 100644
index 000..ffc4ec0
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c
@@ -0,0 +1,1302 @@
+/*
+ * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
+ * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#include asm/semaphore.h
+#include asm/delay.h
+
+#include linux/netdevice.h
+#include linux/sched.h
+#include linux/spinlock.h
+#include linux/pci.h
+
+#include cxio_resource.h
+#include cxio_hal.h
+#include cxgb3_offload.h
+#include sge_defs.h
+
+static struct cxio_rdev *rdev_tbl[T3_MAX_NUM_RNIC];
+static cxio_hal_ev_callback_func_t cxio_ev_cb = NULL;
+
+static inline struct cxio_rdev *cxio_hal_find_rdev_by_name(char *dev_name)
+{
+   int i;
+   for (i = 0; i  T3_MAX_NUM_RNIC; i++)
+   if (rdev_tbl[i])
+   if (!strcmp(rdev_tbl[i]-dev_name, dev_name))
+   return rdev_tbl[i];
+   return NULL;
+}
+
+static inline struct cxio_rdev *cxio_hal_find_rdev_by_t3cdev(struct t3cdev
+*tdev)
+{
+   int i;
+   for (i = 0; i  T3_MAX_NUM_RNIC; i++)
+   if (rdev_tbl[i])
+   if (rdev_tbl[i]-t3cdev_p == tdev)
+   return rdev_tbl[i];
+   return NULL;
+}
+
+static inline int cxio_hal_add_rdev(struct cxio_rdev *rdev_p)
+{
+   int i;
+   for (i = 0; i  T3_MAX_NUM_RNIC; i++)
+   if (!rdev_tbl[i]) {
+   rdev_tbl[i] = rdev_p;
+   break;
+   }
+   return (i == T3_MAX_NUM_RNIC);
+}
+
+static inline void cxio_hal_delete_rdev(struct cxio_rdev *rdev_p)
+{
+   int i;
+   for (i = 0; i  T3_MAX_NUM_RNIC; i++)
+   if (rdev_tbl[i] == rdev_p) {
+   rdev_tbl[i] = NULL;
+   break;
+   }
+}
+
+int cxio_hal_cq_op(struct cxio_rdev *rdev_p, struct t3_cq *cq, 
+  enum t3_cq_opcode op, u32 credit)
+{
+   int ret;
+   struct t3_cqe *cqe;
+   u32 rptr;
+
+   struct rdma_cq_op setup;
+   setup.id = cq-cqid;
+   setup.credits = (op == CQ_CREDIT_UPDATE) ? credit : 0;
+   setup.op = op;
+   ret = rdev_p-t3cdev_p-ctl(rdev_p-t3cdev_p, RDMA_CQ_OP, setup);
+
+   if ((ret  0) || (op == CQ_CREDIT_UPDATE)) 
+   return ret;
+
+   /*
+* If the rearm returned an index other than our current index,
+* then there might be CQE's in flight (being DMA'd).  We must wait
+* here for them to complete or the consumer can miss a notification.
+*/
+   if (Q_PTR2IDX((cq-rptr), cq-size_log2) != ret) {
+   int i=0;
+
+   rptr = cq-rptr;
+
+   /* 
+* Keep the generation correct by bumping rptr until it
+* matches the index returned by the rearm - 1.
+*/
+   while (Q_PTR2IDX((rptr+1), cq-size_log2) != ret)
+   rptr++;
+
+   /* 
+* Now rptr is the index for the (last) cqe that was 
+* in-flight at the time the HW rearmed the CQ.  We 

[PATCH v4 01/13] Linux RDMA Core Changes

2006-12-14 Thread Steve Wise

Support provider-specific data in ib_uverbs_cmd_req_notify_cq().
The Chelsio iwarp provider library needs to pass information to the
kernel verb for re-arming the CQ.

Signed-off-by: Steve Wise [EMAIL PROTECTED]
---

 drivers/infiniband/core/uverbs_cmd.c  |9 +++--
 drivers/infiniband/hw/amso1100/c2.h   |2 +-
 drivers/infiniband/hw/amso1100/c2_cq.c|3 ++-
 drivers/infiniband/hw/ehca/ehca_iverbs.h  |3 ++-
 drivers/infiniband/hw/ehca/ehca_reqs.c|3 ++-
 drivers/infiniband/hw/ipath/ipath_cq.c|4 +++-
 drivers/infiniband/hw/ipath/ipath_verbs.h |3 ++-
 drivers/infiniband/hw/mthca/mthca_cq.c|6 --
 drivers/infiniband/hw/mthca/mthca_dev.h   |4 ++--
 include/rdma/ib_verbs.h   |5 +++--
 10 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index 743247e..5dd1de9 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -959,6 +959,7 @@ ssize_t ib_uverbs_req_notify_cq(struct i
int out_len)
 {
struct ib_uverbs_req_notify_cq cmd;
+   struct ib_udata   udata;
struct ib_cq  *cq;
 
if (copy_from_user(cmd, buf, sizeof cmd))
@@ -968,8 +969,12 @@ ssize_t ib_uverbs_req_notify_cq(struct i
if (!cq)
return -EINVAL;
 
-   ib_req_notify_cq(cq, cmd.solicited_only ?
-IB_CQ_SOLICITED : IB_CQ_NEXT_COMP);
+   INIT_UDATA(udata, buf + sizeof cmd, 0,
+  in_len - sizeof cmd, 0); 
+
+   cq-device-req_notify_cq(cq, cmd.solicited_only ?
+ IB_CQ_SOLICITED : IB_CQ_NEXT_COMP,
+ udata);
 
put_cq_read(cq);
 
diff --git a/drivers/infiniband/hw/amso1100/c2.h 
b/drivers/infiniband/hw/amso1100/c2.h
index 04a9db5..9a76869 100644
--- a/drivers/infiniband/hw/amso1100/c2.h
+++ b/drivers/infiniband/hw/amso1100/c2.h
@@ -519,7 +519,7 @@ extern void c2_free_cq(struct c2_dev *c2
 extern void c2_cq_event(struct c2_dev *c2dev, u32 mq_index);
 extern void c2_cq_clean(struct c2_dev *c2dev, struct c2_qp *qp, u32 mq_index);
 extern int c2_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc 
*entry);
-extern int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify);
+extern int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify, struct 
ib_udata *udata);
 
 /* CM */
 extern int c2_llp_connect(struct iw_cm_id *cm_id,
diff --git a/drivers/infiniband/hw/amso1100/c2_cq.c 
b/drivers/infiniband/hw/amso1100/c2_cq.c
index 05c9154..7ce8bca 100644
--- a/drivers/infiniband/hw/amso1100/c2_cq.c
+++ b/drivers/infiniband/hw/amso1100/c2_cq.c
@@ -217,7 +217,8 @@ int c2_poll_cq(struct ib_cq *ibcq, int n
return npolled;
 }
 
-int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify)
+int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify,
+ struct ib_udata *udata)
 {
struct c2_mq_shared __iomem *shared;
struct c2_cq *cq;
diff --git a/drivers/infiniband/hw/ehca/ehca_iverbs.h 
b/drivers/infiniband/hw/ehca/ehca_iverbs.h
index 3720e30..566b30c 100644
--- a/drivers/infiniband/hw/ehca/ehca_iverbs.h
+++ b/drivers/infiniband/hw/ehca/ehca_iverbs.h
@@ -135,7 +135,8 @@ int ehca_poll_cq(struct ib_cq *cq, int n
 
 int ehca_peek_cq(struct ib_cq *cq, int wc_cnt);
 
-int ehca_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify cq_notify);
+int ehca_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify cq_notify,
+  struct ib_udata *udata);
 
 struct ib_qp *ehca_create_qp(struct ib_pd *pd,
 struct ib_qp_init_attr *init_attr,
diff --git a/drivers/infiniband/hw/ehca/ehca_reqs.c 
b/drivers/infiniband/hw/ehca/ehca_reqs.c
index b46bda1..3ed6992 100644
--- a/drivers/infiniband/hw/ehca/ehca_reqs.c
+++ b/drivers/infiniband/hw/ehca/ehca_reqs.c
@@ -634,7 +634,8 @@ poll_cq_exit0:
return ret;
 }
 
-int ehca_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify cq_notify)
+int ehca_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify cq_notify,
+  struct ib_udata *udata)
 {
struct ehca_cq *my_cq = container_of(cq, struct ehca_cq, ib_cq);
 
diff --git a/drivers/infiniband/hw/ipath/ipath_cq.c 
b/drivers/infiniband/hw/ipath/ipath_cq.c
index 87462e0..27ba4db 100644
--- a/drivers/infiniband/hw/ipath/ipath_cq.c
+++ b/drivers/infiniband/hw/ipath/ipath_cq.c
@@ -307,13 +307,15 @@ int ipath_destroy_cq(struct ib_cq *ibcq)
  * ipath_req_notify_cq - change the notification type for a completion queue
  * @ibcq: the completion queue
  * @notify: the type of notification to request
+ * @udata: user data 
  *
  * Returns 0 for success.
  *
  * This may be called from interrupt context.  Also called by
  * ib_req_notify_cq() in the generic verbs code.
  */
-int ipath_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify notify)
+int ipath_req_notify_cq(struct ib_cq 

Re: [PATCH 2.6.19-git19] BUG due to bad argument to ieee80211softmac_assoc_work

2006-12-14 Thread Johannes Berg
On Wed, 2006-12-13 at 13:17 -0500, Michael Bommarito wrote:

 Attached is a patch that fixes this (the actual change is two lines
 but context provided in patch for review).  The dmesg containing call
 trace is attached to the bugzilla entry above.

You forgot to attach the patch but IIRC it's been found and fixed
already.

johannes


signature.asc
Description: This is a digitally signed message part


Re: [RFC] split NAPI from network device.

2006-12-14 Thread Benjamin Herrenschmidt
On Wed, 2006-12-13 at 15:46 -0800, Stephen Hemminger wrote:
 Split off NAPI part from network device, this patch is build tested
 only! It breaks kernel API for network devices, and only three examples
 are fixed (skge, sky2, and tg3).
 
 1. Decomposition allows different NAPI - network device
Some hardware has N devices for one IRQ, others like MSI-X
want multiple receive's for one device.
 
 2. Cleanup locking with netpoll
 
 3. Change poll callback arguements and semantics
 
 4. Make softnet_data static (only in dev.c)

Thanks !

I'll give a go at adapting emac and maybe a few more when I get 5mn to
spare...

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.19-git19] BUG due to bad argument to ieee80211softmac_assoc_work

2006-12-14 Thread Uli Kunitz

Michael,

I sent a patch to this list on Sunday, that patched the problem. It  
seems to be migrated into the wireless-2.6 git tree.


Regards,

Uli
Am 13.12.2006 um 19:17 schrieb Michael Bommarito:


This didn't get much attention on bugzilla and I figured it was
important enough to forward along to the whole list since it's been
lingering around in ieee80211-softmac since 19-git5 at least.
http://bugzilla.kernel.org/show_bug.cgi?id=7657

Somebody was passing the whole mac device structure to
ieee80211softmac_assoc_work instead of just the assocation work, which
lead to much death and locking.

Attached is a patch that fixes this (the actual change is two lines
but context provided in patch for review).  The dmesg containing call
trace is attached to the bugzilla entry above.

-Mike
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
Uli Kunitz



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.19-git19] BUG due to bad argument to ieee80211softmac_assoc_work

2006-12-14 Thread Michael Bommarito

Hello Uli,
 Yes, apologies, I had been waiting for an abandoned bugzilla entry
to get attention, and when I realized it was assigned to a dead-end, I
had simply posted the patch without checking for prior messages.
 I was further confused by the fact that it hadn't made its way into
any of the 19-gitX sets (and for that matter, the window for
2.6.20-rc1 has come and gone and this still remains unfixed), despite
how clear the error was and how trivial the fix seems.

-Mike

On 12/14/06, Uli Kunitz [EMAIL PROTECTED] wrote:

Michael,

I sent a patch to this list on Sunday, that patched the problem. It
seems to be migrated into the wireless-2.6 git tree.

Regards,

Uli
Am 13.12.2006 um 19:17 schrieb Michael Bommarito:

 This didn't get much attention on bugzilla and I figured it was
 important enough to forward along to the whole list since it's been
 lingering around in ieee80211-softmac since 19-git5 at least.
 http://bugzilla.kernel.org/show_bug.cgi?id=7657

 Somebody was passing the whole mac device structure to
 ieee80211softmac_assoc_work instead of just the assocation work, which
 lead to much death and locking.

 Attached is a patch that fixes this (the actual change is two lines
 but context provided in patch for review).  The dmesg containing call
 trace is attached to the bugzilla entry above.

 -Mike
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Uli Kunitz





-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.19-git19] BUG due to bad argument to ieee80211softmac_assoc_work

2006-12-14 Thread Larry Finger

Michael Bommarito wrote:

Hello Uli,
 Yes, apologies, I had been waiting for an abandoned bugzilla entry
to get attention, and when I realized it was assigned to a dead-end, I
had simply posted the patch without checking for prior messages.
 I was further confused by the fact that it hadn't made its way into
any of the 19-gitX sets (and for that matter, the window for
2.6.20-rc1 has come and gone and this still remains unfixed), despite
how clear the error was and how trivial the fix seems.


I was not aware that a bugzilla entry existed for this problem. I learned about it when my system 
would hang on bootup if the bcm43xx card was installed. By bisection, I learned which commit was 
causing the problem. About that time, the complete fix was discussed on the netdev and bcm43xx 
mailing lists. I was a little perturbed that only part of the fix was accepted into 2.6.19-gitX.


The full fix was pushed to John Linville on Dec. 10, who pushed it on to Jeff Garzik on Dec. 11. I 
have not yet seen any message sending it on to Andrew Morton or Linus.


A bug fix will always be accepted, particularly one that only changes 2 lines - it is only a new 
feature that will no longer be accepted once the -rc1 stage is reached. If this message doesn't do 
the trick and it isn't included by -rc2, I'll ping Jeff to see what happened. Changes always take 
longer than one likes, but one needs to be careful.


Larry

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/14] Spidernet Avoid possible RX chain corruption

2006-12-14 Thread Michael Ellerman
On Thu, 2006-12-14 at 11:15 -0600, Linas Vepstas wrote:
 On Thu, Dec 14, 2006 at 11:22:43AM +1100, Michael Ellerman wrote:
 spider_net_refill_rx_chain(card);
   - spider_net_enable_rxchtails(card);
 spider_net_enable_rxdmac(card);
 return 0;
  
  Didn't you just add that line?
 
 Dagnabbit. The earlier pach was moving around existing code.
 Or, more precisely, trying to maintain the general function
 of the old code even while moving things around.
 
 Later on, when I started looking at what the danged function 
 actually did, and the context it was in, I realized that it 
 was a bad idea to call the thing.  So then I removed it. :-/
 
 How should I handle this proceedurally? Resend the patch sequence? 
 Let it slide?

If it was my code I'd redo the series, it's confusing and it's going to
look confused in the git history IMHO.

Currently the driver calls spider_net_enable_rxchtails() from
spider_net_enable_card() and spider_net_handle_rxram_full().

Your patch 3/14 removes spider_net_handle_rxram_full() entirely, leaving
spider_net_enable_card() as the only caller of
spider_net_enable_rxchtails().

Patch 10/14 adds a call to spider_net_enable_rxchtails() in
spider_net_alloc_rx_skbs(), and nothing else (except comment changes).

Patch 12/14 removes the call to spider_net_enable_rxchtails() in
spider_net_alloc_rx_skbs(), and nothing else.

So as far as I can tell you should just drop 10/14 and 12/14. 

My worry is that amongst all that rearranging of code, it's not clear
what the semantic change is. Admittedly I don't know the driver that
well, but that's kind of the point - if you and Jim get moved onto a new
project, someone needs to be able to pick up the driver and maintain it.

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person


signature.asc
Description: This is a digitally signed message part


Fw: 2.6.20-rc1 sky2 problems (regression?)

2006-12-14 Thread Andrew Morton


Begin forwarded message:

Date: Thu, 14 Dec 2006 12:47:05 -0800
From: Alex Romosan [EMAIL PROTECTED]
To: linux-kernel@vger.kernel.org
Subject: 2.6.20-rc1 sky2 problems (regression?)


under heavy network load the sky2 driver (compiled in the kernel)
locks up and the only way i can get the network back is to reboot the
machine (bringing the network down and back up again doesn't help).
this happens on an amd64 machine (athlon 3500+ processor) and the card
in question is a Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit
Ethernet Controller (rev 15) (from lspci). this is what i see in the
syslog:

kernel: sky2 eth0: rx error, status 0x414a414a length 0
kernel: eth0: hw csum failure.
kernel: 
kernel: Call Trace:
kernel:  IRQ  [8044681c] __skb_checksum_complete+0x4d/0x66
kernel:  [80477bc5] tcp_v4_rcv+0x147/0x8ea
kernel:  [80479ef2] raw_rcv_skb+0x9/0x20
kernel:  [8047a2ff] raw_rcv+0xbe/0xc4
kernel:  [8045ea9d] ip_local_deliver+0x170/0x21b
kernel:  [8045e8fa] ip_rcv+0x478/0x4ab
kernel:  [8044905d] netif_receive_skb+0x184/0x20e
kernel:  [803de8e5] sky2_poll+0x68f/0x93c
kernel:  [802219ce] scheduler_tick+0x23/0x2f9
kernel:  [8044a796] net_rx_action+0x61/0xf0
kernel:  [8022a35f] __do_softirq+0x40/0x8a
kernel:  [8020a3cc] call_softirq+0x1c/0x28
kernel:  [8020bbf0] do_softirq+0x2c/0x7d
kernel:  [8022a313] irq_exit+0x36/0x42
kernel:  [8020bebe] do_IRQ+0x8c/0x9e
kernel:  [80208710] default_idle+0x0/0x3a
kernel:  [80209bf1] ret_from_intr+0x0/0xa
kernel:  EOI  [80208736] default_idle+0x26/0x3a
kernel:  [8020878c] cpu_idle+0x42/0x75
kernel:  [805df675] start_kernel+0x1ce/0x1d3
kernel:  [805df140] _sinittext+0x140/0x144
kernel: 
kernel: eth0: hw csum failure.
kernel: 
kernel: Call Trace:
kernel:  IRQ  [8044681c] __skb_checksum_complete+0x4d/0x66
kernel:  [80477bc5] tcp_v4_rcv+0x147/0x8ea
kernel:  [80479ef2] raw_rcv_skb+0x9/0x20
kernel:  [8047a2ff] raw_rcv+0xbe/0xc4
kernel:  [8045ea9d] ip_local_deliver+0x170/0x21b
kernel:  [8045e8fa] ip_rcv+0x478/0x4ab
kernel:  [8044905d] netif_receive_skb+0x184/0x20e
kernel:  [803de8e5] sky2_poll+0x68f/0x93c
kernel:  [80474647] tcp_delack_timer+0x0/0x1b5
kernel:  [8044a796] net_rx_action+0x61/0xf0
kernel:  [8022a35f] __do_softirq+0x40/0x8a
kernel:  [8020a3cc] call_softirq+0x1c/0x28
kernel:  [8020bbf0] do_softirq+0x2c/0x7d
kernel:  [8022a313] irq_exit+0x36/0x42
kernel:  [8020bebe] do_IRQ+0x8c/0x9e
kernel:  [80209bf1] ret_from_intr+0x0/0xa
kernel:  EOI  [802a8402] inode2sd+0x104/0x117
kernel:  [802b8cfa] search_by_key+0xa08/0xbfe
kernel:  [802b8475] search_by_key+0x183/0xbfe
kernel:  [80284778] ll_rw_block+0x89/0x9e
kernel:  [802b8475] search_by_key+0x183/0xbfe
kernel:  [80283cf5] __find_get_block_slow+0x101/0x10d
kernel:  [80284053] __find_get_block+0x197/0x1a5
kernel:  [8026800c] inode_get_bytes+0x2a/0x52
kernel:  [802a89f1] reiserfs_update_sd_size+0x7e/0x284
kernel:  [80237700] kthread+0xed/0xfd
kernel:  [802be990] do_journal_end+0x34b/0xbdd
kernel:  [802b1729] reiserfs_dirty_inode+0x56/0x76
kernel:  [80284c19] block_prepare_write+0x1a/0x24
kernel:  [802809b1] __mark_inode_dirty+0x29/0x197
kernel:  [802a8d04] reiserfs_commit_write+0x10d/0x19f
kernel:  [80284c19] block_prepare_write+0x1a/0x24
kernel:  [802484fc] generic_file_buffered_write+0x4ad/0x6c4
kernel:  [80271b3c] __pollwait+0x0/0xe0
kernel:  [8022a006] current_fs_time+0x35/0x3b
kernel:  [80248a8c] __generic_file_aio_write_nolock+0x379/0x3ec
kernel:  [8049baca] unix_dgram_recvmsg+0x1be/0x1d9
kernel:  [804b6516] __mutex_lock_slowpath+0x205/0x210
kernel:  [80248b60] generic_file_aio_write+0x61/0xc1
kernel:  [80248aff] generic_file_aio_write+0x0/0xc1
kernel:  [80264e57] do_sync_readv_writev+0xc0/0x107
kernel:  [802377f7] autoremove_wake_function+0x0/0x2e
kernel:  [80229d16] getnstimeofday+0x10/0x28
kernel:  [80264ced] rw_copy_check_uvector+0x6c/0xdc
kernel:  [802654f7] do_readv_writev+0xb2/0x18b
kernel:  [80265a2c] sys_writev+0x45/0x93
kernel:  [802096de] system_call+0x7e/0x83

and so on. some times i don't get this trace but instead i get:

kernel: sky2 eth0: tx timeout
kernel: sky2 eth0: transmit ring 140 .. 99 report=181 done=181
kernel: sky2 status report lost?
kernel: NETDEV WATCHDOG: eth0: transmit timed out
kernel: sky2 eth0: tx timeout
kernel: sky2 eth0: transmit ring 181 .. 140 report=181 done=181
kernel: sky2 hardware hung? flushing

but the end result is the same, the network card stops responding and
i have to reboot the machine. i can reproduce this on a consistent
basis so if there 

Re: 2.6.20-rc1 sky2 problems (regression?)

2006-12-14 Thread Herbert Xu
Alex Romosan [EMAIL PROTECTED] wrote:
 /** does the HW need to evaluate checksum for TCP or UDP packets?
 if (pMessage-ip_summed == CHECKSUM_HW)
 
 maybe this needs to be replace with CHECKSUM_PARTIAL. the second one
 
 /** TCP checksum offload
 if ((pSKPacket-pMbuf-ip_summed == CHECKSUM_HW) 
 (SetOpcodePacketFlag == SK_TRUE)
 
 i wonder if this is supposed to be CHECKSUM_COMPLETE

The rule of thumb is that it's COMPLETE for RX, and PARTIAL for TX.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/22] e1000: disable TSO when debugging slab

2006-12-14 Thread Herbert Xu
Jeff Garzik [EMAIL PROTECTED] wrote:
  
 +#ifdef CONFIG_DEBUG_SLAB
 + /* 82544's work arounds do not play nicely with DEBUG SLAB */
 + if (adapter-hw.mac_type == e1000_82544)
 + netdev-features = ~NETIF_F_TSO;
 +#endif
 
 ACK, provided that you greatly enhance the comment to explain -why-, not 
 just the desired results.

Actually, CONFIG_DEBUG_SLAB is not the only thing that can break the
82544 work-around, Xen for example will also generate packets that
breaks it.  Jesse has a more recent fix that resolves both problems.

I've updated his patch to make it smaller.

Note that the only reason we don't see this normally is because the
TCP stack starts writing from the end, i.e., it writes the TCP header
first then slaps on the IP header, etc.  So the end of the TCP header
(skb-tail - 1 here) is always aligned correctly.

Had we made the start of the IP header (e.g., IPv6) 8-byte aligned
instead, this would happen for normal TCP traffic as well.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 73f3a85..2c6ba42 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -3168,6 +3168,16 @@ #ifdef NETIF_F_TSO
if (skb-data_len  (hdr_len == (skb-len - skb-data_len))) {
switch (adapter-hw.mac_type) {
unsigned int pull_size;
+   case e1000_82544:
+   /* Make sure we have room to chop off 4 bytes,
+* and that the end alignment will work out to
+* this hardware's requirements
+* NOTE: this is a TSO only workaround
+* if end byte alignment not correct move us
+* into the next dword */
+   if ((unsigned long)(skb-tail - 1)  4)
+   break;
+   /* fall through */
case e1000_82571:
case e1000_82572:
case e1000_82573:
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] d80211: add IEEE802.11e/WMM MLMEs, Status Code and Reason Code

2006-12-14 Thread Zhu Yi
On Thu, 2006-12-14 at 11:27 +0100, Jiri Benc wrote:
 On Thu, 14 Dec 2006 12:02:04 +0800, Zhu Yi wrote:
  Signed-off-by: Zhu Yi [EMAIL PROTECTED]
 
 Please Cc: me and John Linville on d80211 patches otherwise your
 chances of review (and inclusion) are much lower.
 
 In addition to comments from Michael (which are all perfectly valid and
 you need to address all of them):
 
  +struct ieee802_11_ts_info {
 
 Choose a name consistent with the rest of the header (e.g. ieee80211_
 prefix).

OK.

Thanks,
-yi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.20-rc1 sky2 problems (regression?)

2006-12-14 Thread Stephen Hemminger
On Fri, 15 Dec 2006 13:24:32 +1100
Herbert Xu [EMAIL PROTECTED] wrote:

 Alex Romosan [EMAIL PROTECTED] wrote:
  /** does the HW need to evaluate checksum for TCP or UDP packets?
  if (pMessage-ip_summed == CHECKSUM_HW)
  
  maybe this needs to be replace with CHECKSUM_PARTIAL. the second one
  
  /** TCP checksum offload
  if ((pSKPacket-pMbuf-ip_summed == CHECKSUM_HW) 
  (SetOpcodePacketFlag == SK_TRUE)
  
  i wonder if this is supposed to be CHECKSUM_COMPLETE
 
 The rule of thumb is that it's COMPLETE for RX, and PARTIAL for TX.
 
 Cheers,

I have a fixed up version of the vendor driver, I'll repackage it tomorrow.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/6] d80211: create wifi.h to define WIFI OUIs

2006-12-14 Thread Zhu Yi
On Thu, 2006-12-14 at 11:31 +0100, Jiri Benc wrote:
 AFAIK wifi is a trademark and we want to avoid using it. wlan seems
 to be a better alternative for the prefixes. Also, I don't see a reason
 for a separate header file here.

WI-FI(r) is a trademark, but wifi and WIFI_XXX are not. I'm OK with
putting these to existed headers.

Thanks,
-yi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6] d80211: add sysfs interface for QoS functions

2006-12-14 Thread Zhu Yi
On Thu, 2006-12-14 at 12:23 +0100, Jiri Benc wrote:
 So... what about implementing that into cfg80211? :-)
 
 I'm not inclined towards this patch (even if you address Stephen's
 comment). 

OK. This is only for my testing (or maybe someone else wants to try the
code). I'm not asking to merge it. When all the other code is reviewed
and accepted, I will write a cfg80211 interface for it.

Thanks,
-yi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.20-rc1 sky2 problems (regression?)

2006-12-14 Thread Alex Romosan
Stephen Hemminger [EMAIL PROTECTED] writes:

 I have a fixed up version of the vendor driver, I'll repackage it tomorrow.

as per the include file, i ended up replacing all the CHECKSUM_HW with
CHECkSUM_PARTIAL since the functions in questions had to do with
transmit. seems to be working so far without any lockups. we'll see
how long this lasts.

--alex--

-- 
| I believe the moment is at hand when, by a paranoiac and active |
|  advance of the mind, it will be possible (simultaneously with  |
|  automatism and other passive states) to systematize confusion  |
|  and thus to help to discredit completely the world of reality. |
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[announce] iproute2 2.6.19-061214

2006-12-14 Thread Stephen Hemminger
This is an update to the iproute2 command set.
It can be downloaded from:
  http://developer.osdl.org/dev/iproute2/download/iproute2-2.6.18-061214.tar.gz

Repository:
  git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git

For more info on iproute2 see:
  http://linux-net.osdl.org/index.php/Iproute2

The version number includes the kernel version to denote what features are
supported. The same source should build on older systems, but obviously the
newer kernel features won't be available. As much as possible, this package
tries to be source compatible across releases.

Changes from 2.6.18-061002 to 2.6.19-061214:

Boian Bonev:
  Display local route table name correctly in output of:

Hasso Tepper:
  Fixes for tc help commands

jamal:
  Multicast computation off by one
  Update generic netlink header
  Add controller support for new features exposed
  clarify ok and pass
  Fix missing class/flowid oddity
  Mention need for db dev package
  update xfrm async events
  make muticast group to bitmask conversion generic
  update xfrm monitoring to use nl_mgrp

Masahide NAKAMURA:
  ADDR: Fix print format for lifetimes.
  ADDR: Enable to add IPv6 address with valid/preferred lifetime.
  ADDR: Define 0xU as INFINITY_LIFE_TIME regarding to the kernel.
  TUNNEL: Split common functions to export them.
  TUNNEL: Import ip6tunnel.c.
  TUNNEL: IPv6-over-IPv6 tunnel support.
  XFRM: sub policy support.
  XFRM: Mobile IPv6 route optimization support.
  XFRM: support report message by monitor.
  XFRM: Mobility header support.

Noriaki TAKAMIYA:
  ADDR: Add the 'change' and 'replace' commands to the IPv6 address 
manipulation context.

Patrick McHardy:
  [IPROUTE]: Add support for routing rule fwmark masks

Stephen Hemminger:
  Man page for ss submitted by Alex Wirt
  Typo in man page
  Trap possible overflow in usec values to netem
  genl Makefile LDFLAGS
  SA and SP in IPSec BEET mode.
  Route metrics decode bug.
  lnstat man page
  Man page for rtmon
  Update to 2.6.19 headers
  Add more includes
  Change to post 2.6.19 sanitized headers
  Eliminate trailing whitespace


Thomas Graf:
  Add support for inverted selectors
  Add rule notification support to ip monitor

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] bcm43xx-d80211: Fix for PHYmode API change.

2006-12-14 Thread Michael Buesch
This fixes the PHYmode list API breakage for the bcm43xx-d80211 driver.

Signed-off-by: Michael Buesch [EMAIL PROTECTED]

Index: bu3sch-wireless-dev/drivers/net/wireless/d80211/bcm43xx/bcm43xx.h
===
--- bu3sch-wireless-dev.orig/drivers/net/wireless/d80211/bcm43xx/bcm43xx.h  
2006-12-13 19:24:25.0 +0100
+++ bu3sch-wireless-dev/drivers/net/wireless/d80211/bcm43xx/bcm43xx.h   
2006-12-14 17:42:42.0 +0100
@@ -561,6 +561,8 @@ struct bcm43xx_phy {
enum bcm43xx_firmware_compat fw;
/* The TX header length. This depends on the firmware. */
size_t txhdr_size;
+
+   struct ieee80211_hw_mode hwmode;
 };
 
 /* Data structures for DMA transmission, per 80211 core. */
Index: bu3sch-wireless-dev/drivers/net/wireless/d80211/bcm43xx/bcm43xx_main.c
===
--- 
bu3sch-wireless-dev.orig/drivers/net/wireless/d80211/bcm43xx/bcm43xx_main.c 
2006-12-13 19:24:25.0 +0100
+++ bu3sch-wireless-dev/drivers/net/wireless/d80211/bcm43xx/bcm43xx_main.c  
2006-12-14 17:57:46.0 +0100
@@ -2892,19 +2892,25 @@ static void bcm43xx_chipset_detach(struc
 
 static void bcm43xx_free_modes(struct bcm43xx_private *bcm)
 {
-   struct ieee80211_hw *hw = bcm-ieee;
+   struct ssb_core *core;
+   struct bcm43xx_corepriv_80211 *wlpriv;
+   struct bcm43xx_phy *phy;
int i;
 
-   for (i = 0; i  hw-num_modes; i++) {
-   kfree(hw-modes[i].channels);
-   kfree(hw-modes[i].rates);
+   for (i = 0; i  bcm-nr_80211_available; i++) {
+   core = bcm-wlcores[i];
+   wlpriv = core-priv;
+   phy = wlpriv-phy;
+
+   kfree(phy-hwmode.channels);
+   phy-hwmode.channels = NULL;
+   kfree(phy-hwmode.rates);
+   phy-hwmode.rates = NULL;
}
-   kfree(hw-modes);
-   hw-modes = NULL;
-   hw-num_modes = 0;
 }
 
-static int bcm43xx_append_mode(struct ieee80211_hw *hw,
+static int bcm43xx_append_mode(struct bcm43xx_private *bcm,
+  struct bcm43xx_phy *phy,
   int mode_id,
   int nr_channels,
   const struct ieee80211_channel *channels,
@@ -2913,10 +2919,10 @@ static int bcm43xx_append_mode(struct ie
   int nr_rates2,
   const struct ieee80211_rate *rates2)
 {
-   struct ieee80211_hw_modes *mode;
+   struct ieee80211_hw_mode *mode;
int err = -ENOMEM;
 
-   mode = (hw-modes[hw-num_modes]);
+   mode = phy-hwmode;
 
mode-mode = mode_id;
mode-num_channels = nr_channels;
@@ -2937,11 +2943,14 @@ static int bcm43xx_append_mode(struct ie
   sizeof(*rates2) * nr_rates2);
}
 
-   hw-num_modes++;
-   err = 0;
+   err = ieee80211_register_hwmode(bcm-ieee, mode);
+   if (err)
+   goto err_free_rates;
 out:
return err;
 
+err_free_rates:
+   kfree(mode-rates);
 err_free_channels:
kfree(mode-channels);
goto out;
@@ -2950,17 +2959,9 @@ err_free_channels:
 static int bcm43xx_setup_modes(struct bcm43xx_private *bcm)
 {
int err = -ENOMEM;
-   struct ieee80211_hw *hw = bcm-ieee;
struct ssb_core *core;
struct bcm43xx_corepriv_80211 *wlpriv;
-   int i, nr_modes;
-
-   nr_modes = bcm-nr_80211_available;
-   hw-modes = kzalloc(sizeof(*(hw-modes)) * nr_modes,
- GFP_KERNEL);
-   if (!hw-modes)
-   goto out;
-   hw-num_modes = 0;
+   int i;
 
for (i = 0; i  bcm-nr_80211_available; i++) {
core = bcm-wlcores[i];
@@ -2968,7 +2969,7 @@ static int bcm43xx_setup_modes(struct bc
 
switch (wlpriv-phy.type) {
case BCM43xx_PHYTYPE_A:
-   err = bcm43xx_append_mode(bcm-ieee, MODE_IEEE80211A,
+   err = bcm43xx_append_mode(bcm, wlpriv-phy, 
MODE_IEEE80211A,
  
ARRAY_SIZE(bcm43xx_a_chantable),
  bcm43xx_a_chantable,
  
ARRAY_SIZE(bcm43xx_ofdm_ratetable),
@@ -2976,7 +2977,7 @@ static int bcm43xx_setup_modes(struct bc
  0, NULL);
break;
case BCM43xx_PHYTYPE_B:
-   err = bcm43xx_append_mode(bcm-ieee, MODE_IEEE80211B,
+   err = bcm43xx_append_mode(bcm, wlpriv-phy, 
MODE_IEEE80211B,
  
ARRAY_SIZE(bcm43xx_bg_chantable),
  bcm43xx_bg_chantable,
  
ARRAY_SIZE(bcm43xx_cck_ratetable),
@@ -2984,7 +2985,7 @@ 

[PATCH 1/2] d80211: Turn PHYmode list from an array into a linked list

2006-12-14 Thread Michael Buesch
This turns the PHY-modes list into a linked list.
The advantage is that drivers can add modes dynamically, as they probe
them and don't have to settle to a given arraysize at the beginning
of probing.

Signed-off-by: Michael Buesch [EMAIL PROTECTED]

--

Note that I will also send fixup patches for all other d80211 drivers,
if no complaints are done against this patch.

Index: bu3sch-wireless-dev/include/net/d80211.h
===
--- bu3sch-wireless-dev.orig/include/net/d80211.h   2006-12-05 
18:09:34.0 +0100
+++ bu3sch-wireless-dev/include/net/d80211.h2006-12-13 19:40:05.0 
+0100
@@ -76,12 +76,14 @@ struct ieee80211_rate {
   * optimizing channel utilization estimates */
 };
 
-struct ieee80211_hw_modes {
-   int mode;
-   int num_channels;
-   struct ieee80211_channel *channels;
-   int num_rates;
-struct ieee80211_rate *rates;
+struct ieee80211_hw_mode {
+   int mode; /* MODE_IEEE80211... */
+   int num_channels; /* Number of channels (below) */
+   struct ieee80211_channel *channels; /* Array of supported channels */
+   int num_rates; /* Number of rates (below) */
+struct ieee80211_rate *rates; /* Array of supported rates */
+
+   struct list_head list; /* Internal, don't touch */
 };
 
 struct ieee80211_tx_queue_params {
@@ -420,9 +422,7 @@ typedef enum {
SET_KEY, DISABLE_KEY, REMOVE_ALL_KEYS,
 } set_key_cmd;
 
-/* This is driver-visible part of the per-hw state the stack keeps.
- * If you change something in here, call ieee80211_update_hw() to
- * notify the stack about the change. */
+/* This is driver-visible part of the per-hw state the stack keeps. */
 struct ieee80211_hw {
/* these are assigned by d80211, don't write */
int index;
@@ -512,9 +512,6 @@ struct ieee80211_hw {
/* This is maximum value for rssi reported by this device */
int maxssi;
 
-   int num_modes;
-   struct ieee80211_hw_modes *modes;
-
/* Number of available hardware TX queues for data packets.
 * WMM requires at least four queues. */
int queues;
@@ -750,9 +747,9 @@ static inline char *ieee80211_get_rx_led
 #endif
 }
 
-/* Call this function if you changed the hardware description after
- * ieee80211_register_hw */
-int ieee80211_update_hw(struct ieee80211_hw *hw);
+/* Register a new hardware PHYMODE capability to the stack. */
+int ieee80211_register_hwmode(struct ieee80211_hw *hw,
+ struct ieee80211_hw_mode *mode);
 
 /* Unregister a hardware device. This function instructs 802.11 code to free
  * allocated resources and unregister netdevices from the kernel. */
Index: bu3sch-wireless-dev/net/d80211/ieee80211.c
===
--- bu3sch-wireless-dev.orig/net/d80211/ieee80211.c 2006-12-05 
18:09:35.0 +0100
+++ bu3sch-wireless-dev/net/d80211/ieee80211.c  2006-12-13 19:40:05.0 
+0100
@@ -1915,7 +1915,8 @@ int ieee80211_if_config_beacon(struct ne
 
 int ieee80211_hw_config(struct ieee80211_local *local)
 {
-   int i, ret = 0;
+   struct ieee80211_hw_mode *mode;
+   int ret = 0;
 
 #ifdef CONFIG_D80211_VERBOSE_DEBUG
printk(KERN_DEBUG HW CONFIG: channel=%d freq=%d 
@@ -1926,12 +1927,10 @@ int ieee80211_hw_config(struct ieee80211
if (local-ops-config)
ret = local-ops-config(local_to_hw(local), local-hw.conf);
 
-   for (i = 0; i  local-hw.num_modes; i++) {
-   struct ieee80211_hw_modes *mode = local-hw.modes[i];
+   list_for_each_entry(mode, local-modes_list, list) {
if (mode-mode == local-hw.conf.phymode) {
-   if (local-curr_rates != mode-rates) {
+   if (local-curr_rates != mode-rates)
rate_control_clear(local);
-   }
local-curr_rates = mode-rates;
local-num_curr_rates = mode-num_rates;
ieee80211_prepare_rates(local);
@@ -2511,10 +2510,10 @@ ieee80211_rx_h_data(struct ieee80211_txr
 static struct ieee80211_rate *
 ieee80211_get_rate(struct ieee80211_local *local, int phymode, int hw_rate)
 {
-   int m, r;
+   struct ieee80211_hw_mode *mode;
+   int r;
 
-   for (m = 0; m  local-hw.num_modes; m++) {
-   struct ieee80211_hw_modes *mode = local-hw.modes[m];
+   list_for_each_entry(mode, local-modes_list, list) {
if (mode-mode != phymode)
continue;
for (r = 0; r  mode-num_rates; r++) {
@@ -4351,24 +4350,6 @@ void ieee80211_if_mgmt_setup(struct net_
dev-destructor = ieee80211_if_free;
 }
 
-static void ieee80211_precalc_modes(struct ieee80211_local *local)
-{
-   struct ieee80211_hw_modes *mode;
-   struct ieee80211_rate *rate;
-   struct ieee80211_hw *hw = local-hw;
-  

RE: [PATCH 1/6] d80211: add IEEE802.11e/WMM MLMEs, Status Code and Reason Code

2006-12-14 Thread Simon Barber
None of this should be in the kernel. See wpa_supplicant.

Simon 

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Zhu Yi
Sent: Wednesday, December 13, 2006 8:02 PM
To: netdev@vger.kernel.org
Subject: [PATCH 1/6] d80211: add IEEE802.11e/WMM MLMEs, Status Code and
Reason Code

Signed-off-by: Zhu Yi [EMAIL PROTECTED]

---

 include/net/d80211_mgmt.h |  148
+
 1 files changed, 148 insertions(+), 0 deletions(-)

d83f6236e756f5f0bb1484d99188f06704de
diff --git a/include/net/d80211_mgmt.h b/include/net/d80211_mgmt.h index
87141d4..450c0a2 100644
--- a/include/net/d80211_mgmt.h
+++ b/include/net/d80211_mgmt.h
@@ -14,6 +14,39 @@
 
 #include linux/types.h
 
+struct ieee802_11_ts_info {
+   __le16 traffic_type:1;
+   __le16 tsid:4;
+   __le16 direction:2;
+   __le16 access_policy:2;
+   __le16 aggregation:1;
+   __le16 apsd:1;
+   __le16 up:3;
+   __le16 ack_policy:2;
+   u8 schedule:1;
+   u8 reserved:7;
+} __attribute__ ((packed));
+
+struct ieee802_11_elem_tspec {
+   struct ieee802_11_ts_info ts_info;
+   __le16 nominal_msdu_size;
+   __le16 max_msdu_size;
+   __le32 min_service_interval;
+   __le32 max_service_interval;
+   __le32 inactivity_interval;
+   __le32 suspension_interval;
+   __le32 service_start_time;
+   __le32 min_data_rate;
+   __le32 mean_data_rate;
+   __le32 peak_data_rate;
+   __le32 burst_size;
+   __le32 delay_bound;
+   __le32 min_phy_rate;
+   __le16 surplus_band_allow;
+   __le16 medium_time;
+} __attribute__ ((packed));
+
+
 struct ieee80211_mgmt {
__le16 frame_control;
__le16 duration;
@@ -81,9 +114,51 @@ struct ieee80211_mgmt {
struct {
u8 action_code;
u8 dialog_token;
+   u8 variable[0];
+   } __attribute__ ((packed)) addts_req;
+   struct {
+   u8 action_code;
+   u8 dialog_token;
+   __le16 status_code;
+   u8 variable[0];
+   } __attribute__ ((packed)) addts_resp;
+   struct {
+   u8 action_code;
+   struct ieee802_11_ts_info
ts_info;
+   __le16 reason_code;
+   } __attribute__ ((packed)) delts;
+   struct {
+   u8 action_code;
+   u8 dialog_token;
u8 status_code;
u8 variable[0];
} __attribute__ ((packed)) wme_action;
+   struct {
+   u8 action_code;
+   u8 dest[6];
+   u8 src[6];
+   __le16 capab_info;
+   __le16 timeout;
+   /* Followed by Supported Rates
and
+* Extended Supported Rates */
+   u8 variable[0];
+   } __attribute__ ((packed)) dls_req;
+   struct {
+   u8 action_code;
+   __le16 status_code;
+   u8 dest[6];
+   u8 src[6];
+   /* Followed by Capability
Information,
+* Supported Rates and Extended
+* Supported Rates */
+   u8 variable[0];
+   } __attribute__ ((packed)) dls_resp;
+   struct {
+   u8 action_code;
+   u8 dest[6];
+   u8 src[6];
+   __le16 reason_code;
+   } __attribute__ ((packed)) dls_teardown;
struct{
u8 action_code;
u8 element_id;
@@ -150,6 +225,18 @@ enum ieee80211_statuscode {
WLAN_STATUS_UNSUPP_RSN_VERSION = 44,
WLAN_STATUS_INVALID_RSN_IE_CAP = 45,
WLAN_STATUS_CIPHER_SUITE_REJECTED = 46,
+   /* 802.11e */
+   WLAN_STATUS_UNSPECIFIED_QOS = 32,
+   

RE: [PATCH 4/6] d80211: add IEEE802.11e/WMM Traffic Stream (TS) Management support

2006-12-14 Thread Simon Barber
This policing of media time must be done in the qdisc - and made to work
per DA (Destination Address) - in order that AP mode will work as well
as STA mode. In addition the count of used time should be updated AFTER
the frame has been sent, not before, since the number of retries done
cannot be taken into account before. These MUST be counted.

The API to the qdisc should be via TC - not cfg80211 or other.

Simon

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Zhu Yi
Sent: Wednesday, December 13, 2006 8:03 PM
To: netdev@vger.kernel.org
Subject: [PATCH 4/6] d80211: add IEEE802.11e/WMM Traffic Stream (TS)
Management support

The d80211 now maintains a sta_ts_data structure for every TSID and
direction combination of all the Taffic Streams. For those admission
control enabled Acesss Categories (AC), STA can initiatively request a
traffic stream. The stack also maintains two variables to record the
admitted time and used time for each TS. In every
dot11EDCAAveragingPeriod, a timer is used to track how much time (in
usec) has been used (vs the admitted time). If it finds the used time is
less than the admitted time in current dot11EDCAAveragingPeriod period,
the STA will continue to fulfil the admitted time in the next period.
Otherwise the stack will reduce the admitted time until the TS has been
throttled. Finally both the AP and STA are able to delete the TS by
sending a DELTS MLME.

Signed-off-by: Zhu Yi [EMAIL PROTECTED]

---

 net/d80211/ieee80211.c   |4 
 net/d80211/ieee80211_i.h |   49 -
 net/d80211/ieee80211_iface.c |5 +
 net/d80211/ieee80211_sta.c   |  403
++
 net/d80211/wme.c |   34 +++-
 5 files changed, 480 insertions(+), 15 deletions(-)

d4a326b8493fb465480a68696315c05558c03b2c
diff --git a/net/d80211/ieee80211.c b/net/d80211/ieee80211.c index
6e10db5..4eba18f 100644
--- a/net/d80211/ieee80211.c
+++ b/net/d80211/ieee80211.c
@@ -4599,6 +4599,10 @@ int ieee80211_register_hw(struct ieee802
goto fail_wep;
}
 
+   /* Initialize QoS Params */
+   local-dot11EDCAAveragingPeriod = 5;
+   local-MPDUExchangeTime = 0;
+
/* TODO: add rtnl locking around device creation and qdisc
install */
ieee80211_install_qdisc(local-mdev);
 
diff --git a/net/d80211/ieee80211_i.h b/net/d80211/ieee80211_i.h index
ef303da..e8929d3 100644
--- a/net/d80211/ieee80211_i.h
+++ b/net/d80211/ieee80211_i.h
@@ -56,6 +56,10 @@ struct ieee80211_local;
  * increased memory use (about 2 kB of RAM per entry). */  #define
IEEE80211_FRAGMENT_MAX 4
 
+/* Minimum and Maximum TSID used by EDCA. HCCA uses 0~7; EDCA uses 8~15

+*/ #define EDCA_TSID_MIN 8 #define EDCA_TSID_MAX 15
+
 struct ieee80211_fragment_entry {
unsigned long first_frag_time;
unsigned int seq;
@@ -241,6 +245,7 @@ struct ieee80211_if_sta {
IEEE80211_IBSS_SEARCH, IEEE80211_IBSS_JOINED
} state;
struct work_struct work;
+   struct timer_list admit_timer; /* Recompute EDCA admitted time
*/
u8 bssid[ETH_ALEN], prev_bssid[ETH_ALEN];
u8 ssid[IEEE80211_MAX_SSID_LEN];
size_t ssid_len;
@@ -328,6 +333,19 @@ struct ieee80211_sub_if_data {
 
 #define IEEE80211_DEV_TO_SUB_IF(dev) netdev_priv(dev)
 
+struct sta_ts_data {
+   enum {
+   TS_STATUS_UNUSED= 0,
+   TS_STATUS_ACTIVE= 1,
+   TS_STATUS_INACTIVE  = 2,
+   TS_STATUS_THROTTLING= 3,
+   } status;
+   u8 dialog_token;
+   u8 up;
+   u32 admitted_time_usec;
+   u32 used_time_usec;
+};
+
 struct ieee80211_local {
/* embed the driver visible part.
 * don't cast (use the static inlines below), but we keep @@
-449,18 +467,19 @@ struct ieee80211_local {  #ifdef
CONFIG_HOSTAPD_WPA_TESTING
u32 wpa_trigger;
 #endif /* CONFIG_HOSTAPD_WPA_TESTING */
-/* SNMP counters */
-/* dot11CountersTable */
-u32 dot11TransmittedFragmentCount;
-u32 dot11MulticastTransmittedFrameCount;
-u32 dot11FailedCount;
+   /* SNMP counters */
+   /* dot11CountersTable */
+   u32 dot11TransmittedFragmentCount;
+   u32 dot11MulticastTransmittedFrameCount;
+   u32 dot11FailedCount;
u32 dot11RetryCount;
u32 dot11MultipleRetryCount;
u32 dot11FrameDuplicateCount;
-u32 dot11ReceivedFragmentCount;
-u32 dot11MulticastReceivedFrameCount;
-u32 dot11TransmittedFrameCount;
-u32 dot11WEPUndecryptableCount;
+   u32 dot11ReceivedFragmentCount;
+   u32 dot11MulticastReceivedFrameCount;
+   u32 dot11TransmittedFrameCount;
+   u32 dot11WEPUndecryptableCount;
+   u32 dot11EDCAAveragingPeriod;
 
 #ifdef CONFIG_D80211_LEDS
int tx_led_counter, rx_led_counter;
@@ -533,6 +552,17 @@ struct ieee80211_local {
* (1  MODE_*) */
 
int user_space_mlme;
+
+   u32 

RE: [PATCH 6/6] d80211: add sysfs interface for QoS functions

2006-12-14 Thread Simon Barber
This is all part of the client MLME - it would be much better to add
this functionality to wpa_supplicant, rather than adding it to the
kernel. Nothing here needs to be in the kernel for any reason.

The client MLME functions that are in the kernel were put in there for
test and debugging convenience only - the right client MLME to use is
the one in wpa_supplicant. Especially with all the new and very complex
MLME functions that are being added to 802.11 we do not want this huge
amount of code in the kernel when it does not need to be there.

The client MLME in the kernel should be #ifdefed out and made a kernel
option - off by default.

Simon
 

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Zhu Yi
Sent: Wednesday, December 13, 2006 8:03 PM
To: netdev@vger.kernel.org
Subject: [PATCH 6/6] d80211: add sysfs interface for QoS functions

The sysfs interface here is only a proof of concept. It provides a way
for the userspace applications to use the advanced QoS features
supported by
d80211 stack. The finial solution should be switched to cfg80211.

Signed-off-by: Zhu Yi [EMAIL PROTECTED]

---

 net/d80211/ieee80211_i.h |   13 ++
 net/d80211/ieee80211_sysfs.c |  245
++
 2 files changed, 258 insertions(+), 0 deletions(-)

83d49f70af1f38c152d8bd3abd69756ec087622e
diff --git a/net/d80211/ieee80211_i.h b/net/d80211/ieee80211_i.h index
d09f65e..7904033 100644
--- a/net/d80211/ieee80211_i.h
+++ b/net/d80211/ieee80211_i.h
@@ -20,6 +20,7 @@
 #include linux/workqueue.h
 #include linux/types.h
 #include linux/spinlock.h
+#include net/d80211_mgmt.h
 #include ieee80211_key.h
 #include sta_info.h
 
@@ -329,6 +330,7 @@ struct ieee80211_sub_if_data {
 int channel_use_raw;
 
struct attribute_group *sysfs_group;
+   struct attribute_group *sysfs_qos_group;
 };
 
 #define IEEE80211_DEV_TO_SUB_IF(dev) netdev_priv(dev) @@ -702,6 +704,17
@@ struct sta_info * ieee80211_ibss_add_sta
 u8 *addr);
 int ieee80211_sta_deauthenticate(struct net_device *dev, u16 reason);
int ieee80211_sta_disassociate(struct net_device *dev, u16 reason);
+void ieee80211_send_addts(struct net_device *dev,
+ struct ieee802_11_elem_tspec *tspec); void
wmm_send_addts(struct 
+net_device *dev,
+   struct ieee802_11_elem_tspec *tspec); void 
+ieee80211_send_delts(struct net_device *dev, u8 tsid, u8 direction,
+ u32 medium_time);
+void wmm_send_delts(struct net_device *dev, u8 tsid, u8 direction,
+   u32 medium_time);
+void ieee80211_send_dls_req(struct net_device *dev, struct dls_info 
+*dls); void ieee80211_send_dls_teardown(struct net_device *dev, u8 
+*mac, u16 reason); void dls_info_add(struct ieee80211_local *local, 
+struct dls_info *dls);
 void dls_info_stop(struct ieee80211_local *local);  int
dls_link_status(struct ieee80211_local *local, u8 *addr);
 
diff --git a/net/d80211/ieee80211_sysfs.c b/net/d80211/ieee80211_sysfs.c
index 6a60077..31dc1f4 100644
--- a/net/d80211/ieee80211_sysfs.c
+++ b/net/d80211/ieee80211_sysfs.c
@@ -13,6 +13,7 @@
 #include linux/netdevice.h
 #include linux/rtnetlink.h
 #include net/d80211.h
+#include net/d80211_mgmt.h
 #include ieee80211_i.h
 #include ieee80211_rate.h
 
@@ -21,6 +22,15 @@
 #define to_net_dev(class) \
container_of(class, struct net_device, class_dev)
 
+/* For sysfs and debug only */
+static struct ieee802_11_elem_tspec _tspec; static u8 
+_dls_mac[ETH_ALEN];
+
+#define TSID _tspec.ts_info.tsid
+#define TSDIR _tspec.ts_info.direction
+#define TSUP _tspec.ts_info.up
+
+
 static inline int rtnl_lock_local(struct ieee80211_local *local)  {
rtnl_lock();
@@ -657,6 +667,230 @@ static struct class ieee80211_class = {  #endif
};
 
+
+/* QoS sysfs entries */
+static ssize_t show_ts_info(struct class_device *dev, char *buf) {
+   /* TSID, Direction, UP */
+   return sprintf(buf, %u %u %u\n, TSID, TSDIR, TSUP); }
+
+static ssize_t store_ts_info(struct class_device *dev, const char *buf,
+size_t len)
+{
+   unsigned int id, index, up;
+
+   if (sscanf(buf, %u, %u, %u, id, index, up) != 3) {
+   printk(KERN_ERR %s: sscanf error\n, __FUNCTION__);
+   return -EINVAL;
+   }
+   if (id  8 || id  15) {
+   printk(KERN_ERR invalid tsid %d\n, id);
+   return -EINVAL;
+   }
+   if ((index != 0)  (index != 1)  (index != 3)) {
+   printk(KERN_ERR invalid direction %d\n, index);
+   return -EINVAL;
+   }
+   if (up  0 || up  7) {
+   printk(KERN_ERR invalid UP %d\n, up);
+   return -EINVAL;
+   }
+   TSID = id;
+   TSDIR = index;
+   TSUP = up;
+   return len;
+}
+
+static CLASS_DEVICE_ATTR(ts_info, S_IWUSR|S_IRUGO, show_ts_info, 
+store_ts_info);
+
+static ssize_t show_tspec(struct class_device *dev, char *buf) {
+ 

RE: [PATCH 5/6] d80211: add IEEE 802.11e Direct Link Setup (DLS) support

2006-12-14 Thread Simon Barber
Again - this DLS management frame processing code should not be in the
kernel - it should be in wpa_supplicant.

Only the frame processing code should be in the kernel.

Simon
 

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Zhu Yi
Sent: Wednesday, December 13, 2006 8:03 PM
To: netdev@vger.kernel.org
Subject: [PATCH 5/6] d80211: add IEEE 802.11e Direct Link Setup (DLS)
support

Struct dls_info is declared to store the peer's MAC address, timeout
value, supported rates, etc information for the DLS link. The stack also
maintains a hash table to store the dls_info for all the DLS peers for
local interface. The peer's MAC address is used as the hash table key.
The DLS MLMEs handling functions for DLS Setup Request, DLS Response and
DLS teardown are added.

During packet TX, the stack compares the destination MAC address against
the dls_info hash table and see whether a Direct Link should be used for
the packet transmission. If so, it modifiess the IEEE 802.11 MAC header
DA, SA and BSS fields to reflect the direct link setup.

Signed-off-by: Zhu Yi [EMAIL PROTECTED]

---

 net/d80211/ieee80211.c |   19 +-
 net/d80211/ieee80211_i.h   |   17 ++
 net/d80211/ieee80211_sta.c |  450

 3 files changed, 481 insertions(+), 5 deletions(-)

077c391798f72f356c0a5cb50f307b50143a5dcc
diff --git a/net/d80211/ieee80211.c b/net/d80211/ieee80211.c index
4eba18f..b25d00e 100644
--- a/net/d80211/ieee80211.c
+++ b/net/d80211/ieee80211.c
@@ -1472,11 +1472,18 @@ static int ieee80211_subif_start_xmit(st
 memcpy(hdr.addr4, skb-data + ETH_ALEN, ETH_ALEN);
 hdrlen = 30;
 } else if (sdata-type == IEEE80211_IF_TYPE_STA) {
-   fc |= IEEE80211_FCTL_TODS;
-   /* BSSID SA DA */
-   memcpy(hdr.addr1, sdata-u.sta.bssid, ETH_ALEN);
-   memcpy(hdr.addr2, skb-data + ETH_ALEN, ETH_ALEN);
-   memcpy(hdr.addr3, skb-data, ETH_ALEN);
+   if (dls_link_status(local, hdr.addr1) == DLS_STATUS_OK)
{
+   /* DA SA BSSID */
+   memcpy(hdr.addr1, skb-data, ETH_ALEN);
+   memcpy(hdr.addr2, skb-data + ETH_ALEN,
ETH_ALEN);
+   memcpy(hdr.addr3, sdata-u.sta.bssid, ETH_ALEN);
+   } else {
+   fc |= IEEE80211_FCTL_TODS;
+   /* BSSID SA DA */
+   memcpy(hdr.addr1, sdata-u.sta.bssid, ETH_ALEN);
+   memcpy(hdr.addr2, skb-data + ETH_ALEN,
ETH_ALEN);
+   memcpy(hdr.addr3, skb-data, ETH_ALEN);
+   }
hdrlen = 24;
} else if (sdata-type == IEEE80211_IF_TYPE_IBSS) {
/* DA SA BSSID */
@@ -4602,6 +4609,7 @@ int ieee80211_register_hw(struct ieee802
/* Initialize QoS Params */
local-dot11EDCAAveragingPeriod = 5;
local-MPDUExchangeTime = 0;
+   spin_lock_init(local-dls_lock);
 
/* TODO: add rtnl locking around device creation and qdisc
install */
ieee80211_install_qdisc(local-mdev);
@@ -4702,6 +4710,7 @@ void ieee80211_unregister_hw(struct ieee
 
ieee80211_rx_bss_list_deinit(local-mdev);
ieee80211_clear_tx_pending(local);
+   dls_info_stop(local);
sta_info_stop(local);
rate_control_deinitialize(local);
ieee80211_dev_sysfs_del(local);
diff --git a/net/d80211/ieee80211_i.h b/net/d80211/ieee80211_i.h index
e8929d3..d09f65e 100644
--- a/net/d80211/ieee80211_i.h
+++ b/net/d80211/ieee80211_i.h
@@ -346,6 +346,18 @@ struct sta_ts_data {
u32 used_time_usec;
 };
 
+#define DLS_STATUS_OK  0
+#define DLS_STATUS_NOLINK  1
+#define DLS_STATUS_SETUP   2
+struct dls_info {
+   atomic_t refcnt;
+   int status;
+   u8 addr[ETH_ALEN];
+   struct dls_info *hnext; /* next entry in hash table list */
+   u32 timeout;
+   u32 supp_rates;
+};
+
 struct ieee80211_local {
/* embed the driver visible part.
 * don't cast (use the static inlines below), but we keep @@
-558,6 +570,9 @@ struct ieee80211_local {
 #define STA_TSDIR_NUM  2
/* HCCA: 0~7, EDCA: 8~15 */
struct sta_ts_data ts_data[STA_TSID_NUM][STA_TSDIR_NUM];
+
+   struct dls_info *dls_hash[STA_HASH_SIZE];
+   spinlock_t dls_lock;
 };
 
 enum sta_link_direction {
@@ -687,6 +702,8 @@ struct sta_info * ieee80211_ibss_add_sta
 u8 *addr);
 int ieee80211_sta_deauthenticate(struct net_device *dev, u16 reason);
int ieee80211_sta_disassociate(struct net_device *dev, u16 reason);
+void dls_info_stop(struct ieee80211_local *local); int 
+dls_link_status(struct ieee80211_local *local, u8 *addr);
 
 /* ieee80211_dev.c */
 int ieee80211_dev_alloc_index(struct ieee80211_local *local); diff
--git a/net/d80211/ieee80211_sta.c b/net/d80211/ieee80211_sta.c index
81b2ded..393a294 100644
--- a/net/d80211/ieee80211_sta.c

RE: d80211-drivers pull request (week-48)

2006-12-14 Thread Simon Barber
Devicescape does understant that the hardware can do retries - but it
adds software retries on top. This allows higher reliability, as well as
correct handling of the powersave state machine. (PS bit from a STA is
supposed to stop APs transmission immediately).

Simon

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Michael Buesch
Sent: Tuesday, December 12, 2006 1:35 AM
To: Daniel Drake
Cc: Michael Wu; John Linville; netdev@vger.kernel.org; Ulrich Kunitz
Subject: Re: d80211-drivers pull request (week-48)

On Tuesday 12 December 2006 02:07, Daniel Drake wrote:
 Michael Wu wrote:
zd1211rw-d80211: Use ieee80211_tx_status
 
 I've thought some more about this and I'm not so sure that this is the

 right approach.
 
 Can't devicescape be taught that the ZD1211 handles retries in 
 hardware and the stack doesn't need to worry about it?
 
 What does devicescape do in response to not getting an ack?

It does ratecontrol based on that.
Basically: No ACK == failed packet. If too many failures, lower the
rate.

--
Greetings Michael.
-
To unsubscribe from this list: send the line unsubscribe netdev in the
body of a message to [EMAIL PROTECTED] More majordomo info at
http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html