Hi Baruch and all,

As you recommended, I used oprofile and it turned out that the __udp4_lib_lookup function spent most of the time. There is a udp hash table and the sockets are sought based on the 7 LSBs of the destination port number. So what happened is now quite obvious: I had many thousands of sockets, all with the same destination port, thus linked in the same slot of this hash table. I tried using different ports and it
was much faster then.

As the hash table seems to be about the proc fs, I tried to compile a kernel without that. I could start that kernel only with the init=/bin/bash option, otherwise many things did not work. My program stopped right after sending out the first packet, so the proc fs must be essential to the receive operation (but it might have been something else, too). If it is so, do you think I can remove the proc fs dependency that I can still use the regular socket functions (i.e. socket, connect, listen, send, recv etc)?

Now, I also understand why this receive related function is called right in the dev_queue_xmit function, where also some softirq happens, and both the send and receive part resided on the same host. As soon as I ran the test on two hosts, separating the send and receive sides, the considerable send delay simply disappeared. Could you help me to understand why? I expected the delay to appear on the receiving side, but it didn't.

The connection setup part (socket+connect) however still lineraly increased with the number of sockets. Hopefully, I can send you the oprofile result of this part, too, but in case you already have any idea of how I can get rid of this dependency, please, let me know!

thx a lot: Zacco

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to