On Friday 06 June 2008, Aggelos Economopoulos wrote: > On Monday 05 May 2008, Aggelos Economopoulos wrote: [...] > > On second thought, let me ask for input sooner rather than later. [...] > OK, same thing, but now it's the pcbs. TCP is "easy". [...] > My plan is to start a discussion on the more interesting in_pcb > situation on kernel@ this weekend.
Currently, inpcb's for UDP are all in a global hash table which is only protected by (you guessed it) the BGL. The straightforward way to avoid this would be to break the table a la TCP. This presents two problems. First, UDP sockets can issue connect() multiple times (thus changing faddr, fport) and call bind to go from wildcard laddr to a specific one. When this happens, the inpcb must be moved to another cpu. This shouldn't be too hard to handle; just mark the old inpcb as BEING_DELETED and only delete it after inserting the new inpcb. Dropping a few packets is expected for UDP and this shouldn't happen very often anyway. Then there is the more interesting issue of how to hash. As described above, the lport is the only field we can be sure is not a wildcard. Now consider a UDP (say DNS) server; such a server does not normally connect() so whatever hash function we choose, the inpcb is going to end up on one cpu. This is the cpu we would normally dispatch an incoming UDP packet to. The thing is, all datagrams for our UDP server will end up going through the same cpu. So our busy DNS server just can't scale: using only one protocol thread is going to be a bottleneck. And if we decide to allow multiple UDP protocol threads to access the socket then we may have to lock around accesses to the socket, but AFAICT that won't be necessary. UDP does not mess with most socket fields in the input/output paths and it seems to me the code can survive some socket option changing under it. However, our sockbuf can't handle concurrent accesses, so we'd have to have multiple sockbufs (one per cpu) and then the socket layer would have to pull data from all of them (probably in a round-robin fashion). UDP does not guarantee in-order delivery but, since in-order is typically the case, I'm not sure how well the apps can handle it. On top of that we'd need to decide what to do about buffer size limits and whether the sockbufs should stay in struct socket. OK, this should get the discussion started :) Aggelos
