Hi all, After making two NICs multiple receive queue working, I now propose to add Toeplitz hash function to map packet's CPU. It is mainly use to support "receive side scaling" (http://www.microsoft.com/whdc/device/network/ndis_rss.mspx) in hardware. To make the whole story short: the hardware will calculate the hash when receiving a packet, and put the packet to the proper RX queue along with the calculated hash, which means we don't need to calculate the hash ourselves and the input processing could be fully parallelized (if the multiple TX queue support is added then the whole forwarding path is even CPU localized). If the packet is non-fragment TCP, the hash is calculated based on laddr,faddr,lport,fport, else the hash is calculated using laddr,faddr.
There are two things we need to overcome: 1) The result of hash function is non-commutative in the M$ paper, i.e. faddr,laddr,fport,lport and laddr,faddr,lport,fport gives different result. Thanks to corecode's suggestion, as long as 0xabcd is replicated to form the key, the result of the hash function _is_ commutative. 2) It is computational heavy Thanks to corecode again, we could cache a pre-calculated result table, so we actually only need to index a array and OR the results. A simple implementation is at: http://leaf.dragonflybsd.org/~sephe/toe.c I used it to verify that hardware gives the correct result :) The whole thing is not implemented yet, but if you don't think its bad idea, I will move on to implement it. Note, it is not intended to replace the current packet hash function. Best Regards, sephe -- Live Free or Die