While working on the non-SRQ support for IPOIB CM I observed that scatter-gather lists adversely impacts performance (as compared to without it). On the whole, CM mode does improve performance -with or without scatter-gather lists, but we lose a lot of throughput (something like >15%) with sg lists.
I looked at the profiles and found that ipoib_cm_alloc_rx_skb() (and the associated alloc_page()) show up far more (> 10X) in the profile with sg lists, than without it. To put this in perspective, upon receipt of a packet we call ipoib_cm_alloc_rx_skb() which in turn ends up calling alloc_page() 16 times (every time!). I believe that is where we are taking a big hit with sg lists. This and the associated sg list processing is what causes the throughput drop. I loked at the e1000 driver to see how they handle this issue and here are a few things that I learnt; which we may try and incorporate as we find suitable: 1. e1000 driver does not use sg lists in all cases 2. e100 driver uses a max of 3 fragments (to handle jumbo frames) 3. e1000 driver uses "copybreak" as a module paramater. For small packets (less than copybreak) they actually go ahead and unsplit the packet. In fact they specifically call out alloc_page() and put_page() as eating up CPU cycles and try to avoid them when feasible. 4. There is decision made (rx_ps_pages) if one one should use packet split or not. This decision is based on several factors like mtu, page size and the like. Can we try and incorporate items 1, 3 and 4 in to the implementation of IPOIB CM? What is the general opinions about this? Should we look at some other drivers? Pradeep [EMAIL PROTECTED] _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
