On Wed, 7 Dec 2005, David S. Miller wrote:
> The different between the cases was not significant and the
> prefetching cases were better than no prefetching.  Again, still no
> detriment to performance.

I still think what e1000 is doing is way too aggressive.

I know of at least one platform, sparc64, that doesn't
even have enough prefetch slots on certain chips to support
the number of outstanding prefetches you are issuing.

One, maybe two, prefetches per RX skb processed should be
more than enough.  As demonstrated by Robert.

Please reduce the aggressiveness of your prefetching, at
least for the first implementation that goes into the tree.
Ok?  We can discuss doing more aggressive things in the
future.

Okay, so I tested with just pktgen sending to a machine with a PCI Express Intel 82571 server adapter. The stack is just discarding the data, no routing.

There are 5 prefetches in the e1000_clean_rx_irq_ps function
#1 skb->data
#2 next rx descriptor
#3 next rx buffer_info
#4 next skb
#5 next skb->data

There are _no_ copybreaks in this code

here are my results:
sender sending at ~700000 pkt/s
dual Xeon, 2.8GHz, 1MB cache, 2.6.14

all tests with 6.2.15
no prefetch:    517500
#1:             539000
#1#2:           547500
#1#2#5:         565500
#1#2#3#5:       565100
#1#2#3:         545500
#1#2#3#4:       544200

prefetching the next rx descriptor really helps us. (#2)
prefetching the next skb->data also really helps.

in no case did the prefetch hurt anything on this box. I'd prefer leaving #1#2#5 for now.

Jesse

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to