Quoting Andrea Bittau:
| * the most expensive thing should be checksum calculating [~25%].
I have thought about it and I think the main reason is that DCCP first assembles
the packet (copy from user, add header) and checksums only after all that work
has been done.
| * After checksum calculation, the profile should be flat. That is, 100000
| functions, each taking 0.1%.
There used to be a similar situation in UDP, until people checksummed and
copied at the
same time (see Partridge/Pink "A faster UDP", TON 1993).
The kernel has csum_partial_copy_fromiovecend() which is used e.g. by
ip_generic_frag.
The challenge/difficulty of using this function with partial checksums is in
telling it to
* copy `len' bytes from user
* checksum cscov <= len bytes
(i.e. continue copying, but stop checksumming after cscov bytes)
* it leaves the checksum in skb->csum as before
If someone can find a way of adding this, including respecting 4-byte
boundaries, it
may improve performance by some degree. In this case, I would like to hear
about that,
since a similar case arises in UDP-Lite (RFC 3828).
Using partial checksums may give performance close to the copy_and_checksum
case, since
in the extreme case only the header is checksummed - and this has to be done
irrespective
of which copy function is used.
| Regarding checksums, have a look at:
| http://darkircop.org/check.png
This is very interesting to see but I could not tell what the axes were for -
do higher
numbers mean better relative performance or the other way around?
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html