I think the point is that if the common usage is to
sum many different files, or one file at a time over
long spans of time then the performance of getting
the bytes from the filesystem to user space may
outweigh any cache optimization gains

the ast apps are already at a disadvantage because they
pull in extra .so's over the base case(s) they are measured against

what I need is a big view analysis of at least a few more variables
so that resonable decisions can be made of ifdef'ing up the code

e.g.,

what is the startup cost of the extra .so's?

what are the effects, if any, of timing apps repeatedly over the same file
vs
timing the apps over enough files to blow fs cache(s)

what are the interactions between io/mmap block sizes and L? cache
block sizes being controlled by the prefetch calls?

my suspicions are that tweaking the user io/mmap block sizes (which can be done
in a general way for all apps, possibly with an ifdef in one place)
may change the timings and diminish the effects of the explicit prefetch calls

would it be enough to make them not worth it?
I don't know without more data

also
are there performance results for the unhacked gnu sum vs the hacked gnu sum?
are there performance results for the hacked gnu sum vs the solaris sum?

Reply via email to