Forgot to CC the list:
> I did a quick time -v, and found that sorting a 96M file, with -S500M > there were 36358 page faults, and only 5380 page faults with -S10M. > > Wow. > > So system time goes up, but user time goes down. It seems odd > that user time would go down, but I believe it's in the output of > the merging. > > In internal sort, the output occurs after all the merging's finished, > while in external merge the output occurs as each line is being > output. With my group working on parallel sort, we noticed a ~14% > speedup when we output to the top level of merging, as opposed > to all at once after the sort is completed. > > bash-3.2$ /usr/bin/time -v sort -S10M randL > /dev/null > Command being timed: "sort -S10M randL" > User time (seconds): 4.74 > System time (seconds): 0.57 > Percent of CPU this job got: 99% > Elapsed (wall clock) time (h:mm:ss or m:ss): 0:05.32 > Average shared text size (kbytes): 0 > Average unshared data size (kbytes): 0 > Average stack size (kbytes): 0 > Average total size (kbytes): 0 > Maximum resident set size (kbytes): 0 > Average resident set size (kbytes): 0 > Major (requiring I/O) page faults: 0 > Minor (reclaiming a frame) page faults: 5380 > Voluntary context switches: 14 > Involuntary context switches: 11 > Swaps: 0 > File system inputs: 0 > File system outputs: 0 > Socket messages sent: 0 > Socket messages received: 0 > Signals delivered: 0 > Page size (bytes): 4096 > Exit status: 0 > bash-3.2$ /usr/bin/time -v sort -S500M randL > /dev/null > Command being timed: "sort -S500M randL" > User time (seconds): 5.27 > System time (seconds): 0.28 > Percent of CPU this job got: 99% > Elapsed (wall clock) time (h:mm:ss or m:ss): 0:05.56 > Average shared text size (kbytes): 0 > Average unshared data size (kbytes): 0 > Average stack size (kbytes): 0 > Average total size (kbytes): 0 > Maximum resident set size (kbytes): 0 > Average resident set size (kbytes): 0 > Major (requiring I/O) page faults: 0 > Minor (reclaiming a frame) page faults: 36358 > Voluntary context switches: 3 > Involuntary context switches: 11 > Swaps: 0 > File system inputs: 0 > File system outputs: 0 > Socket messages sent: 0 > Socket messages received: 0 > Signals delivered: 0 > Page size (bytes): 4096 > Exit status: 0 > > > > > > ----- Original Message ---- > > From: Philip Rowlands > > To: Pádraig Brady > > Cc: Report bugs to ; Joey Degges > > Sent: Tue, March 2, 2010 5:21:15 AM > > Subject: Re: Taking advantage of L1 and L2 cache in sort > > > > On Tue, 2 Mar 2010, Pádraig Brady wrote: > > > > > Currently when sorting we take advantage of the RAM vs disk > > > speed bump by using a large mem buffer dependent on the size of RAM. > > > However we don't take advantage of the cache layer in the > > > memory hierarchy which has an increasing importance in modern > > > systems given the disparity between CPU and RAM speed increases. > > [snip data] > > > > Interesting results; this type of analysis might also benefit from running > > the > > > various tests under cachegrind, which would give detailed results about > > L1/L2 > > cache miss rates. > > > > > > Cheers, > > Phil
