At Robert Hyatt's prompting, I have looked more closely at
this. The degree to which a pipe (i.e., cmd1 | cmd2)
parallelizes seems to depend quite sensitively on (a) the
size of the writes/reads and (b) the amount of computation
done between the writes/reads.

I've written a small program (attached) that writes/reads a
megabyte of data and does some pointless computation for
each byte it writes. There are two parameters: "write_size",
which is the size of each write/read and "think_exponent",
which is the log-2 of the amount of computation done per
byte.

Here are the results for elapsed time and % CPU as functions
of think_exponent (3 to 11) and write_size (1k to 1024k):

        1     2     3     4     5     6     7     8     9    10    11    12 
   1k  0.08  0.13  0.20  0.36  0.67  1.28  2.45  4.85  9.67 19.58 39.76 78.84
   2k  0.07  0.13  0.21  0.36  0.66  1.27  2.47  4.89  9.78 19.41 38.61 77.40
   4k  0.08  0.14  0.20  0.36  0.67  1.27  2.47  4.87  9.74 19.41 38.45 77.07
   8k  0.09  0.15  0.22  0.37  0.87  1.28  2.50  5.14  9.92 19.44 39.18 78.04
  16k  0.10  0.16  0.21  0.46  0.72  1.36  2.54  5.00  9.87 19.78 40.06 81.95
  32k  0.11  0.23  0.38  0.67  0.74  1.30  2.47  5.05  9.88 20.77 39.00 79.51
  64k  0.11  0.22  0.39  0.63  0.93  1.85  2.80  5.10 10.37 22.89 40.62 79.54
 128k  0.11  0.23  0.36  0.69  1.21  2.10  3.66  6.61 10.91 19.59 39.04 77.79
 256k  0.10  0.20  0.35  0.48  0.80  1.39  2.55  4.99  9.81 19.57 38.74 76.93
 512k  0.12  0.21  0.24  0.41  0.71  1.29  2.67  5.11  9.89 19.31 38.85 77.87
1024k  0.11  0.14  0.22  0.36  0.69  1.32  2.62  4.95  9.67 19.33 39.76 77.46

        1     2     3     4     5     6     7     8     9    10    11    12 
   1k  188   176   196   193   195   195   199   198   197   195   191   193 
   2k  181   186   189   191   194   196   197   197   196   197   198   197 
   4k  187   178   186   196   195   196   197   198   197   197   196   197 
   8k  159   169   180   198   148   190   194   188   194   197   195   196 
  16k  132   130   194   151   180   181   191   193   196   193   190   186 
  32k   86   105   101   104   170   190   196   190   193   184   195   192 
  64k  106   101    97   106   141   130   172   189   185   166   185   191 
 128k  100   103   105   100   106   118   133   145   176   195   196   196 
 256k  120   115   108   141   160   178   190   193   195   192   194   199 
 512k  115   114   162   167   179   193   184   188   194   198   197   197 
1024k  107   163   165   192   187   189   186   195   199   194   188   197 

When the pipe parallelizes well, the %CPU is close to 200%;
when it parallelizes badly, the %CPU is close to 100%.

You can see that the pipe parallelizes badly when the
write/read size is in a certain interval (e.g., 16k to 512k
for think_exponent of 4) and the amount of computation being
done is small or moderate.

Can any one explain this? Can anyone fix it?

Again, I don't think this is latency because if I replace
the line "n = write_size" in the function write_or_read with
"n = 1" then everything parallelizes well. It seems to be
that under certain circumstances, the amonut of data being
passed and the amount of computation being performed
conspire to keep both processes on the same processor.

Regards,

Alan

/*
 * use as: 
 * ./a.out write think_exponent write_size_in_k
 * ./a.out read  think_exponent write_size_in_k
 */
#include <unistd.h>
#include <stdlib.h>

#define NBYTES  (1024L * 1024L)

long think_exponent;
size_t write_size;

void
write_or_read(char *which)
{
  unsigned char buffer[NBYTES];
  ssize_t n, m;
  n = write_size;
  do {
    if (strcmp(which, "write") == 0)
      m = write(1, buffer, n);
    else
      m = read(0, buffer, n);
    n -= m;
  } while (n > 0);
}

void
think(void)
{
  long i, j;
  volatile unsigned int v = 1;
  for (j = 0; j < write_size; ++j)
    for (i = 0; i < 2 << think_exponent; ++i)
      v *= 3;
}

int
main(int argc, char **argv)
{
  long i;

  think_exponent = atol(argv[2]);
  write_size = atol(argv[3]) * 1024;

  for (i = 0; i < NBYTES / write_size; ++i) {
    write_or_read(argv[1]);
    think();
  }

  return 0;
}

-- 
Dr Alan Watson
Instituto de Astronom�a UNAM
-
Linux SMP list: FIRST see FAQ at http://www.irisa.fr/prive/mentre/smp-faq/
To Unsubscribe: send "unsubscribe linux-smp" to [EMAIL PROTECTED]

Reply via email to