Just to chip in, I've had a few complaints about this as one of the guys
behind malariacontrol.net.

The basic formula for the volume of data being saved to disk per hour is
s*f*n where s is the size of a checkpoint, f is the frequency (num written
per hour), and n is the number of jobs active in parallel. Obviously* *keeping
*s* small is the job of the project (we try, but there is a limit),* n* is
up to the user, but *f* is specified by the client/user and AFAICT the
client default cannot be specified by the project (we had one guy complain
that several computers using 1000MBit/s NAS were totally choking the
network).

Short version of the story is that increasing the checkpoint interval (even
quite drastically since checkpoints are only useful should the application
crash in a non-deterministic fashion or be killed unexpectedly) should help,
but I don't know what else can be done.

-Diggory

Reply via email to