Justin Mason wrote:
Daryl C. W. O'Shea writes:

The zone's nightly-mc corpus (uploaded corpora) are this big (in KB):

  2       /export/home/bbmass/rawcor/doc
  19760   /export/home/bbmass/rawcor/fredt
  6764040 /export/home/bbmass/rawcor/jm (mostly spam, since May 2007)
  209393  /export/home/bbmass/rawcor/zmi

so that's pretty big.  In terms of disk space usage, that probably
wouldn't take much space to cp -al; but it'd take a fair bit of time,
esp on the zone, which has serious I/O bottleneck problems.

As an aside, if bandwidth is free, the whole mass-check will run quite a bit faster if you rsync the corpus to each of the slaves. Of course that assumes you've got the disk space and i/o to spare (i/o you may already have if /tmp isn't a ramdisk).

yeah, rsyncing about 7GB of corpora, nightly, would definitely be slow ;)

Not really, it's probably less than 100MB change a day. My current personal spam corpus is 2.1 GB over the last 60 days. Rsync'ing it nightly with my 128kbit upload speed doesn't take very long. If the objective is to get our disk i/o usage down on the zone this would make a serious difference.

Daryl

Reply via email to