Justin Mason wrote:
Daryl C. W. O'Shea writes:
The zone's nightly-mc corpus (uploaded corpora) are this big (in KB):
2 /export/home/bbmass/rawcor/doc
19760 /export/home/bbmass/rawcor/fredt
6764040 /export/home/bbmass/rawcor/jm (mostly spam, since May 2007)
209393 /export/home/bbmass/rawcor/zmi
so that's pretty big. In terms of disk space usage, that probably
wouldn't take much space to cp -al; but it'd take a fair bit of time,
esp on the zone, which has serious I/O bottleneck problems.
As an aside, if bandwidth is free, the whole mass-check will run quite a
bit faster if you rsync the corpus to each of the slaves. Of course
that assumes you've got the disk space and i/o to spare (i/o you may
already have if /tmp isn't a ramdisk).
yeah, rsyncing about 7GB of corpora, nightly, would definitely be slow ;)
Not really, it's probably less than 100MB change a day. My current
personal spam corpus is 2.1 GB over the last 60 days. Rsync'ing it
nightly with my 128kbit upload speed doesn't take very long. If the
objective is to get our disk i/o usage down on the zone this would make
a serious difference.
Daryl