David Golden wrote:
On Jan 18, 2008 8:09 AM, David Cantrell <[EMAIL PROTECTED]> wrote:
I would like a plain-text log alongside the indexed one - it's easier to
look at from shell scripts so I can do things like compare the log to
the list of distributions I'm testing and see where a smoker broke.
Out of curiosity, do you clear out your log file periodically? If
you're really up to ~80K reports, I've got to think that your log
files are getting huge.
My biggest one is 2MB, recording all the reports I've sent from Linux.
I don't clear them out. If something is a common pre-requisite but
always fails on 5.6.2, like WWW::Mechanize does, then I really shouldn't
clear that out of my logs and hassle Andy again about a version that
I've already tested. He already knows about the test failure.
If they do ever get too big, then I suppose I could write a little
script to strip out anything that's been superceded on the CPAN.
It might also be useful to have OS (and version) and hostname - the
former to cope with OS upgrades on a machine, which would make sending
the reports again a legitimate thing to do, the latter for the case
where a home directory is mounted over NFS and shared between several
smoke boxes.
The OS/version are part of the "unique" characteristics of a report
already so those have to go in. Hostname seems a bit more like
overkill. I mean, if you test Foo-Bar-1.23 on one machine, do you
really want to be testing it again on the same perl/arch/os but just a
different hostname?
I suppose not. Having the OS/version is probably sufficient.
Filename length limits. Case-sensitivity. Consumption of vast numbers
of inodes. That last one is a killer. If we have 30,000 test reports
in the database, each with some combination of:
author/dist/version/perl/epoch/grade/platform/hostname
then that's [tappity-tap] 240,000 inodes.
Inodes. Right. Ick. I'm not sure I buy the math, but inode
consumption could be relevant -- particularly given the number of
reports being submitted by the leaders.
Thinking about it, I don't buy the maths either :-) You could keep the
number down by carefully ordering the components in the path to restrict
the number of directories created - keep that which varies the least at
the beginning, like architecture, perl version, grade, and that which
varies the most - distribution and epoch - at the end.
An inode is consumed for every file and every directory.
I think CPAN::YACSmoke uses SDBM_File, but
from MJD's presentation on lightweight databases, it looks like it
might have issues as the number of keys gets into the thousands.
http://perl.plover.com/yak/lightweight-db/materials/slides/slide077.html
I don't know if DBM::Deep has similar issues.
It seems to cope with Number::Phone::UK::DetailedLocations OK, which has
about a quarter of a million records in a __DATA__ section.
--
David Cantrell | Minister for Arbitrary Justice
Computer Science is about lofty design goals and careful algorithmic
optimisation. Sysadminning is about cleaning up the resulting mess.