On Wed, May 13, 2009 at 09:25:17AM -0400, David Golden wrote:
> My null hypothesis at the moment is
>
> ./history/$perlversion-$archname/history.db
> PASS.db
> FAIL.db
> UNKNOWN.db
> NA.db
>
> Where each db is a sorted list of distfile name and associated GUIDS
> of that grade (including multiples if that is allowed):
>
> DAGOLDEN/File-Marker-0.13.tar.gz {GUID} {GUID} {GUID}
>
> That would make checking for a duplicate report very fast -- binary
> search in the right grade file.
For binary search, you need to start either:
seeking to approximate middle of range;
rewind or fast-forward to an actual record boundary;
read, wash, rinse, repeat
or have fixed-width fields. But the gods like to punish people who
arbitrarily restrict their data thus.
Perhaps it makes sense to use a binary format like GDBM_File. People
who need plain-text data for their shell scripts can trivially dump that
back out, and GDBM_File has always been in core.
GDBM_File doesn't, of course, let you store an array of GUIDs, but a
space-seperated list would probably do the job just fine. If you really
need structured data, DBM::Deep is the way to go, at the expense of
adding a non-core module to the dependencies. Of course, you could
still rename it to CPAN::Testers::DBM::Deep like I did with
Number::Phone to avoid "polluting" testers' machines with an unnecessary
module.
> Since getting smoker speedups depends on not retesting distributions
> with a known result
You still need to test (to find conflicts with other recently installed
modules) and install common dependencies every time if you test against
a reasonably clean perl install. The only thing you can reliably skip
is generating and sending the report.
> optimizing for search seems to make sense for me.
> Writing a new result is slow due to the sort, but that's the
> tradeoff.
That has the disadvantage of really hammering a network if the logs are
kept on NFS (mine are on some of the machines I use, and until I moved
the perl I was testing with etc into /tmp the sysadmins got rather
annoyed at me; I can't really move the logs into /tmp (and back to $HOME
at the end of a session) as that way they're not shared between
instances running on different machines but sharing the same $HOME).
> >> I think it makes sense to allow the CT client config file to have
> >> "sections" for automated testing clients, but that change may take a while
> >> to happen (if it happens at all).
> > Not sure what you mean by this.
> In YAMLish-yadda-yadda terms:
> global:
> profile: myprofile.json
> ...
> CPAN::Reporter::Smoker
> status_file: ~/smoking.txt
> timeout: 3600
> ...
> POE::Component::BinGOs::Skynet::Smoker
> queue_module: ...
Ah, OK. That makes perfect sense.
--
David Cantrell | Bourgeois reactionary pig
Awww, people say the sweetest things:
18:40 <@danshell> DrHyde: you sick fuck