Re: Feedback wanted on new CPAN::Testers::Client configuration mechanism

David Golden Wed, 13 May 2009 06:25:32 -0700

On Tue, May 12, 2009 at 11:37 AM, David Cantrell <[email protected]> wrote:
> All of it should be carried over, but choice of editor should default to
> just looking at $EDITOR at runtime.  Needs to be configurable though in
> case you're using a platform where setting that isn't customary, such as
> Windows.


The pattern that git uses is pretty good.  You can set editor in
config or it will use GIT_EDITOR or it will use EDITOR (then maybe
VISUAL, but that often has issues).

> I suppose you could drop configuring whether to sent duplicates, as
> sending 'em would, I think, *always* be the wrong thing to do.

That one is really a "hidden/advanced" option in CPAN::Reporter that I
use mostly for testing/debugging so I can just keep generating a the
report for a single dist over and over again.  (Usually File::Marker,
which is probably the reason it has 150 reports or so.)

>> Should config files be human readable/editable or should they be
>> structured data (e.g. JSON)?
>
> Yes, and yes.  Sorry :-)  Thankfully, YAML is both, and JSON can be too
> provided it has suitable whitespace.  There's also no particular reason
> that I can see why to not just serialise a perl structure out to disk
> Data::Dumper-stylee - that has the advantage of not requiring that YAML
> or JSON be available in the testing environment.

On reflection, I'm doing it with Data::Dumper, cribbing from
Module::Build::Dumper.  So it's just Perl, which hopefully, anyone
testing Perl can read and edit.  :-)

>>                                 (I will note that JSON will be a
>> prerequisite for Metabase and will thus be available for CT config as
>> well.)
>
> Ah, OK.  In that case, use JSON, not YAML.

Since Gabor raised the idea of testing an external perl without (most
of) the CPAN Testers tools installed, I'm going to stick with Perl so
a really minimal CPAN Testers setup can access common config data
without JSON.

> Also consider multiple concurrent testing runs (of different
> distributions, obviously) using the same version and architecture.  This
> means that you can't just read the history file at startup, writing
> additions to disk but not re-reading the file.  You need to 'tail' the
> file to spot additions by other processes.  Or at least make it easy for
> someone else to add that later when they need it.

CPAN::Reporter already locks/reads/checks whenever history is queried
for that reason.  I'm not sure about whether I'd optimize for tailing
or for random-search.  Either way, it should live behind an API and
tools shouldn't be querying history directly.  There will probably be
a CPAN::Testers::History or equivalent that will provide common access
across tools.

I think that kind of component architecture is going to be the goal
for CPAN::Testers 2.0.  For example, if BinGOs wants to write a PoCo
around the local test history, he just needs that module and it's not
embedded in some larger CPAN::Testers::Client.

> Note that using filenames (eg creating files like
> 5.8.9--Foo-Bar-1.23.tar.gz--linux--blah-PASS) to store this info probably
> won't do the trick, because that will lead to an explosion of files, vast
> numbers of inodes, and people running up against filesystem quotas.
> Even if the files are zero-length, quota systems generally ration both
> space and number of files.

My null hypothesis at the moment is

    ./history/$perlversion-$archname/history.db
        PASS.db
        FAIL.db
        UNKNOWN.db
        NA.db

Where each db is a sorted list of distfile name and associated GUIDS
of that grade (including multiples if that is allowed):

    DAGOLDEN/File-Marker-0.13.tar.gz {GUID} {GUID} {GUID}

That would make checking for a duplicate report very fast -- binary
search in the right grade file.  Checking for any report is only
slightly slower since it's a search across those 4 files and we know
the likelihood is PASS > FAIL > UNKNOWN > NA.  (Last two are pretty
close these days).

Since getting smoker speedups depends on not retesting distributions
with a known result, optimizing for search seems to make sense for me.
 Writing a new result is slow due to the sort, but that's the
tradeoff.  Smoking a distribution with lots of dependencies
(<cough>Moose</cough>) could mean dozens of "is there already a
report?" queries and it would be really nice to not have those be
linear (reverse) search if history is just appended.

I'll probably re-read MJD's talk on lightweight databases and figure
out something I like that can be done simply using pure perl:
http://perl.plover.com/classes/lightweight-db/

>> I think it makes sense to allow the CT client config file to have
>> "sections" for automated testing clients, but that change may take a while
>> to happen (if it happens at all).
>
> Not sure what you mean by this.

In YAMLish-yadda-yadda terms:

    global:
        profile: myprofile.json
        ...
    CPAN::Reporter::Smoker
        status_file: ~/smoking.txt
        timeout: 3600
        ...
    POE::Component::BinGOs::Skynet::Smoker
        queue_module: ...
        log_module: ...
        irc_channels: ...

The point being that over time I'd like to see all CPAN Testers config
stuff migrate into one directory (at least) if not one file, so that
CPAN Testers ecosystem config isn't sprayed across .cpantesters,
.cpanreporter, .cpanplus, .whatever

-- David

Re: Feedback wanted on new CPAN::Testers::Client configuration mechanism

Reply via email to