I. Rationale

Currently, the CPANTesters.org site has been running "in a degraded state" for 
some time; many reports that were previously available simply aren't there any 
more, and the platform has frequent outages in which it will not accept new 
reports, as well as a non-functioning API for requests. Doug Bell (preaction) 
has been the primary maintainer of this crucial piece of infrastructure for 
some time, but has lately been hard to reach for questions, concerns, and even 
offers of assistance.

This creates a "Bus Factor" problem for the larger Perl community; both module 
authors and users, as well as the core developers, depend on CPAN testing for 
assurance that their work is sound and functioning as designed across the many 
platforms and versions of Perl that exist in the wild.

While I (and others) will be attending the Toolchain Summit with the specific 
goal of working with Doug to help make the existing infrastructure more robust 
and capable, I believe there is also room in our community for parallel 
development of a new system, which is being called the "Perl Magpie."

The existing code contains artifacts of testing dating back to the end of the 
20th century; the Magpie team believes that it is time for an end-to-end 
redesign, that preserves the functionality of the 25-year-old code, while 
opening up prospects for new ways of reporting tests.

II. Design philosophy and program structure

Magpies, birds of the family Corvidae, are seen throughout Eurasia, in a 
variety of species. They are known to collect found objects, and will reputedly 
display found objects to peers and other animals that they may interact with. 
They are regarded by scientists to be very intelligent, including the ability 
to make and use tools, imitate human speech, and work in teams.

Inspired by the magpie, we wish to develop a simply-structured web application 
that can collect CPAN testing and other data about CPAN modules, and present it 
in multiple friendly formats upon request. Obviously, we need 
backwards-compatibility in such an application, so that existing testers may 
send it test results with only minor changes to their configuration. 
Additionally, we would like the application to provide additional means of 
ingesting tests, and a wide variety of API accessors to read back the 
information after ingestion, so that others may generate creative new ways of 
displaying the data.

In my daily work, I write a lot of Dancer2 applications, using PostgreSQL as a 
database, with simple JavaScript/jQuery front-ends. Dancer2 handles the web 
routing engine, and with plugins, can provide for authentication/registration, 
RSS feeds, and a fully RESTful API. Using JavaScript and jQuery lets us send 
some of the rendering work to the browser, offloading that work from the server 
and allowing for customized rendering in browsers focused on accessibility for 
users with (particularly) visual disabilities.

Our first-phase design philosophy centers on the KISS principle--make it as 
complex as necessary, and no moreso, and not adding features or functionality 
because "we might want that someday," or prejudging the use cases, instead 
waiting for them to arise. We begin by ensuring that existing functionality is 
preserved--to a reporter using Test::Reporter::Transport::Metabase, it should 
"look like" a Metabase, responding to the same URLs that the existing system 
will do. Thus, a test reporter could change the address (not the whole URL) in 
their .cpanreporter configuration file, and submit tests--it should Just Work, 
and properly respond to a correctly-formatted test report by ingesting it, and 
promptly making it available for reporting.

The database structure is simple, with the "Test" table storing a compressed 
text of the report, and extracting key fields for aggregate reporting needs to 
indexed fields (Perl version, distribution, author, tester, OS name, OS 
version, platform/config options, and the test result). As we coax out new 
information for the Magpie to collect, the DB schema will expand incrementally. 
 As much as possible, we'll avoid "statistics" tables for the aggregation of 
data, until and unless we reach a point that the database engine can't handle 
the load for us. While real-world rate data for this dataset is unavailable to 
us at this time, I have worked with and developed applications that are 
handling multiple-millions of records, summing numerical data with 
sub-one-second times.

Layout is largely left to the remote user of the data, using simple JavaScript 
for rendering.  Simple templates are created using Template::Toolkit to provide 
a default interface, and APIs will be created and documented for consumption in 
ways we haven't conceived.

III. Feature Roadmap, and current status of work

The repository for the code is at https://github.com/Perl-Magpie/Perl-Magpie, 
and pull requests are, of course, welcome!

V1.0: The basics -- Hope to have this completed and deployed prior to PTS May 1:
  * Accept a properly-formatted test result from cpanm-reporter, using the 
Metabase transport pointing at its address. (DONE)
  * Display a matrix for a given distribution, showing all available test 
results in an informative aggregate page. (DONE)
  * Allow for a browser click on the matrix elements, to drill down to a list 
of tests that meet the desired criteria combination (OS, Perl Version, 
Distribution, in any combination. (IN PROGRESS)
  * Allow for a report ID to be clicked on, as in the current matrix system, to 
allow display of a specific test report.
  * Create a front-end page to allow for searching on module or distribution 
name, which if successful, will return the matrix for the most-recent release 
of a module, or a specific version, if any tests are available.
  * Provide for appropriate responses in the case of malformed test reports to 
the "Metabase" transport route, and for "No results available" in the case of a 
search that does not hit in the database.
  * Add robust external monitoring of services to ensure service reliability 
and uptime for all users of the Perl ecosystem. This would include: API health, 
Ping thresholds, disk usage, CPU load, backend health, etc which would be 
published to an external site for community visibility.
  * Optionally have health results published to IRC or the CPT mailing list 
with information about the platform and its responsiveness for community 
consumption. This should help prevent the "bus factor" issue we have today, and 
allow other people with supporting skillsets to get involved.

V1.1: Make it play with the rest of the ecosystem
  * Utilizing a cron job or other strategy, capture the current Metabase 
"log.txt" periodically, and use that to accumulate test results currently 
coming in to the legacy Metabase.
  * Provide for a "log.txt" result search for the Magpie, similarly formatted, 
so testers can see that their tests are being ingested properly.
  * Create an API that could be used by MetaCPAN and others to provide a fast 
summary of results in JSON format for a given distribution.
  * Work out some sort of mechanism to associate one or more Metabase UUIDs to 
names and email addresses, to replicate the old "registration/moderation" 
behavior after CPAN::Reporter configuration commands.
  * Add some top-lists and interesting data to the front page--top testers, 
most-tested, most-failed, etc.
  * Add email functionality to allow for emailing failed tests to module 
developers (including opt-out and summary capability).
  * Create RSS feeds for distributions and CPAN authors, to give test reports 
via RSS.
  * Add "other versions" to the matrix page, to allow for checking tests on 
prior/other versions of a distribution.

V1.x: Ensure that other CPAN test tooling besides cpanm-reporter is functional.

V2.0: "Out there.  Thataway."
  * Possibilities without end--a new reporter module for CPAN that sends 
reports to the Magpie via a new API?  Create and/or collect CPANTS information? 
Ideas are welcome, and it's time to think outside of the box.

Reply via email to