I. Rationale Currently, the CPANTesters.org site has been running "in a degraded state" for some time; many reports that were previously available simply aren't there any more, and the platform has frequent outages in which it will not accept new reports, as well as a non-functioning API for requests. Doug Bell (preaction) has been the primary maintainer of this crucial piece of infrastructure for some time, but has lately been hard to reach for questions, concerns, and even offers of assistance.
This creates a "Bus Factor" problem for the larger Perl community; both module authors and users, as well as the core developers, depend on CPAN testing for assurance that their work is sound and functioning as designed across the many platforms and versions of Perl that exist in the wild. While I (and others) will be attending the Toolchain Summit with the specific goal of working with Doug to help make the existing infrastructure more robust and capable, I believe there is also room in our community for parallel development of a new system, which is being called the "Perl Magpie." The existing code contains artifacts of testing dating back to the end of the 20th century; the Magpie team believes that it is time for an end-to-end redesign, that preserves the functionality of the 25-year-old code, while opening up prospects for new ways of reporting tests. II. Design philosophy and program structure Magpies, birds of the family Corvidae, are seen throughout Eurasia, in a variety of species. They are known to collect found objects, and will reputedly display found objects to peers and other animals that they may interact with. They are regarded by scientists to be very intelligent, including the ability to make and use tools, imitate human speech, and work in teams. Inspired by the magpie, we wish to develop a simply-structured web application that can collect CPAN testing and other data about CPAN modules, and present it in multiple friendly formats upon request. Obviously, we need backwards-compatibility in such an application, so that existing testers may send it test results with only minor changes to their configuration. Additionally, we would like the application to provide additional means of ingesting tests, and a wide variety of API accessors to read back the information after ingestion, so that others may generate creative new ways of displaying the data. In my daily work, I write a lot of Dancer2 applications, using PostgreSQL as a database, with simple JavaScript/jQuery front-ends. Dancer2 handles the web routing engine, and with plugins, can provide for authentication/registration, RSS feeds, and a fully RESTful API. Using JavaScript and jQuery lets us send some of the rendering work to the browser, offloading that work from the server and allowing for customized rendering in browsers focused on accessibility for users with (particularly) visual disabilities. Our first-phase design philosophy centers on the KISS principle--make it as complex as necessary, and no moreso, and not adding features or functionality because "we might want that someday," or prejudging the use cases, instead waiting for them to arise. We begin by ensuring that existing functionality is preserved--to a reporter using Test::Reporter::Transport::Metabase, it should "look like" a Metabase, responding to the same URLs that the existing system will do. Thus, a test reporter could change the address (not the whole URL) in their .cpanreporter configuration file, and submit tests--it should Just Work, and properly respond to a correctly-formatted test report by ingesting it, and promptly making it available for reporting. The database structure is simple, with the "Test" table storing a compressed text of the report, and extracting key fields for aggregate reporting needs to indexed fields (Perl version, distribution, author, tester, OS name, OS version, platform/config options, and the test result). As we coax out new information for the Magpie to collect, the DB schema will expand incrementally. As much as possible, we'll avoid "statistics" tables for the aggregation of data, until and unless we reach a point that the database engine can't handle the load for us. While real-world rate data for this dataset is unavailable to us at this time, I have worked with and developed applications that are handling multiple-millions of records, summing numerical data with sub-one-second times. Layout is largely left to the remote user of the data, using simple JavaScript for rendering. Simple templates are created using Template::Toolkit to provide a default interface, and APIs will be created and documented for consumption in ways we haven't conceived. III. Feature Roadmap, and current status of work The repository for the code is at https://github.com/Perl-Magpie/Perl-Magpie, and pull requests are, of course, welcome! V1.0: The basics -- Hope to have this completed and deployed prior to PTS May 1: * Accept a properly-formatted test result from cpanm-reporter, using the Metabase transport pointing at its address. (DONE) * Display a matrix for a given distribution, showing all available test results in an informative aggregate page. (DONE) * Allow for a browser click on the matrix elements, to drill down to a list of tests that meet the desired criteria combination (OS, Perl Version, Distribution, in any combination. (IN PROGRESS) * Allow for a report ID to be clicked on, as in the current matrix system, to allow display of a specific test report. * Create a front-end page to allow for searching on module or distribution name, which if successful, will return the matrix for the most-recent release of a module, or a specific version, if any tests are available. * Provide for appropriate responses in the case of malformed test reports to the "Metabase" transport route, and for "No results available" in the case of a search that does not hit in the database. * Add robust external monitoring of services to ensure service reliability and uptime for all users of the Perl ecosystem. This would include: API health, Ping thresholds, disk usage, CPU load, backend health, etc which would be published to an external site for community visibility. * Optionally have health results published to IRC or the CPT mailing list with information about the platform and its responsiveness for community consumption. This should help prevent the "bus factor" issue we have today, and allow other people with supporting skillsets to get involved. V1.1: Make it play with the rest of the ecosystem * Utilizing a cron job or other strategy, capture the current Metabase "log.txt" periodically, and use that to accumulate test results currently coming in to the legacy Metabase. * Provide for a "log.txt" result search for the Magpie, similarly formatted, so testers can see that their tests are being ingested properly. * Create an API that could be used by MetaCPAN and others to provide a fast summary of results in JSON format for a given distribution. * Work out some sort of mechanism to associate one or more Metabase UUIDs to names and email addresses, to replicate the old "registration/moderation" behavior after CPAN::Reporter configuration commands. * Add some top-lists and interesting data to the front page--top testers, most-tested, most-failed, etc. * Add email functionality to allow for emailing failed tests to module developers (including opt-out and summary capability). * Create RSS feeds for distributions and CPAN authors, to give test reports via RSS. * Add "other versions" to the matrix page, to allow for checking tests on prior/other versions of a distribution. V1.x: Ensure that other CPAN test tooling besides cpanm-reporter is functional. V2.0: "Out there. Thataway." * Possibilities without end--a new reporter module for CPAN that sends reports to the Magpie via a new API? Create and/or collect CPANTS information? Ideas are welcome, and it's time to think outside of the box.