On Thu, Dec 31, 2009 at 11:45 AM, Barbie <[email protected]> wrote: > I currently poll the NNTP server every hour for updates. I think the RSS > feed would be fine to use to know what to update, if it was updated > frequently enough.
I think we can get it to be much, much more frequent. > However, the bigger interest for me is whether I can grab the full > reports in a bundle. A daily DB snapshot would be too infrequent for > updates, but would be fine to ensure that everything was captured at the > end of each day. Would requesting each report in turn be too much of a > drain? Metabase is supposed to support that kind of batch behavior, though we've not tested it. (It might not be implemented yet, even.) > I'm currently working on supplying all the current reports via the > existing Reports site, this will then relieve the NNTP archive on > perl.org. Once I've got it running live in the next week or so, there > will be a difinitive URL template that can be used to display the SMTP > or HTTP submitted reports. This can be used in the RSS feed to relieve > any direct burden on the Metabase itself. I'm a bit concerned that we're pulling in separate directions. Or at least, if we're all touching separate parts of the elephant, I haven't seen the whole elephant yet. I think I need to find some time to jot down or sketch out what I see as the new architecture and let people react with what fits and what doesn't. My goal is to get enough of the CT2.0/Metabase infrastructure up and running that we can stop slinging around raw reports and start referencing Metabase report fact objects. Part of the goal of CT2.0 is to have reports as structured data, after all. Here's a very quick overview of what I'm envisioning. * Existing clients use Test::Reporter::Transport::Metabase to send report objects to a master CT2.0 Metabase server on an Amazon EC2 virtual server (or load-balanced cluster of servers if necessary). * Addition of the report to the master CT2.0 Metabase is syndicated in a way that allows interested parties to update databases, post to IRC, whatever. The syndicated data could just be the "indexed" data about the report (dist name, grade, platform, perl, etc.) and the GUID that refererences the full report fact object in the CT2.0 Metabase. (The GUID replaces NNTP ID, but in a way that we can convert existing NNTP ID's to GUIDs and vice versa.) The point is that it comes pre-parsed rather than as a chunk of text. * Some authorized site mirrors the actual report text locally to a cheaper-to-operate server and serves them up via a standardized link/template. (Much like nntp.perl.org does now.) Reports could be put into a local metabase or stored in whatever format makes sense to the administrator. Worst case, they stay stored in S3 and the data is fetched directly as needed. (Expensive, but easy to implement.) The framework for all the Metabase stuff already exists. The metabase backend on Amazon needs to be implemented. (Not hard, just takes some time.) The syndication service needs to be written. (Also not hard, but easier to do after the backend design is done.) I'm pretty confident in my ability to do the first part of that. The second I could do, but could use a volunteer to help write the syndication service. The third part -- substituting for nntp.perl.org -- I'd also like a volunteer for. And, of course, given the syndication service, do we (a) just wire that up to the existing stats DB for downstream consumers or (b) make it easy for downstream consumers to update their own DB's from the syndication feed? And, as i've said, in the *worst* case, if the syndication service doesn't get done in time, it would be pretty trivial to regenerate the stats database on an Amazon EC2 instance/cluster as often as necessary (high bandwidth to S3 and zero cost) and then let stats.cpantesters.org just rsync it every so often. That's not the best solution, but I'm trying to chart a path with a lot of options for design redundancy. I'll write this up further and try to describe components and broad tasks. -- David
