Things have been fairly stable for the last 48 hours, so here's a report on the current status:
Work done:
* Created a new API to write incoming test reports to a MySQL database
* This removes Amazon SimpleDB and saves us $250/mo
* In the future, we will be able to reduce our total disk usage through
removing duplicate data
* Translated the Metabase reports to the new test report format
* The new test report format has more fields for future expansion,
including places for testers to report on all the dependencies of the
distribution they tested
* The API is doing this transparently
* Another script is migrating all the existing data to the new format
* Started processing incoming test reports in parallel using the Minion job
runner
* Incoming reports generate a job on the queue
* Worker processes process individual reports
* This work can be spread to multiple machines if we can get access to
hardware
Outstanding issues around this migration:
* The test_report table is latin1, and some reports are submitted with UTF-8
characters.
* Mitigation: `ascii => 1` in the serializer_options for the JSON column
* Future Change: Make this table UTF-8 safe
* The Amazon Metabase instances are still up
* After the next week or two of stable operation I will be shutting these
down
* The original CPAN Testers generate process is still running
* Once the Metabase instance is shut down, these processes will be removed
from cron
* The Minion task runner must be moved to MySQL
* Presently it is using SQLite and lock timeouts are an occasional annoyance
* Moving to MySQL will let us have multiple machines running Minion workers.
* MySQL allows for greater concurrency accessing the database (to insert
new jobs and update job status)
* The queue can grow to >10,000 unprocessed reports during the day, and I'd
like to keep that from happening.
* The Minion task runner needs monitoring
* Queue size will be a good indicator of how the system is doing
* This will be a lot easier when Minion is using MySQL
* The test report and processed test report tables need monitoring
* Right now there's a manual monitor in that Andreas e-mails me once every
couple weeks to tell me that report processing has stopped
* Counting the number of each and comparing the two should be a good
indicator
* InnoDB tables have trouble with `SELECT COUNT(*) FROM <table>` though...
* These tables could be altered to add auto increment ID fields which
could be a fast indicator of table size
* Some existing processes are still using the MySQL Metabase cache:
* These processes are:
* The original generate process
* The view-report.cgi which views the full text of the report
* Moving these processes to using the new test report format will improve
performance
* Then we can delete the Metabase cache to free up disk space
Thanks to:
* Joel Berger for writing the CPAN::Testers::Backend::ProcessReports module at
the 2017 Perl QA Summit
* Andreas König and Slaven Rezić for their help troubleshooting backwards
compatibility issues and report processing issues
* Barbie for finding some missing parts from the new report processor and
quickly writing new scripts to fix them
* Everyone who helped test the new API code before this migration (Chris
Williams, Ioan Rogers)
Next Steps:
* The machine is occasionally overloaded due to view-report.cgi, which causes
timeouts when submitting reports for 5-10 minutes, so changing this to be a
daemon that uses the new report format is a high priority.
Doug Bell
[email protected]
> On Aug 12, 2017, at 2:40 PM, Doug Bell <[email protected]> wrote:
>
> The Metabase API has been changed over. Some bugs were fixed and everything
> appears stable. If there are any problems, I will revert to the old Metabase
> API. If I am unresponsive, testers can point their machines to
> `metabase-old.cpantesters.org <http://metabase-old.cpantesters.org/>` to
> reach the old Metabase API. All the old infrastructure is still ticking over
> and will continue to do so until the new things have been stable for at least
> a week.
>
> Moving away from EC2 is a fairly major cost savings: What we were paying
> $250/mo for we can now pay $0/mo for. That said, the Metabase section of the
> site has been its own server for a long time, and now we're adding its work
> to our one existing server that already does everything else except the
> database.
>
> To help spread the work out, if anyone still has one or two older, outdated
> servers sitting in a rack somewhere doing nothing and could make them
> available to the CPAN Testers project, let me know. I know people have
> volunteered hardware before, but I wasn't in a place where I could easily
> make use of it. Now that the new Metabase API exists, and now that I am very
> close to a Minion job-queue-based backend processing system, I can more
> easily spread work across multiple machines.
>
> For donating, you'll get a place on the sponsors list
> (http://iheart.cpantesters.org <http://iheart.cpantesters.org/>), and you'll
> help continue making Perl and CPAN a community of people collaborating on
> stable, useful software projects.
>
> If there are any other questions, problems, or concerns, please let me know.
>
> Thanks,
>
>
>
> Doug Bell
> [email protected] <mailto:[email protected]>
>
>
>
>> On Aug 9, 2017, at 3:03 PM, Doug Bell <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> [Cross-posting from the CPAN Testers blog:
>> http://blog.cpantesters.org/diary/209
>> <http://blog.cpantesters.org/diary/209>]
>>
>> Summary: I will be doing work on the Metabase API on 2017-08-12. Writing
>> test reports may be unresponsive for a few minutes, and there may be bugs.
>> Please let me know if there are any problems submitting test reports.
>>
>> I have completed the processing script for the new test report format. This
>> was the last step in moving the Metabase API away from Amazon and on to our
>> MySQL cluster for cost and stability reasons: Amazon SimpleDB is too
>> expensive, and its limitations for our purposes outweigh its costs. We have
>> always maintained a copy of the Metabase data in our MySQL database, and
>> there's no real need to continue having two live copies of the same data
>> (especially when one of the copies costs money every time you ask for a
>> piece of data).
>>
>> This Saturday, 2017-08-12, around 1:00 PM US/Central (18:00 UTC), I will be
>> switching DNS over to the new, backwards-compatible Metabase API which
>> writes to our MySQL database. A few months ago, I asked for testers to try
>> this new API out, and everything went well (thanks to everyone who helped
>> with that). The new API works the same as the old API: No changes are needed
>> for your testers or anyone consuming the minimal data feeds out of the
>> Metabase API (the log.txt view).
>>
>> Since this is only a DNS change, the downtime for the change should be zero
>> as DNS propagates and your testers are pointed at the new IP address. Since
>> it's possible for me to mess up this change, there may be some downtime.
>> Since all software has bugs, there may be some downtime if any bugs are
>> revealed by all the testers being migrated to the new API.
>>
>> This change (and all the work around this change) sets up the project for
>> new changes down the road:
>>
>> * Speeding up report processing by triggering individual report processing
>> jobs as reports are submitted
>> * Distributing those processing jobs over multiple machines to improve
>> performance
>> * Making the test report text available immediately after submission instead
>> of having to wait for backend processing jobs
>>
>> If you have any questions, feel free to reply to this thread or to me
>> directly.
>>
>> Doug Bell
>> [email protected] <mailto:[email protected]>
>>
>>
>>
>
signature.asc
Description: Message signed with OpenPGP
