One other consideration: it probably continues to make sense for the CGI to describe the check that is being made to the end user. A regular expression is barely adequate for that, but better than no indication.
- Sam Ruby On Wed, Apr 11, 2018 at 9:46 AM, Shane Curcuru <[email protected]> wrote: > I'd like to simplify some of the site-scan.rb/site.cgi processing by > centralizing some of the core things that the scripts are searching for > into site-scan.rb. While I appreciate the original design motivation, > we currently have duplicate regexes - and we have more people interested > in using the results of the site scan (esp. with events) and officers > potentially requesting changes to the requirements. > > Roughly, I'd like to move most of CHECKS into site-scan.rb for > simplicity and use those to implement most of the link scans. Some of > the scans still have more logic (which would still be custom), but some > of them can be mechanical. > > CHECKS = { > 'events' => > [ > '', > # a_text regex to scan for - for events, we don't care, so blank > '/apache.org/events', > # a_href minimal regex to capture - for events, this tells us what > link to capture from the page > %r{^https?://.*apache.org/events/current-event} > # a_href full regex to expect for compliance (used in site.cgi) > ], > > 'license' => > [ > '/licenses?/', > # a_text regex to scan for - for license, this is required > 'apache.org', > # a_href minimal regex to capture - for license, we only capture > the link if it points to apache.org > %r{^https?://.*apache.org/licenses/$} > # a_href full regex to expect for compliance; it must point to one > of our actual licenses to pass > ], > ...etc. > } > > Any overall objections? It's making me twitchy seeing most of the > regexes we use for scanning in separate places. > > -- > > - Shane > Director & Member > The Apache Software Foundation
