One other consideration: it probably continues to make sense for the
CGI to describe the check that is being made to the end user.  A
regular expression is barely adequate for that, but better than no
indication.

- Sam Ruby

On Wed, Apr 11, 2018 at 9:46 AM, Shane Curcuru <a...@shanecurcuru.org> wrote:
> I'd like to simplify some of the site-scan.rb/site.cgi processing by
> centralizing some of the core things that the scripts are searching for
> into site-scan.rb.  While I appreciate the original design motivation,
> we currently have duplicate regexes - and we have more people interested
> in using the results of the site scan (esp. with events) and officers
> potentially requesting changes to the requirements.
>
> Roughly, I'd like to move most of CHECKS into site-scan.rb for
> simplicity and use those to implement most of the link scans.  Some of
> the scans still have more logic (which would still be custom), but some
> of them can be mechanical.
>
> CHECKS = {
>   'events'      =>
>     [
>       '',
>       # a_text regex to scan for - for events, we don't care, so blank
>       '/apache.org/events',
>       # a_href minimal regex to capture - for events, this tells us what
> link to capture from the page
>       %r{^https?://.*apache.org/events/current-event}
>       # a_href full regex to expect for compliance (used in site.cgi)
>     ],
>
>   'license'      =>
>     [
>       '/licenses?/',
>       # a_text regex to scan for - for license, this is required
>       'apache.org',
>       # a_href minimal regex to capture - for license, we only capture
> the link if it points to apache.org
>       %r{^https?://.*apache.org/licenses/$}
>       # a_href full regex to expect for compliance; it must point to one
> of our actual licenses to pass
>     ],
> ...etc.
> }
>
> Any overall objections?  It's making me twitchy seeing most of the
> regexes we use for scanning in separate places.
>
> --
>
> - Shane
>   Director & Member
>   The Apache Software Foundation

Reply via email to