sebb wrote on 5/9/17 6:08 AM:
> The site scanner currently looks for specific links *or* specific text.
>
> This does not always work well, e.g. httpd uses 'Sponsors' for the
> 'Thanks' link, so it appears to have no link rather than one with an
> 'incorrect' name.
>
> I think it would be better to search for both the expected text and
> the expected link, and record any matches for either.
Agreed.
Note that the analysis step is never likely going to be 100% accurate,
since the current policy is written with the intent in mind, not a
specific formula. But you're right: looking for, and also storing scan
data for both links and text is a great way to improve results.
Separately, I do think having an "approved exceptions" list is an easier
way to improve results in some cases rather than funkier regexes or the
like. See concept in "Re: Rename site-check.rb => site-scan.rb?", but
improved to match your additions here:
site-exceptions.json
{
"axis": {
"trademarks": { :allowed_string "Trademark Registered of The ASF" },
"events": { :allowed_url "http://www.apache.org/special-event" }
},
...
}
>
> Probably the search targets should also be recorded in the analysis output.
> This should make it easier for the analysis to report what was expected.
>
> for example:
>
> httpd: {
> ...
> sponsorship: {
> text: {
> expected: "Thanks",
> found: ["http://.../"]
> },
> link: {
> expected: "http://...",
> found: ["Sponsors"]
> },
> }
> }
>
>
> Obviously this would mean changes to the analysis as well.
>
> Thoughts?
>
--
- Shane
https://www.apache.org/foundation/marks/resources