On Mon, Apr 24, 2017 at 10:34 AM, sebb <[email protected]> wrote:
> The site-check code currently looks at the link text when searching
> for required links.
>
> Maybe it would make more sense to look for the target URL?
> That should not vary much, if at all, so it should be easier to find.

If this turns out to be a real problem, both could be extracted.

> Either way, whatever analyses the output probably needs to check that
> the values are sensible.
> A License link that points to www.apache.org is not much use, nor is a
> link to http://www.apache.org/foundation/thanks.html that says
> "Security"

My thoughts were to split the data gathering and analysis steps.
That's why when I matched on the text, I provided the link.  And when
I match on the link, I try to gather the text (or img[src]).

- Sam Ruby

Reply via email to