On Mon, Apr 24, 2017 at 10:34 AM, sebb <[email protected]> wrote: > The site-check code currently looks at the link text when searching > for required links. > > Maybe it would make more sense to look for the target URL? > That should not vary much, if at all, so it should be easier to find.
If this turns out to be a real problem, both could be extracted. > Either way, whatever analyses the output probably needs to check that > the values are sensible. > A License link that points to www.apache.org is not much use, nor is a > link to http://www.apache.org/foundation/thanks.html that says > "Security" My thoughts were to split the data gathering and analysis steps. That's why when I matched on the text, I provided the link. And when I match on the link, I try to gather the text (or img[src]). - Sam Ruby
