[CODE4LIB] Broken Link Report Tool?

Hicks, William Mon, 04 Apr 2011 15:15:28 -0700

Can someone point me in the direction of a good, robust broken link scanner 
other than Xenu, which is not quite powerful or adaptable enough for my needs.  
We are trying to get more serious about our content strategy in my library and 
linking in various parts of our site is abysmal.  Here's my dream app...


Web app that collects from a non-technical library staff user a base url path 
under which to crawl and scan links.  User creates the object which includes a 
descriptive title, their email address, and some hidden metadata, such as 
current creation date.  The app crawls the links of said URL and any children, 
ignoring other site urls not under the given path, returns a report (web, pdf, 
csv, whatever) of page title/pageurl/broken link text/broken link url/error 
code.  Further, the app is hooked into cron and runs a new report based off of 
the existing criteria every X days.  On X day, user gets an email with updated 
report.  At login, user has a table sort view of all of their objects and each 
object keeps a record of reports.  Stats on how many links per section, and 
frequency of broken-ness (tracked over time) would be nice but not deal killer. 
  From the admin side of things we would need to be able to configure global 
error codes to include/exclude, internal urls to exclude, timeout lengths, 
depths, and websites to treat specially since they may not play well with the 
crawler, proxy, whatever.  Finally, these plus other settings might be nice to 
override at a local object level admin-wise as well (i.e set a shorter or 
longer day cycle, set a maximum depth to crawl, etc).

It seems like something of this sort should exist, but I'm not finding exactly 
what I want.  The closest right now is link tiger, but I don't want to set 
librarians loose on the whole site, just their targeted areas.

Thoughts?

W

[CODE4LIB] Broken Link Report Tool?

Reply via email to