Hello, I'm currently working on a new design for the broken link checker. After having a first pass with my team, I will send it out to dev@ to get broader feedback from the community. I expect this to happen on Tuesday.
Best regards, Marco On Fri, Aug 3, 2018 at 4:32 PM sandeep krishnamurthy < [email protected]> wrote: > Thank you Aaron for starting this very important thread on how we deal with > website broken links that is actually first thing every user sees! > > 1. Custom landing page for 404s and broken links => This is very critical > and best practice. Definitely we should seek infra team help to set this > up. > > 2. Access to server logs => I think yes, this will be super useful > information to identify issues in the site and take corrective actions. > However, I am not sure from security standpoint how would this work out. > Assuming we get logs in S3, who should have access etc. needs to be > discussed and resolved. > > 3. Removing regression test - I agree. After analyzing the broken links > report, regression reported broken link that existed sometime ago but not > anymore is kind of mix of important and no more important links. > > @Community: I am pasting link of recent broken link checker CI job for > having a quick look at the current broken link reports: > > http://jenkins.mxnet-ci.amazon-ml.com/view/Nightly%20Tests/job/Broken_Link_Checker_Pipeline/168/console > > > Best, > Sandeep > > On Thu, Aug 2, 2018 at 4:44 PM Aaron Markham <[email protected]> > wrote: > > > Pedro, that would be cool. > > > > On a related note, I've reviewed the results from the broken link checker > > job and submitted a PR for 3 links out of >50 reports. > > 35 of these are in a "regression" category. The others are mostly > > redirects, and the tool will get updated to deal with those (not report > > temp moves, warn about perm moves). > > > > Regarding the regression category, I propose that we remove this check in > > favor of getting the server logs and analyzing issues from that end. > > > > The 35 "broken links" are pages that existed at one time, but do not now. > > There's no data if any site visitor even tried to hit these old pages. > Yet, > > this regression info comes in a report mixed with actual broken links and > > make it seem like there's a problem when, well, maybe there's not. If we > > had data to say, 500 users a week go to this link and we give them a 404. > > Ok, that's a problem. But this regression check doesn't do that. It just > > says "a link existed one day, and it isn't there now, and I (the link > > checker) might be the only one in the world that cares." Also, this > > regression check has no idea if users are going to a page that doesn't > > exist due to a bad link in a blog or other outside resource. If we get > the > > server logs, we'll know what pages users are going to that aren't there. > We > > can make an informed decision for fixing it by redirecting it or by > putting > > something there and how to prioritize. Also, these 35 links just say > > they're missing. Someone would have to investigate or just guess where to > > redirect them, and maybe all that work doesn't even yield any benefit. > > > > To summarize (remove the regression check, and...): > > * check server logs on what pages users are getting errors on, and make > > plans to fix those > > * show users better error pages > > * report broken links only when there's a live link on the site that > yields > > a true 404 (not a 301 or 303 or anything else) > > > > Cheers, > > Aaron > > > > > > On Thu, Aug 2, 2018 at 3:17 PM, Pedro Larroy < > [email protected] > > > > > wrote: > > > > > Yes, we can do something fun for these errors, something maybe with a > cat > > > an a DL theme, or some funny style transfer stuff. > > > > > > Pedro > > > > > > On Thu, Aug 2, 2018 at 11:54 PM Aaron Markham < > [email protected] > > > > > > wrote: > > > > > > > Hi everyone, > > > > I would like to suggest that we adopt a custom landing page for > missing > > > > pages, rather than the dead-end error the website has now. Typically > > you > > > > set this up by modifying the server config and pointing it to a > custom > > > > page. > > > > > > > > I'd like to know how we go about requesting that kind of change on > the > > > > Apache infra. > > > > > > > > Also, so that we know if people are getting 404s on certain pages, or > > if > > > > pages are breaking and throwing 500 errors, we could look at the > server > > > > logs for the domain. A cron job that pushes them to s3 would work if > we > > > > can't get direct access. > > > > > > > > How do we get regular access to or a feed of these logs that are > > residing > > > > on the web server? > > > > > > > > Cheers, > > > > Aaron > > > > > > > > > > > > -- > Sandeep Krishnamurthy >
