Hello,

I'm currently working on a new design for the broken link checker. After
having a first pass with my team, I will send it out to dev@ to get broader
feedback from the community. I expect this to happen on Tuesday.

Best regards,
Marco

On Fri, Aug 3, 2018 at 4:32 PM sandeep krishnamurthy <
[email protected]> wrote:

> Thank you Aaron for starting this very important thread on how we deal with
> website broken links that is actually first thing every user sees!
>
> 1. Custom landing page for 404s and broken links => This is very critical
> and best practice. Definitely we should seek infra team help to set this
> up.
>
> 2. Access to server logs => I think yes, this will be super useful
> information to identify issues in the site and take corrective actions.
> However, I am not sure from security standpoint how would this work out.
> Assuming we get logs in S3, who should have access etc. needs to be
> discussed and resolved.
>
> 3. Removing regression test - I agree. After analyzing the broken links
> report, regression reported broken link that existed sometime ago but not
> anymore is kind of mix of important and no more important links.
>
> @Community: I am pasting link of recent broken link checker CI job for
> having a quick look at the current broken link reports:
>
> http://jenkins.mxnet-ci.amazon-ml.com/view/Nightly%20Tests/job/Broken_Link_Checker_Pipeline/168/console
>
>
> Best,
> Sandeep
>
> On Thu, Aug 2, 2018 at 4:44 PM Aaron Markham <[email protected]>
> wrote:
>
> > Pedro, that would be cool.
> >
> > On a related note, I've reviewed the results from the broken link checker
> > job and submitted a PR for 3 links out of >50 reports.
> > 35 of these are in a "regression" category. The others are mostly
> > redirects, and the tool will get updated to deal with those (not report
> > temp moves, warn about perm moves).
> >
> > Regarding the regression category, I propose that we remove this check in
> > favor of getting the server logs and analyzing issues from that end.
> >
> > The 35 "broken links" are pages that existed at one time, but do not now.
> > There's no data if any site visitor even tried to hit these old pages.
> Yet,
> > this regression info comes in a report mixed with actual broken links and
> > make it seem like there's a problem when, well, maybe there's not. If we
> > had data to say, 500 users a week go to this link and we give them a 404.
> > Ok, that's a problem. But this regression check doesn't do that. It just
> > says "a link existed one day, and it isn't there now, and I (the link
> > checker) might be the only one in the world that cares." Also, this
> > regression check has no idea if users are going to a page that doesn't
> > exist due to a bad link in a blog or other outside resource. If we get
> the
> > server logs, we'll know what pages users are going to that aren't there.
> We
> > can make an informed decision for fixing it by redirecting it or by
> putting
> > something there and how to prioritize. Also, these 35 links just say
> > they're missing. Someone would have to investigate or just guess where to
> > redirect them, and maybe all that work doesn't even yield any benefit.
> >
> > To summarize (remove the regression check, and...):
> > * check server logs on what pages users are getting errors on, and make
> > plans to fix those
> > * show users better error pages
> > * report broken links only when there's a live link on the site that
> yields
> > a true 404 (not a 301 or 303 or anything else)
> >
> > Cheers,
> > Aaron
> >
> >
> > On Thu, Aug 2, 2018 at 3:17 PM, Pedro Larroy <
> [email protected]
> > >
> > wrote:
> >
> > > Yes, we can do something fun for these errors, something maybe with a
> cat
> > > an a DL theme, or some funny style transfer stuff.
> > >
> > > Pedro
> > >
> > > On Thu, Aug 2, 2018 at 11:54 PM Aaron Markham <
> [email protected]
> > >
> > > wrote:
> > >
> > > > Hi everyone,
> > > > I would like to suggest that we adopt a custom landing page for
> missing
> > > > pages, rather than the dead-end error the website has now. Typically
> > you
> > > > set this up by modifying the server config and pointing it to a
> custom
> > > > page.
> > > >
> > > > I'd like to know how we go about requesting that kind of change on
> the
> > > > Apache infra.
> > > >
> > > > Also, so that we know if people are getting 404s on certain pages, or
> > if
> > > > pages are breaking and throwing 500 errors, we could look at the
> server
> > > > logs for the domain. A cron job that pushes them to s3 would work if
> we
> > > > can't get direct access.
> > > >
> > > > How do we get regular access to or a feed of these logs that are
> > residing
> > > > on the web server?
> > > >
> > > > Cheers,
> > > > Aaron
> > > >
> > >
> >
>
>
> --
> Sandeep Krishnamurthy
>

Reply via email to