I overwrote the result link - much better, no more 429s.

https://12.am/tmp/cassandra.apache.org_muffet.log.txt

- lots of page anchor problems
- quite a few busted links
- quite a few hosts that are gone
- one link timeout
- (a few "error" reports are each 200s, just headers)

$ egrep '^\sid #' c*.log.txt |wc -l
1416
$ egrep '^\s4' c*.log.txt |wc -l
55
$ egrep '^\slookup' c*.log.txt |wc -l
20
$ egrep '^\stimeout' c*.log.txt |wc -l
1

Warm regards,
Michael

On 11/6/21 11:59 AM, Michael Shuler wrote:
FYI - I'm going to try to slow down the checks, since I just noticed a bunch of the 4xx errors are "HTTP 429 Too Many Requests"

Kind regards,
Michael

On 11/6/21 11:52 AM, Michael Shuler wrote:
(Sending to dev@ which seems a better place to discuss; updated subject. Thanks OP!)

I ran a couple link checking tools on the site and there are lots more problems than the couple noted. This seems like a good task for a non-dev to make a substantial project impact. Muffet [0] seemed the quickest way to get some decent output. I grabbed the v2.4.4 binary release [1]; tar xzvf .., and:

$ ./muffet https://cassandra.apache.org/ \
  | tee -a cassandra.apache.org_muffet.log.txt

result (2950 lines):
https://12.am/tmp/cassandra.apache.org_muffet.log.txt

$ egrep '^\s4' cassandra.apache.org_muffet.log.txt \
  | wc -l
841
$ egrep '^\sid #' cassandra.apache.org_muffet.log.txt \
  | wc -l
1401

[0] https://github.com/raviqqe/muffet
[1] https://github.com/raviqqe/muffet/releases

Kind regards,
Michael

On 11/5/21 4:09 PM, Greg Stein wrote:
see below:

---------- Forwarded message ---------
From: *Hubert Kulas* <hubertzku...@gmail.com <mailto:hubertzku...@gmail.com>>
Date: Fri, Nov 5, 2021 at 1:29 PM
Subject: Not working links
To: <webmas...@apache.org <mailto:webmas...@apache.org>>


Hi,

I am writing my thesis about big data and I was doing some research about real-world use cases of Cassandra. While doing that I found that after clicking "read more" under 'Coursera'  leads us to DataStax website where we are greeted with "You do not have access to view this page" message. To reproduce it just go to https://cassandra.apache.org/_/case-studies.html <https://cassandra.apache.org/_/case-studies.html> and then find Coursera and click "read more".  Then after trying to find a way to contact you guys about the problem I encountered another problem on this part of the website https://cassandra.apache.org/doc/3.11.5/contactus.html <https://cassandra.apache.org/doc/3.11.5/contactus.html> After clicking the icon leads us to https://cassandra.apache.org/feed.xml <https://cassandra.apache.org/feed.xml> which gives us the 404 Not Found message.
2021-11-05_19h26_44.png

Best Regards,
Hubert Kulas

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Reply via email to