I overwrote the result link - much better, no more 429s.
https://12.am/tmp/cassandra.apache.org_muffet.log.txt
- lots of page anchor problems
- quite a few busted links
- quite a few hosts that are gone
- one link timeout
- (a few "error" reports are each 200s, just headers)
$ egrep '^\sid #' c*.log.txt |wc -l
1416
$ egrep '^\s4' c*.log.txt |wc -l
55
$ egrep '^\slookup' c*.log.txt |wc -l
20
$ egrep '^\stimeout' c*.log.txt |wc -l
1
Warm regards,
Michael
On 11/6/21 11:59 AM, Michael Shuler wrote:
FYI - I'm going to try to slow down the checks, since I just noticed a
bunch of the 4xx errors are "HTTP 429 Too Many Requests"
Kind regards,
Michael
On 11/6/21 11:52 AM, Michael Shuler wrote:
(Sending to dev@ which seems a better place to discuss; updated
subject. Thanks OP!)
I ran a couple link checking tools on the site and there are lots more
problems than the couple noted. This seems like a good task for a
non-dev to make a substantial project impact. Muffet [0] seemed the
quickest way to get some decent output. I grabbed the v2.4.4 binary
release [1]; tar xzvf .., and:
$ ./muffet https://cassandra.apache.org/ \
| tee -a cassandra.apache.org_muffet.log.txt
result (2950 lines):
https://12.am/tmp/cassandra.apache.org_muffet.log.txt
$ egrep '^\s4' cassandra.apache.org_muffet.log.txt \
| wc -l
841
$ egrep '^\sid #' cassandra.apache.org_muffet.log.txt \
| wc -l
1401
[0] https://github.com/raviqqe/muffet
[1] https://github.com/raviqqe/muffet/releases
Kind regards,
Michael
On 11/5/21 4:09 PM, Greg Stein wrote:
see below:
---------- Forwarded message ---------
From: *Hubert Kulas* <hubertzku...@gmail.com
<mailto:hubertzku...@gmail.com>>
Date: Fri, Nov 5, 2021 at 1:29 PM
Subject: Not working links
To: <webmas...@apache.org <mailto:webmas...@apache.org>>
Hi,
I am writing my thesis about big data and I was doing some research
about real-world use cases of Cassandra. While doing that I found
that after clicking "read more" under 'Coursera' leads us to
DataStax website where we are greeted with "You do not have access to
view this page" message. To reproduce it just go to
https://cassandra.apache.org/_/case-studies.html
<https://cassandra.apache.org/_/case-studies.html> and then find
Coursera and click "read more". Then after trying to find a way to
contact you guys about the problem I encountered another problem on
this part of the website
https://cassandra.apache.org/doc/3.11.5/contactus.html
<https://cassandra.apache.org/doc/3.11.5/contactus.html>
After clicking the icon leads us to
https://cassandra.apache.org/feed.xml
<https://cassandra.apache.org/feed.xml> which gives us the 404 Not
Found message.
2021-11-05_19h26_44.png
Best Regards,
Hubert Kulas
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org