On Fri, Dec 27, 2019 at 10:49 PM Guilherme Janczak
<janczak.guilhe...@gmail.com> wrote:
>
> On Thu, 26 Dec 2019 16:13:33 +0000
> "goleo ." <goleo...@gmail.com> wrote:
>
> > I was wondering how much space distfiles on "ftp" take, so because
> > I couldn't see that in my web browser clearly, I downloaded the page
> > https://ftp.openbsd.org/pub/OpenBSD/distfiles/ as distfiles.txt
>
> With wget, you can download the HTML of a web page, and also recurse
> into links within it.
>
> $ wget -r -l 0 -A '*.html' --no-parent -O everything.html 
> https://ftp.openbsd.org/pub/OpenBSD/distfiles/
>
> This command recurses into an infinite number of links without going up
> in the hierarchy and into the parent directory, downloads only other
> .html files (from which more links can be acquired), and appends
> everything to an "everything.html" file.
>
> After a few minutes running and just ~1.7MiB of HTML downloaded, it
> tried to recurse into a lot of non-existing directories, so I cut it
> short there. The figure may not be perfect.
>
> $ grep -E '[0-9]$' everything.html | sed 's|.* \([0-9]*\)$|\1|' | awk 
> '{sum+=$1} END{print sum / 1024 / 1024}'
> 65629
>
>
> The sum of all filesizes, which are listed in kebibytes, divided by
> 1024^2, to turn it into gibibytes, returns 65629 gibibytes or about
> 65 tebibytes.
> This number seems a little absurd, I'm not sure if I made a mistake.
> It does not seem completely implausible either however, the tree
> does have files dating all the way back to 1990.
> https://ftp.openbsd.org/pub/OpenBSD/distfiles/ja-fonts/

Filesizes are listed just in bytes, that means your calculation shows
65629 megabytes.

Still nice, I didn't know it's so easy to fetch contents of
subdirectories :)

Reply via email to