Hello,

I’m trying to archive a website, that has the following url structure

/

The index, with a link to /blog

/blog

A list of all blog posts

/blog/great-drinks.html

Blog post about drinks

/blog/great-food.html

Blog post about food

I’m using the following command to try and archive this in a way I can
statically host it

wget -mpckr --user-agent="" -e robots=off  --wait 1 --random-wait
--max-redirect=1  localhost:8000

I was expecting the following file structure, especially given reports like
https://lists.gnu.org/archive/html/bug-wget/2016-09/msg00088.html

/index.html

/blog/index.html

/blog/great-drinks

/blog/great-food

Instead I got

/index.html

/blog/great-drinks

/blog/great-food

I’m hoping I’ve just provided the wrong flags, but can’t see how to fix
this.

Below is various debug/reproduction information which I hope helps. For me
this happens every time with both versions


The logs show that /blog was downloaded, but then overwritten when the
folder was created

wget -d -mpckr --user-agent="" -e robots=off  --wait 1 --random-wait
--max-redirect=1  localhost:8000

Setting --mirror (mirror) to 1

Setting --page-requisites (pagerequisites) to 1

Setting --continue (continue) to 1

Setting --convert-links (convertlinks) to 1

Setting --recursive (recursive) to 1

Setting --user-agent (useragent) to

Setting robots (robots) to off

Setting --wait (wait) to 1

Setting --random-wait (randomwait) to 1

Setting --max-redirect (maxredirect) to 1

DEBUG output created by Wget 1.25.0 on darwin24.1.0.

Reading HSTS entries from /Users/richard/.wget-hsts

Prepended http:// to 'localhost:8000'

URI encoding = ‘UTF-8’

URI encoding = ‘UTF-8’

Enqueuing http://localhost:8000/ at depth 0

Queue count 1, maxcount 1.

[IRI Enqueuing ‘http://localhost:8000/’ with ‘UTF-8’

Dequeuing http://localhost:8000/ at depth 0

Queue count 0, maxcount 1.

Converted file name 'localhost:8000/index.html' (UTF-8) ->
'localhost:8000/index.html' (UTF-8)

--2025-05-23 11:42:20--  http://localhost:8000/

Resolving localhost (localhost)... ::1, 127.0.0.1

Caching localhost => ::1 127.0.0.1

Connecting to localhost (localhost)|::1|:8000... connected.

Created socket 5.

Releasing 0x000000012a704140 (new refcount 1).

---request begin---

GET / HTTP/1.1

Host: localhost:8000

Accept: */*

Accept-Encoding: identity

Connection: Keep-Alive

---request end---

HTTP request sent, awaiting response...

---response begin---

HTTP/1.0 200 OK

Server: SimpleHTTP/0.6 Python/3.13.3

Date: Fri, 23 May 2025 10:42:20 GMT

Content-type: text/html

Content-Length: 285

Last-Modified: Fri, 23 May 2025 10:15:29 GMT

---response end---

200 OK

Registered socket 5 for persistent reuse.

Length: 285 [text/html]

Saving to: ‘localhost:8000/index.html’

localhost:8000/index.html
100%[====================================================================================>]
285  --.-KB/s in 0s

2025-05-23 11:42:20 (67.9 MB/s) - ‘localhost:8000/index.html’ saved
[285/285]

Loaded localhost:8000/index.html (size 285).

URI encoding = ‘UTF-8’

localhost:8000/index.html: merge(‘http://localhost:8000/’, ‘/blog’) ->
http://localhost:8000/blog

appending ‘http://localhost:8000/blog’ to urlpos.

nofollow in localhost:8000/index.html: 0

Deciding whether to enqueue "http://localhost:8000/blog";.

Decided to load it.

URI encoding = None

Enqueuing http://localhost:8000/blog at depth 1

Queue count 1, maxcount 1.

[IRI Enqueuing ‘http://localhost:8000/blog’ with None

Dequeuing http://localhost:8000/blog at depth 1

Queue count 0, maxcount 1.

Converted file name 'localhost:8000/blog' (UTF-8) -> 'localhost:8000/blog'
(UTF-8)

sleep_between_retrievals: avg=1.000000,sleep=1.144345

--2025-05-23 11:42:21--  http://localhost:8000/blog

Disabling further reuse of socket 5.

Closed fd 5

Found localhost in host_name_addresses_map (0x12a704140)

Connecting to localhost (localhost)|::1|:8000... connected.

Created socket 5.

Releasing 0x000000012a704140 (new refcount 1).

---request begin---

GET /blog HTTP/1.1

Host: localhost:8000

Referer: http://localhost:8000/

Accept: */*

Accept-Encoding: identity

Connection: Keep-Alive

---request end---

HTTP request sent, awaiting response...

---response begin---

HTTP/1.0 301 Moved Permanently

Server: SimpleHTTP/0.6 Python/3.13.3

Date: Fri, 23 May 2025 10:42:21 GMT

Location: /blog/

Content-Length: 0

---response end---

301 Moved Permanently

Registered socket 5 for persistent reuse.

Location: /blog/ [following]

] done.

URI content encoding = None

Converted file name 'localhost:8000/blog' (UTF-8) -> 'localhost:8000/blog'
(UTF-8)

sleep_between_retrievals: avg=1.000000,sleep=0.771367

--2025-05-23 11:42:22--  http://localhost:8000/blog/

Disabling further reuse of socket 5.

Closed fd 5

Found localhost in host_name_addresses_map (0x12a704140)

Connecting to localhost (localhost)|::1|:8000... connected.

Created socket 5.

Releasing 0x000000012a704140 (new refcount 1).

---request begin---

GET /blog/ HTTP/1.1

Host: localhost:8000

Referer: http://localhost:8000/

Accept: */*

Accept-Encoding: identity

Connection: Keep-Alive

---request end---

HTTP request sent, awaiting response...

---response begin---

HTTP/1.0 200 OK

Server: SimpleHTTP/0.6 Python/3.13.3

Date: Fri, 23 May 2025 10:42:22 GMT

Content-type: text/html

Content-Length: 282

Last-Modified: Fri, 23 May 2025 10:10:13 GMT

---response end---

200 OK

Registered socket 5 for persistent reuse.

Length: 282 [text/html]

Saving to: ‘localhost:8000/blog’

localhost:8000/blog
100%[====================================================================================>]
282  --.-KB/s in 0s

2025-05-23 11:42:22 (5.49 MB/s) - ‘localhost:8000/blog’ saved [282/282]

Deciding whether to enqueue "http://localhost:8000/blog/";.

Decided to load it.

Loaded localhost:8000/blog (size 282).

URI encoding = ‘UTF-8’

localhost:8000/blog: merge(‘http://localhost:8000/blog/’,
‘great-food.html’) -> http://localhost:8000/blog/great-food.html

appending ‘http://localhost:8000/blog/great-food.html’ to urlpos.

URI encoding = ‘UTF-8’

localhost:8000/blog: merge(‘http://localhost:8000/blog/’,
‘great-drinks.html’) -> http://localhost:8000/blog/great-drinks.html

appending ‘http://localhost:8000/blog/great-drinks.html’ to urlpos.

nofollow in localhost:8000/blog: 0

Deciding whether to enqueue "http://localhost:8000/blog/great-food.html";.

Decided to load it.

URI encoding = None

Enqueuing http://localhost:8000/blog/great-food.html at depth 2

Queue count 1, maxcount 1.

[IRI Enqueuing ‘http://localhost:8000/blog/great-food.html’ with None

Deciding whether to enqueue "http://localhost:8000/blog/great-drinks.html";.

Decided to load it.

URI encoding = None

Enqueuing http://localhost:8000/blog/great-drinks.html at depth 2

Queue count 2, maxcount 2.

[IRI Enqueuing ‘http://localhost:8000/blog/great-drinks.html’ with None

Dequeuing http://localhost:8000/blog/great-food.html at depth 2

Queue count 1, maxcount 2.

pathconf: Not a directory

Converted file name 'localhost:8000/blog/great-food.html' (UTF-8) ->
'localhost:8000/blog/great-food.html' (UTF-8)

sleep_between_retrievals: avg=1.000000,sleep=1.127769

--2025-05-23 11:42:23--  http://localhost:8000/blog/great-food.html

Disabling further reuse of socket 5.

Closed fd 5

Found localhost in host_name_addresses_map (0x12a704140)

Connecting to localhost (localhost)|::1|:8000... connected.

Created socket 5.

Releasing 0x000000012a704140 (new refcount 1).

---request begin---

GET /blog/great-food.html HTTP/1.1

Host: localhost:8000

Referer: http://localhost:8000/blog/

Accept: */*

Accept-Encoding: identity

Connection: Keep-Alive

---request end---

HTTP request sent, awaiting response...

---response begin---

HTTP/1.0 200 OK

Server: SimpleHTTP/0.6 Python/3.13.3

Date: Fri, 23 May 2025 10:42:23 GMT

Content-type: text/html

Content-Length: 346

Last-Modified: Fri, 23 May 2025 10:11:52 GMT

---response end---

200 OK

Registered socket 5 for persistent reuse.

Length: 346 [text/html]

Removing localhost:8000/blog because of directory danger!

Saving to: ‘localhost:8000/blog/great-food.html’

localhost:8000/blog/great-food.html
100%[====================================================================================>]
346  --.-KB/s in 0s

2025-05-23 11:42:23 (13.2 MB/s) - ‘localhost:8000/blog/great-food.html’
saved [346/346]

Loaded localhost:8000/blog/great-food.html (size 346).

URI encoding = ‘UTF-8’

localhost:8000/blog/great-food.html: merge(‘
http://localhost:8000/blog/great-food.html’, ‘/’) -> http://localhost:8000/

appending ‘http://localhost:8000/’ to urlpos.

nofollow in localhost:8000/blog/great-food.html: 0

Deciding whether to enqueue "http://localhost:8000/";.

Already on the black list.

Decided NOT to load it.

Dequeuing http://localhost:8000/blog/great-drinks.html at depth 2

Queue count 0, maxcount 2.

Converted file name 'localhost:8000/blog/great-drinks.html' (UTF-8) ->
'localhost:8000/blog/great-drinks.html' (UTF-8)

sleep_between_retrievals: avg=1.000000,sleep=1.466063

--2025-05-23 11:42:24--  http://localhost:8000/blog/great-drinks.html

Disabling further reuse of socket 5.

Closed fd 5

Found localhost in host_name_addresses_map (0x12a704140)

Connecting to localhost (localhost)|::1|:8000... connected.

Created socket 5.

Releasing 0x000000012a704140 (new refcount 1).

---request begin---

GET /blog/great-drinks.html HTTP/1.1

Host: localhost:8000

Referer: http://localhost:8000/blog/

Accept: */*

Accept-Encoding: identity

Connection: Keep-Alive

---request end---

HTTP request sent, awaiting response...

---response begin---

HTTP/1.0 200 OK

Server: SimpleHTTP/0.6 Python/3.13.3

Date: Fri, 23 May 2025 10:42:24 GMT

Content-type: text/html

Content-Length: 353

Last-Modified: Fri, 23 May 2025 10:11:32 GMT

---response end---

200 OK

Registered socket 5 for persistent reuse.

Length: 353 [text/html]

Saving to: ‘localhost:8000/blog/great-drinks.html’

localhost:8000/blog/great-drinks.html
100%[====================================================================================>]
353  --.-KB/s in 0s

2025-05-23 11:42:24 (782 KB/s) - ‘localhost:8000/blog/great-drinks.html’
saved [353/353]

Loaded localhost:8000/blog/great-drinks.html (size 353).

URI encoding = ‘UTF-8’

localhost:8000/blog/great-drinks.html: merge(‘
http://localhost:8000/blog/great-drinks.html’, ‘/’) ->
http://localhost:8000/

appending ‘http://localhost:8000/’ to urlpos.

nofollow in localhost:8000/blog/great-drinks.html: 0

Deciding whether to enqueue "http://localhost:8000/";.

Already on the black list.

Decided NOT to load it.

FINISHED --2025-05-23 11:42:24--

Total wall clock time: 4.6s

Downloaded: 4 files, 1.2K in 0.001s (2.33 MB/s)

Scanning localhost:8000/blog (from http://localhost:8000/blog/)

localhost:8000/blog: Is a directory

Converting links in localhost:8000/blog... nothing to do.

Scanning localhost:8000/index.html (from http://localhost:8000/)

Loaded localhost:8000/index.html (size 285).

URI encoding = ‘UTF-8’

localhost:8000/index.html: merge(‘http://localhost:8000/’, ‘/blog’) ->
http://localhost:8000/blog

appending ‘http://localhost:8000/blog’ to urlpos.

nofollow in localhost:8000/index.html: 0

URI encoding = ‘UTF-8’

will convert url http://localhost:8000/blog to local localhost:8000/blog

Converting links in localhost:8000/index.html... 1.

TO_RELATIVE: http://localhost:8000/blog to blog at position 246 in
localhost:8000/index.html.

1-0

Scanning localhost:8000/blog/great-drinks.html (from
http://localhost:8000/blog/great-drinks.html)

Loaded localhost:8000/blog/great-drinks.html (size 353).

URI encoding = ‘UTF-8’

localhost:8000/blog/great-drinks.html: merge(‘
http://localhost:8000/blog/great-drinks.html’, ‘/’) ->
http://localhost:8000/

appending ‘http://localhost:8000/’ to urlpos.

nofollow in localhost:8000/blog/great-drinks.html: 0

URI encoding = ‘UTF-8’

will convert url http://localhost:8000/ to local localhost:8000/index.html

Converting links in localhost:8000/blog/great-drinks.html... 1.

TO_RELATIVE: http://localhost:8000/ to ../index.html at position 311 in
localhost:8000/blog/great-drinks.html.

1-0

Scanning localhost:8000/blog/great-food.html (from
http://localhost:8000/blog/great-food.html)

Loaded localhost:8000/blog/great-food.html (size 346).

URI encoding = ‘UTF-8’

localhost:8000/blog/great-food.html: merge(‘
http://localhost:8000/blog/great-food.html’, ‘/’) -> http://localhost:8000/

appending ‘http://localhost:8000/’ to urlpos.

nofollow in localhost:8000/blog/great-food.html: 0

URI encoding = ‘UTF-8’

will convert url http://localhost:8000/ to local localhost:8000/index.html

Converting links in localhost:8000/blog/great-food.html... 1.

TO_RELATIVE: http://localhost:8000/ to ../index.html at position 304 in
localhost:8000/blog/great-food.html.

1-0

Converted links in 4 files in 0.005 seconds.

Here are the versions I tried

wget --version

GNU Wget 1.25.0 built on darwin24.1.0.

-cares +digest -gpgme +https +ipv6 +iri +large-file -metalink +nls

+ntlm +opie -psl +ssl/openssl

Wgetrc:

/opt/homebrew/etc/wgetrc (system)

Locale:

/opt/homebrew/Cellar/wget/1.25.0/share/locale

Compile:

clang -DHAVE_CONFIG_H -DSYSTEM_WGETRC="/opt/homebrew/etc/wgetrc"

-DLOCALEDIR="/opt/homebrew/Cellar/wget/1.25.0/share/locale" -I.

-I../lib -I../lib -I/opt/homebrew/opt/openssl@3/include

-I/opt/homebrew/Cellar/libidn2/2.3.7/include -DNDEBUG -g -O2

Link:

clang -I/opt/homebrew/Cellar/libidn2/2.3.7/include -DNDEBUG -g -O2

-L/opt/homebrew/Cellar/libidn2/2.3.7/lib -lidn2

-L/opt/homebrew/opt/openssl@3/lib -lssl -lcrypto -ldl -lz

../lib/libgnu.a -liconv -lintl -Wl,-framework -Wl,CoreFoundation

-Wl,-framework -Wl,CoreServices -lunistring

And

GNU Wget 1.21.4 built on linux-gnu.

-cares +digest -gpgme +https +ipv6 +iri +large-file -metalink +nls

+ntlm +opie +psl +ssl/openssl

Wgetrc:

/etc/wgetrc (system)

Locale:

/usr/share/locale

Compile:

gcc -DHAVE_CONFIG_H -DSYSTEM_WGETRC="/etc/wgetrc"

-DLOCALEDIR="/usr/share/locale" -I. -I../../src -I../lib

-I../../lib -Wdate-time -D_FORTIFY_SOURCE=3 -DHAVE_LIBSSL -DNDEBUG

-g -O2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer

-ffile-prefix-map=/build/wget-SlgjzS/wget-1.21.4=. -flto=auto

-ffat-lto-objects -fstack-protector-strong -fstack-clash-protection

-Wformat -Werror=format-security -mbranch-protection=standard

-fdebug-prefix-map=/build/wget-SlgjzS/wget-1.21.4=/usr/src/wget-1.21.4-1ubuntu4.1

-DNO_SSLv2 -D_FILE_OFFSET_BITS=64 -g -Wall

Link:

gcc -DHAVE_LIBSSL -DNDEBUG -g -O2 -fno-omit-frame-pointer

-mno-omit-leaf-frame-pointer

-ffile-prefix-map=/build/wget-SlgjzS/wget-1.21.4=. -flto=auto

-ffat-lto-objects -fstack-protector-strong -fstack-clash-protection

-Wformat -Werror=format-security -mbranch-protection=standard

-fdebug-prefix-map=/build/wget-SlgjzS/wget-1.21.4=/usr/src/wget-1.21.4-1ubuntu4.1

-DNO_SSLv2 -D_FILE_OFFSET_BITS=64 -g -Wall -Wl,-Bsymbolic-functions

-flto=auto -ffat-lto-objects -Wl,-z,relro -Wl,-z,now -lpcre2-8

-luuid -lidn2 -lssl -lcrypto -lz -lpsl ../lib/libgnu.a

Simple reproduction case, create the below files and directory structure.
Then, provided you have python3 installed, run the following python
command, for an http server to serve those files. You can then use the wget
command above from a different directory and see the blog file get created
then deleted

python3 -m http.server


index.html

<!DOCTYPE html>

<html lang="en">

<head>

<meta charset="UTF-8">

<meta name="viewport" content="width=device-width, initial-scale=1.0">

<title>Example Filehost</title>

</head>

<body>

<h1>Welcome to Example Filehost</h1>

<a href="/blog">Go to Blog</a>

</body>

</html>

/blog/index.html

<!DOCTYPE html>

<html lang="en">

<head>

  <meta charset="UTF-8">

  <title>Blog Links</title>

</head>

<body>

  <h1>Welcome to the Blog</h1>

  <ul>

<li><a href="great-food.html">Great Food</a></li>

<li><a href="great-drinks.html">Great Drinks</a></li>

  </ul>

</body>

</html>


/blog/great-drinks.html

<!DOCTYPE html>

<html lang="en">

<head>

  <meta charset="UTF-8">

  <title>Great Drinks</title>

</head>

<body>

  <h1>Great Drinks</h1>

  <p>Welcome to our blog post about great drinks! Here are a few of our
favorites:</p>

  <ul>

<li>Water - Simple but useful to stave off dehydration.</li>

  </ul>

  <a href="/">Back to the index</a>

</body>

</html>

/blog/great-food.html

<!DOCTYPE html>

<html lang="en">

<head>

  <meta charset="UTF-8">

  <title>Great Food</title>

</head>

<body>

  <h1>Great Food</h1>

  <p>Welcome to our blog post about great food! Here are a few of our
favorites:</p>

  <ul>

<li>Bread - Simple but useful to stave off starvation.</li>

  </ul>

  <a href="/">Back to the index</a>

</body>

</html>


I hope this is enough detail, let me know if you have any questions or need
any more information

Thanks folks,

Richard

Reply via email to