Hello, I’m trying to archive a website, that has the following url structure
/ The index, with a link to /blog /blog A list of all blog posts /blog/great-drinks.html Blog post about drinks /blog/great-food.html Blog post about food I’m using the following command to try and archive this in a way I can statically host it wget -mpckr --user-agent="" -e robots=off --wait 1 --random-wait --max-redirect=1 localhost:8000 I was expecting the following file structure, especially given reports like https://lists.gnu.org/archive/html/bug-wget/2016-09/msg00088.html /index.html /blog/index.html /blog/great-drinks /blog/great-food Instead I got /index.html /blog/great-drinks /blog/great-food I’m hoping I’ve just provided the wrong flags, but can’t see how to fix this. Below is various debug/reproduction information which I hope helps. For me this happens every time with both versions The logs show that /blog was downloaded, but then overwritten when the folder was created wget -d -mpckr --user-agent="" -e robots=off --wait 1 --random-wait --max-redirect=1 localhost:8000 Setting --mirror (mirror) to 1 Setting --page-requisites (pagerequisites) to 1 Setting --continue (continue) to 1 Setting --convert-links (convertlinks) to 1 Setting --recursive (recursive) to 1 Setting --user-agent (useragent) to Setting robots (robots) to off Setting --wait (wait) to 1 Setting --random-wait (randomwait) to 1 Setting --max-redirect (maxredirect) to 1 DEBUG output created by Wget 1.25.0 on darwin24.1.0. Reading HSTS entries from /Users/richard/.wget-hsts Prepended http:// to 'localhost:8000' URI encoding = ‘UTF-8’ URI encoding = ‘UTF-8’ Enqueuing http://localhost:8000/ at depth 0 Queue count 1, maxcount 1. [IRI Enqueuing ‘http://localhost:8000/’ with ‘UTF-8’ Dequeuing http://localhost:8000/ at depth 0 Queue count 0, maxcount 1. Converted file name 'localhost:8000/index.html' (UTF-8) -> 'localhost:8000/index.html' (UTF-8) --2025-05-23 11:42:20-- http://localhost:8000/ Resolving localhost (localhost)... ::1, 127.0.0.1 Caching localhost => ::1 127.0.0.1 Connecting to localhost (localhost)|::1|:8000... connected. Created socket 5. Releasing 0x000000012a704140 (new refcount 1). ---request begin--- GET / HTTP/1.1 Host: localhost:8000 Accept: */* Accept-Encoding: identity Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.0 200 OK Server: SimpleHTTP/0.6 Python/3.13.3 Date: Fri, 23 May 2025 10:42:20 GMT Content-type: text/html Content-Length: 285 Last-Modified: Fri, 23 May 2025 10:15:29 GMT ---response end--- 200 OK Registered socket 5 for persistent reuse. Length: 285 [text/html] Saving to: ‘localhost:8000/index.html’ localhost:8000/index.html 100%[====================================================================================>] 285 --.-KB/s in 0s 2025-05-23 11:42:20 (67.9 MB/s) - ‘localhost:8000/index.html’ saved [285/285] Loaded localhost:8000/index.html (size 285). URI encoding = ‘UTF-8’ localhost:8000/index.html: merge(‘http://localhost:8000/’, ‘/blog’) -> http://localhost:8000/blog appending ‘http://localhost:8000/blog’ to urlpos. nofollow in localhost:8000/index.html: 0 Deciding whether to enqueue "http://localhost:8000/blog". Decided to load it. URI encoding = None Enqueuing http://localhost:8000/blog at depth 1 Queue count 1, maxcount 1. [IRI Enqueuing ‘http://localhost:8000/blog’ with None Dequeuing http://localhost:8000/blog at depth 1 Queue count 0, maxcount 1. Converted file name 'localhost:8000/blog' (UTF-8) -> 'localhost:8000/blog' (UTF-8) sleep_between_retrievals: avg=1.000000,sleep=1.144345 --2025-05-23 11:42:21-- http://localhost:8000/blog Disabling further reuse of socket 5. Closed fd 5 Found localhost in host_name_addresses_map (0x12a704140) Connecting to localhost (localhost)|::1|:8000... connected. Created socket 5. Releasing 0x000000012a704140 (new refcount 1). ---request begin--- GET /blog HTTP/1.1 Host: localhost:8000 Referer: http://localhost:8000/ Accept: */* Accept-Encoding: identity Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.0 301 Moved Permanently Server: SimpleHTTP/0.6 Python/3.13.3 Date: Fri, 23 May 2025 10:42:21 GMT Location: /blog/ Content-Length: 0 ---response end--- 301 Moved Permanently Registered socket 5 for persistent reuse. Location: /blog/ [following] ] done. URI content encoding = None Converted file name 'localhost:8000/blog' (UTF-8) -> 'localhost:8000/blog' (UTF-8) sleep_between_retrievals: avg=1.000000,sleep=0.771367 --2025-05-23 11:42:22-- http://localhost:8000/blog/ Disabling further reuse of socket 5. Closed fd 5 Found localhost in host_name_addresses_map (0x12a704140) Connecting to localhost (localhost)|::1|:8000... connected. Created socket 5. Releasing 0x000000012a704140 (new refcount 1). ---request begin--- GET /blog/ HTTP/1.1 Host: localhost:8000 Referer: http://localhost:8000/ Accept: */* Accept-Encoding: identity Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.0 200 OK Server: SimpleHTTP/0.6 Python/3.13.3 Date: Fri, 23 May 2025 10:42:22 GMT Content-type: text/html Content-Length: 282 Last-Modified: Fri, 23 May 2025 10:10:13 GMT ---response end--- 200 OK Registered socket 5 for persistent reuse. Length: 282 [text/html] Saving to: ‘localhost:8000/blog’ localhost:8000/blog 100%[====================================================================================>] 282 --.-KB/s in 0s 2025-05-23 11:42:22 (5.49 MB/s) - ‘localhost:8000/blog’ saved [282/282] Deciding whether to enqueue "http://localhost:8000/blog/". Decided to load it. Loaded localhost:8000/blog (size 282). URI encoding = ‘UTF-8’ localhost:8000/blog: merge(‘http://localhost:8000/blog/’, ‘great-food.html’) -> http://localhost:8000/blog/great-food.html appending ‘http://localhost:8000/blog/great-food.html’ to urlpos. URI encoding = ‘UTF-8’ localhost:8000/blog: merge(‘http://localhost:8000/blog/’, ‘great-drinks.html’) -> http://localhost:8000/blog/great-drinks.html appending ‘http://localhost:8000/blog/great-drinks.html’ to urlpos. nofollow in localhost:8000/blog: 0 Deciding whether to enqueue "http://localhost:8000/blog/great-food.html". Decided to load it. URI encoding = None Enqueuing http://localhost:8000/blog/great-food.html at depth 2 Queue count 1, maxcount 1. [IRI Enqueuing ‘http://localhost:8000/blog/great-food.html’ with None Deciding whether to enqueue "http://localhost:8000/blog/great-drinks.html". Decided to load it. URI encoding = None Enqueuing http://localhost:8000/blog/great-drinks.html at depth 2 Queue count 2, maxcount 2. [IRI Enqueuing ‘http://localhost:8000/blog/great-drinks.html’ with None Dequeuing http://localhost:8000/blog/great-food.html at depth 2 Queue count 1, maxcount 2. pathconf: Not a directory Converted file name 'localhost:8000/blog/great-food.html' (UTF-8) -> 'localhost:8000/blog/great-food.html' (UTF-8) sleep_between_retrievals: avg=1.000000,sleep=1.127769 --2025-05-23 11:42:23-- http://localhost:8000/blog/great-food.html Disabling further reuse of socket 5. Closed fd 5 Found localhost in host_name_addresses_map (0x12a704140) Connecting to localhost (localhost)|::1|:8000... connected. Created socket 5. Releasing 0x000000012a704140 (new refcount 1). ---request begin--- GET /blog/great-food.html HTTP/1.1 Host: localhost:8000 Referer: http://localhost:8000/blog/ Accept: */* Accept-Encoding: identity Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.0 200 OK Server: SimpleHTTP/0.6 Python/3.13.3 Date: Fri, 23 May 2025 10:42:23 GMT Content-type: text/html Content-Length: 346 Last-Modified: Fri, 23 May 2025 10:11:52 GMT ---response end--- 200 OK Registered socket 5 for persistent reuse. Length: 346 [text/html] Removing localhost:8000/blog because of directory danger! Saving to: ‘localhost:8000/blog/great-food.html’ localhost:8000/blog/great-food.html 100%[====================================================================================>] 346 --.-KB/s in 0s 2025-05-23 11:42:23 (13.2 MB/s) - ‘localhost:8000/blog/great-food.html’ saved [346/346] Loaded localhost:8000/blog/great-food.html (size 346). URI encoding = ‘UTF-8’ localhost:8000/blog/great-food.html: merge(‘ http://localhost:8000/blog/great-food.html’, ‘/’) -> http://localhost:8000/ appending ‘http://localhost:8000/’ to urlpos. nofollow in localhost:8000/blog/great-food.html: 0 Deciding whether to enqueue "http://localhost:8000/". Already on the black list. Decided NOT to load it. Dequeuing http://localhost:8000/blog/great-drinks.html at depth 2 Queue count 0, maxcount 2. Converted file name 'localhost:8000/blog/great-drinks.html' (UTF-8) -> 'localhost:8000/blog/great-drinks.html' (UTF-8) sleep_between_retrievals: avg=1.000000,sleep=1.466063 --2025-05-23 11:42:24-- http://localhost:8000/blog/great-drinks.html Disabling further reuse of socket 5. Closed fd 5 Found localhost in host_name_addresses_map (0x12a704140) Connecting to localhost (localhost)|::1|:8000... connected. Created socket 5. Releasing 0x000000012a704140 (new refcount 1). ---request begin--- GET /blog/great-drinks.html HTTP/1.1 Host: localhost:8000 Referer: http://localhost:8000/blog/ Accept: */* Accept-Encoding: identity Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.0 200 OK Server: SimpleHTTP/0.6 Python/3.13.3 Date: Fri, 23 May 2025 10:42:24 GMT Content-type: text/html Content-Length: 353 Last-Modified: Fri, 23 May 2025 10:11:32 GMT ---response end--- 200 OK Registered socket 5 for persistent reuse. Length: 353 [text/html] Saving to: ‘localhost:8000/blog/great-drinks.html’ localhost:8000/blog/great-drinks.html 100%[====================================================================================>] 353 --.-KB/s in 0s 2025-05-23 11:42:24 (782 KB/s) - ‘localhost:8000/blog/great-drinks.html’ saved [353/353] Loaded localhost:8000/blog/great-drinks.html (size 353). URI encoding = ‘UTF-8’ localhost:8000/blog/great-drinks.html: merge(‘ http://localhost:8000/blog/great-drinks.html’, ‘/’) -> http://localhost:8000/ appending ‘http://localhost:8000/’ to urlpos. nofollow in localhost:8000/blog/great-drinks.html: 0 Deciding whether to enqueue "http://localhost:8000/". Already on the black list. Decided NOT to load it. FINISHED --2025-05-23 11:42:24-- Total wall clock time: 4.6s Downloaded: 4 files, 1.2K in 0.001s (2.33 MB/s) Scanning localhost:8000/blog (from http://localhost:8000/blog/) localhost:8000/blog: Is a directory Converting links in localhost:8000/blog... nothing to do. Scanning localhost:8000/index.html (from http://localhost:8000/) Loaded localhost:8000/index.html (size 285). URI encoding = ‘UTF-8’ localhost:8000/index.html: merge(‘http://localhost:8000/’, ‘/blog’) -> http://localhost:8000/blog appending ‘http://localhost:8000/blog’ to urlpos. nofollow in localhost:8000/index.html: 0 URI encoding = ‘UTF-8’ will convert url http://localhost:8000/blog to local localhost:8000/blog Converting links in localhost:8000/index.html... 1. TO_RELATIVE: http://localhost:8000/blog to blog at position 246 in localhost:8000/index.html. 1-0 Scanning localhost:8000/blog/great-drinks.html (from http://localhost:8000/blog/great-drinks.html) Loaded localhost:8000/blog/great-drinks.html (size 353). URI encoding = ‘UTF-8’ localhost:8000/blog/great-drinks.html: merge(‘ http://localhost:8000/blog/great-drinks.html’, ‘/’) -> http://localhost:8000/ appending ‘http://localhost:8000/’ to urlpos. nofollow in localhost:8000/blog/great-drinks.html: 0 URI encoding = ‘UTF-8’ will convert url http://localhost:8000/ to local localhost:8000/index.html Converting links in localhost:8000/blog/great-drinks.html... 1. TO_RELATIVE: http://localhost:8000/ to ../index.html at position 311 in localhost:8000/blog/great-drinks.html. 1-0 Scanning localhost:8000/blog/great-food.html (from http://localhost:8000/blog/great-food.html) Loaded localhost:8000/blog/great-food.html (size 346). URI encoding = ‘UTF-8’ localhost:8000/blog/great-food.html: merge(‘ http://localhost:8000/blog/great-food.html’, ‘/’) -> http://localhost:8000/ appending ‘http://localhost:8000/’ to urlpos. nofollow in localhost:8000/blog/great-food.html: 0 URI encoding = ‘UTF-8’ will convert url http://localhost:8000/ to local localhost:8000/index.html Converting links in localhost:8000/blog/great-food.html... 1. TO_RELATIVE: http://localhost:8000/ to ../index.html at position 304 in localhost:8000/blog/great-food.html. 1-0 Converted links in 4 files in 0.005 seconds. Here are the versions I tried wget --version GNU Wget 1.25.0 built on darwin24.1.0. -cares +digest -gpgme +https +ipv6 +iri +large-file -metalink +nls +ntlm +opie -psl +ssl/openssl Wgetrc: /opt/homebrew/etc/wgetrc (system) Locale: /opt/homebrew/Cellar/wget/1.25.0/share/locale Compile: clang -DHAVE_CONFIG_H -DSYSTEM_WGETRC="/opt/homebrew/etc/wgetrc" -DLOCALEDIR="/opt/homebrew/Cellar/wget/1.25.0/share/locale" -I. -I../lib -I../lib -I/opt/homebrew/opt/openssl@3/include -I/opt/homebrew/Cellar/libidn2/2.3.7/include -DNDEBUG -g -O2 Link: clang -I/opt/homebrew/Cellar/libidn2/2.3.7/include -DNDEBUG -g -O2 -L/opt/homebrew/Cellar/libidn2/2.3.7/lib -lidn2 -L/opt/homebrew/opt/openssl@3/lib -lssl -lcrypto -ldl -lz ../lib/libgnu.a -liconv -lintl -Wl,-framework -Wl,CoreFoundation -Wl,-framework -Wl,CoreServices -lunistring And GNU Wget 1.21.4 built on linux-gnu. -cares +digest -gpgme +https +ipv6 +iri +large-file -metalink +nls +ntlm +opie +psl +ssl/openssl Wgetrc: /etc/wgetrc (system) Locale: /usr/share/locale Compile: gcc -DHAVE_CONFIG_H -DSYSTEM_WGETRC="/etc/wgetrc" -DLOCALEDIR="/usr/share/locale" -I. -I../../src -I../lib -I../../lib -Wdate-time -D_FORTIFY_SOURCE=3 -DHAVE_LIBSSL -DNDEBUG -g -O2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -ffile-prefix-map=/build/wget-SlgjzS/wget-1.21.4=. -flto=auto -ffat-lto-objects -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -mbranch-protection=standard -fdebug-prefix-map=/build/wget-SlgjzS/wget-1.21.4=/usr/src/wget-1.21.4-1ubuntu4.1 -DNO_SSLv2 -D_FILE_OFFSET_BITS=64 -g -Wall Link: gcc -DHAVE_LIBSSL -DNDEBUG -g -O2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -ffile-prefix-map=/build/wget-SlgjzS/wget-1.21.4=. -flto=auto -ffat-lto-objects -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -mbranch-protection=standard -fdebug-prefix-map=/build/wget-SlgjzS/wget-1.21.4=/usr/src/wget-1.21.4-1ubuntu4.1 -DNO_SSLv2 -D_FILE_OFFSET_BITS=64 -g -Wall -Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -Wl,-z,relro -Wl,-z,now -lpcre2-8 -luuid -lidn2 -lssl -lcrypto -lz -lpsl ../lib/libgnu.a Simple reproduction case, create the below files and directory structure. Then, provided you have python3 installed, run the following python command, for an http server to serve those files. You can then use the wget command above from a different directory and see the blog file get created then deleted python3 -m http.server index.html <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>Example Filehost</title> </head> <body> <h1>Welcome to Example Filehost</h1> <a href="/blog">Go to Blog</a> </body> </html> /blog/index.html <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>Blog Links</title> </head> <body> <h1>Welcome to the Blog</h1> <ul> <li><a href="great-food.html">Great Food</a></li> <li><a href="great-drinks.html">Great Drinks</a></li> </ul> </body> </html> /blog/great-drinks.html <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>Great Drinks</title> </head> <body> <h1>Great Drinks</h1> <p>Welcome to our blog post about great drinks! Here are a few of our favorites:</p> <ul> <li>Water - Simple but useful to stave off dehydration.</li> </ul> <a href="/">Back to the index</a> </body> </html> /blog/great-food.html <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>Great Food</title> </head> <body> <h1>Great Food</h1> <p>Welcome to our blog post about great food! Here are a few of our favorites:</p> <ul> <li>Bread - Simple but useful to stave off starvation.</li> </ul> <a href="/">Back to the index</a> </body> </html> I hope this is enough detail, let me know if you have any questions or need any more information Thanks folks, Richard