So I've run into another version of the problem: I'm using --page-requisites, and they're getting filtered in much the same way as redirections. However, the new fixes don't change that behavior.
The example case is that $ wget --mirror --convert-links --page-requisites --limit-rate=20k \ --include-directories=/assignments \ http://www.iana.org/assignments/index.html does not fetch the CSS specified by http://www.iana.org/assignments/index.html in <link rel="stylesheet" media="screen" href="../_css/2015.1/screen.css"/> which is http://www.iana.org/_css/2015.1/screen.css. It looks like requisite URLs are flagged with link_inline_p of struct urlpos true. If that flag is set and opt.page_requisites is set, then test 4 of download_child is suppressed (which is the --no-parent test). This change seems to add the same logic as is applied to redirections: diff --git a/src/recur.c b/src/recur.c index 1469e31..b1f9109 100644 --- a/src/recur.c +++ b/src/recur.c @@ -462,6 +462,12 @@ retrieve_tree (struct url *start_url_parsed, struct iri *pi) r = download_child (child, url_parsed, depth, start_url_parsed, blacklist, i); + if (child->link_inline_p && + (reason == WG_RR_LIST || reason == WG_RR_REGEX)) + { + DEBUGP (("Ignoring decision for page requisite, decided to load it.\n")); + reason = WG_RR_SUCCESS; + } if (r == WG_RR_SUCCESS) { ci = iri_new (); and it has the expected effect, the requisites for index.html are downloaded. I've attached a patch for this that includes an update to the manual page. Although the update to the manual page doesn't mention the suppression of the --no-parent test. Dale
diff --git a/doc/wget.texi b/doc/wget.texi index f42773e..04d1562 100644 --- a/doc/wget.texi +++ b/doc/wget.texi @@ -2289,7 +2289,11 @@ wget -p http://@var{site}/1.html @end example Note that Wget will behave as if @samp{-r} had been specified, but only -that single page and its requisites will be downloaded. Links from that +that single page and its requisites will be downloaded. +(As with @samp{-r}, the @samp{--include-directories}, +@samp{--exclude-directories}, @samp{--accept-regex}, and @samp{--reject-regex} +tests are not applied to page requisites.) +Links from that page to external documents will not be followed. Actually, to download a single page and all its requisites (even if they exist on separate websites), and make sure the lot displays properly locally, this author diff --git a/src/recur.c b/src/recur.c index 1469e31..fdb1d2e 100644 --- a/src/recur.c +++ b/src/recur.c @@ -462,6 +462,12 @@ retrieve_tree (struct url *start_url_parsed, struct iri *pi) r = download_child (child, url_parsed, depth, start_url_parsed, blacklist, i); + if (child->link_inline_p && + (r == WG_RR_LIST || r == WG_RR_REGEX)) + { + DEBUGP (("Ignoring decision for page requisite, decided to load it.\n")); + r = WG_RR_SUCCESS; + } if (r == WG_RR_SUCCESS) { ci = iri_new ();