Re: [Bug-wget] How do I tell wget not to follow links in a file?

Micah Cowan Thu, 07 Apr 2011 08:42:03 -0700

On 04/07/2011 05:26 AM, Giuseppe Scrivano wrote:
> "David Skalinder" <[email protected]> writes:
> 
>>> I want to mirror part of a website that contains two links pages, each of
>>> which contains links to many root-level directories and also to the other
>>> links page.  I want to download recursively all the links from one links
>>> page, but not from the other: that is, I want to tell wget "download
>>> links1 and follow all of its links, but do not download or follow links
>>> from links2".
>>>
>>> I've put a demo of this problem up at http://fangjaw.com/wgettest -- there
>>> is a diagram there that might state the problem more clearly.
>>>
>>> This functionality seems so basic that I assume I must be overlooking
>>> something.  Clearly wget has been designed to give users control over
>>> which files they download; but all I can find is that -X controls both
>>> saving and link-following at the directory level, while -R controls saving
>>> at the file level but still follows links from unsaved files.
> 
> why doesn't -X work in the scenario you have described?  If all links
> from `links2' are under /B, you can exclude them using something like:


That scenario seems rather unlikely, unless we're talking about
autogenerated folder index files...

This issue would be resolved if wget had a way to avoid its current
behavior of always unconditionally downloading HTML files regardless of
what rejection rules say. Then you can just reject that single file (and
if need be, download it as part of a separate session.

-- 
Micah J. Cowan
http://micah.cowan.name/

Re: [Bug-wget] How do I tell wget not to follow links in a file?

Reply via email to