On 04/07/2011 05:26 AM, Giuseppe Scrivano wrote: > "David Skalinder" <[email protected]> writes: > >>> I want to mirror part of a website that contains two links pages, each of >>> which contains links to many root-level directories and also to the other >>> links page. I want to download recursively all the links from one links >>> page, but not from the other: that is, I want to tell wget "download >>> links1 and follow all of its links, but do not download or follow links >>> from links2". >>> >>> I've put a demo of this problem up at http://fangjaw.com/wgettest -- there >>> is a diagram there that might state the problem more clearly. >>> >>> This functionality seems so basic that I assume I must be overlooking >>> something. Clearly wget has been designed to give users control over >>> which files they download; but all I can find is that -X controls both >>> saving and link-following at the directory level, while -R controls saving >>> at the file level but still follows links from unsaved files. > > why doesn't -X work in the scenario you have described? If all links > from `links2' are under /B, you can exclude them using something like:
That scenario seems rather unlikely, unless we're talking about autogenerated folder index files... This issue would be resolved if wget had a way to avoid its current behavior of always unconditionally downloading HTML files regardless of what rejection rules say. Then you can just reject that single file (and if need be, download it as part of a separate session. -- Micah J. Cowan http://micah.cowan.name/
