Matthew White <[email protected]> writes: > wget --recursive \ > --page-requisites \ > --convert-links \ > --domains="www.iana.org" \ > --reject "robots.txt","reports","contact" \ > > --exclude-directories="/go,/assignments,/_img,/_js,/_css,/domains,/performance,/about,/protocols,/procedures,/dnssec,/reports,/help,/abuse,/numbers,/reviews,/time-zones,/2000,/2001" > \ > http://www.iana.org/assignments/index.html
True, using --exclude-directories I can isolate what I want, but as you note, that requires actually knowing all of the children of the root in advance. Whereas it seems to me that there should be a straightforward way of instructing wget to exclude "everything but X". > wget --recursive \ > --no-clobber \ > --page-requisites \ > --adjust-extension \ > --convert-links \ > --span-hosts \ > --domains="www.iana.org" \ > http://www.iana.org/assignments/index.html As you said, that command returned lots of things that aren't in http://www.iana.org/assignments. Dale
