On Wed, 03 Aug 2016 11:46:22 -0400 [email protected] (Dale R. Worley) wrote:
> Matthew White <[email protected]> writes: > > wget --recursive \ > > --page-requisites \ > > --convert-links \ > > --domains="www.iana.org" \ > > --reject "robots.txt","reports","contact" \ > > > > --exclude-directories="/go,/assignments,/_img,/_js,/_css,/domains,/performance,/about,/protocols,/procedures,/dnssec,/reports,/help,/abuse,/numbers,/reviews,/time-zones,/2000,/2001" > > \ > > http://www.iana.org/assignments/index.html > > True, using --exclude-directories I can isolate what I want, but as you > note, that requires actually knowing all of the children of the root in > advance. Whereas it seems to me that there should be a straightforward > way of instructing wget to exclude "everything but X". > > > wget --recursive \ > > --no-clobber \ > > --page-requisites \ > > --adjust-extension \ > > --convert-links \ > > --span-hosts \ > > --domains="www.iana.org" \ > > http://www.iana.org/assignments/index.html > > As you said, that command returned lots of things that aren't in > http://www.iana.org/assignments. > > Dale Hi Dale! Quick update. I'm trying the first command you mentioned in "reverse" with a combination of -A, -R, --accept-regex, --reject-regex, -I, and -X. Still no good results for "exclude all, include this and that". [to build an exclude/include list you need to experiment a little] Later, Matthew -- Matthew White <[email protected]>
pgpq8vrKScrQh.pgp
Description: PGP signature
