Matthew White <[email protected]> writes:
> wget --recursive                               \
>      --page-requisites                         \
>      --convert-links                           \
>      --domains="www.iana.org"                  \
>      --reject "robots.txt","reports","contact" \
>      
> --exclude-directories="/go,/assignments,/_img,/_js,/_css,/domains,/performance,/about,/protocols,/procedures,/dnssec,/reports,/help,/abuse,/numbers,/reviews,/time-zones,/2000,/2001"
>  \
> http://www.iana.org/assignments/index.html

True, using --exclude-directories I can isolate what I want, but as you
note, that requires actually knowing all of the children of the root in
advance.  Whereas it seems to me that there should be a straightforward
way of instructing wget to exclude "everything but X".

> wget --recursive              \
>      --no-clobber             \
>      --page-requisites        \
>      --adjust-extension       \
>      --convert-links          \
>      --span-hosts             \
>      --domains="www.iana.org" \
>      http://www.iana.org/assignments/index.html

As you said, that command returned lots of things that aren't in
http://www.iana.org/assignments.

Dale

Reply via email to