After rereading the wget's manual: ~ gnu.org/software/wget/manual/wget.html ~ I still don't see how to include the following options: ~ 1) say you need to get all linked pages and requisites (style sheets, images, ...) starting from a certain URL ONLY if they belong to the same domain as the one of the URL, and, ~ 2) if externally linked, get only the one page (with its requisites) without crawling that other site ~ 3) some regexp functionality to discriminate certain sites more finely. ~ the thing is that with '--domains=domain-list' you can specify that domain, but you can not allow for single external pages and some sites include the exact same information in more than one language and you may not be interested in downloading it in all available languages ~ Can you achieve this using wget? ~ Anyway, wget may not have been exactly designed to do this. Do you know of any other os project that would, for example, consolidate many pages into one, parse out all inline script from, say, googlesyndication.com and such crud, ...? ~ thanks lbrtchx
