Hello all, I am trying to do a recursive download of a webpage and span multiple hosts within the same domain, but not cross to other domains. The issue is that the crawl does extend to other domains. My full command is this:
wget \ --recursive \ --no-clobber \ --page-requisites \ --adjust-extension \ --span-hosts \ --domains=scapino.nl \ --no-parent \ --tries=2 \ --wait=1 \ --random-wait \ --waitretry=2 \ --header='User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36' \ https://www.scapino.nl/winkels/scapino-utrecht-510061 >From this combination of --span-hosts and --domains, I would expect to download assets from cdn.scapino.nl and www.scapino.nl, but not other domains. For some reason that I don't understand, wget also starts to do what looks like a full crawl of the domain werkenbijscapino.nl, which is referenced from the original page. Any thoughts or direction would be much appreciated. I am using wget 1.18 on Debian. Best regards, Friso
