On Freitag, 7. Oktober 2016 15:40:55 CEST Dale R. Worley wrote:
> Tim Ruehsen <tim.rueh...@gmx.de> writes:
> > the changes in recur.c are not acceptable. They circumvent too many checks
> > like host-spanning, excludes and even --https-only.
> 
> I suppose it depends on what you consider the semantics to be.
> Generally, I look at it if I've specified to download http://x/y/z and
> http://x/y/z redirects to http://a/b/c, if http://x/y/z passes the tests
> I've specified, then the page should be downloaded; the fact that it's
> redirected to http://a/b/c is incidental.  Most checks *should* be
> circumvented.
> 
> I guess I'd make exceptions for --https-only, which is presumably
> placing a requirement on *how* the pages should be fetched, and probably
> the robots check, as that's a policy statement by the server.

If you become redirected to another host/domain, it is wget policy not to do 
so except the user explicitly states it (--span-host or --domains).

Your case is a redirection within the same domain - which my patch considers 
to be ok (even if that redirection contains an explicitly unwanted path/
component). Even that might be dangerous as a default behavior- that is why I 
want to see some more opinions.

We could add another cli option for fine-tuning here.

Tim

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to