Re: [Bug-wget] Prevent wget from redownloading when using, recursive option?

Allan Spiegel Mon, 28 Dec 2009 11:45:04 -0800

Hi,

I have a similar issue. I'm using wget recursively as a link checkingspider. I don't save the files downloaded, so the -c and -N optionswon't help me. What I'd love is if wget could keep a list of the linksit follows and not follow any link on the list. As it is, I download250K links of which only 70K are unique.I'm thinking this is a feature request, but if there's a way I can cutdown on the extra downloads today, I'd love to know it.


Here's the command I use:

wget --input-file=spider_pages.html --force-html --no-cache--no-check-certificate --recursive --page-requisites --no-parent -e"robots=off" --delete-after --no-directories --no-host-directories--no-verbose


Thanks
--Allan

Message: 1
Date: Sun, 27 Dec 2009 13:10:25 -0800
From: Micah Cowan <[email protected]>
Subject: Re: [Bug-wget] Prevent wget from redownloading when using
        recurise        option?
To: David <[email protected]>
Cc: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=ISO-8859-1

David wrote:

Is there a way to prevent wget from redownloading files it has already
downloaded when using the recursive -r option? I know that -c is used
when downloading a large file but I wasn't sure if it also could be
used to accomplish this. It seems like even if it was set not to
download files it would still have to check to make sure the file had
been completely downloaded. Right now it's hard for me to tell if this
is its behavior when using -rc as the individual files are small and
thus do not take long to download (I cannot tell if wget is actually
downloading the full file or just requesting the file's size from the
server and moving on upon seeing that the file is already complete.


I typically use -rc. -rN is also a possibility.

Re: [Bug-wget] Prevent wget from redownloading when using, recursive option?

Reply via email to