Re: [Bug-wget] How to ignore link like "index.html?lang=ja"?

Micah Cowan Sun, 06 Jun 2010 13:54:46 -0700

On 06/06/2010 01:46 PM, Guillaume Turri wrote:
> Tony Lewis a écrit :
>> Guillaume Turri wrote:
>>
>>  
>>> In fact, why is this option treated after a download?
>>>     
>>
>> When mirroring, all HTML files have to be downloaded (whether or not
>> it is
>> desired to ultimately keep the HTML file) in order to find all the
>> interesting file. For example:
>>
>> wget http://www.somesite.com/index.html --mirror --accept=pdf
> Indeed. I didn't realise it could be used that way.
> 
> Thank you for this explanation.


Yeah, that was the original thinking. But I still hate it. For one
thing, there are no longer any guarantees that recurse-able HTML files
end in ".html"; for another, it does the wrong thing if you want to do
-r -l1 -A.pdf (just grab all the pdf links from the given page. It's
better to let you explicitly specifiy what files to download, and a
separately specified set of files to be deleted afterwards (or more
accurately, files to download only for parsing/recursion purposes, as at
some point in the future we might not actually download all files
directly to disk just in order to parse them).

-- 
Micah J. Cowan
http://micah.cowan.name/

Re: [Bug-wget] How to ignore link like "index.html?lang=ja"?

Reply via email to