On 06/06/2010 01:46 PM, Guillaume Turri wrote: > Tony Lewis a écrit : >> Guillaume Turri wrote: >> >> >>> In fact, why is this option treated after a download? >>> >> >> When mirroring, all HTML files have to be downloaded (whether or not >> it is >> desired to ultimately keep the HTML file) in order to find all the >> interesting file. For example: >> >> wget http://www.somesite.com/index.html --mirror --accept=pdf > Indeed. I didn't realise it could be used that way. > > Thank you for this explanation.
Yeah, that was the original thinking. But I still hate it. For one thing, there are no longer any guarantees that recurse-able HTML files end in ".html"; for another, it does the wrong thing if you want to do -r -l1 -A.pdf (just grab all the pdf links from the given page. It's better to let you explicitly specifiy what files to download, and a separately specified set of files to be deleted afterwards (or more accurately, files to download only for parsing/recursion purposes, as at some point in the future we might not actually download all files directly to disk just in order to parse them). -- Micah J. Cowan http://micah.cowan.name/
