Re: trouble with -p
On Sun, 20 Jul 2008 23:08:56 +0200, Matthias Vill wrote: Brian Keck schrieb: If you do wget http://www.ifixit.com/Guide/First-Look/iPhone3G then you get an HTML file called iPhone3G. But if you do wget -p http://www.ifixit.com/Guide/First-Look/iPhone3G then you get a directory called iPhone3G. ... But of course I want both. Is there a way of getting wget -p to do something clever, like renaming the HTML file? ... maybe this helps: --html-extension That's what I was hoping for. At least it works for the above. (It also renames diggthis.js to diggthis.js.html, but I don't care about that). Thanks, Brian Keck
Re: trouble with -p
On Sat, 19 Jul 2008 10:26:25 MST, Micah Cowan wrote: Brian Keck wrote: If you do wget http://www.ifixit.com/Guide/First-Look/iPhone3G then you get an HTML file called iPhone3G. But if you do wget -p http://www.ifixit.com/Guide/First-Look/iPhone3G then you get a directory called iPhone3G. ... If you specify the link with a trailing slash, then Wget will realize iPhone3G is a directory, and will store the file it finds there as iPhone3G/index.html. ... I should have thought of adding a trailing slash ... it works in this case. Thanks, Brian Keck
trouble with -p
Hello, If you do wget http://www.ifixit.com/Guide/First-Look/iPhone3G then you get an HTML file called iPhone3G. But if you do wget -p http://www.ifixit.com/Guide/First-Look/iPhone3G then you get a directory called iPhone3G. This makes sense if you look at the links in the HTML file, like /Guide/First-Look/iPhone3G/images/3jYKHyIVrAHnG4Br-standard.jpg But of course I want both. Is there a way of getting wget -p to do something clever, like renaming the HTML file? I've looked through wget(1) /usr/share/doc/wget the comments in the 1.10.2 source without seeing anything relevant. Thanks, Brian Keck
bug in escaped filename calculation?
Hello, I'm wondering if I've found a bug in the excellent wget. I'm not asking for help, because it turned out not to be the reason one of my scripts was failing. The possible bug is in the derivation of the filename from a URL which contains UTF-8. The case is: wget http://en.wikipedia.org/wiki/%C3%87atalh%C3%B6y%C3%BCk Of course these are all ascii characters, but underlying it are 3 nonascii characters, whose UTF-8 encoding is: hexoctal name --- - C387 303 274 C-cedilla C3B6 303 266 o-umlaut C3BC 303 274 u-umlaut The file created has a name that's almost, but not quite, a valid UTF-8 bytestring ... ls *y*k | od -tc 000 303 % 8 7 a t a l h 303 266 y 303 274 k \n Ie the o-umlaut u-umlaut UTF-8 encodings occur in the bytestring, but the UTF-8 encoding of C-cedilla has its 2nd byte replaced by the 3-byte string %87. I'm guessing this is not intended. I would have sent a fix too, but after finding my way through http.c retr.c I got lost in url.c. Brian Keck
Re: trouble with -p
On Sun, 12 Aug 2007 19:44:36 MST, Micah Cowan wrote: Brian Keck wrote: Sometimes -p doesn't work. For instance: ... You want the -H option. Thanks, so I do, Brian Keck
trouble with -p
Hello, Sometimes -p doesn't work. For instance: wget -p http://en.wikipedia.org/wiki/Herbig-Haro_object This fetches several images from en.wikipedia.org, but none of the several images from upload.wikimedia.org. Is this normal behaviour? There's some javascript, but it looks harmless. The source in the neighbourhood of one of the omitted images (tidied up a bit) is ... div class=thumbinner style=width:152px; a href=http://en.wikipedia.org/wiki/Image:HH_object_diagram.svg; class=internal title=Schematic diagram of how HH objects arise img alt=Schematic diagram of how HH objects arise longdesc=/wiki/Image:HH_object_diagram.svg class=thumbimage src=http://upload.wikimedia.org/wikipedia/commons/thumb/6/6a/HH_object_diagram.svg/150px-HH_object_diagram.svg.png; width=150 height=300 / /a I'm using wget 1.10.2 on debian unstable. Thanks for any help, Brian Keck