Re: trouble with -p

2008-07-24 Thread Brian Keck

On Sun, 20 Jul 2008 23:08:56 +0200, Matthias Vill wrote:
Brian Keck schrieb:
 If you do
 wget http://www.ifixit.com/Guide/First-Look/iPhone3G
 then you get an HTML file called iPhone3G.
 But if you do
 wget -p http://www.ifixit.com/Guide/First-Look/iPhone3G
 then you get a directory called iPhone3G.  
 ...
 But of course I want both.  Is there a way of getting wget -p to do
 something clever, like renaming the HTML file?  
 ...
maybe this helps:
--html-extension

That's what I was hoping for.

At least it works for the above.

(It also renames diggthis.js to diggthis.js.html, but I don't care about
that).

Thanks,
Brian Keck


Re: trouble with -p

2008-07-23 Thread Brian Keck

On Sat, 19 Jul 2008 10:26:25 MST, Micah Cowan wrote:
Brian Keck wrote:
If you do
wget http://www.ifixit.com/Guide/First-Look/iPhone3G
then you get an HTML file called iPhone3G.
But if you do
wget -p http://www.ifixit.com/Guide/First-Look/iPhone3G
then you get a directory called iPhone3G.  
...
If you specify the link with a trailing slash, then Wget will realize
iPhone3G is a directory, and will store the file it finds there as
iPhone3G/index.html. 
...

I should have thought of adding a trailing slash ... it works in this
case.

Thanks,
Brian Keck



trouble with -p

2008-07-19 Thread Brian Keck
Hello,

If you do

wget http://www.ifixit.com/Guide/First-Look/iPhone3G

then you get an HTML file called iPhone3G.

But if you do

wget -p http://www.ifixit.com/Guide/First-Look/iPhone3G

then you get a directory called iPhone3G.  

This makes sense if you look at the links in the HTML file, like

/Guide/First-Look/iPhone3G/images/3jYKHyIVrAHnG4Br-standard.jpg

But of course I want both.  Is there a way of getting wget -p to do
something clever, like renaming the HTML file?  I've looked through
wget(1)  /usr/share/doc/wget  the comments in the 1.10.2 source
without seeing anything relevant.

Thanks,
Brian Keck


bug in escaped filename calculation?

2007-10-04 Thread Brian Keck

Hello,

I'm wondering if I've found a bug in the excellent wget.
I'm not asking for help, because it turned out not to be the reason
one of my scripts was failing.

The possible bug is in the derivation of the filename from a URL which
contains UTF-8.

The case is:

  wget http://en.wikipedia.org/wiki/%C3%87atalh%C3%B6y%C3%BCk

Of course these are all ascii characters, but underlying it are
3 nonascii characters, whose UTF-8 encoding is:

  hexoctal name
    ---  -
  C387  303 274  C-cedilla
  C3B6  303 266  o-umlaut
  C3BC  303 274  u-umlaut

The file created has a name that's almost, but not quite, a valid UTF-8
bytestring ... 

  ls *y*k | od -tc
  000 303   %   8   7   a   t   a   l   h 303 266   y 303 274   k  \n

Ie the o-umlaut  u-umlaut UTF-8 encodings occur in the bytestring,
but the UTF-8 encoding of C-cedilla has its 2nd byte replaced by the
3-byte string %87.

I'm guessing this is not intended.  

I would have sent a fix too, but after finding my way through http.c 
retr.c I got lost in url.c.

Brian Keck


Re: trouble with -p

2007-08-13 Thread Brian Keck
On Sun, 12 Aug 2007 19:44:36 MST, Micah Cowan wrote:
Brian Keck wrote:
 Sometimes -p doesn't work.  For instance:
...
You want the -H option.

Thanks, so I do,
Brian Keck


trouble with -p

2007-08-12 Thread Brian Keck

Hello,

Sometimes -p doesn't work.  For instance:

wget -p http://en.wikipedia.org/wiki/Herbig-Haro_object

This fetches several images from en.wikipedia.org, but none of the
several images from upload.wikimedia.org.

Is this normal behaviour?

There's some javascript, but it looks harmless.  The source in the neighbourhood
of one of the omitted images (tidied up a bit) is ...

div class=thumbinner style=width:152px;
a href=http://en.wikipedia.org/wiki/Image:HH_object_diagram.svg;
  class=internal title=Schematic diagram of how HH objects arise
img alt=Schematic diagram of how HH objects arise
  longdesc=/wiki/Image:HH_object_diagram.svg class=thumbimage
  
src=http://upload.wikimedia.org/wikipedia/commons/thumb/6/6a/HH_object_diagram.svg/150px-HH_object_diagram.svg.png;
  width=150 height=300 /
/a

I'm using wget 1.10.2 on debian unstable.

Thanks for any help,
Brian Keck