trouble with -p

2008-07-19 Thread Brian Keck
Hello,

If you do

wget http://www.ifixit.com/Guide/First-Look/iPhone3G

then you get an HTML file called iPhone3G.

But if you do

wget -p http://www.ifixit.com/Guide/First-Look/iPhone3G

then you get a directory called iPhone3G.  

This makes sense if you look at the links in the HTML file, like

/Guide/First-Look/iPhone3G/images/3jYKHyIVrAHnG4Br-standard.jpg

But of course I want both.  Is there a way of getting wget -p to do
something clever, like renaming the HTML file?  I've looked through
wget(1)  /usr/share/doc/wget  the comments in the 1.10.2 source
without seeing anything relevant.

Thanks,
Brian Keck


Re: trouble with -p

2008-07-19 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Brian Keck wrote:
 Hello,
 
 If you do
 
 wget http://www.ifixit.com/Guide/First-Look/iPhone3G
 
 then you get an HTML file called iPhone3G.
 
 But if you do
 
 wget -p http://www.ifixit.com/Guide/First-Look/iPhone3G
 
 then you get a directory called iPhone3G.  
 
 This makes sense if you look at the links in the HTML file, like
 
 /Guide/First-Look/iPhone3G/images/3jYKHyIVrAHnG4Br-standard.jpg
 
 But of course I want both.  Is there a way of getting wget -p to do
 something clever, like renaming the HTML file?  I've looked through
 wget(1)  /usr/share/doc/wget  the comments in the 1.10.2 source
 without seeing anything relevant.

That strikes me as not quite right. If Wget sees
http://www.ifixit.com/Guide/First-Look/iPhone3G, and it's not redirected
to http://www.ifixit.com/Guide/First-Look/iPhone3G/, then Wget will use
a file name. What's more, if it later sees it with the slash, it will
fail to create a directory at all, since the file already exists with
that pathname.

I'm not sure what you mean by I want both. You can't possibly have a
regular file named iPhone3G, and another file named iPhone3G/images/...
it can't be both a file and a directory at once.

If you specify the link with a trailing slash, then Wget will realize
iPhone3G is a directory, and will store the file it finds there as
iPhone3G/index.html. You're out of luck, though, if some links refer to
it with, and some without, the trailing slash, with a server that
doesn't redirect to the slash version (like Apache does).

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer,
and GNU Wget Project Maintainer.
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIgiPA7M8hyUobTrERAmq8AJ96TyBcrdI0YB06Z2tODRCMSI22AgCggESe
jgXOMQ+uNMupbgq0vJZByv0=
=jzGB
-END PGP SIGNATURE-


Re: trouble with -p

2008-07-19 Thread James Cloos
 Micah == Micah Cowan [EMAIL PROTECTED] writes:

Micah I'm not sure what you mean by I want both. 

He means that, when the -p option is given, he wants to mangle either
the created filename or the created directory name so that both do in
fact get created on the filesystem and all related files get saved.

Perhaps delaying the initial open(2) until after parsing the first
document and then pretending that the initial URL had a trailing
solidus might work?

-JimC
-- 
James Cloos [EMAIL PROTECTED] OpenPGP: 1024D/ED7DAEA6


Re: trouble with -p

2008-07-19 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

James Cloos wrote:
 Micah == Micah Cowan [EMAIL PROTECTED] writes:
 
 Micah I'm not sure what you mean by I want both. 
 
 He means that, when the -p option is given, he wants to mangle either
 the created filename or the created directory name so that both do in
 fact get created on the filesystem and all related files get saved.
 
 Perhaps delaying the initial open(2) until after parsing the first
 document and then pretending that the initial URL had a trailing
 solidus might work?

Not possible with the current architecture. And that wouldn't solve the
problem if it happens not to appear that way in the links immediately
contained within.

https://savannah.gnu.org/bugs/index.php?23756 covers my solution for
handling this.

The easy workaround for now, though, would be to supply the URL with the
solidus in the first place, though as mentioned, I'm not sure that will
work if it then later encounters a version without the solidus.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer,
and GNU Wget Project Maintainer.
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIgjPS7M8hyUobTrERArzeAJ90f55hIfPc4Rg/+q/mey7fNXQj9ACfV8ZL
TNzLJKLVkB2J6EVJcMbwqW4=
=jKGB
-END PGP SIGNATURE-