Re: Support for file://

David Mon, 22 Sep 2008 03:04:34 -0700

Hi Micah,

Your're right - this was raised before and in fact it was a feature Mauro 
Tortonesi intended to be implemented for the 1.12 release, but it seems to have 
been forgotten somewhere along the line. I wrote to the list in 2006 describing 
what I consider a compelling reason to support file://. Here is what I wrote 
then:

At 03:45 PM 26/06/2006, David wrote:
In replies to the post requesting support of the "file://" scheme, requests 
were made for someone to provide a compelling reason to want to do this. 
Perhaps the following is such a reason.
I have a CD with HTML content (it is a CD of abstracts from a scientific 
conference), however for space reasons not all the content was included on the 
CD - there remain links to figures and diagrams on a remote web site. I'd like 
to create an archive of the complete content locally by having wget retrieve 
everything and convert the links to point to the retrieved material. Thus the 
wget functionality when retrieving the local files should work the same as if 
the files were retrieved from a web server (i.e. the input local file needs to 
be processed, both local and remote content retrieved, and the copies made of 
the local and remote files all need to be adjusted to now refer to the local 
copy rather than the remote content). A simple shell script that runs cp or 
rsync on local files without any further processing would not achieve this aim.
Regarding to where the local files should be copied, I suggest a default scheme 
similar to current http functionality. For example, if the local source was 
/source/index.htm, and I ran something like:
   wget.exe -m -np -k file:///source/index.htm
this could be retrieved to ./source/index.htm (assuming that I ran the command 
from anywhere other than the root directory). On Windows,  if the local source 
file is c:\test.htm,  then the destination could be .\c\test.htm. It would 
probably be fair enough for wget to throw up an error if the source and 
destination were the same file (and perhaps helpfully suggest that the user 
changes into a new subdirectory and retry the command).
One additional problem this scheme needs to deal with is when one or more /../ 
in the path specification results in the destination being above the current 
parent directory; then  the destination would have to be adjusted to ensure the 
file remained within the parent directory structure. For example, if I am in 
/dir/dest/ and ran
   wget.exe -m -np -k file://../../source/index.htm
this could be saved to ./source/index.htm  (i.e. /dir/dest/source/index.htm)
-David. 

At 08:49 AM 3/09/2008, you wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Petri Koistinen wrote:
> Hi,
> 
> I would be nice if wget would also support file://.

Feel free to file an issue for this (I'll mark it "Needs Discussion" and
set at low priority). I'd thought there was already an issue for this,
but can't find it (either open or closed). I know this has come up
before, at least.

I think I'd need some convincing on this, as well as a clear definition
of what the scope for such a feature ought to be. Unlike curl, which
"groks urls", Wget "W(eb)-gets", and file:// can't really be argued to
be part of the web.

That in and of itself isn't really a reason not to support it, but my
real misgivings have to do with the existence of various excellent tools
that already do local-file transfers, and likely do it _much_ better
than Wget could hope to. Rsync springs readily to mind.

Even the system "cp" command is likely to handle things much better than
Wget. In particular, special OS-specific, extended file attributes,
extended permissions and the like, are among the things that existing
system tools probably handle quite well, and that Wget is unlikely to. I
don't really want Wget to be in the business of duplicating the system
"cp" command, but I might conceivably not mind "file://" support if it
means simple _content_ transfer, and not actual file duplication.

Also in need of addressing is what "recursion" should mean for file://.
Between ftp:// and http://, "recursion" currently means different
things. In FTP, it means "traverse the file hierarchy recursively",
whereas in HTTP it means "traverse links recursively". I'm guessing
file:// should work like FTP (i.e., recurse when the path is a
directory, ignore HTML-ness), but anyway this is something that'd need
answering.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIvcLq7M8hyUobTrERAl6YAJ9xeTINVkuvl8HkElYlQt7dAsUfHACfXRT3
lNR++Q0XMkcY4c6dZu0+gi4=
=mKqj
-----END PGP SIGNATURE-----

      Make the switch to the world&#39;s best email. Get Yahoo!7 Mail! 
http://au.yahoo.com/y7mail

Re: Support for file://

Reply via email to