Re: Support for file://

2008-09-27 Thread Petr Pisar

Michelle Konzack napsal(a):

Am 2008-09-20 22:05:35, schrieb Micah Cowan:

I'm confused. If you can successfully download the files from
HOSTINGPROVIDER in the first place, then why would a difference exist?
And if you can't, then this wouldn't be an effective way to find out.


I mean, IF you have a local (master) mirror and your  website  @ISP  and
you want to know, whether the two websites are  identical  and  have  no
cruft in it, you can


I didn't follow this thread, however, just FYI, there exist excellent
(not only) FTP client called lftp that has built-in command mirror.
The command has similar effect as rsync tool---i.e. it synchronize 
remote and local directories recursively.


-- Petr




signature.asc
Description: OpenPGP digital signature


Re: Support for file://

2008-09-26 Thread Michelle Konzack
Am 2008-09-20 22:05:35, schrieb Micah Cowan:
 I'm confused. If you can successfully download the files from
 HOSTINGPROVIDER in the first place, then why would a difference exist?
 And if you can't, then this wouldn't be an effective way to find out.

I mean, IF you have a local (master) mirror and your  website  @ISP  and
you want to know, whether the two websites are  identical  and  have  no
cruft in it, you can

  1)  fetch the website from your isp recursively with
  wget -r -nH -R /tmp/tmp_ISP http://website.isp.tld/

  2)  fetch the local mirror with
  wget -r -nH -R /tmp/tmp_LOC file://path/to/local/mirror/

where the full path in 2) would be the same as the website in 1) and
then compare it with

  3)  /path/to/local/mirror/

If you have edited the files local and remote, you  can  get  surprising
results.

Fetching recursive of /index.html mean, that ALL  files  are  downloaded
which are mentioned in ANY HTML files.  So if 1) differs from

ftp://website.isp.tld/

then there is something wrong in the site...


Thanks, Greetings and nice Day/Evening
Michelle Konzack
Systemadministrator
24V Electronic Engineer
Tamay Dogan Network
Debian GNU/Linux Consultant


-- 
Linux-User #280138 with the Linux Counter, http://counter.li.org/
# Debian GNU/Linux Consultant #
Michelle Konzack   Apt. 917  ICQ #328449886
+49/177/935194750, rue de Soultz MSN LinuxMichi
+33/6/61925193 67100 Strasbourg/France   IRC #Debian (irc.icq.com)


signature.pgp
Description: Digital signature


Re: Support for file://

2008-09-22 Thread David

Hi Micah,

Your're right - this was raised before and in fact it was a feature Mauro 
Tortonesi intended to be implemented for the 1.12 release, but it seems to have 
been forgotten somewhere along the line. I wrote to the list in 2006 describing 
what I consider a compelling reason to support file://. Here is what I wrote 
then:

At 03:45 PM 26/06/2006, David wrote:
In replies to the post requesting support of the file:// scheme, requests 
were made for someone to provide a compelling reason to want to do this. 
Perhaps the following is such a reason.
I have a CD with HTML content (it is a CD of abstracts from a scientific 
conference), however for space reasons not all the content was included on the 
CD - there remain links to figures and diagrams on a remote web site. I'd like 
to create an archive of the complete content locally by having wget retrieve 
everything and convert the links to point to the retrieved material. Thus the 
wget functionality when retrieving the local files should work the same as if 
the files were retrieved from a web server (i.e. the input local file needs to 
be processed, both local and remote content retrieved, and the copies made of 
the local and remote files all need to be adjusted to now refer to the local 
copy rather than the remote content). A simple shell script that runs cp or 
rsync on local files without any further processing would not achieve this aim.
Regarding to where the local files should be copied, I suggest a default scheme 
similar to current http functionality. For example, if the local source was 
/source/index.htm, and I ran something like:
   wget.exe -m -np -k file:///source/index.htm
this could be retrieved to ./source/index.htm (assuming that I ran the command 
from anywhere other than the root directory). On Windows,  if the local source 
file is c:\test.htm,  then the destination could be .\c\test.htm. It would 
probably be fair enough for wget to throw up an error if the source and 
destination were the same file (and perhaps helpfully suggest that the user 
changes into a new subdirectory and retry the command).
One additional problem this scheme needs to deal with is when one or more /../ 
in the path specification results in the destination being above the current 
parent directory; then  the destination would have to be adjusted to ensure the 
file remained within the parent directory structure. For example, if I am in 
/dir/dest/ and ran
   wget.exe -m -np -k file://../../source/index.htm
this could be saved to ./source/index.htm  (i.e. /dir/dest/source/index.htm)
-David. 


At 08:49 AM 3/09/2008, you wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Petri Koistinen wrote:
 Hi,
 
 I would be nice if wget would also support file://.

Feel free to file an issue for this (I'll mark it Needs Discussion and
set at low priority). I'd thought there was already an issue for this,
but can't find it (either open or closed). I know this has come up
before, at least.

I think I'd need some convincing on this, as well as a clear definition
of what the scope for such a feature ought to be. Unlike curl, which
groks urls, Wget W(eb)-gets, and file:// can't really be argued to
be part of the web.

That in and of itself isn't really a reason not to support it, but my
real misgivings have to do with the existence of various excellent tools
that already do local-file transfers, and likely do it _much_ better
than Wget could hope to. Rsync springs readily to mind.

Even the system cp command is likely to handle things much better than
Wget. In particular, special OS-specific, extended file attributes,
extended permissions and the like, are among the things that existing
system tools probably handle quite well, and that Wget is unlikely to. I
don't really want Wget to be in the business of duplicating the system
cp command, but I might conceivably not mind file:// support if it
means simple _content_ transfer, and not actual file duplication.

Also in need of addressing is what recursion should mean for file://.
Between ftp:// and http://, recursion currently means different
things. In FTP, it means traverse the file hierarchy recursively,
whereas in HTTP it means traverse links recursively. I'm guessing
file:// should work like FTP (i.e., recurse when the path is a
directory, ignore HTML-ness), but anyway this is something that'd need
answering.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIvcLq7M8hyUobTrERAl6YAJ9xeTINVkuvl8HkElYlQt7dAsUfHACfXRT3
lNR++Q0XMkcY4c6dZu0+gi4=
=mKqj
-END PGP SIGNATURE-


  Make the switch to the world#39;s best email. Get Yahoo!7 Mail! 
http://au.yahoo.com/y7mail

Re: Support for file://

2008-09-22 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

David wrote:
 
 Hi Micah,
 
 Your're right - this was raised before and in fact it was a feature
 Mauro Tortonesi intended to be implemented for the 1.12 release, but it
 seems to have been forgotten somewhere along the line. I wrote to the
 list in 2006 describing what I consider a compelling reason to support
 file:// file:///. Here is what I wrote then:
 
 At 03:45 PM 26/06/2006, David wrote:
 In replies to the post requesting support of the file:// scheme,
 requests were made for someone to provide a compelling reason to want to
 do this. Perhaps the following is such a reason.
 I have a CD with HTML content (it is a CD of abstracts from a scientific
 conference), however for space reasons not all the content was included
 on the CD - there remain links to figures and diagrams on a remote web
 site. I'd like to create an archive of the complete content locally by
 having wget retrieve everything and convert the links to point to the
 retrieved material. Thus the wget functionality when retrieving the
 local files should work the same as if the files were retrieved from a
 web server (i.e. the input local file needs to be processed, both local
 and remote content retrieved, and the copies made of the local and
 remote files all need to be adjusted to now refer to the local copy
 rather than the remote content). A simple shell script that runs cp or
 rsync on local files without any further processing would not achieve
 this aim.

Fair enough. This example at least makes sense to me. I suppose it can't
hurt to provide this, so long as we document clearly that it is not a
replacement for cp or rsync, and is never intended to be (won't handle
attributes and special file properties).

However, support for file:// will introduce security issues, care is needed.

For instance, file:// should never be respected when it comes from the
web. Even on the local machine, it could be problematic to use it on
files writable by other users (as they can then craft links to download
privileged files with upgraded permissions). Perhaps files that are only
readable for root should always be skipped, or wget should require a
--force sort of option if the current mode can result in more
permissive settings on the downloaded file.

Perhaps it would be wise to make this a configurable option. It might
also be prudent to enable an option for file:// to be disallowed for root.

https://savannah.gnu.org/bugs/?24347

If any of you can think of additional security issues that will need
consideration, please add them in comments to the report.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFI19aE7M8hyUobTrERAt49AJ4irLGMd6OVRWeooKPqZxmX0+K2agCfaq2d
Mx9IgSo5oUDQgBPD01mcGcY=
=sdAZ
-END PGP SIGNATURE-


Re: Support for file://

2008-09-20 Thread Michelle Konzack
Hello Micah,

Am 2008-09-02 15:49:15, schrieb Micah Cowan:
 I think I'd need some convincing on this, as well as a clear definition
 of what the scope for such a feature ought to be. Unlike curl, which
 groks urls, Wget W(eb)-gets, and file:// can't really be argued to
 be part of the web.

Right but...

 That in and of itself isn't really a reason not to support it, but my
 real misgivings have to do with the existence of various excellent tools
 that already do local-file transfers, and likely do it _much_ better
 than Wget could hope to. Rsync springs readily to mind.
 
 Even the system cp command is likely to handle things much better than
 Wget. In particular, special OS-specific, extended file attributes,
 extended permissions and the like, are among the things that existing
 system tools probably handle quite well, and that Wget is unlikely to. I
 don't really want Wget to be in the business of duplicating the system
 cp command, but I might conceivably not mind file:// support if it
 means simple _content_ transfer, and not actual file duplication.
 
 Also in need of addressing is what recursion should mean for file://.
 Between ftp:// and http://, recursion currently means different
 things. In FTP, it means traverse the file hierarchy recursively,
 whereas in HTTP it means traverse links recursively. I'm guessing
 file:// should work like FTP (i.e., recurse when the path is a
 directory, ignore HTML-ness), but anyway this is something that'd need
 answering.

Imagine you have a local mirror of your website and you want to know why
the site @HOSTINGPROVIDER has some files more or such.

You can spider the website @HOSTINGPROVIDER recursiv in a  local  tmp1
directory and then, with the same commandline, you can do the same  with
the local mirror and download the files recursive into tmp2 and  now
you and now you can make a recursive fs-diff and know  which  files  are
used...  on both, the local mirror and @HOSTINGPROVIDER

I was searching such feature several times and currently the only way is
to install a Webserver local which not always possibel.

Maybe this is a discussion worth?

Greetings
Michelle

-- 
Linux-User #280138 with the Linux Counter, http://counter.li.org/ 


signature.pgp
Description: Digital signature


Re: Support for file://

2008-09-20 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Michelle Konzack wrote:
 Imagine you have a local mirror of your website and you want to know why
 the site @HOSTINGPROVIDER has some files more or such.
 
 You can spider the website @HOSTINGPROVIDER recursiv in a  local  tmp1
 directory and then, with the same commandline, you can do the same  with
 the local mirror and download the files recursive into tmp2 and  now
 you and now you can make a recursive fs-diff and know  which  files  are
 used...  on both, the local mirror and @HOSTINGPROVIDER

I'm confused. If you can successfully download the files from
HOSTINGPROVIDER in the first place, then why would a difference exist?
And if you can't, then this wouldn't be an effective way to find out.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFI1dYe7M8hyUobTrERAuuyAJ9m3ArCqxG4orhAQuEM010yWv6ScwCfaE9h
jXIjJ+XUjBYwyBdi8NB/rEY=
=NDnR
-END PGP SIGNATURE-


Re: Support for file://

2008-09-02 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Petri Koistinen wrote:
 Hi,
 
 I would be nice if wget would also support file://.

Feel free to file an issue for this (I'll mark it Needs Discussion and
set at low priority). I'd thought there was already an issue for this,
but can't find it (either open or closed). I know this has come up
before, at least.

I think I'd need some convincing on this, as well as a clear definition
of what the scope for such a feature ought to be. Unlike curl, which
groks urls, Wget W(eb)-gets, and file:// can't really be argued to
be part of the web.

That in and of itself isn't really a reason not to support it, but my
real misgivings have to do with the existence of various excellent tools
that already do local-file transfers, and likely do it _much_ better
than Wget could hope to. Rsync springs readily to mind.

Even the system cp command is likely to handle things much better than
Wget. In particular, special OS-specific, extended file attributes,
extended permissions and the like, are among the things that existing
system tools probably handle quite well, and that Wget is unlikely to. I
don't really want Wget to be in the business of duplicating the system
cp command, but I might conceivably not mind file:// support if it
means simple _content_ transfer, and not actual file duplication.

Also in need of addressing is what recursion should mean for file://.
Between ftp:// and http://, recursion currently means different
things. In FTP, it means traverse the file hierarchy recursively,
whereas in HTTP it means traverse links recursively. I'm guessing
file:// should work like FTP (i.e., recurse when the path is a
directory, ignore HTML-ness), but anyway this is something that'd need
answering.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIvcLq7M8hyUobTrERAl6YAJ9xeTINVkuvl8HkElYlQt7dAsUfHACfXRT3
lNR++Q0XMkcY4c6dZu0+gi4=
=mKqj
-END PGP SIGNATURE-