Wget query (very urgent)

2001-04-23 Thread Inderbir

Hi,

I'am Inderbir working for an IT-enabler in CA.

Wanted to key-in a few words of praise for Wget. I've found the utility
really handy for mirroring site contents.

I had the following queries regarding its functioality though.

1. Is Wget capable of extracting html files, image files etc., called from a
javascript function embedded in the html, to the new (mirror) site?
Eg.: MM_showHideLayers is a javascript function used for swapping images.
And it is called in the html as :

a href=xyz.htm
onMouseOver=MM_showHideLayers('whatsnew','','hide','questions','','show');
MM_swapImage('Bquest','','images/img2.gif',1)
onMouseOut=MM_showHideLayers('questions','','hide');
MM_swapImgRestore() 

img src=images/img.gif width=154 height=20 name=Bquest border=0
alt=Questions?
/a

Can wget also copy the image file (images/img2.gif) to the mirror site?


2. Wget, when run on a symbolic link to the actual physical directory,
copies the whole site as is on the server, without considering whether the
links are active/inactive.


3. Does Wget create its own temporary files. We've run it on a symbolic link
for site mirroring, No. of files the original site has is 9500 and 385
directories. After execution the mirrored site contains abt. 12500 files in
385 directoris. Are these aditional files temporary in nature.

I shall appreciate if you could get me answers on the above-mentioned
queries. I require them very urgently.

Regards,
Inderbir.




Re: Wget query (very urgent)

2001-04-23 Thread Hrvoje Niksic

Inderbir [EMAIL PROTECTED] writes:

 The second and the third queries are related to Wget on Linux.
 
 We've run Wget on a link to a directory structure in Linux and found
 that the mirror site generated has all the files which physically
 reside in the original site's directory structure. Regardless of
 whether they are active or inactive links.

I assume you're mirroring part of the contents under the web server
root via HTTP.  You seem to say that Wget somehow copied the files
that are not linked from anywhere.  This is very unlikely, since Wget
has no way of knowing which files are available, unless it encounters
a link.

I see two possible explanations:

1. The files are linked, but the link is not apparent.  Maybe the links
   are commented out?  Wget 1.6 doesn't support HTML comments, so that
   would explain it following them.

2. Your HTTP server automatically generates an HTML index of the
   directories where `index.html' is not present.  (Apache does this
   by default.)  If such a directory is reached, Wget will copy all
   the files simply because the links are kindly provided by the
   server-generated index page.

   This would also explain the temporary files ?D=A, etc., because
   links of that name are created by Apache when it generates the
   directory listing.