wget  

Inconsistent link translation; wrong folder destination

Belov, Charles
Thu, 03 Mar 2005 15:39:28 -0800

I am having a problem in which absolute / links are being inconsistently 
translated into relative links. Furthermore, files are going into the wrong 
directory.

Executing wget 1.9.1 from /www/htdocs/cms/dpt:
 
wget --input-file=[filename1] --output-file=[filename2] --verbose 
--timestamping --limit-rate=20k --wait=2 --random-wait --no-host-directories 
--html-extension --convert-links --restrict-file-names='windows' --debug 
 
With a 2-line input file containing: 
 
http://www.sfgov.org/site/dpt_index.asp
http://www.sfgov.org/site/dpt_index.asp?id=13438
 
incorrectly results in pages:
 
http://www.sfmuni.com/cms/dpt/dpt_index.asp.html
http://www.sfmuni.com/cms/dpt/[EMAIL PROTECTED]

where I was expecting:

http://www.sfmuni.com/cms/dpt/site/dpt_index.asp.html
http://www.sfmuni.com/cms/dpt/site/[EMAIL PROTECTED]

although I suppose I could live with that if the link translation had worked.
 
The right column on each page contains a series of links in the heading 
"Explore". Under this heading:
 
Page http://www.sfmuni.com/cms/dpt/[EMAIL PROTECTED] has a good "Home" link 
which correctly links to http://www.sfmuni.com/cms/dpt/dpt_index.asp.html 
(original source code link of  "/site/dpt_index.asp" was correctly translated 
to new source code link of "dpt_index.asp.html") and a good self-referential 
link to "About us" as well.
 
However,
 
Page http://www.sfmuni.com/cms/dpt/dpt_index.asp.html has a bad "About us" 
link, which incorrectly links to 
http://www.sfmuni.com/www/htdocs/cms/dpt/[EMAIL PROTECTED] (original source 
code link of "/site/dpt_index.asp?id=13438" was incorrectly translated to new 
source code link of "/www/htdocs/cms/dpt/[EMAIL PROTECTED]" where I would 
expect a working "[EMAIL PROTECTED]"). The self-referential link to "Home" is 
similarly bad.
 
So it appears the links on http://www.sfmuni.com/cms/dpt/dpt_index.asp.html 
were not correctly translated into relative links, but links on 
http://www.sfgov.org/site/dpt_index.asp?id=13438 work. 

Any idea how to work around this problem (at least the link problem, if not the 
destination problem)?

Charles "Chas" Belov
 
Relevant lines from the debug file:
 
--14:42:13--  http://www.sfgov.org/site/dpt_index.asp
           => `dpt_index.asp'
...
 --14:42:14--  http://www.sfgov.org/site/dpt_index.asp?id=13438
           => [EMAIL PROTECTED]'
...
 
FINISHED --14:42:16--
Downloaded: 37,524 bytes in 2 files
Scanning dpt_index.asp.html (from http://www.sfgov.org/site/dpt_index.asp)
...
dpt_index.asp.html: merge("http://www.sfgov.org/site/dpt_index.asp";, 
"/site/dpt_inde
x.asp") -> http://www.sfgov.org/site/dpt_index.asp
appending "http://www.sfgov.org/site/dpt_index.asp"; to urlpos.
...
dpt_index.asp.html: merge("http://www.sfgov.org/site/dpt_index.asp";, 
"/site/dpt_inde
x.asp?id=13438") -> http://www.sfgov.org/site/dpt_index.asp?id=13438
appending "http://www.sfgov.org/site/dpt_index.asp?id=13438"; to urlpos.
...
will convert url http://www.sfgov.org/site/dpt_index.asp to local 
dpt_index.asp.html
will convert url http://www.sfgov.org/images/spacer.gif to complete
will convert url http://www.sfgov.org/site/dpt_index.asp?id=13438 to local 
dpt_index
[EMAIL PROTECTED]
...
Converting dpt_index.asp.html... TO_COMPLETE: <something> to 
http://www.sfgov.org/sc
ripts/main.css at position 112 in dpt_index.asp.html.
...
TO_RELATIVE: http://www.sfgov.org/site/dpt_index.asp to dpt_index.asp.html at 
positi
on 8274 in dpt_index.asp.html.
TO_COMPLETE: <something> to http://www.sfgov.org/images/spacer.gif at position 
8322
in dpt_index.asp.html.
TO_RELATIVE: http://www.sfgov.org/site/dpt_index.asp?id=13438 to [EMAIL 
PROTECTED]
438.html at position 8396 in dpt_index.asp.html.
...
Scanning [EMAIL PROTECTED] (from http://www.sfgov.org/site/dpt_index.asp?i
d=13438)
...
will convert url http://www.sfgov.org/site/dpt_index.asp to local 
dpt_index.asp.html
will convert url http://www.sfgov.org/images/spacer.gif to complete
will convert url http://www.sfgov.org/site/dpt_index.asp?id=13438 to local 
dpt_index
[EMAIL PROTECTED]
...
TO_RELATIVE: http://www.sfgov.org/site/dpt_index.asp to dpt_index.asp.html at 
positi
on 18574 in [EMAIL PROTECTED]
TO_COMPLETE: <something> to http://www.sfgov.org/images/spacer.gif at position 
18622
 in [EMAIL PROTECTED]
TO_RELATIVE: http://www.sfgov.org/site/dpt_index.asp?id=13438 to [EMAIL 
PROTECTED]
438.html at position 18696 in [EMAIL PROTECTED]
...
Converted 2 files in 0.07 seconds.


 
  • Inconsistent link translation; wrong folder destination Belov, Charles