Hi,
I've just installed htdig to index our intranet site and I'm having
trouble getting it to follow links
I can replace the start_url with http://www.htdig.org and it works fine,
but when I replace it with http://intranet.eoc.org.uk it doesn't follow
any links in the index document.
Our root directory contains the following files:
index.html
left_index.html <------ contains most of the links
header_index.html
body_index.html
footer_index.html
All other documents are in sub-directory off the root called "html".
Here's what I've changed in my config file:
start_url: http://intranet.eoc.org.uk
limit_urls_to: http://intranet.eoc.org.uk
And here's the contents of http://intranet.eoc.org.uk/index.html:
=======begin========
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 FINAL//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
<META NAME="Author" CONTENT="EOC">
<META NAME="Generator" CONTENT="NetObjects Fusion 4.0 for Windows">
<TITLE>EOC Intranet</TITLE>
</HEAD>
<FRAMESET BORDER=0 FRAMESPACING=0 FRAMEBORDER=0 COLS="136,*">
<FRAME NAME="left" SRC="left_index.html" SCROLLING=AUTO
MARGINWIDTH="2" MARGINHEIGHT="1" FRAMEBORDER=NO BORDER="0" NORESIZE>
<FRAMESET BORDER=0 FRAMESPACING=0 FRAMEBORDER=0 ROWS="118,*,46">
<FRAME NAME="header" SRC="header_index.html" SCROLLING=AUTO
MARGINWIDTH="2" MARGINHEIGHT="1" FRAMEBORDER=NO BORDER="0" NORESIZE>
<FRAME NAME="body" SRC="body_index.html" SCROLLING=AUTO
MARGINWIDTH=2 MARGINHEIGHT=2>
<FRAME NAME="footer" SRC="footer_index.html" SCROLLING=AUTO
MARGINWIDTH="2" MARGINHEIGHT="1" FRAMEBORDER=NO BORDER="0" NORESIZE>
</FRAMESET>
</FRAMESET>
</HTML>
This is the results of rundig -vvv:
=======begin========
1:0:http://intranet.eoc.org.uk/
New server: intranet.eoc.org.uk, 80
Retrieval command for http://intranet.eoc.org.uk/robots.txt: GET
/robots.txt HTTP/1.0
User-Agent: htdig/3.1.2 ([EMAIL PROTECTED])
Host: intranet.eoc.org.uk
Header line: HTTP/1.1 404 Not Found
Header line: Date: Fri, 11 Jun 1999 16:04:54 GMT
Header line: Server: Apache/1.3.6 (Unix) PHP/3.0.7
Header line: Connection: close
Header line: Content-Type: text/html
Header line:
returnStatus = 1
pushed
pick: intranet.eoc.org.uk, # servers = 1
0:0:0:http://intranet.eoc.org.uk/: Retrieval command for
http://intranet.eoc.org.uk/: GET / HTTP/1.0
User-Agent: htdig/3.1.2 ([EMAIL PROTECTED])
Host: intranet.eoc.org.uk
Header line: HTTP/1.1 200 OK
Header line: Date: Fri, 11 Jun 1999 16:04:54 GMT
Header line: Server: Apache/1.3.6 (Unix) PHP/3.0.7
Header line: Last-Modified: Fri, 11 Jun 1999 14:42:02 GMT
Translated Fri, 11 Jun 1999 14:42:02 GMT to 11 Jun 1999 14:42:02 (99)
And converted to Fri, 11 Jun 1999 14:42:02
Header line: ETag: "13619-3b4-3761203a"
Header line: Accept-Ranges: bytes
Header line: Content-Length: 948
Header line: Connection: close
Header line: Content-Type: text/html
Header line:
returnStatus = 0
Read 948 from document
Read a total of 948 bytes
title: EOC Intranet
href: http://intranet.eoc.org.uk/left_index.html ()
resolving 'http://intranet.eoc.org.uk/left_index.html'
href: http://intranet.eoc.org.uk/header_index.html ()
esolving 'http://intranet.eoc.org.uk/header_index.html'
href: http://intranet.eoc.org.uk/body_index.html ()
resolving 'http://intranet.eoc.org.uk/body_index.html'
href: http://intranet.eoc.org.uk/footer_index.html ()
resolving 'http://intranet.eoc.org.uk/footer_index.html'
size = 948
pick: intranet.eoc.org.uk, # servers = 1
htmerge: Sorting...
htmerge: Merging...
0/http://intranet.eoc.org.uk/
========end=========
It appears to be seeing the URI's but not following them, or am I
mis-reading the log?
Also, if I change my start_url to
http://intranet.eoc.org.uk/left_index.html then I get the same output,
ie it seems to start at http://intranet.eoc.org.uk/ rather than
http://intranet.eoc.org.uk/left_index.html
Further to that, if I change my start_url to
http://intranet.eoc.org.uk/html/pay.html then it appears to work OK, ie
the links in pay.html are extracted and retrieved (except those in the
root directory, ie left_index.hmtl, header_index.html, etc.
Can anyone suggest where the problem might lie?
Thanks,
R.
--
Robin Bowes - System Development Manager - Room 405A
E.O.C., Overseas House, Quay St., Manchester, M3 3HN, UK.
Tel: +44 161 838 8321 Fax: +44 161 835 1657
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.