Hi,
I upgraded to htdig to htdig-3.2.0-2.011302, and I still had problems.

I also added the trailing slash in my htdig.conf file
start_url:
http://prodbrass.web.boeing.com/mhonarchive/test-ulysses-final/
limit_urls_to:  ${start_url}

The other defines are the default stuff.  Basically, rundig wouldn't index
to the 
the subdirectory of /test-ulysses-final.

I did the rundig -c /usr/htdig/htdig.conf -vvvv
The following output is what I got: (Sorry that I attached a long output
excerpt, 
I just want to make sure I include all details)
-----------------------------------------------------
ht://dig Start Time: Thu Aug  8 14:42:21 2002
        1:1:http://prodbrass.web.boeing.com/mhonarchive/test-ulysses-final/
New server: prodbrass.web.boeing.com, 80
 - Persistent connections: enabled
 - HEAD before GET: disabled
 - Timeout: 30
 - Connection space: 0
 - Max Documents: -1
 - TCP retries: 1
 - TCP wait time: 5
 - Accept-Language: 
Trying to retrieve robots.txt file
Making HTTP request on http://prodbrass.web.boeing.com/robots.txt
Header line: HTTP/1.1 404 Not Found
Header line: Date: Thu, 08 Aug 2002 21:42:21 GMT
Header line: Server: Apache/1.3.20 (Unix)  (Red-Hat/Linux) mod_python/2.7.6
Python/1.5.2 mod_ssl/2.8.4 OpenSSL/0.9.6b DAV/1.0.2 PHP/4.2.2
mod_perl/1.24_01 mod_throttle/3.1.2
Header line: Connection: close
Header line: Transfer-Encoding: chunked
Header line: Content-Type: text/html; charset=iso-8859-1
No modification time returned: assuming now
Retrieving document /robots.txt on host: prodbrass.web.boeing.com:80
Http version      : HTTP/1.1
Server            : HTTP/1.1
Status Code       : 404
Reason            : Not Found
Access Time       : Thu, 08 Aug 2002 21:42:21 GMT
Modification Time : Thu, 08 Aug 2002 21:42:21 GMT
Content-type      : text/html; charset=iso-8859-1
Transfer-encoding : chunked
Connection        : close
Request time: 0 secs
 pushed
pick: prodbrass.web.boeing.com, # servers = 1
> prodbrass.web.boeing.com with a traditional HTTP connection
0:2:0:http://prodbrass.web.boeing.com/mhonarchive/test-ulysses-final/:
Making HTTP request on
http://prodbrass.web.boeing.com/mhonarchive/test-ulysses-final/
Header line: HTTP/1.1 200 OK
Header line: Date: Thu, 08 Aug 2002 21:42:26 GMT
Header line: Server: Apache/1.3.20 (Unix)  (Red-Hat/Linux) mod_python/2.7.6
Python/1.5.2 mod_ssl/2.8.4 OpenSSL/0.9.6b DAV/1.0.2 PHP/4.2.2
mod_perl/1.24_01 mod_throttle/3.1.2
Header line: X-Powered-By: PHP/4.2.2
Discarded header line: X-Powered-By: PHP/4.2.2
Header line: Connection: close
Header line: Transfer-Encoding: chunked
Header line: Content-Type: text/html
No modification time returned: assuming now
Retrieving document /mhonarchive/test-ulysses-final/ on host:
prodbrass.web.boeing.com:80
Http version      : HTTP/1.1
Server            : HTTP/1.1
Status Code       : 200
Reason            : OK
Access Time       : Thu, 08 Aug 2002 21:42:26 GMT
Modification Time : Thu, 08 Aug 2002 21:42:26 GMT
Content-type      : text/html
Transfer-encoding : chunked
Connection        : close
Request time: 5 secs
Tag: <html lang="en">, matched -1
Tag: <head>, matched -1
Tag: <meta http-equiv="Content-Type" content="text/html;
charset=iso-8859-1">, matched 20
Tag: <TITLE>, matched 0
word: BRASS@1
word: Exiting@2
word: with@3
word: Error@4
Tag: </TITLE>, matched 1

title: BRASS: Exiting with Error
Tag: <SCRIPT language="JavaScript">, matched 29
Tag: </SCRIPT>, matched 30
Tag: <style type="text/css">, matched 27
Tag: </style>, matched 28
Tag: </HEAD>, matched -1
Tag: <body text="#333333" link="#6666aa" alink="#aa6666" vlink="#6666aa"
bgcolor="#aacccc" leftmargin="0" rightmargin="0" topmargin="0"
bottommargin="0" marginwidth="0" marginheight="0">, matched -1
Tag: <table width="100%" cellpadding="0" cellspacing="0" border="0"
bgcolor="#CCCCCC">, matched -1
Tag: <tr>, matched -1
Tag: <td valign="middle" align="left" bgcolor="#6C7198">, matched -1
Tag: </SPAN>, matched -1
Tag: </td>, matched -1
Tag: </tr>, matched -1
Tag: </table>, matched -1
Tag: <table width="100%" cellpadding="0" cellspacing="0" border="0">,
matched -1
Tag: <tbody>, matched -1
Tag: <tr>, matched -1
Tag: <td valign="center" align="center" width="100%" bgcolor="#d5d7d9"
background="/images/steel3.jpg">, matched -1
Tag: <img src="/images/BRASSstretched.gif" ALIGN=left width="100"
height="50" hspace="0"
                border="0" alt=" Boeing BRASS logo ">, matched 18
word: Boeing@1
word: BRASS@2
word: logo@3
image: http://prodbrass.web.boeing.com/images/BRASSstretched.gif
Tag: <b>, matched -1
Tag: </b>, matched -1
word: UNCLASSIFIED@5
Tag: <p>, matched -1
Tag: <img src="/images/flying_flag.gif" width="30" height="26" hspace="0"
border="0" alt=" GOD BLESS AMERICA ">, matched 18
word: GOD@1
word: BLESS@2
word: AMERICA@3
image: http://prodbrass.web.boeing.com/images/flying_flag.gif
Tag: </a>, matched 3
Tag: </a>, matched 3
Tag: </p>, matched -1
Tag: </td>, matched -1
Tag: <td valign="center" align="left" bgcolor="#d5d7d9"
background="/images/steel3.jpg">, matched -1
Tag: <img src="/images/Boeing_oneline_logo.gif" width="100" height="60"
hspace="0" border="0" alt=" Boeing logo ">, matched 18
word: Boeing@1
word: logo@2
image: http://prodbrass.web.boeing.com/images/Boeing_oneline_logo.gif
Tag: </a>, matched 3
Tag: </td>, matched -1
Tag: </tr>, matched -1
Tag: </tbody>, matched -1
Tag: </table>, matched -1
Tag: <img src="/images/blank.gif" width="100" height="5" alt="">, matched 18
image: http://prodbrass.web.boeing.com/images/blank.gif
Tag: <br>, matched -1
Tag: <CENTER>, matched -1
Tag: <table cellpadding="0" cellspacing="0" border="0" width="99%">, matched
-1
Tag: <tr>, matched -1
Tag: <td
background="//prodbrass.web.boeing.com//themes/forged/images/tbar1.png"
width="1%" height="17">, matched -1
Tag: <IMG src="//prodbrass.web.boeing.com//themes/forged/images/tleft1.png"
border=0 width=17 height=17>, matched 18
image: http://prodbrass.web.boeing.com/themes/forged/images/tleft1.png
Tag: </td>, matched -1
Tag: <td
background="//prodbrass.web.boeing.com//themes/forged/images/tbar1.png"
align="center" colspan="3" width="99%">, matched -1
Tag: <IMG src="//prodbrass.web.boeing.com//themes/forged/images/tbar1.png"
border=0 width=1 height=17>, matched 18
image: http://prodbrass.web.boeing.com/themes/forged/images/tbar1.png
Tag: </td>, matched -1
Tag: <td>, matched -1
Tag: <IMG src="//prodbrass.web.boeing.com//themes/forged/images/tright1.png"
border=0 width=17 height=17>, matched 18
image: http://prodbrass.web.boeing.com/themes/forged/images/tright1.png
Tag: </td>, matched -1
Tag: </tr>, matched -1
Tag: <tr>, matched -1
Tag: <td width="17"
background="//prodbrass.web.boeing.com//themes/forged/images/leftbar1.png"
align="left" valign="bottom">, matched -1
Tag: <IMG
src="//prodbrass.web.boeing.com//themes/forged/images/leftbar1.png" border=0
width=17 height=25>, matched 18
image: http://prodbrass.web.boeing.com/themes/forged/images/leftbar1.png
Tag: </td>, matched -1
Tag: <td colspan="3" bgcolor="#ffffff">, matched -1
Tag: <table cellpadding="0" cellspacing="0" border="0" width="100%">,
matched -1
Tag: <tr>, matched -1
Tag: <td width="141"
background="//prodbrass.web.boeing.com//themes/forged/images/steel3.jpg"
bgcolor="#cfd1d4" align="left" valign="top">, matched -1
Tag: <table cellpadding="0" cellspacing="0" border="0" width="140">, matched
-1
Tag: <tr>, matched -1
Tag: <td align="left" valign="middle">, matched -1
Tag: <b>, matched -1
word: Status@6
Tag: </b>, matched -1
Tag: <br>, matched -1
Tag: </td>, matched -1
Tag: </tr>, matched -1
Tag: <tr>, matched -1
Tag: <td align="right" valign="middle">, matched -1
Tag: <h4>, matched 7
Tag: <FONT COLOR="#990000">, matched -1
word: NOT@7
word: LOGGED@8
Tag: </h4>, matched 13
Tag: <A class="menus" href="/account/login.php">, matched 2
word: Login@9
Tag: </A>, matched 3
href: http://prodbrass.web.boeing.com/account/login.php (Login)

   Rejected: URL not in the limits!
url rejected: (level 1)http://prodbrass.web.boeing.com/account/login.php
Tag: <IMG src="//prodbrass.web.boeing.com//themes/forged/images/point1.png"
border=0 width=7 height=7>, matched 18
image: http://prodbrass.web.boeing.com/themes/forged/images/point1.png
Tag: <br>, matched -1
Tag: <A class="menus" href="/account/register.php">, matched 2
word: Register@10
Tag: </A>, matched 3
href: http://prodbrass.web.boeing.com/account/register.php (Register Me)

   Rejected: URL not in the limits!
url rejected: (level 1)http://prodbrass.web.boeing.com/account/register.php
Tag: <IMG src="//prodbrass.web.boeing.com//themes/forged/images/point1.png"
border=0 width=7 height=7>, matched 18
image: http://prodbrass.web.boeing.com/themes/forged/images/point1.png
Tag: <br>, matched -1
Tag: <BR>, matched -1
Tag: </td>, matched -1
Tag: </tr>, matched -1
Tag: </table>, matched -1
Tag: <table cellpadding="0" cellspacing="0" border="0" width="140">, matched
-1
Tag: <tr>, matched -1
Tag: <td align="left" valign="middle">, matched -1
Tag: <b>, matched -1
Tag: </b>, matched -1
Tag: <br>, matched -1
Tag: </td>, matched -1
Tag: </tr>, matched -1
Tag: <tr>, matched -1
Tag: <td align="right" valign="middle">, matched -1
Tag: <A class="menus" href="/doc/site/">, matched 2
word: Site@11
word: Docs@12
Tag: </A>, matched 3
href: http://prodbrass.web.boeing.com/doc/site/ (Site Docs)

   Rejected: URL not in the limits!
url rejected: (level 1)http://prodbrass.web.boeing.com/doc/site/
Tag: <IMG src="//prodbrass.web.boeing.com//themes/forged/images/point1.png"
border=0 width=7 height=7>, matched 18
image: http://prodbrass.web.boeing.com/themes/forged/images/point1.png
Tag: <br>, matched -1
Tag: <A class="menus" href="/aboutbrass.php">, matched 2
word: About@13
word: BRASS@14
Tag: </A>, matched 3
href: http://prodbrass.web.boeing.com/aboutbrass.php (About BRASS)

   Rejected: URL not in the limits!
url rejected: (level 1)http://prodbrass.web.boeing.com/aboutbrass.php
Tag: <IMG src="//prodbrass.web.boeing.com//themes/forged/images/point1.png"
border=0 width=7 height=7>, matched 18
image: http://prodbrass.web.boeing.com/themes/forged/images/point1.png
Tag: <br>, matched -1
Tag: <A class="menus"
href="/sendmessage.php?[EMAIL PROTECTED]">, matched 2
word: Contact@15
word: BRASS@16
Tag: </A>, matched 3
href:
http://prodbrass.web.boeing.com/sendmessage.php?[EMAIL PROTECTED]
eing.com (Contact BRASS)

   Rejected: Extension is invalid!
url rejected: (level
1)http://prodbrass.web.boeing.com/sendmessage.php?[EMAIL PROTECTED].
boeing.com
Tag: <IMG src="//prodbrass.web.boeing.com//themes/forged/images/point1.png"
border=0 width=7 height=7>, matched 18
image: http://prodbrass.web.boeing.com/themes/forged/images/point1.png
Tag: <br>, matched -1
Tag: <BR>, matched -1
Tag: </td>, matched -1
Tag: </tr>, matched -1
Tag: </table>, matched -1
Tag: <P>, matched -1
Tag: </TD>, matched -1
Tag: <td width="20"
background="//prodbrass.web.boeing.com//themes/forged/images/fade1.png"
nowrap>, matched -1
Tag: </td>, matched -1
Tag: <td valign="top" bgcolor="#FFFFFF" width="99%">, matched -1
Tag: <BR>, matched -1
Tag: <H2>, matched 5
Tag: <font color="#FF3333">, matched -1
word: PERMISSION@17
Tag: </font>, matched -1
word: DENIED@18
Tag: </H2>, matched 11
Tag: <P>, matched -1
word: Need@19
word: login@20
word: view@21
word: this@22
word: page.@23
Tag: <p>, matched -1
Tag: </p>, matched -1
Tag: </td>, matched -1
Tag: <td width="9" bgcolor="#FFFFFF">, matched -1
Tag: </td>, matched -1
Tag: </tr>, matched -1
Tag: </table>, matched -1
Tag: </td>, matched -1
Tag: <td width="17"
background="//prodbrass.web.boeing.com//themes/forged/images/rightbar1.png"
align="right" valign="bottom">, matched -1
Tag: <IMG
src="//prodbrass.web.boeing.com//themes/forged/images/rightbar1.png"
border=0 width=17 height=17>, matched 18
image: http://prodbrass.web.boeing.com/themes/forged/images/rightbar1.png
Tag: </td>, matched -1
Tag: </tr>, matched -1
Tag: <tr>, matched -1
Tag: <td
background="//prodbrass.web.boeing.com//themes/forged/images/bbar1.png"
height="17">, matched -1
Tag: <IMG src="//prodbrass.web.boeing.com//themes/forged/images/bleft1.png"
border=0 width=17 height=17>, matched 18
image: http://prodbrass.web.boeing.com/themes/forged/images/bleft1.png
Tag: </td>, matched -1
Tag: <td
background="//prodbrass.web.boeing.com//themes/forged/images/bbar1.png"
align="center" colspan="3">, matched -1
Tag: <IMG src="//prodbrass.web.boeing.com//themes/forged/images/bbar1.png"
border=0 width=1 height=17>, matched 18
image: http://prodbrass.web.boeing.com/themes/forged/images/bbar1.png
Tag: </td>, matched -1
Tag: <td
background="//prodbrass.web.boeing.com//themes/forged/images/bbar1.png"
bgcolor="#7c8188">, matched -1
Tag: <IMG src="//prodbrass.web.boeing.com//themes/forged/images/bright1.png"
border=0 width=17 height=17>, matched 18
image: http://prodbrass.web.boeing.com/themes/forged/images/bright1.png
Tag: </td>, matched -1
Tag: </tr>, matched -1
Tag: </table>, matched -1
Tag: <P>, matched -1
Tag: <A
HREF="/source.php?page_url=/mhonarchive/test-ulysses-final/index.php">,
matched 2
Tag: <B>, matched -1
Tag: <FONT COLOR="BLACK">, matched -1
Tag: </FONT>, matched -1
word: UNCLASSIFIED@24
Tag: </B>, matched -1
Tag: </A>, matched 3
href:
http://prodbrass.web.boeing.com/source.php?page_url=/mhonarchive/test-ulysse
s-final/index.php (UNCLASSIFIED)

   Rejected: URL not in the limits!
url rejected: (level
1)http://prodbrass.web.boeing.com/source.php?page_url=/mhonarchive/test-ulys
ses-final/index.php
Tag: <P>, matched -1
Tag: <P class="footer">, matched -1
Tag: <BR>, matched -1
Tag: </body>, matched -1
Tag: </html>, matched -1
 size = 8015
pick: prodbrass.web.boeing.com, # servers = 1
> prodbrass.web.boeing.com with a traditional HTTP connection
ht://dig End Time: Thu Aug  8 14:42:26 2002
ID: 2 URL: http://prodbrass.web.boeing.com/mhonarchive/test-ulysses-final/


Preamble text:


Postamble text:
Note: This message will be sent again if you do not change or
take away the notification of the above mentioned HTML page.

Find out more about the notification service at

    http://www.htdig.org/meta.html

Cheers!

ht://Dig Notification Service

Thank you so much!

Mary 

-----Original Message-----
From: Gilles Detillieux [mailto:[EMAIL PROTECTED]]
Sent: 08 August, 2002 1:59 PM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: [htdig] rundig won't dig URLS

The "Moved Permanently" message is the standard description of a 301
return code, which is caused by a redirect.  It's nothing to worry about,
as htdig does handle redirects.  It's also standard procedure for a web
server, when given a request for the URL of a directory that's missing
the trailing slash, to give the client a redirect to the corrected URL
with the trailing slash.  This is to prevent the client from having
problems interpreting links relative to that directory.

You should have the trailing slash on your start_url to avoid the extra
redirect.  It's abscence shouldn't be a problem, but it does cause
unnecessary extra traffic.

Unfortunately, your output excerpt ends just when it gets interesting,
right after htdig fetches the "text-ulysses-final/" directory listing.
Was there any output after that?  htdig should have parsed a bunch of
hrefs at that point.

I did notice that your server is using chunked encoding.  I think there
were problems with reading chunked input as recently as the Feb. 3/02
snapshot of 3.2.0b4.  You didn't mention which one you're running, but
if it's less recent than Feb. 10, you may want to upgrade.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to
<[EMAIL PROTECTED]> with a subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to