Hi, I upgraded to htdig to htdig-3.2.0-2.011302, and I still had problems.
I also added the trailing slash in my htdig.conf file start_url: http://prodbrass.web.boeing.com/mhonarchive/test-ulysses-final/ limit_urls_to: ${start_url} The other defines are the default stuff. Basically, rundig wouldn't index to the the subdirectory of /test-ulysses-final. I did the rundig -c /usr/htdig/htdig.conf -vvvv The following output is what I got: (Sorry that I attached a long output excerpt, I just want to make sure I include all details) ----------------------------------------------------- ht://dig Start Time: Thu Aug 8 14:42:21 2002 1:1:http://prodbrass.web.boeing.com/mhonarchive/test-ulysses-final/ New server: prodbrass.web.boeing.com, 80 - Persistent connections: enabled - HEAD before GET: disabled - Timeout: 30 - Connection space: 0 - Max Documents: -1 - TCP retries: 1 - TCP wait time: 5 - Accept-Language: Trying to retrieve robots.txt file Making HTTP request on http://prodbrass.web.boeing.com/robots.txt Header line: HTTP/1.1 404 Not Found Header line: Date: Thu, 08 Aug 2002 21:42:21 GMT Header line: Server: Apache/1.3.20 (Unix) (Red-Hat/Linux) mod_python/2.7.6 Python/1.5.2 mod_ssl/2.8.4 OpenSSL/0.9.6b DAV/1.0.2 PHP/4.2.2 mod_perl/1.24_01 mod_throttle/3.1.2 Header line: Connection: close Header line: Transfer-Encoding: chunked Header line: Content-Type: text/html; charset=iso-8859-1 No modification time returned: assuming now Retrieving document /robots.txt on host: prodbrass.web.boeing.com:80 Http version : HTTP/1.1 Server : HTTP/1.1 Status Code : 404 Reason : Not Found Access Time : Thu, 08 Aug 2002 21:42:21 GMT Modification Time : Thu, 08 Aug 2002 21:42:21 GMT Content-type : text/html; charset=iso-8859-1 Transfer-encoding : chunked Connection : close Request time: 0 secs pushed pick: prodbrass.web.boeing.com, # servers = 1 > prodbrass.web.boeing.com with a traditional HTTP connection 0:2:0:http://prodbrass.web.boeing.com/mhonarchive/test-ulysses-final/: Making HTTP request on http://prodbrass.web.boeing.com/mhonarchive/test-ulysses-final/ Header line: HTTP/1.1 200 OK Header line: Date: Thu, 08 Aug 2002 21:42:26 GMT Header line: Server: Apache/1.3.20 (Unix) (Red-Hat/Linux) mod_python/2.7.6 Python/1.5.2 mod_ssl/2.8.4 OpenSSL/0.9.6b DAV/1.0.2 PHP/4.2.2 mod_perl/1.24_01 mod_throttle/3.1.2 Header line: X-Powered-By: PHP/4.2.2 Discarded header line: X-Powered-By: PHP/4.2.2 Header line: Connection: close Header line: Transfer-Encoding: chunked Header line: Content-Type: text/html No modification time returned: assuming now Retrieving document /mhonarchive/test-ulysses-final/ on host: prodbrass.web.boeing.com:80 Http version : HTTP/1.1 Server : HTTP/1.1 Status Code : 200 Reason : OK Access Time : Thu, 08 Aug 2002 21:42:26 GMT Modification Time : Thu, 08 Aug 2002 21:42:26 GMT Content-type : text/html Transfer-encoding : chunked Connection : close Request time: 5 secs Tag: <html lang="en">, matched -1 Tag: <head>, matched -1 Tag: <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">, matched 20 Tag: <TITLE>, matched 0 word: BRASS@1 word: Exiting@2 word: with@3 word: Error@4 Tag: </TITLE>, matched 1 title: BRASS: Exiting with Error Tag: <SCRIPT language="JavaScript">, matched 29 Tag: </SCRIPT>, matched 30 Tag: <style type="text/css">, matched 27 Tag: </style>, matched 28 Tag: </HEAD>, matched -1 Tag: <body text="#333333" link="#6666aa" alink="#aa6666" vlink="#6666aa" bgcolor="#aacccc" leftmargin="0" rightmargin="0" topmargin="0" bottommargin="0" marginwidth="0" marginheight="0">, matched -1 Tag: <table width="100%" cellpadding="0" cellspacing="0" border="0" bgcolor="#CCCCCC">, matched -1 Tag: <tr>, matched -1 Tag: <td valign="middle" align="left" bgcolor="#6C7198">, matched -1 Tag: </SPAN>, matched -1 Tag: </td>, matched -1 Tag: </tr>, matched -1 Tag: </table>, matched -1 Tag: <table width="100%" cellpadding="0" cellspacing="0" border="0">, matched -1 Tag: <tbody>, matched -1 Tag: <tr>, matched -1 Tag: <td valign="center" align="center" width="100%" bgcolor="#d5d7d9" background="/images/steel3.jpg">, matched -1 Tag: <img src="/images/BRASSstretched.gif" ALIGN=left width="100" height="50" hspace="0" border="0" alt=" Boeing BRASS logo ">, matched 18 word: Boeing@1 word: BRASS@2 word: logo@3 image: http://prodbrass.web.boeing.com/images/BRASSstretched.gif Tag: <b>, matched -1 Tag: </b>, matched -1 word: UNCLASSIFIED@5 Tag: <p>, matched -1 Tag: <img src="/images/flying_flag.gif" width="30" height="26" hspace="0" border="0" alt=" GOD BLESS AMERICA ">, matched 18 word: GOD@1 word: BLESS@2 word: AMERICA@3 image: http://prodbrass.web.boeing.com/images/flying_flag.gif Tag: </a>, matched 3 Tag: </a>, matched 3 Tag: </p>, matched -1 Tag: </td>, matched -1 Tag: <td valign="center" align="left" bgcolor="#d5d7d9" background="/images/steel3.jpg">, matched -1 Tag: <img src="/images/Boeing_oneline_logo.gif" width="100" height="60" hspace="0" border="0" alt=" Boeing logo ">, matched 18 word: Boeing@1 word: logo@2 image: http://prodbrass.web.boeing.com/images/Boeing_oneline_logo.gif Tag: </a>, matched 3 Tag: </td>, matched -1 Tag: </tr>, matched -1 Tag: </tbody>, matched -1 Tag: </table>, matched -1 Tag: <img src="/images/blank.gif" width="100" height="5" alt="">, matched 18 image: http://prodbrass.web.boeing.com/images/blank.gif Tag: <br>, matched -1 Tag: <CENTER>, matched -1 Tag: <table cellpadding="0" cellspacing="0" border="0" width="99%">, matched -1 Tag: <tr>, matched -1 Tag: <td background="//prodbrass.web.boeing.com//themes/forged/images/tbar1.png" width="1%" height="17">, matched -1 Tag: <IMG src="//prodbrass.web.boeing.com//themes/forged/images/tleft1.png" border=0 width=17 height=17>, matched 18 image: http://prodbrass.web.boeing.com/themes/forged/images/tleft1.png Tag: </td>, matched -1 Tag: <td background="//prodbrass.web.boeing.com//themes/forged/images/tbar1.png" align="center" colspan="3" width="99%">, matched -1 Tag: <IMG src="//prodbrass.web.boeing.com//themes/forged/images/tbar1.png" border=0 width=1 height=17>, matched 18 image: http://prodbrass.web.boeing.com/themes/forged/images/tbar1.png Tag: </td>, matched -1 Tag: <td>, matched -1 Tag: <IMG src="//prodbrass.web.boeing.com//themes/forged/images/tright1.png" border=0 width=17 height=17>, matched 18 image: http://prodbrass.web.boeing.com/themes/forged/images/tright1.png Tag: </td>, matched -1 Tag: </tr>, matched -1 Tag: <tr>, matched -1 Tag: <td width="17" background="//prodbrass.web.boeing.com//themes/forged/images/leftbar1.png" align="left" valign="bottom">, matched -1 Tag: <IMG src="//prodbrass.web.boeing.com//themes/forged/images/leftbar1.png" border=0 width=17 height=25>, matched 18 image: http://prodbrass.web.boeing.com/themes/forged/images/leftbar1.png Tag: </td>, matched -1 Tag: <td colspan="3" bgcolor="#ffffff">, matched -1 Tag: <table cellpadding="0" cellspacing="0" border="0" width="100%">, matched -1 Tag: <tr>, matched -1 Tag: <td width="141" background="//prodbrass.web.boeing.com//themes/forged/images/steel3.jpg" bgcolor="#cfd1d4" align="left" valign="top">, matched -1 Tag: <table cellpadding="0" cellspacing="0" border="0" width="140">, matched -1 Tag: <tr>, matched -1 Tag: <td align="left" valign="middle">, matched -1 Tag: <b>, matched -1 word: Status@6 Tag: </b>, matched -1 Tag: <br>, matched -1 Tag: </td>, matched -1 Tag: </tr>, matched -1 Tag: <tr>, matched -1 Tag: <td align="right" valign="middle">, matched -1 Tag: <h4>, matched 7 Tag: <FONT COLOR="#990000">, matched -1 word: NOT@7 word: LOGGED@8 Tag: </h4>, matched 13 Tag: <A class="menus" href="/account/login.php">, matched 2 word: Login@9 Tag: </A>, matched 3 href: http://prodbrass.web.boeing.com/account/login.php (Login) Rejected: URL not in the limits! url rejected: (level 1)http://prodbrass.web.boeing.com/account/login.php Tag: <IMG src="//prodbrass.web.boeing.com//themes/forged/images/point1.png" border=0 width=7 height=7>, matched 18 image: http://prodbrass.web.boeing.com/themes/forged/images/point1.png Tag: <br>, matched -1 Tag: <A class="menus" href="/account/register.php">, matched 2 word: Register@10 Tag: </A>, matched 3 href: http://prodbrass.web.boeing.com/account/register.php (Register Me) Rejected: URL not in the limits! url rejected: (level 1)http://prodbrass.web.boeing.com/account/register.php Tag: <IMG src="//prodbrass.web.boeing.com//themes/forged/images/point1.png" border=0 width=7 height=7>, matched 18 image: http://prodbrass.web.boeing.com/themes/forged/images/point1.png Tag: <br>, matched -1 Tag: <BR>, matched -1 Tag: </td>, matched -1 Tag: </tr>, matched -1 Tag: </table>, matched -1 Tag: <table cellpadding="0" cellspacing="0" border="0" width="140">, matched -1 Tag: <tr>, matched -1 Tag: <td align="left" valign="middle">, matched -1 Tag: <b>, matched -1 Tag: </b>, matched -1 Tag: <br>, matched -1 Tag: </td>, matched -1 Tag: </tr>, matched -1 Tag: <tr>, matched -1 Tag: <td align="right" valign="middle">, matched -1 Tag: <A class="menus" href="/doc/site/">, matched 2 word: Site@11 word: Docs@12 Tag: </A>, matched 3 href: http://prodbrass.web.boeing.com/doc/site/ (Site Docs) Rejected: URL not in the limits! url rejected: (level 1)http://prodbrass.web.boeing.com/doc/site/ Tag: <IMG src="//prodbrass.web.boeing.com//themes/forged/images/point1.png" border=0 width=7 height=7>, matched 18 image: http://prodbrass.web.boeing.com/themes/forged/images/point1.png Tag: <br>, matched -1 Tag: <A class="menus" href="/aboutbrass.php">, matched 2 word: About@13 word: BRASS@14 Tag: </A>, matched 3 href: http://prodbrass.web.boeing.com/aboutbrass.php (About BRASS) Rejected: URL not in the limits! url rejected: (level 1)http://prodbrass.web.boeing.com/aboutbrass.php Tag: <IMG src="//prodbrass.web.boeing.com//themes/forged/images/point1.png" border=0 width=7 height=7>, matched 18 image: http://prodbrass.web.boeing.com/themes/forged/images/point1.png Tag: <br>, matched -1 Tag: <A class="menus" href="/sendmessage.php?[EMAIL PROTECTED]">, matched 2 word: Contact@15 word: BRASS@16 Tag: </A>, matched 3 href: http://prodbrass.web.boeing.com/sendmessage.php?[EMAIL PROTECTED] eing.com (Contact BRASS) Rejected: Extension is invalid! url rejected: (level 1)http://prodbrass.web.boeing.com/sendmessage.php?[EMAIL PROTECTED]. boeing.com Tag: <IMG src="//prodbrass.web.boeing.com//themes/forged/images/point1.png" border=0 width=7 height=7>, matched 18 image: http://prodbrass.web.boeing.com/themes/forged/images/point1.png Tag: <br>, matched -1 Tag: <BR>, matched -1 Tag: </td>, matched -1 Tag: </tr>, matched -1 Tag: </table>, matched -1 Tag: <P>, matched -1 Tag: </TD>, matched -1 Tag: <td width="20" background="//prodbrass.web.boeing.com//themes/forged/images/fade1.png" nowrap>, matched -1 Tag: </td>, matched -1 Tag: <td valign="top" bgcolor="#FFFFFF" width="99%">, matched -1 Tag: <BR>, matched -1 Tag: <H2>, matched 5 Tag: <font color="#FF3333">, matched -1 word: PERMISSION@17 Tag: </font>, matched -1 word: DENIED@18 Tag: </H2>, matched 11 Tag: <P>, matched -1 word: Need@19 word: login@20 word: view@21 word: this@22 word: page.@23 Tag: <p>, matched -1 Tag: </p>, matched -1 Tag: </td>, matched -1 Tag: <td width="9" bgcolor="#FFFFFF">, matched -1 Tag: </td>, matched -1 Tag: </tr>, matched -1 Tag: </table>, matched -1 Tag: </td>, matched -1 Tag: <td width="17" background="//prodbrass.web.boeing.com//themes/forged/images/rightbar1.png" align="right" valign="bottom">, matched -1 Tag: <IMG src="//prodbrass.web.boeing.com//themes/forged/images/rightbar1.png" border=0 width=17 height=17>, matched 18 image: http://prodbrass.web.boeing.com/themes/forged/images/rightbar1.png Tag: </td>, matched -1 Tag: </tr>, matched -1 Tag: <tr>, matched -1 Tag: <td background="//prodbrass.web.boeing.com//themes/forged/images/bbar1.png" height="17">, matched -1 Tag: <IMG src="//prodbrass.web.boeing.com//themes/forged/images/bleft1.png" border=0 width=17 height=17>, matched 18 image: http://prodbrass.web.boeing.com/themes/forged/images/bleft1.png Tag: </td>, matched -1 Tag: <td background="//prodbrass.web.boeing.com//themes/forged/images/bbar1.png" align="center" colspan="3">, matched -1 Tag: <IMG src="//prodbrass.web.boeing.com//themes/forged/images/bbar1.png" border=0 width=1 height=17>, matched 18 image: http://prodbrass.web.boeing.com/themes/forged/images/bbar1.png Tag: </td>, matched -1 Tag: <td background="//prodbrass.web.boeing.com//themes/forged/images/bbar1.png" bgcolor="#7c8188">, matched -1 Tag: <IMG src="//prodbrass.web.boeing.com//themes/forged/images/bright1.png" border=0 width=17 height=17>, matched 18 image: http://prodbrass.web.boeing.com/themes/forged/images/bright1.png Tag: </td>, matched -1 Tag: </tr>, matched -1 Tag: </table>, matched -1 Tag: <P>, matched -1 Tag: <A HREF="/source.php?page_url=/mhonarchive/test-ulysses-final/index.php">, matched 2 Tag: <B>, matched -1 Tag: <FONT COLOR="BLACK">, matched -1 Tag: </FONT>, matched -1 word: UNCLASSIFIED@24 Tag: </B>, matched -1 Tag: </A>, matched 3 href: http://prodbrass.web.boeing.com/source.php?page_url=/mhonarchive/test-ulysse s-final/index.php (UNCLASSIFIED) Rejected: URL not in the limits! url rejected: (level 1)http://prodbrass.web.boeing.com/source.php?page_url=/mhonarchive/test-ulys ses-final/index.php Tag: <P>, matched -1 Tag: <P class="footer">, matched -1 Tag: <BR>, matched -1 Tag: </body>, matched -1 Tag: </html>, matched -1 size = 8015 pick: prodbrass.web.boeing.com, # servers = 1 > prodbrass.web.boeing.com with a traditional HTTP connection ht://dig End Time: Thu Aug 8 14:42:26 2002 ID: 2 URL: http://prodbrass.web.boeing.com/mhonarchive/test-ulysses-final/ Preamble text: Postamble text: Note: This message will be sent again if you do not change or take away the notification of the above mentioned HTML page. Find out more about the notification service at http://www.htdig.org/meta.html Cheers! ht://Dig Notification Service Thank you so much! Mary -----Original Message----- From: Gilles Detillieux [mailto:[EMAIL PROTECTED]] Sent: 08 August, 2002 1:59 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: [htdig] rundig won't dig URLS The "Moved Permanently" message is the standard description of a 301 return code, which is caused by a redirect. It's nothing to worry about, as htdig does handle redirects. It's also standard procedure for a web server, when given a request for the URL of a directory that's missing the trailing slash, to give the client a redirect to the corrected URL with the trailing slash. This is to prevent the client from having problems interpreting links relative to that directory. You should have the trailing slash on your start_url to avoid the extra redirect. It's abscence shouldn't be a problem, but it does cause unnecessary extra traffic. Unfortunately, your output excerpt ends just when it gets interesting, right after htdig fetches the "text-ulysses-final/" directory listing. Was there any output after that? htdig should have parsed a bunch of hrefs at that point. I did notice that your server is using chunked encoding. I think there were problems with reading chunked input as recently as the Feb. 3/02 snapshot of 3.2.0b4. You didn't mention which one you're running, but if it's less recent than Feb. 10, you may want to upgrade. -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

