According to thorstuff: > Gilles Detillieux wrote: > >As you searched the mailing list archives, you're no doubt aware > >of the list, which is a better way to get answers than to mail > >individuals directly. (See http://www.htdig.org/FAQ.html#q1.16) > > > Sorry about that. I thought that by hitting reply to an email from the > list I would do so. That was my intention anyway.
No probs. It's a common mistake. The same applies to any followup postings. See http://www.htdig.org/FAQ.html#q1.17 > >Your problem doesn't sound like it's OS-specific or RPM-specific > >(I'm using the above 2 RPMs on my own system with no problems), so it > >should be regarded as a general configuration problem. The FAQ has a > >few entries that may help... http://www.htdig.org/FAQ.html#q5.25 and > >http://www.htdig.org/FAQ.html#q5.27 > > > > > Yes I have read those but they did not seem to solve the problem. The main idea behind all this is to try to follow the chain of links, the way htdig or any other web spider would. By looking at the output of htdig -vvv, you can see the links (<a href=...> tags) as htdig encounters them, and see what it's doing with them. If it's rejecting them, FAQ 5.27 will help figure out why. If it's not even seeing links that you think it should, FAQ 5.25, and the entries to which it refers (esp. 5.18 & 5.1) can provide some possible explanations. Failing all that, you should look at the HTML code yourself to see what htdig is seeing. Point a web browser to the URL in start_url, and select "View Page Source" to look at the code to find the <a href=...> tags. > >Also, try running htsearch from the command line, rather than a > >browser, to see if it still can't find any words that you know to be > >in /var/lib/htdig/db.wordlist. > > > > >Make sure htsearch is using the same > >config file as htdig (or rundig), or at least that the two use the > >same database. > > > How do I go about that exactly? If you could just point in the right > direction on this that would help. On Red Hat 9, htsearch will be in /var/www/cgi-bin, so you can run a query like: /var/www/cgi-bin/htsearch 'words=foo&config=htdig' or you can run htsearch with no arguments and it will prompt you for the words (and format, to which you can just hit Enter). As for the config files, the htdig316 RPM package you installed sets CONFIG_DIR to /etc/htdig, so for a "config=whatever" input parameter to htsearch, it will use /etc/htdig/whatever.conf. This should agree with any config file you provide to htdig or rundig via the -c option. In the simplest (default) case, both htdig and htsearch will use /etc/htdig/htdig.conf, but I don't know how exactly you are calling htdig and htsearch so I didn't want to just assume the default config. If you are using two different configs for htdig and htsearch, which is OK, then you can assure they are using the same database by making sure any setting of database_dir or database_base agree in the two config files. I brought up the issue of databases and config files because you claimed that htsearch wasn't finding words that you knew to be in the wordlist, so this seemed to be a plausible trouble spot. However, it seems from the output below that htsearch is indeed finding some matches, just not all the documents you'd like it to. It now seems to me the most likely problem is just that htdig isn't finding links to most/all of your documents. > >If it still doesn't work, it would be useful to know > >what, if any, output you do get from htsearch. > > > Running htsearch from the command line does produce a search result: > <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> > <html><head><title>Search results for 'config'</title></head> > <body bgcolor="#eef7ff"> > <h2><img src="/htdig/htdig.gif" alt="ht://Dig"> > > Search results for 'config'</h2> ... > <strong>Documents 1 - 1 of 1 matches. > More <img src="/htdig/star.gif" alt="*">'s indicate a better match. > </strong> > <hr noshade size="1"> > <dl><dt><strong><a > href="http://web2.forefrontnet.com/">phpinfo()</a></strong><img > src="/htdig/star.gif" alt="*"><img src="/htdig/star.gif" alt="*"><img > src="/htdig/star.gif" alt="*"><img src="/htdig/star.gif" alt="*"> > </dt><dd><b><tt>... </tt></b>'--libexecdir=/usr/libexec' ... > '--with-<strong>config</strong>-file-scan-dir=/etc/php.d' > '--enable-force-cgi-redirect' '--disable-debug' '--enable-pic' '<b><tt> > ...</tt></b><br> > <em><a > href="http://web2.forefrontnet.com/">http://web2.forefrontnet.com/</a></em> > <font size="-1">08/13/03, 53940 bytes</font> > </dd></dl> > > > <hr noshade size="4"> > <a href="http://www.htdig.org/"> > <img src="/htdig/htdig.gif" border="0" alt="">ht://Dig 3.1.6</a> > </body></html> > > As you can see some of the site has been indexed but I can tell that not > all of it has been. As you say it is probably a configuration issue and > any suggestions would really be appreciated. Well, when I look at http://web2.forefrontnet.com/ myself, I only see one single link to another document on your site, and that is http://web2.forefrontnet.com/index.php?=PHPB8B5F2A0-3C92-11d3-A3A9-4C7B08C10000 (PHP 4 Credits) which doesn't seem to have any links at all in it. The way I see it, htdig would only be able to find those two documents without being provided more URLs than just http://web2.forefrontnet.com/ in start_url. Any URLs you add to start_url must of course be reachable via HTTP, which you can determine by testing them in a web browser. > Once again, sorry for the personal email I'll try to make sure I don't > do that again. An honest mistake. Not a problem, really. -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01 _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

