OK, been working on this for the past nine hours or so, but seem to be stumped.

I have gotten into htdig.conf file, primarily (since that is the file to which my 
attention was directed).  I have a main home directory as the starting point.  Under 
that my directory structure consists of text files, image files, htdig files, forum 
files [the forum files are the ones I do not want indexed], applet files and cgi-bin.  
Under each sub-directory there are many, many sub-directories.  The site, itself, is 
about 300 MG.

There is a mysql database for the forum, but it is tucked away in the server and not 
referenced in the directory structure, except for the file which calls it up. Under 
excluded URLs, I had listed /forum.  This listing did not stop htdig from searching 
and indexing thousands of unwanted listings, however, from the forum.  A typical 
listing looks like:  /forum/viewforum.php?f=3&sid=f4d181d874cbc2cc0f41f2927959f2c5  

I tried /forum/ [with an added forward-slash], but that did not help.  Would it be 
possible to start at the sub-directory level, perhaps, with multiple starting points?  

At present, the search engine is totally useless because it searches and indexes 
repeatedly.  The suggestion asks "where to prune?"  I would reply "anywhere to exclude 
/forum and all under it."  

I tried to understand what you mean by the bad query string process, but I cannot 
figure out what you mean.  I have read all the material and inspected htdig.conf 
copiously, but (I apologize) I do not know what I am supposed to do.  Help!  Thanks.

      
--

On Thu, 10 Oct 2002 17:17:21  
 Gilles Detillieux wrote:
>According to Pub Litics:
>> Thanks for the helpful and prompt response.
>> 
>> Have read 4.24 and 5.29 and supporting material.  I have ruled out
>> blocking the extensions, because there are no discernible extensions.
>> For example, here are a few the multifarious search results:
>> 
>> 
>/forum/viewtopic.php?t=18&start=0&postdays=0&postorder=asc&highlight=&sid=981ab2115a733110a6f9753da88aa73f
>  
>> 
>> /forum/index.php?sid=a7fe32da1d54d5e820d29cef03db11b5 
>> 
>> I already have /forum listed under URLs to be blocked, in the CONFIG
>> file, but apparently that is not halting the searching process in
>> that respect.
>> 
>> I, of course, would like to exclude all of such types, per your
>> suggestion.  So, I am focusing on bad query string as the likely
>> culprit.  Only problem is, here is your example:
>> 
>> bad_querystr: forum=private section=topsecret&passwd=required  
>> 
>> Pardon if I sound dense, but I cannot find in the instructions where
>> to place this.  Would it go in the CONFIG file?  If so, where abouts?
>
>I think you're confusing the CONFIG file, which is used for setting
>certain paths and other settings at compile time, and htdig.conf, which
>is used for setting config attributes at run time.  See
>http://www.htdig.org/FAQ.html#q4.18
>http://www.htdig.org/config.html#htdig.conf
>http://www.htdig.org/confindex.html
>
>> Would it be changed?  I assume that "forum" would be the name of my
>> forum.  But, not sure what "section" refers to or "topsecret" means.
>> I assume that "password" would refer to my password? Or, should the word
>> "required" stay as it is?
>
>I think you're taking the example in the documentation far too literally.
>It's just an example of usage, not a suggestion taylor-made for your site.
>The general idea is to include any portions of query strings that appear
>in your URLs, upon which you'd want htdig to stop going any deeper
>and not index those URLs.  I think before you can use bad_querystr or
>exclude_urls effectively, you need to step back and get an idea of how
>the whole hierarchy of links from document to document fits together on
>your site - i.e.  what does this tree look like, and where are the best
>places to prune it?
>
>-- 
>Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
>Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
>Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)
>


____________________________________________________________
Get 25MB of email storage with Lycos Mail Plus!
Sign up today -- http://www.mail.lycos.com/brandPage.shtml?pageId=plus 


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to