According to Jim Cole:
> Jose Julian Buda's bits of Wed, 15 Aug 2001 translated to:
> >I want to know if the htdig index for default "ALL"
> >word in a html document and ALL document specified in
> >start_url.
> 
> In general this will be true. However there are a lot of cases where a
> document, or word in a document, may not be indexed. By default, a word
> will be excluded if it is less than three characters, more than 12
> characters, or a number.

This is mostly true.  However, by default words over 12 characters are
truncated, not excluded.  Because the same truncation occurs at search
time, you'll still get a match, only it's a fuzzy match because any
characters past the first 12 are ignored.  See
http://www.htdig.org/attrs.html#maximum_word_length
http://www.htdig.org/attrs.html#minimum_word_length
http://www.htdig.org/attrs.html#allow_numbers

> A document can be excluded due to settings for
> limit_urls_to, bad_extensions, exclude_urls, max_doc_size, etc. Indexing
> can also be prevented by robots.txt and tags in the document that specify
> it not be indexed. You might want to browse all of the htdig config
> settings.

See also
http://www.htdig.org/FAQ.html#q5.27
http://www.htdig.org/FAQ.html#q5.25
http://www.htdig.org/FAQ.html#q5.18

> On the search side, the default is to only display the first 100 hits. So
> if you are expecting hundreds/thousands of hits from some of your
> searches, you will never see them all unless you adjust your htsearch
> config. (e.g. matches_per_page, maximum_pages).

See also
http://www.htdig.org/FAQ.html#q4.19

> >because when i run rundig to make de database, then
> >when a make a searching for any text , some page
> >appear and some page dont , and the text is in ALL
> >files .
> >Why some page are displayed and other one dont ?
> 
> If all else fails, you might want to try running htdig with -vv or -vvv
> in order to see what it is and isn't indexing.

By the way, Jim, thanks for all the questions you fielded while Geoff and I
were away.  It sure cut down the number of e-mails I had to deal with on
my return, and for the most part your answers were bang-on.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to