According to Jim Cole:
> Jose Julian Buda's bits of Wed, 15 Aug 2001 translated to:
> >I want to know if the htdig index for default "ALL"
> >word in a html document and ALL document specified in
> >start_url.
>
> In general this will be true. However there are a lot of cases where a
> document, or word in a document, may not be indexed. By default, a word
> will be excluded if it is less than three characters, more than 12
> characters, or a number.
This is mostly true. However, by default words over 12 characters are
truncated, not excluded. Because the same truncation occurs at search
time, you'll still get a match, only it's a fuzzy match because any
characters past the first 12 are ignored. See
http://www.htdig.org/attrs.html#maximum_word_length
http://www.htdig.org/attrs.html#minimum_word_length
http://www.htdig.org/attrs.html#allow_numbers
> A document can be excluded due to settings for
> limit_urls_to, bad_extensions, exclude_urls, max_doc_size, etc. Indexing
> can also be prevented by robots.txt and tags in the document that specify
> it not be indexed. You might want to browse all of the htdig config
> settings.
See also
http://www.htdig.org/FAQ.html#q5.27
http://www.htdig.org/FAQ.html#q5.25
http://www.htdig.org/FAQ.html#q5.18
> On the search side, the default is to only display the first 100 hits. So
> if you are expecting hundreds/thousands of hits from some of your
> searches, you will never see them all unless you adjust your htsearch
> config. (e.g. matches_per_page, maximum_pages).
See also
http://www.htdig.org/FAQ.html#q4.19
> >because when i run rundig to make de database, then
> >when a make a searching for any text , some page
> >appear and some page dont , and the text is in ALL
> >files .
> >Why some page are displayed and other one dont ?
>
> If all else fails, you might want to try running htdig with -vv or -vvv
> in order to see what it is and isn't indexing.
By the way, Jim, thanks for all the questions you fielded while Geoff and I
were away. It sure cut down the number of e-mails I had to deal with on
my return, and for the most part your answers were bang-on.
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html