Jose Julian Buda's bits of Wed, 15 Aug 2001 translated to:
>I want to know if the htdig index for default "ALL"
>word in a html document and ALL document specified in
>start_url.
In general this will be true. However there are a lot of cases where a
document, or word in a document, may not be indexed. By default, a word
will be excluded if it is less than three characters, more than 12
characters, or a number. A document can be excluded due to settings for
limit_urls_to, bad_extensions, exclude_urls, max_doc_size, etc. Indexing
can also be prevented by robots.txt and tags in the document that specify
it not be indexed. You might want to browse all of the htdig config
settings.
On the search side, the default is to only display the first 100 hits. So
if you are expecting hundreds/thousands of hits from some of your
searches, you will never see them all unless you adjust your htsearch
config. (e.g. matches_per_page, maximum_pages).
>because when i run rundig to make de database, then
>when a make a searching for any text , some page
>appear and some page dont , and the text is in ALL
>files .
>Why some page are displayed and other one dont ?
If all else fails, you might want to try running htdig with -vv or -vvv
in order to see what it is and isn't indexing.
Jim
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html