On Sunday, October 5, 2003, at 06:23 PM, C Bud wrote:

Yes, these .txt files are directly linked on various
.htm or .html files that are being accessed by the
search.  They open up directly in the web browser as
if they were a html document, rather than a plugin to
the word processing application in which they were
written.  They are linked via hypertext link <a href="
 "> </a> tags.

There are a number of attributes that might affect indexing of the text files. Check the following.


http://www.htdig.org/attrs.html#bad_extensions
http://www.htdig.org/attrs.html#valid_extensions
http://www.htdig.org/attrs.html#exclude_urls

If that doesn't help, the next step is probably to try rundig with more verbose output (e.g. -vvv). If the text files are being dropped for some reason, this should show up in the generated output. If the files don't show up at all in the output, then most likely they correspond to links that cannot be reached in one or more hops from the start_url. Speaking of which, you might also want to verify that you don't have a max_hop_count set in your config file; this would limit the number of links htdig would traverse along a given path.

If htdig is seeing your text files, it should be indexing them without any special effort required on your part (unless your configuration has been modified to prevent that from happening).

Jim



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to