Original problem:
> 3.1.5 indexes site successfully,
> 3.1.6 skips over most files, ignoring them.
Thanks Giles for your reply. I did indeed study the full trace (up to
-vvvv!) and found no "rejected" URLs. Nor do I have exclude_url or any
rewrite rules in the config.
However, I eventually discovered that the problem had nothing to do with
the URLs - it turned out that many of the navigation files (which
contained the links to the rest of the site) contained the meta tag:
<meta name="htdig-noindex">
If I removed these tags, the site index proceeded successfully. Well,
nearly - now all the navigation files are in the DB...
My understanding is that this meta tag causes the file containing it
*not* to be indexed *but* that the file is still parsed for hrefs. So,
for example, if navbar.html has:
....
<meta name="htdig-noindex">
<a href="banana.html">Banana</a>
...
then the word "Banana" will *not* go in the DB and a search for "Banana"
will *not* return "navbar.html". However, banana.html *will* be pushed
onto the search stack and banana.html will be retrieved and indexed.
This is exactly what happens with 3.1.5.
However, with 3.1.6, navbar is parsed, "Banana" is not indexed, but
banana.html is not pushed.
So, either, my understanding of the meta-tag is wrong, or there's a
bug...
Basically, I want htdig to find the hrefs in navbar.html, but I don't
want navbar.html to appear in the index. Can I still do this?
Rgds,
Owen Boyle.
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html