We use ht://Dig on a client site. Up till recently we have
been creating the index by building up a file that links
to every file we want to index, something like this:
<html>
<head><title></title></head>
<body>
<p><a href="/registry/0000001.html">.</a>
<p><a href="/registry/0000002.html">.</a>
<p><a href="/registry/0000003.html">.</a>
<p><a href="/registry/0000009.html">.</a>
<p><a href="/registry/0000111.html">.</a>
<p><a href="/registry/0000213.html">.</a>
<p><a href="/registry/0004571.html">.</a>
<p><a href="/registry/0007771.html">.</a>
<p><a href="/registry/0000778.html">.</a>
<p><a href="/registry/0000067.html">.</a>
...
</body>
</html>
and indexing it to a level of 1.
This is a list that has been growing steadily, and has now reached
about 6,500 files. Recently we started to notice that things we
new for sure existed weren't turning up in search results.
When I did some investigation, I found that only the first 4,500
files or so had actually been indexed - the rest had been skipped.
I have solved the problem for now - I now chunk the list up into
groups of 1000, and the reference each group from a root file and
index to a level of 2.
Is this a bug, a feature I don't know about or a misunderstanding
on my part about how things work?
Regs
Brian White
-------------------------
Brian White
Step Two Designs Pty Ltd
Knowledge Management Consultancy, SGML & XML
Phone: +612-93197901
Web: http://www.steptwo.com.au/
Email: [EMAIL PROTECTED]
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html