I am attempting to use htdig to index a large number (~100,000) files each of
which are pretty small (~500 bytes). Running htdig -vv and using strace seems
to indicate that htdig is spending most of its time opening and closing these
files, and not actually doing the indexing.
Is there a way (either in 3.1.5, which I'm using, or in 3.2) to concatenate
all of these individual files into one large file, with some delimiter between
them, and have htdig be aware of that delimiter to differentiate between the
files?
I.e., instead of foo.html containing
<HEAD><TITLE>foo</TITLE><HEAD>
<BODY><H1>Nice pants</H1><H3>I really like a good pair of pants</H3></BODY>
and bar.html containing
<HEAD><TITLE>bar</TITLE><HEAD>
<BODY><H1>Turtle soup</H1><H3>Eating turtle soup is the cat's meow</H3></BODY>
I could have one file, that might look something like
<HTML>
<HEAD><TITLE>foo</TITLE><HEAD>
<BODY><H1>Nice pants</H1><H3>I really like a good pair of pants</H3></BODY>
</HTML>
<HTML>
<HEAD><TITLE>bar</TITLE><HEAD>
<BODY><H1>Turtle soup</H1><H3>Eating turtle soup is the cat's meow</H3></BODY>
</HTML>
-dave
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.