I am attempting to use htdig to index a large number (~100,000) files each of 
which are pretty small (~500 bytes). Running htdig -vv and using strace seems 
to indicate that htdig is spending most of its time opening and closing these 
files, and not actually doing the indexing.

Is there a way (either in 3.1.5, which I'm using, or in 3.2) to concatenate 
all of these individual files into one large file, with some delimiter between 
them, and have htdig be aware of that delimiter to differentiate between the 
files?

I.e., instead of foo.html containing

<HEAD><TITLE>foo</TITLE><HEAD>
<BODY><H1>Nice pants</H1><H3>I really like a good pair of pants</H3></BODY>

and bar.html containing

<HEAD><TITLE>bar</TITLE><HEAD>
<BODY><H1>Turtle soup</H1><H3>Eating turtle soup is the cat's meow</H3></BODY>


I could have one file, that might look something like

<HTML>
<HEAD><TITLE>foo</TITLE><HEAD>
<BODY><H1>Nice pants</H1><H3>I really like a good pair of pants</H3></BODY>
</HTML>
<HTML>
<HEAD><TITLE>bar</TITLE><HEAD>
<BODY><H1>Turtle soup</H1><H3>Eating turtle soup is the cat's meow</H3></BODY>
</HTML>


-dave





------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.

Reply via email to