Hi David, On Tue, Aug 31, 2004 at 02:02:35PM +0100, David Adams wrote: > Martin, > > This is a joke, yes? > > If not, please note that the Lucene FAQs make it clear that it is equally > dependant on external parsers. That's what a colleague recommended to me: to look in the lucene FAQ's whether there is any alternative to the already mentioned parsers.. Honestly, I didn't take a look first before writing my email. :( Fact is: indexing some webtree with the mentioned ppthtml, xlhtml or xpdf takes ten times longer with a load of 10 on a dualproc Sun V480 with 4G RAM. Indexing only .doc files and .html rundig completes in about 30mins. I discover hanging ppthtml and xlhtml processes, consuming nearly 95% CPU and consuming about 1GB RAM for each document. Of course, those processes don't come back and have to be killed... :( > We use wp2html to convert Word documents and it's fine,but we bought it only > because we needed to convert Wordperfect documents (not that we get many!) > > David Adams Yours, Martin -- -------------------------------------------------------- arago AG, Institut fuer komplexes Datenmanagement Am Niddatal 3, 60488 Frankfurt/Main, [EMAIL PROTECTED] Tel. 069/405680, Fax 069/40568111, http://www.arago.de --------------------------------------------------------
pgpbyr3LINPSq.pgp
Description: PGP signature