Hi David,


On Tue, Aug 31, 2004 at 02:02:35PM +0100, David Adams wrote:
> Martin,
> 
> This is a joke, yes?
> 
> If not, please note that the Lucene FAQs make it clear that it is equally
> dependant on external parsers.

That's what a colleague recommended to me: to look in the lucene FAQ's
whether there is any alternative to the already mentioned parsers..

Honestly, I didn't take a look first before writing my email. :(

Fact is: indexing some webtree with the mentioned ppthtml, xlhtml or
xpdf takes ten times longer with a load of 10 on a dualproc Sun V480 with
4G RAM. Indexing only .doc files and .html rundig completes in about 30mins.

I discover hanging ppthtml and xlhtml processes, consuming nearly 95% CPU
and consuming about 1GB RAM for each document. Of course, those processes
don't come back and have to be killed... :(

> We use wp2html to convert Word documents and it's fine,but we bought it only
> because we needed to convert Wordperfect documents (not that we get many!)
> 
> David Adams

Yours,

Martin

-- 

--------------------------------------------------------
 arago AG, Institut fuer komplexes Datenmanagement
 Am Niddatal 3, 60488 Frankfurt/Main, [EMAIL PROTECTED]
 Tel. 069/405680, Fax 069/40568111, http://www.arago.de
--------------------------------------------------------

Attachment: pgpbyr3LINPSq.pgp
Description: PGP signature

Reply via email to