Fw: PDF / Word document parsers

Anita Srinivas Thu, 18 Apr 2002 23:11:56 -0700

Kelvin,
Thanks for your quick reply.  I think it is built for Linux/Unix  platform..
I am working on Windows platform.


Anita
----- Original Message -----
From: "Kelvin Tan" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Friday, April 19, 2002 10:25 AM
Subject: Re: PDF / Word document parsers


> Anita,
>
> I've experienced a moderate amount of success using Etymon for PDF
parsing.
> It does consume quite alot of memory for larger PDF documents, but
otherwise
> it's ok. What difficulties are you facing?
>
> For MS Word parsing, The Jakarta POI project is working something out, but
> in the meanwhile I've managed to search MS Word documents by reading the
> file and stripping out nonsense characters. It's a hack I think, but if I
> increase the indexWriter's maxFieldLength to about a million, I can search
> like 13-15MB word documents with ease.
>
> Kelvin
> ----- Original Message -----
> From: "Anita Srinivas" <[EMAIL PROTECTED]>
> To: "Lucene Users List" <[EMAIL PROTECTED]>
> Sent: Friday, April 19, 2002 2:13 PM
> Subject: PDF / Word document parsers
>
>
> Hi...
>
> I have been looking for PDF and Word document parsers.  I have tried the
> contributions page on the Lucene site as suggested by a Lucene User. The
> PJEtymon does not have a Windows version.  The XPDF does not do the
parsing
> very well.
>
> Can someone  suggest some better Word document or PDF parsers other than
the
> ones I mentioned here, .
>
> Thanks
>
> Anita Srinivas
>
>
>
> --
> To unsubscribe, e-mail:
<mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail:
<mailto:[EMAIL PROTECTED]>
>
>


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Fw: PDF / Word document parsers

Reply via email to