Hi Victor

Thanks.

In the past I have used the Inso OutsideIn filters and found them very good;
however I'd like to come up with a pure Java solution, so if there is a Java
equivalent to the Inso filters I be grateful for any details.  Failing that,
I thought that I'd go for individual parsers initially using the file
extensions to select the correct parser but in the future adding a file type
recogniser for files without extensions.  Hence my request for anyone
knowing of good parsers particularly for the most common formats.

That being said, has anyone come across a Powerpoint parser?

Pete

----- Original Message -----
From: "Victor Hadianto" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Thursday, May 29, 2003 12:01 AM
Subject: Re: RE : Parsers


> > The www.textmining.org text extractors work very well for Word and pdf
> > documents.
> > They use both PDFBox and POI.
> >
> > For Excel, using POI directly is very easy. Tell me if you want to see
> > code samples.
> >
> > I'm looking myself for a Powerpoint text extractor, if you know one...
>
> Another solution is to use Microsoft Office itself. You can setup a server
> that serve request to convert Microsoft Office doc. There are many ways of
> doing this, for example using Python to directly call Office then put your
> python script in a webserver.
>
> Or you can set a .Net conversion server and you can call this .Net service
> using a Web Service, and many other interesting technique.
>
> victor
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to