Hi Victor Thanks.
In the past I have used the Inso OutsideIn filters and found them very good; however I'd like to come up with a pure Java solution, so if there is a Java equivalent to the Inso filters I be grateful for any details. Failing that, I thought that I'd go for individual parsers initially using the file extensions to select the correct parser but in the future adding a file type recogniser for files without extensions. Hence my request for anyone knowing of good parsers particularly for the most common formats. That being said, has anyone come across a Powerpoint parser? Pete ----- Original Message ----- From: "Victor Hadianto" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Thursday, May 29, 2003 12:01 AM Subject: Re: RE : Parsers > > The www.textmining.org text extractors work very well for Word and pdf > > documents. > > They use both PDFBox and POI. > > > > For Excel, using POI directly is very easy. Tell me if you want to see > > code samples. > > > > I'm looking myself for a Powerpoint text extractor, if you know one... > > Another solution is to use Microsoft Office itself. You can setup a server > that serve request to convert Microsoft Office doc. There are many ways of > doing this, for example using Python to directly call Office then put your > python script in a webserver. > > Or you can set a .Net conversion server and you can call this .Net service > using a Web Service, and many other interesting technique. > > victor > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]