> > Hm, this should be a FAQ. > > Maybe it should... ;-) It is now.
> > Check Lucene contributions page, there are some starting points > there, > > Well, this seems to be a very popular request... In fact I need > something like that also. Unfortunately, there seems to be no > authoritative answer as far as converting pdf files to text in a pure > > Java environment... Maybe I'm missing something here as usual? > > Also, on a related note, what would be a good approach to convert any > > random document into pdf? I was thinking to have a two steps process > for > document indexing in Lucene: > > - First, convert everything to pdf (with Acrobat or something) > - Second, convert pdf to text and index it. > > Any practical suggestions about how to do that in a pure Java > environment very welcome. Wouldn't you want to convert to XML instead and use XSLT to transform the XML representation to any desired format by just applying a style sheet? Sounds like less work with bigger document type coverage. Otis __________________________________________________ Do You Yahoo!? Yahoo! Health - your guide to health and wellness http://health.yahoo.com -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
