Thanks Adam, +1. In the latest 0.2 trunk version of OODT, we've upgraded Tika to 0.8, the latest released version. See:
https://issues.apache.org/jira/browse/OODT-89 We'll probably be cutting a 0.2 release in the near future, I posted a comment on this thread, discussing it: http://s.apache.org/li HTH! Cheers, Chris On Jan 9, 2011, at 9:39 AM, Adam Estrada wrote: > All, > > I have been taking a look at OODT and noticed that you are still using some > of the older versions of other projects. I highly recommend updating Tika > and PDFBox to their latest versions. 0.7 and 1.2.1. The older versions of > PDFBox won't parse PDFs that were OCRd from whatever the latest version of > acrobat is. I know this from experience... > > Adam ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
