Dear Wiki user, You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.
The "PDFParser (Apache PDFBox)" page has been changed by TimothyAllison: https://wiki.apache.org/tika/PDFParser%20%28Apache%20PDFBox%29?action=diff&rev1=8&rev2=9 }}} == Optional Dependencies == - If you need to process TIFF or JPEG2000 images within PDFs, please consider adding the optional dependencies specified by [[https://pdfbox.apache.org/2.0/dependencies.html#optional-components||PDFBox]]. These dependencies are not compatible with ASL 2.0; please make sure that any third party licenses are suitable for your project. + If you need to process TIFF or JPEG2000 images within PDFs (either for inline image extraction or OCR), please consider adding the optional dependencies specified by [[https://pdfbox.apache.org/2.0/dependencies.html#optional-components|PDFBox]]. These dependencies are not compatible with ASL 2.0; please make sure that any third party licenses are suitable for your project. - Finally, [[https://twitter.com/mcaruanagalizia/status/796097425446490114|M. Caruana Galizia]] alerted us to the need to use maven-shade's ServicesResourceTransformer because the third-party dependencies' services file will be overwritten unless you do transform the services. See an example: [[https://github.com/ICIJ/extract/blob/master/pom.xml|here]]. + Finally, [[https://twitter.com/mcaruanagalizia/status/796097425446490114|M. Caruana Galizia]] alerted us to the need to use maven-shade's `ServicesResourceTransformer` because the third-party dependencies' services file will be overwritten unless you transform the services. See an example: [[https://github.com/ICIJ/extract/blob/master/pom.xml|here]]. == OCR == Note: the configuration of some of these features via the config file requires a nightly build of Tika after 11/8/2016 or Tika version >= 1.15.
