Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change 
notification.

The "PDFParser (Apache PDFBox)" page has been changed by TimothyAllison:
https://wiki.apache.org/tika/PDFParser%20%28Apache%20PDFBox%29?action=diff&rev1=8&rev2=9

  }}}
  
  == Optional Dependencies ==
- If you need to process TIFF or JPEG2000 images within PDFs, please consider 
adding the optional dependencies specified by 
[[https://pdfbox.apache.org/2.0/dependencies.html#optional-components||PDFBox]].
  These dependencies are not compatible with ASL 2.0;  please make sure that 
any third party licenses are suitable for your project.
+ If you need to process TIFF or JPEG2000 images within PDFs (either for inline 
image extraction or OCR), please consider adding the optional dependencies 
specified by 
[[https://pdfbox.apache.org/2.0/dependencies.html#optional-components|PDFBox]]. 
 These dependencies are not compatible with ASL 2.0;  please make sure that any 
third party licenses are suitable for your project.
  
- Finally, [[https://twitter.com/mcaruanagalizia/status/796097425446490114|M. 
Caruana Galizia]] alerted us to the need to use maven-shade's 
ServicesResourceTransformer because the third-party dependencies' services file 
will be overwritten unless you do transform the services.  See an example: 
[[https://github.com/ICIJ/extract/blob/master/pom.xml|here]].
+ Finally, [[https://twitter.com/mcaruanagalizia/status/796097425446490114|M. 
Caruana Galizia]] alerted us to the need to use maven-shade's 
`ServicesResourceTransformer` because the third-party dependencies' services 
file will be overwritten unless you transform the services.  See an example: 
[[https://github.com/ICIJ/extract/blob/master/pom.xml|here]].
  
  == OCR ==
  Note: the configuration of some of these features via the config file 
requires a nightly build of Tika after 11/8/2016 or Tika version >= 1.15.

Reply via email to