Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change 
notification.

The "Troubleshooting Tika" page has been changed by TimothyAllison:
https://wiki.apache.org/tika/Troubleshooting%20Tika?action=diff&rev1=10&rev2=11

  == PDF Text Problems ==
  If Tika isn't extracting the right text from a PDF, and/or is giving errors, 
the first thing to do is identify if this is a Tika issue, or an issue with the 
underlying Apache PDFBox library used.
  
- To check, grab the latest [[http://pdfbox.apache.org/download.cgi|Apache 
PDFBox pdfbox-app jar]] and use the 
[[http://pdfbox.apache.org/2.0/commandline.html#extracttext|ExtractText command 
line tool]] on your problematic PDF. 
+ To check, grab the latest [[http://pdfbox.apache.org/download.cgi|Apache 
PDFBox pdfbox-app jar]] and use the 
[[http://pdfbox.apache.org/2.0/commandline.html#extracttext|ExtractText command 
line tool]] on your problematic PDF:
+ {{{
+ java -jar pdfbox-app.X.Y.jar ExtractText problematicPDF.pdf
+ }}}
  
  If that shows the same problem, it's a PDFBox bug. Please 
[[http://pdfbox.apache.org/support.html|file an Apache PDFBox bug report]] and 
attach at least one failing file to the bug. When that gets fixed, Tika will 
pick up the new release and will get the fix
  

Reply via email to