Add Tika parsers for PDF and TTF
--------------------------------

                 Key: PDFBOX-1132
                 URL: https://issues.apache.org/jira/browse/PDFBOX-1132
             Project: PDFBox
          Issue Type: New Feature
          Components: FontBox, Parsing
            Reporter: Jukka Zitting


The PDF and TTF parsers in Apache Tika rely more on improvements in PDFBox than 
on those in Tika, so it would make more sense for that code to reside inside 
Apache PDFBox.

Having the code inside PDFBox would allow for tighter integration with PDFBox 
internals and avoid need to wait for an official PDFBox release before new 
features can be used inside the PDF and TTF parsers.

To do this, I'd migrate the code PDF and TTF parser classes and related test 
cases and files from Tika to the PDFBox and FontBox components. We'd add an 
optional dependency to tika-core to these components, so people who don't use 
or need Tika wouldn't be affected.

I'll attach a patch with the proposed changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to