[
https://issues.apache.org/jira/browse/TIKA-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940981#comment-14940981
]
Tim Allison commented on TIKA-1760:
-----------------------------------
Thank you for raising this issue. I'm not sure there's anything we can do at
the Tika level...or is there? I'd recommend opening an issue on PDFBox's JIRA
if you haven't already. Please link that issue to this one so that we can
track this.
> PDF index fulltext fails.
> -------------------------
>
> Key: TIKA-1760
> URL: https://issues.apache.org/jira/browse/TIKA-1760
> Project: Tika
> Issue Type: Bug
> Reporter: Arkady Zalkowitsch
> Priority: Critical
> Attachments: not_found.pdf
>
>
> PDF index fulltext fails when font dictionary in there contains one entry for
> the font Helvetica and an entry for Encoding whose value does not represent a
> font at all.
> The AcroForm dictionary in PDF looks like this:
> 4 0 obj
> <<
> /Fields [ 12 0 R ]
> /DA(/Helvetica 0 Tf 0 g )
> /DR
> <<
> /Font
> <<
> /Helvetica 11 0 R
> /Encoding<</PDFDocEncoding 10 0 R>>
> >>
> >>
> /NeedAppearances true
> >>
> endobj
> PDFBox tries to parse that "font" called Encoding and fails doing so. but
> PDResources.getFonts() only logs the resulting exception:
> try
> {
> newFont = PDFontFactory.createFont( (COSDictionary)font );
> }
> catch (IOException exception)
> {
> LOG.error("error while creating a font", exception);
> }
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)