Arkady Zalkowitsch created TIKA-1760:
----------------------------------------
Summary: PDF index fulltext fails.
Key: TIKA-1760
URL: https://issues.apache.org/jira/browse/TIKA-1760
Project: Tika
Issue Type: Bug
Reporter: Arkady Zalkowitsch
Priority: Critical
PDF index fulltext fails when font dictionary in there contains one entry for
the font Helvetica and an entry for Encoding whose value does not represent a
font at all.
The AcroForm dictionary in PDF looks like this:
4 0 obj
<<
/Fields [ 12 0 R ]
/DA(/Helvetica 0 Tf 0 g )
/DR
<<
/Font
<<
/Helvetica 11 0 R
/Encoding<</PDFDocEncoding 10 0 R>>
>>
>>
/NeedAppearances true
>>
endobj
PDFBox tries to parse that "font" called Encoding and fails doing so. but
PDResources.getFonts() only logs the resulting exception:
try
{
newFont = PDFontFactory.createFont( (COSDictionary)font );
}
catch (IOException exception)
{
LOG.error("error while creating a font", exception);
}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)