[jira] [Commented] (TIKA-1760) PDF index fulltext fails.

Tim Allison (JIRA) Fri, 02 Oct 2015 03:01:21 -0700

    [ 
https://issues.apache.org/jira/browse/TIKA-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940981#comment-14940981
 ]


Tim Allison commented on TIKA-1760:
-----------------------------------

Thank you for raising this issue.  I'm not sure there's anything we can do at 
the Tika level...or is there?  I'd recommend opening an issue on PDFBox's JIRA 
if you haven't already.  Please link that issue to this one so that we can 
track this.

> PDF index fulltext fails.
> -------------------------
>
>                 Key: TIKA-1760
>                 URL: https://issues.apache.org/jira/browse/TIKA-1760
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Arkady Zalkowitsch
>            Priority: Critical
>         Attachments: not_found.pdf
>
>
> PDF index fulltext fails when font dictionary in there contains one entry for 
> the font Helvetica and an entry for Encoding whose value does not represent a 
> font at all.
> The AcroForm dictionary in PDF looks like this:
> 4 0 obj
> <<
>   /Fields [ 12 0 R ]
>   /DA(/Helvetica 0 Tf 0 g )
>   /DR
>   <<
>     /Font
>     <<
>       /Helvetica 11 0 R
>       /Encoding<</PDFDocEncoding 10 0 R>>
>     >>
>   >>
>   /NeedAppearances true
> >>
> endobj
> PDFBox tries to parse that "font" called Encoding and fails doing so. but 
> PDResources.getFonts() only logs the resulting exception:
> try
> {
>     newFont = PDFontFactory.createFont( (COSDictionary)font );
> }
> catch (IOException exception)
> {
>     LOG.error("error while creating a font", exception);
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TIKA-1760) PDF index fulltext fails.

Reply via email to