[
https://issues.apache.org/jira/browse/TIKA-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17252336#comment-17252336
]
Hudson commented on TIKA-3246:
------------------------------
SUCCESS: Integrated in Jenkins build Tika ยป tika-main-jdk8 #102 (See
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/102/])
TIKA-3246: call tailored fixup when getting AcroForm the first time to avoid
the creation of appearances which aren't needed in tika (newly needed in PDFBox
2.0.22) (tilman:
[https://github.com/apache/tika/commit/3a4c529a201c9c3d9b56cbdf8c2f8b702d74768e])
* (edit)
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-pdf-module/src/main/java/org/apache/tika/parser/pdf/PDFParser.java
* (edit)
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-pdf-module/src/main/java/org/apache/tika/parser/pdf/AbstractPDF2XHTML.java
> IllegalArgumentException when generation of appearances fails
> -------------------------------------------------------------
>
> Key: TIKA-3246
> URL: https://issues.apache.org/jira/browse/TIKA-3246
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.25
> Reporter: Tilman Hausherr
> Assignee: Tilman Hausherr
> Priority: Major
> Fix For: 2.0.0, 1.26
>
> Attachments: TIKA-3246.patch
>
>
> {noformat}
> java.lang.IllegalArgumentException: No glyph for U+0041 (A) in font
> BZZZZZ+Aladin-Regular
> at
> org.apache.pdfbox.pdmodel.font.PDCIDFontType2.encode(PDCIDFontType2.java:372)
> at
> org.apache.pdfbox.pdmodel.font.PDType0Font.encode(PDType0Font.java:422)
> at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:332)
> at org.apache.pdfbox.pdmodel.font.PDFont.getStringWidth(PDFont.java:363)
> at
> org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.calculateFontSize(AppearanceGeneratorHelper.java:859)
> at
> org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.insertGeneratedAppearance(AppearanceGeneratorHelper.java:494)
> at
> org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.setAppearanceContent(AppearanceGeneratorHelper.java:422)
> at
> org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.setAppearanceValue(AppearanceGeneratorHelper.java:232)
> at
> org.apache.pdfbox.pdmodel.interactive.form.PDTextField.constructAppearances(PDTextField.java:264)
> at
> org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm.refreshAppearances(PDAcroForm.java:327)
> at
> org.apache.pdfbox.pdmodel.fixup.processor.AcroFormGenerateAppearancesProcessor.process(AcroFormGenerateAppearancesProcessor.java:54)
> at
> org.apache.pdfbox.pdmodel.fixup.AcroFormDefaultFixup.apply(AcroFormDefaultFixup.java:56)
> at
> org.apache.pdfbox.pdmodel.PDDocumentCatalog.getAcroForm(PDDocumentCatalog.java:132)
> at
> org.apache.pdfbox.pdmodel.PDDocumentCatalog.getAcroForm(PDDocumentCatalog.java:113)
> at
> org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:267)
> {noformat}
> This is related to a change in PDFBox in {{PDDocumentCatalog.getAcroForm()}},
> we try to "fix" fields when they exist as annotations but not as fields. I
> wonder if this is needed at all.
> It happens with several files, among them the two AML files of PDFBOX-4086.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)