[
https://issues.apache.org/jira/browse/PDFBOX-765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883906#action_12883906
]
Arjohn Kampman commented on PDFBOX-765:
---------------------------------------
The performance degradation seems to be related to files that can not be found.
For example, with some PDF files, pdfbox tries to load
org/apache/pdfbox/resources/afm/MicrosoftSansSerif.afm over and over again.
Normally, the result of such load operations are cached in
PDTrueTypeFont.afmObjects, but not so when the result is <null>.
Here's (one of?) the relevant stack trace(s):
ResourceLoader.loadResource(String) line: 54
PDTrueTypeFont(PDFont).getAFM() line: 305
PDTrueTypeFont(PDSimpleFont).getFontHeight(byte[], int, int) line: 119
PDFTextStripper(PDFStreamEngine).processEncodedText(byte[]) line: 402
ShowTextGlyph.process(PDFOperator, List<COSBase>) line: 61
PDFTextStripper(PDFStreamEngine).processOperator(PDFOperator, List) line: 567
PDFTextStripper(PDFStreamEngine).processSubStream(PDPage, PDResources,
COSStream) line: 250
PDFTextStripper(PDFStreamEngine).processStream(PDPage, PDResources, COSStream)
line: 208
PDFTextStripper.processPage(PDPage, COSStream) line: 378
PDFTextStripper.processPages(List<COSObjectable>) line: 302
PDFTextStripper.writeText(PDDocument, Writer) line: 258
PDFTextStripper.getText(PDDocument) line: 184
> Performance regression in PDFBox 1.2.0
> --------------------------------------
>
> Key: PDFBOX-765
> URL: https://issues.apache.org/jira/browse/PDFBOX-765
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.2.0
> Reporter: Jukka Zitting
> Assignee: Jukka Zitting
> Priority: Critical
>
> Arjohn Kampman reported a notable performance drop in PDFBox 1.2.0, possibly
> caused by PDFBOX-754.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.