[ https://issues.apache.org/jira/browse/PDFBOX-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17899916#comment-17899916 ]
Axel Howind commented on PDFBOX-5902: ------------------------------------- When I saw the code that does the unicode mapping, I thought that the mapping would not change once it is there. I tried the tests and saw that the same mapping was overwritten several times, so I thought that instead of calculating the mapping each time, it might be better to do a Map.computeIfAbsent() that only calls the rest of the code when no mapping is registered yet. But then I saw that at least in the tests, the value changed so using computeIfAbsent() would not be possible. There's another one, Map.compute() where you can check the old value and only replace when the value changes, but since you have to determine the new value each time anyway, there's nothing to win here. > The CPU usage of a PDF file with a size of 85.6 MB is abnormal > -------------------------------------------------------------- > > Key: PDFBOX-5902 > URL: https://issues.apache.org/jira/browse/PDFBOX-5902 > Project: PDFBox > Issue Type: Bug > Affects Versions: 2.0.31, 3.0.2 PDFBox > Reporter: ltzzZ > Priority: Major > Attachments: image-2024-11-15-17-07-17-802.png, > image-2024-11-16-12-23-59-684.png, image-2024-11-16-12-38-54-861.png, > image-2024-11-19-08-50-37-171.png, image-2024-11-19-08-55-59-315.png, > image-2024-11-19-08-56-23-894.png, image-2024-11-19-08-56-49-755.png > > > When I try to extract the text content from a pdf file with a size of 85.6MB, > at this point the CPU usage is abnormal, the threshold of the alarm is > reached, and the extraction speed is also very slow, didn't get results for a > few minutes, not a memory problem, also tried to upgrade the version of the > library, this problem still exists. > !image-2024-11-15-17-07-17-802.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org