[ 
https://issues.apache.org/jira/browse/PDFBOX-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17900370#comment-17900370
 ] 

Axel Howind commented on PDFBOX-5902:
-------------------------------------

[~lehmi] I added code that logged if an entry was already present and if the 
new value differs (using equals()) from the old one and ran the test suite. 
It's possible that it happens just in some test code and should not happen with 
real files. I had something in mind to reduce the creation overhead, but that 
would only work if the mappings do not change, so I stopped there and reverted 
the code. If you are interested, I can restore the changes and post here.

> The CPU usage of a PDF file with a size of 85.6 MB is abnormal
> --------------------------------------------------------------
>
>                 Key: PDFBOX-5902
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5902
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 2.0.31, 3.0.2 PDFBox
>            Reporter: ltzzZ
>            Priority: Major
>         Attachments: image-2024-11-15-17-07-17-802.png, 
> image-2024-11-16-12-23-59-684.png, image-2024-11-16-12-38-54-861.png, 
> image-2024-11-19-08-50-37-171.png, image-2024-11-19-08-55-59-315.png, 
> image-2024-11-19-08-56-23-894.png, image-2024-11-19-08-56-49-755.png
>
>
> When I try to extract the text content from a pdf file with a size of 85.6MB, 
> at this point the CPU usage is abnormal, the threshold of the alarm is 
> reached, and the extraction speed is also very slow, didn't get results for a 
> few minutes, not a memory problem, also tried to upgrade the version of the 
> library, this problem still exists.
> !image-2024-11-15-17-07-17-802.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to