[ 
https://issues.apache.org/jira/browse/PDFBOX-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17899371#comment-17899371
 ] 

Andreas Lehmkühler commented on PDFBOX-5902:
--------------------------------------------

[~chain] to avoid to compare apples with oranges. What environment are you 
using for your tests? I'm using standard PC hardware (Ryzen5 7600, 32GB RAM) 
and get similar results than [~axh]

But I've observed some strange behaviour. In my environment the speed relates 
to the configured amount of memory, but it's different than expected. It's 
faster if the memory is smaller!! I've tried 256m, 512m, 1g and 2g

Additionally, those configurations using more memory deliver non deterministic 
results, e.g. with 2g I got results like 24s, 82s and 148s. I had some peaks in 
the CPU usage which seems to be related to the garbage collector. 

Those results support [~axh] theory that it might be related to the memory 
management/garbage collector

> The CPU usage of a PDF file with a size of 85.6 MB is abnormal
> --------------------------------------------------------------
>
>                 Key: PDFBOX-5902
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5902
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 2.0.31, 3.0.2 PDFBox
>            Reporter: ltzzZ
>            Priority: Major
>         Attachments: image-2024-11-15-17-07-17-802.png, 
> image-2024-11-16-12-23-59-684.png, image-2024-11-16-12-38-54-861.png, 
> image-2024-11-19-08-50-37-171.png, image-2024-11-19-08-55-59-315.png, 
> image-2024-11-19-08-56-23-894.png, image-2024-11-19-08-56-49-755.png
>
>
> When I try to extract the text content from a pdf file with a size of 85.6MB, 
> at this point the CPU usage is abnormal, the threshold of the alarm is 
> reached, and the extraction speed is also very slow, didn't get results for a 
> few minutes, not a memory problem, also tried to upgrade the version of the 
> library, this problem still exists.
> !image-2024-11-15-17-07-17-802.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to