[jira] [Commented] (PDFBOX-5902) The CPU usage of a PDF file with a size of 85.6 MB is abnormal

ltzzZ (Jira) Mon, 18 Nov 2024 18:51:16 -0800


    [ 
https://issues.apache.org/jira/browse/PDFBOX-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17899328#comment-17899328
 ]


ltzzZ commented on PDFBOX-5902:
-------------------------------

After re-running locally by adding the JVM parameter \{-XX:+UseG1GC 
-XX:+UseStringDeduplication}, it looks a little better, at least the CPU usage 
is not abnormal, but the fetch speed is still at 70-80 seconds, which is too 
slow, almost unacceptable, is there any other way I can improve the speed of 
extracting text.Also, I don't have a way to change my JDK version, which is 
used by the project and I don't have the right to change it
 

> The CPU usage of a PDF file with a size of 85.6 MB is abnormal
> --------------------------------------------------------------
>
>                 Key: PDFBOX-5902
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5902
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 2.0.31, 3.0.2 PDFBox
>            Reporter: ltzzZ
>            Priority: Major
>         Attachments: image-2024-11-15-17-07-17-802.png, 
> image-2024-11-16-12-23-59-684.png, image-2024-11-16-12-38-54-861.png, 
> image-2024-11-19-08-50-37-171.png, image-2024-11-19-08-55-59-315.png, 
> image-2024-11-19-08-56-23-894.png, image-2024-11-19-08-56-49-755.png
>
>
> When I try to extract the text content from a pdf file with a size of 85.6MB, 
> at this point the CPU usage is abnormal, the threshold of the alarm is 
> reached, and the extraction speed is also very slow, didn't get results for a 
> few minutes, not a memory problem, also tried to upgrade the version of the 
> library, this problem still exists.
> !image-2024-11-15-17-07-17-802.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-5902) The CPU usage of a PDF file with a size of 85.6 MB is abnormal

Reply via email to