[jira] [Commented] (PDFBOX-5902) The CPU usage of a PDF file with a size of 85.6 MB is abnormal

ltzzZ (Jira) Fri, 15 Nov 2024 21:23:21 -0800


    [ 
https://issues.apache.org/jira/browse/PDFBOX-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17898826#comment-17898826
 ]


ltzzZ commented on PDFBOX-5902:
-------------------------------

I have modified the configuration and run it according to what you said, and 
the extraction speed has been improved. However, the CPU usage is still 
abnormal, and the extraction time is still over 1 minute. Is there any 
difference between the extraction method of the command-line tool and the PDF 
Text Stripper. getText() method in the dependency package?

!image-2024-11-16-12-23-59-684.png!

 

> The CPU usage of a PDF file with a size of 85.6 MB is abnormal
> --------------------------------------------------------------
>
>                 Key: PDFBOX-5902
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5902
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 2.0.31, 3.0.2 PDFBox
>            Reporter: ltzzZ
>            Priority: Major
>         Attachments: image-2024-11-15-17-07-17-802.png, 
> image-2024-11-16-12-23-59-684.png
>
>
> When I try to extract the text content from a pdf file with a size of 85.6MB, 
> at this point the CPU usage is abnormal, the threshold of the alarm is 
> reached, and the extraction speed is also very slow, didn't get results for a 
> few minutes, not a memory problem, also tried to upgrade the version of the 
> library, this problem still exists.
> !image-2024-11-15-17-07-17-802.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-5902) The CPU usage of a PDF file with a size of 85.6 MB is abnormal

Reply via email to