[ 
https://issues.apache.org/jira/browse/PDFBOX-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17899621#comment-17899621
 ] 

Axel Howind commented on PDFBOX-5902:
-------------------------------------

I myself am not one of the official contributors to this project, but I use it 
a lot, so I have an interest of it working well on any input. That's why I 
follow the mailing list and if I see something interesting, I sometimes look 
into it and provide patches.

The maintainers here really put a lot of effort into the project to investigate 
and fix issues, so if you find something in PDFBox that doesn't work, chances 
are good that it will get fixed. But the project needs your input because it is 
impossible to think about every use case.

It might be a good idea to include a CJK file in the benchmark to see if some 
regression is introduced for CJK files. But that's something the maintainers 
have to decide. I know that they regularly run extensive tests on a large 
number of inputs, but I do not know what input files are used.

> The CPU usage of a PDF file with a size of 85.6 MB is abnormal
> --------------------------------------------------------------
>
>                 Key: PDFBOX-5902
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5902
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 2.0.31, 3.0.2 PDFBox
>            Reporter: ltzzZ
>            Priority: Major
>         Attachments: image-2024-11-15-17-07-17-802.png, 
> image-2024-11-16-12-23-59-684.png, image-2024-11-16-12-38-54-861.png, 
> image-2024-11-19-08-50-37-171.png, image-2024-11-19-08-55-59-315.png, 
> image-2024-11-19-08-56-23-894.png, image-2024-11-19-08-56-49-755.png
>
>
> When I try to extract the text content from a pdf file with a size of 85.6MB, 
> at this point the CPU usage is abnormal, the threshold of the alarm is 
> reached, and the extraction speed is also very slow, didn't get results for a 
> few minutes, not a memory problem, also tried to upgrade the version of the 
> library, this problem still exists.
> !image-2024-11-15-17-07-17-802.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to