[ https://issues.apache.org/jira/browse/PDFBOX-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17899621#comment-17899621 ]
Axel Howind commented on PDFBOX-5902: ------------------------------------- I myself am not one of the official contributors to this project, but I use it a lot, so I have an interest of it working well on any input. That's why I follow the mailing list and if I see something interesting, I sometimes look into it and provide patches. The maintainers here really put a lot of effort into the project to investigate and fix issues, so if you find something in PDFBox that doesn't work, chances are good that it will get fixed. But the project needs your input because it is impossible to think about every use case. It might be a good idea to include a CJK file in the benchmark to see if some regression is introduced for CJK files. But that's something the maintainers have to decide. I know that they regularly run extensive tests on a large number of inputs, but I do not know what input files are used. > The CPU usage of a PDF file with a size of 85.6 MB is abnormal > -------------------------------------------------------------- > > Key: PDFBOX-5902 > URL: https://issues.apache.org/jira/browse/PDFBOX-5902 > Project: PDFBox > Issue Type: Bug > Affects Versions: 2.0.31, 3.0.2 PDFBox > Reporter: ltzzZ > Priority: Major > Attachments: image-2024-11-15-17-07-17-802.png, > image-2024-11-16-12-23-59-684.png, image-2024-11-16-12-38-54-861.png, > image-2024-11-19-08-50-37-171.png, image-2024-11-19-08-55-59-315.png, > image-2024-11-19-08-56-23-894.png, image-2024-11-19-08-56-49-755.png > > > When I try to extract the text content from a pdf file with a size of 85.6MB, > at this point the CPU usage is abnormal, the threshold of the alarm is > reached, and the extraction speed is also very slow, didn't get results for a > few minutes, not a memory problem, also tried to upgrade the version of the > library, this problem still exists. > !image-2024-11-15-17-07-17-802.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org