[ https://issues.apache.org/jira/browse/PDFBOX-4877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133289#comment-17133289 ]
Alfred commented on PDFBOX-4877: -------------------------------- Ok. that makes sense. Then the only thing that might still make sense is to try and optimize the method that is actually used in the code. We can keep the other one too, the one that is only used in tests, if you want. But for the one that is used in the code, the "result" matrix is always "new Matrix" so it is never equal to this ,and it is never null. We can probably skip those checks. > Matrix class performance improvements > ------------------------------------- > > Key: PDFBOX-4877 > URL: https://issues.apache.org/jira/browse/PDFBOX-4877 > Project: PDFBox > Issue Type: Improvement > Components: Parsing, Text extraction > Affects Versions: 2.0.20, 3.0.0 PDFBox > Reporter: Alfred > Assignee: Andreas Lehmkühler > Priority: Major > Labels: Optimization > Fix For: 2.0.21, 3.0.0 PDFBox > > Attachments: PDFBOX-4877.patch > > Original Estimate: 1m > Remaining Estimate: 1m > > I am testing text extraction from PDF and profiling the execution. > I found that the third major time consumer is with matrix multiplicaitons. > The Matrix class spends large amounts of time copying results to new > instances. > Also, the if statements are slowing down execution as they kill performance > in modern CPUs. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org