[ 
https://issues.apache.org/jira/browse/PDFBOX-4877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alfred updated PDFBOX-4877:
---------------------------
    Description: 
I am testing text extraction from PDF and profiling the execution.

 

I found that the third major time consumer is with matrix multiplicaitons.

The Matrix class spends large amounts of time copying results to new instances. 

  was:
I am testing text extraction from PDF and profiling the execution.

I found that the second biggest time consumer is the static code in 
Standard14Fonts that loads fonts from the pdf box jar.

Looking at the code I realized we don't have to load all fonts statically, when 
the class loads.

Not all PDFs need all fonts, so, if we lazy loaded them, only when needed, it 
will save some time and some memory.

The memory part in particular would be important when running on a tablet or a 
phone, where the entire memory space of the app is 80M - 160M.


> Matrix class performance improvements
> -------------------------------------
>
>                 Key: PDFBOX-4877
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4877
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Parsing, Text extraction
>    Affects Versions: 2.0.20, 3.0.0 PDFBox
>            Reporter: Alfred
>            Priority: Major
>              Labels: Optimization
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> I am testing text extraction from PDF and profiling the execution.
>  
> I found that the third major time consumer is with matrix multiplicaitons.
> The Matrix class spends large amounts of time copying results to new 
> instances. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to