[ 
https://issues.apache.org/jira/browse/PDFBOX-4896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alfred updated PDFBOX-4896:
---------------------------
    Description: 
One of the major performance bottlenecks in text extraction was the

clone + push and the pop + clone operations on the graphic state before and 
after the call to showGlyph.

Not only it was slow to clone, it also consumes large amounts of memory making 
the garbage collector work harder.

When extracting text, showGlyph does not modify the graphic state so there's no 
need to save / restore the state.

The same could be true in general, not just for text extraction, but I do not 
understand the code well enough to decide.

I have only modified the behavior for the LegacyPDFStreamEngine, to be safe.

The showGlyph operation sounds like a read only operation, that should not 
modify anything.

 

I have the code ready and I will submit a patch and a review.

  was:
One of the major performance bottlenecks in text extraction was the

clone + push and the pop + clone operations on the graphic state before and 
after the call to showGlyph.

Not only it was slow to clone, it also consumes large amounts of memory making 
the garbage collector work harder.

When extracting text, showGlyph does not modify the graphic state so there's no 
need to save / restore the state.

The same could be true in general, not just for text extraction, but I do not 
understand the code well enough to decide. I have only modified the behavior 
for the LegacyPDFStreamEngine, to be safe.

The showGlyph operation sounds like a read only operation, that should not 
modify anything.

 

I have the code ready and I will submit a patch and a review.


> Don't save and restore graphic states around showGlyph in 
> LegacyPDFStreamEngine
> -------------------------------------------------------------------------------
>
>                 Key: PDFBOX-4896
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4896
>             Project: PDFBox
>          Issue Type: Improvement
>    Affects Versions: 2.0.20, 3.0.0 PDFBox
>            Reporter: Alfred
>            Priority: Minor
>         Attachments: PDFBOX-4896.patch
>
>
> One of the major performance bottlenecks in text extraction was the
> clone + push and the pop + clone operations on the graphic state before and 
> after the call to showGlyph.
> Not only it was slow to clone, it also consumes large amounts of memory 
> making the garbage collector work harder.
> When extracting text, showGlyph does not modify the graphic state so there's 
> no need to save / restore the state.
> The same could be true in general, not just for text extraction, but I do not 
> understand the code well enough to decide.
> I have only modified the behavior for the LegacyPDFStreamEngine, to be safe.
> The showGlyph operation sounds like a read only operation, that should not 
> modify anything.
>  
> I have the code ready and I will submit a patch and a review.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to