Alistair Oldfield created PDFBOX-5269:
-----------------------------------------
Summary: Consider making LegacyPDFStreamEngine a public class
Key: PDFBOX-5269
URL: https://issues.apache.org/jira/browse/PDFBOX-5269
Project: PDFBox
Issue Type: Improvement
Components: Text extraction
Affects Versions: 2.0.24
Reporter: Alistair Oldfield
Please consider making Please consider making LegacyPDFStreamEngine public.
This will allow extending the class.
At the moment, one needs to copy the entire class sources into their own local
version and making a public version of the copy if one wishes to extend it.
This also in turn makes creating a local copy of PDFTextStripper necessary so
it can inherit from the local copy of LegacyPDFStreamEngine.
One reason someone would want to extend it (my example):For my needs, I have
had to change the implementation of:
public void processPage(PDPage page):
in my case I have had to change the implementation (this is particular to my
needs, but hopefully highlights the usefulness, and why it would potentially be
needed):
{code:java}
try {
super.processPage(page);
}
catch(MissingOperandException e) {
// we need to catch this, because it is acceptable, we will deal with this
particular error by cleaning the PDF.
throw new PdfLoadingException(e.getMessage(), e);
}
catch(Exception e) {
//we ignore all other errors and keep going because we are OK with that for
our purposes.
}{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]