[ https://issues.apache.org/jira/browse/PDFBOX-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tilman Hausherr closed PDFBOX-5097. ----------------------------------- Resolution: Not A Bug > Rendered pdf image lacks all the text in this particular case > ------------------------------------------------------------- > > Key: PDFBOX-5097 > URL: https://issues.apache.org/jira/browse/PDFBOX-5097 > Project: PDFBox > Issue Type: Bug > Components: Rendering > Affects Versions: 2.0.22 > Environment: Linux DamianPad 5.4.0-65-generic #73-Ubuntu SMP Mon Jan > 18 17:25:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux > Reporter: Robert-Andrei Damian > Priority: Major > Labels: jbig2 > Attachments: 0.png, 1.png, document(3).pdf > > > Hello, > I am working with pdfbox to transform input pdf files to images, which are > later fed to an OCR library. It works perfectly in most of the cases but I > stumbled upon this particular case in which all text disappeared from the > rendered image. > My source code for the method which converts the pdf into images: > > {code:java} > public List<BufferedImage> splitPdf(File pdfFile) throws IOException { > List<BufferedImage> result = new ArrayList<>(); > PDDocument document = PDDocument.load(pdfFile); > PDFRenderer pdfRenderer = new PDFRenderer(document); > for (int pageIndex = 0; pageIndex < document.getNumberOfPages(); > pageIndex++) { > result.add(pdfRenderer.renderImage(pageIndex)); > debugPageImageInfo(result.get(result.size() - 1)); > } > document.close(); > return result; > } > {code} > > I attached to this issue the pdf file for which I identified the problem and > the resulting images. > > I hope this is helpful for anyone else encountering the same problem! > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org