[jira] [Closed] (PDFBOX-5097) Rendered pdf image lacks all the text in this particular case

Tilman Hausherr (Jira) Thu, 04 Feb 2021 10:26:06 -0800


     [ 
https://issues.apache.org/jira/browse/PDFBOX-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tilman Hausherr closed PDFBOX-5097.
-----------------------------------
    Resolution: Not A Bug

> Rendered pdf image lacks all the text in this particular case
> -------------------------------------------------------------
>
>                 Key: PDFBOX-5097
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5097
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Rendering
>    Affects Versions: 2.0.22
>         Environment: Linux DamianPad 5.4.0-65-generic #73-Ubuntu SMP Mon Jan 
> 18 17:25:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
>            Reporter: Robert-Andrei Damian
>            Priority: Major
>              Labels: jbig2
>         Attachments: 0.png, 1.png, document(3).pdf
>
>
> Hello,
> I am working with pdfbox to transform input pdf files to images, which are 
> later fed to an OCR library. It works perfectly in most of the cases but I 
> stumbled upon this particular case in which all text disappeared from the 
> rendered image.
> My source code for the method which converts the pdf into images:
>  
> {code:java}
> public List<BufferedImage> splitPdf(File pdfFile) throws IOException {
>     List<BufferedImage> result = new ArrayList<>();
>     PDDocument document = PDDocument.load(pdfFile);
>     PDFRenderer pdfRenderer = new PDFRenderer(document);
>     for (int pageIndex = 0; pageIndex < document.getNumberOfPages(); 
> pageIndex++) {
>         result.add(pdfRenderer.renderImage(pageIndex));
>         debugPageImageInfo(result.get(result.size() - 1));
>     }
>     document.close();
>     return result;
> }
> {code}
>  
> I attached to this issue the pdf file for which I identified the problem and 
> the resulting images.
>  
> I hope this is helpful for anyone else encountering the same problem!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Closed] (PDFBOX-5097) Rendered pdf image lacks all the text in this particular case

Reply via email to