[jira] [Commented] (PDFBOX-2195) Missing text when converting PDF to image

A.D. Kent (JIRA) Tue, 08 Jul 2014 12:48:26 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14055388#comment-14055388
 ]


A.D. Kent commented on PDFBOX-2195:
-----------------------------------

It would appear that the Tahoma,Italics (while an issue) was a bit of a red 
herring.  I believe the missing text on p10 was due to the following code that 
was being used to scale:

{code}
PDPage page = (PDPage)pages.get(i ); 
              PDRectangle cropBox = page.findCropBox();
              PDRectangle newCropBox = new PDRectangle();
              newCropBox.setLowerLeftX(cropBox.getLowerLeftX() - 
(cropBox.getWidth()-cropBox.getWidth()*scale)/2);
              newCropBox.setLowerLeftY(cropBox.getLowerLeftY() - 
(cropBox.getHeight()-cropBox.getHeight()*scale)/2);
              newCropBox.setUpperRightX(cropBox.getUpperRightX() - 
(cropBox.getWidth()-cropBox.getWidth()*scale)/2);
              newCropBox.setUpperRightY(cropBox.getUpperRightY() - 
(cropBox.getHeight()-cropBox.getHeight()*scale)/2);
              page.setCropBox(newCropBox);
              
              PDFStreamParser parser = new PDFStreamParser(page.getContents()); 
             
              parser.parse();
              List<Object> tokens = parser.getTokens();              
              tokens.add(0,new COSFloat(scale));              
              tokens.add(1,COSInteger.ZERO);
              tokens.add(2,COSInteger.ZERO);
              tokens.add(3,new COSFloat(scale));             
              tokens.add(4,COSInteger.ZERO);
              tokens.add(5,COSInteger.ZERO);
              tokens.add(6,PDFOperator.getOperator("cm"));
              PDStream newContents = new PDStream( document );
              ContentStreamWriter writer = new ContentStreamWriter( 
newContents.createOutputStream() );
              writer.writeTokens( tokens );              
              newContents.addCompression();
              page.setContents(newContents);
{code}

> Missing text when converting PDF to image
> -----------------------------------------
>
>                 Key: PDFBOX-2195
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2195
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Rendering
>    Affects Versions: 2.0.0
>         Environment: Win8.1 (JRE 1.7)
>            Reporter: A.D. Kent
>         Attachments: Claim AA011332 Diagram and Estimates.pdf, Claim AA011332 
> Diagram and Estimates.tif, Claim AA011332 Diagram and Estimates_p10.pdf, 
> Claim AA011332 Diagram and Estimates_p10_jai.tif
>
>
> Attempting to convert a PDF to image using latest 2.0.0 from SVN.  PDF 
> utilizes Tahoma, Tahoma,Bold, and Tahoma,Italic (non-embedded).  Upon calling 
> PDFRenderer.renderImageWithDPI, I get the following output:
> Jul 08, 2014 9:50:01 AM org.apache.fontbox.util.SystemFontManager 
> findTTFontname
> WARNING: Font not found: Tahoma,Italic
> Resultant image is missing text where Tahoma,Italic is used.  Have also 
> reverted to 1.8.6 and used PDPage.convertToImage with same results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PDFBOX-2195) Missing text when converting PDF to image

Reply via email to