[jira] [Comment Edited] (PDFBOX-955) Can't extract b/w images from PDF

Tilman Hausherr (JIRA) Fri, 09 Aug 2013 10:55:28 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735061#comment-13735061
 ]


Tilman Hausherr edited comment on PDFBOX-955 at 8/9/13 5:53 PM:
----------------------------------------------------------------

PDF Files with G4 images are blank again. It can be reproduced with the file 
d0000040.pdf. The reason seems to be that the pixels of the embedded TIF files 
are reversed, and then drawn on a white image. So we get white on white, i.e. 
nothing. I "prove" my point with this change in 
pdfbox\pdfviewer\PageDrawer.java (this is not a fix, but it will hopefully give 
a hint):

    public void drawImage(Image awtImage, AffineTransform at)
    {
        graphics.setComposite(getGraphicsState().getStrokeJavaComposite());
        graphics.setClip(getGraphicsState().getCurrentClippingPath());
        
        //these two lines from me
        graphics.setColor(Color.BLACK);
        graphics.fillRect(0, 0, 5000, 5000);
        
        graphics.drawImage(awtImage, at, null);
    }

Now the rendered file is no longer white only, it is white on black. I suspect 
that the problem is somehow related to transparant backgrounds / pixels.
                
      was (Author: tilman):
    PDF Files with G4 images are blank again. It can be reproduced with the 
file d0000040.pdf of PDFBOX-955. The reason seems to be that the pixels of the 
embedded TIF files are reversed, and then drawn on a white image. So we get 
white on white, i.e. nothing. I "prove" my point with this change in 
pdfbox\pdfviewer\PageDrawer.java (this is not a fix, but it will hopefully give 
a hint):

    public void drawImage(Image awtImage, AffineTransform at)
    {
        graphics.setComposite(getGraphicsState().getStrokeJavaComposite());
        graphics.setClip(getGraphicsState().getCurrentClippingPath());
        
        //these two lines from me
        graphics.setColor(Color.BLACK);
        graphics.fillRect(0, 0, 5000, 5000);
        
        graphics.drawImage(awtImage, at, null);
    }

Now the rendered file is no longer white only, it is white on black. I suspect 
that the problem is somehow related to transparant backgrounds / pixels.
                  
> Can't extract b/w images from PDF
> ---------------------------------
>
>                 Key: PDFBOX-955
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-955
>             Project: PDFBox
>          Issue Type: Improvement
>    Affects Versions: 1.4.0
>         Environment: Windows XP prof, Java 1.6.0_22, Netbeans 6.9.1
>            Reporter: Tilman Hausherr
>            Assignee: Andreas Lehmkühler
>            Priority: Minor
>              Labels: extract
>             Fix For: 1.6.0
>
>         Attachments: d0000040-01.png, d0000040.pdf, ExtractImages.java, 
> PDFBOX955-d00000401.png, PDFBOX955-photo1.png, photo.jpg, photo.pdf
>
>
> I wrote a test application using org.apache.pdfbox.ExtractImages to... 
> extract images as PNG. (This is the start of something bigger, which involves 
> making a statistic about the content of over a million pages within PDF 
> files) However all images I get are all black or all white when I test on our 
> own PDF files. I did get correct images from a file that had color images. To 
> extract, I tried page.convertToImage() and then writing with ImageIO.write(), 
> but I also tried using PDFImageWriter, neither had success for b/w images.
> The sample PDF is not confidential; it does give a warning "getRGBImage 
> returned NULL" but other PDFs that don't give the warning (but are 
> confidential) also fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (PDFBOX-955) Can't extract b/w images from PDF

Reply via email to