[jira] [Reopened] (PDFBOX-955) Can't extract b/w images from PDF

Tilman Hausherr (JIRA) Fri, 09 Aug 2013 10:55:28 -0700

     [ 
https://issues.apache.org/jira/browse/PDFBOX-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tilman Hausherr reopened PDFBOX-955:
------------------------------------

    Reproduced In: 2.0.0

PDF Files with G4 images are blank again. It can be reproduced with the file 
d0000040.pdf of PDFBOX-955. The reason seems to be that the pixels of the 
embedded TIF files are reversed, and then drawn on a white image. So we get 
white on white, i.e. nothing. I "prove" my point with this change in 
pdfbox\pdfviewer\PageDrawer.java (this is not a fix, but it will hopefully give 
a hint):

    public void drawImage(Image awtImage, AffineTransform at)
    {
        graphics.setComposite(getGraphicsState().getStrokeJavaComposite());
        graphics.setClip(getGraphicsState().getCurrentClippingPath());
        
        //these two lines from me
        graphics.setColor(Color.BLACK);
        graphics.fillRect(0, 0, 5000, 5000);
        
        graphics.drawImage(awtImage, at, null);
    }

Now the rendered file is no longer white only, it is white on black. I suspect 
that the problem is somehow related to transparant backgrounds / pixels.
                
> Can't extract b/w images from PDF
> ---------------------------------
>
>                 Key: PDFBOX-955
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-955
>             Project: PDFBox
>          Issue Type: Improvement
>    Affects Versions: 1.4.0
>         Environment: Windows XP prof, Java 1.6.0_22, Netbeans 6.9.1
>            Reporter: Tilman Hausherr
>            Assignee: Andreas Lehmkühler
>            Priority: Minor
>              Labels: extract
>             Fix For: 1.6.0
>
>         Attachments: d0000040-01.png, d0000040.pdf, ExtractImages.java, 
> PDFBOX955-d00000401.png, PDFBOX955-photo1.png, photo.jpg, photo.pdf
>
>
> I wrote a test application using org.apache.pdfbox.ExtractImages to... 
> extract images as PNG. (This is the start of something bigger, which involves 
> making a statistic about the content of over a million pages within PDF 
> files) However all images I get are all black or all white when I test on our 
> own PDF files. I did get correct images from a file that had color images. To 
> extract, I tried page.convertToImage() and then writing with ImageIO.write(), 
> but I also tried using PDFImageWriter, neither had success for b/w images.
> The sample PDF is not confidential; it does give a warning "getRGBImage 
> returned NULL" but other PDFs that don't give the warning (but are 
> confidential) also fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (PDFBOX-955) Can't extract b/w images from PDF

Reply via email to