[ 
https://issues.apache.org/jira/browse/PDFBOX-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851622#comment-13851622
 ] 

Tilman Hausherr commented on PDFBOX-1810:
-----------------------------------------

@Frederik: Could you attach a PDF that has only black & white, i.e. 2 colors? 
Yours has 4 gray colors.

Don't know whether it is a problem with pdfbox or with the pdf itself, but 
here's what I found out:
{code}
<<
/Type /XObject
/Subtype /Image
/Width 4960
/Height 6944
/ColorSpace /DeviceGray
/BitsPerComponent 2
/Length 282806
/Filter /FlateDecode
>>
{code}
Bytes needed IMO (width * height * bitspercomponent / 8) :
4960 * 6944 * 2 / 8 = 8610560
that happens to be exactly the size of the decompressed FlateDecoded first 
image of the pdf.

But in PDPixelmap, after the colorspace is chosen, this is copied into an array 
of size 34442240. That number is 4960 * 6944, i.e. now its one gray color per 
byte.

A bit of logging shows:
{code}
PDPixelMap:307 - ColorModel: ColorModel: #pixelBits = 8 numComponents = 1 color 
space = java.awt.color.ICC_ColorSpace@19c2ffca transparency = 1 has alpha = 
false isAlphaPre = false
PDPixelMap:308 - getPixelSize: 8
PDPixelMap:311 - getDataType: 0
{code}
The "#pixelBits = 8" is not what I would expect.

What we'd need is a colorspace that has four pixels in a byte. In PDDeviceGray, 
this code results in getting a correctly rendered image:
{code}
    public ColorModel createColorModel(int bpc) throws IOException
    {
        return ImageTypeSpecifier.createGrayscale(bpc, DataBuffer.TYPE_BYTE, 
false).getColorModel();
    }
{code}
However, a trace is also weird:
{code}
PDPixelMap:307 - ColorModel: IndexColorModel: #pixelBits = 2 numComponents = 3 
color space = java.awt.color.ICC_ColorSpace@75247397 transparency = 1 
transIndex   = -1 has alpha = false isAlphaPre = false
PDPixelMap:308 - getPixelSize: 2
PDPixelMap:311 - getDataType: 0
{code}
numComponents = 3 makes no sense.

> PDFToImage: Image of pdf is resized and drawn multiple times at top of output 
> image
> -----------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1810
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1810
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 1.8.2, 1.8.3, 2.0.0
>         Environment: Debian testing, OpenJDK7
>            Reporter: Frederik Bertling
>         Attachments: K3.pdf, K31.jpg, K32.jpg
>
>
> Hi,
> all the pdfs created with simple scan (https://launchpad.net/simple-scan) are 
> not correctly converted into images.
> A single page is resized and drawn multiple times at the top of the output 
> image. 
> Using the pdfbox app on windows with the newest orcale java 7 causes a java 
> heap error.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to