[jira] [Commented] (PDFBOX-2501) Page render without barcode

Tilman Hausherr (JIRA) Mon, 17 Nov 2014 07:59:00 -0800

    [ 
https://issues.apache.org/jira/browse/PDFBOX-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14214757#comment-14214757
 ]


Tilman Hausherr commented on PDFBOX-2501:
-----------------------------------------

Your file is malformed:
- p215 of the 1.7 spec: "Unless the image uses ASCIIHexDecode or ASCII85Decode 
as one of its filters, the ID operator shall be followed by a single 
white-space character, and the next character shall be interpreted as the first 
byte of image data."
- p214 of the 1.7 spec: "Because the inline format gives the reader less 
flexibility in managing the image data, it shall be used only for small images 
(4 KB or less)." One of your inline images (which is blank, i.e. you're wasting 
space) has a size of 110KB. The 4 others (the barcodes) have sizes of 5KB and 
13KB. Additionally, I don't think it is a good idea to encode barcodes in a 
JPEG. Barcodes should be b/w, and your barcodes have 34 different colors. 
You're wasting space, you are using a color format instead of a 1 bit b/w 
format.

The second point is mostly harmless, but the first one is not, the PDF stream 
has this: ...49 44 0d 0a ff d8 ff e0... 49 44 = "ID". Then comes "0d 0a" which 
is CR LF. However the LF is already the first byte of data :-(

(You can't see this yourself when editing your file, the PDF stream is 
compressed, I have attached an uncompressed version)

Now I could of course add a check that the second byte is a LF and then skip it 
(I tried it and now your file can be rendered). However what if this is really 
data? This could be the case with a different filter than the one you have, 
which is DCT (= JPEG).

So these are the options:
1) you contact those who created this file (Fast Reports, 
http://www.fast-report.com/en/support/), show them this issue, and ask them to 
fix it immediately. They can write here or contact me directly if they have 
more questions.
2) you build PDFBox from source and I tell you the change to do

There are two possibilities that come to my mind for (2). One is to add a check 
in PDFStreamParser.java near "case 'I'", the second one is to change JPXFilter 
to use a PushbackInputStream to peek the first byte and unread it if it is not 
0x0A. The first possibility brings the risk I mentioned above. The second 
possibility, which I'd prefer, means a slightly slower performance and a 
slightly higher memory usage which is why I'm reluctant to do it for 1.8.

(In the trunk the change is easy and I'll do it, because DCTFilter already uses 
ImageInputStream which allows a seek. However you're using 1.8)


> Page render without barcode
> ---------------------------
>
>                 Key: PDFBOX-2501
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2501
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 1.8.7
>         Environment: Ubuntu Linux, Java 8
>            Reporter: Daniel Egea
>            Assignee: Tilman Hausherr
>         Attachments: image.png, iris2_4943659641078733308.pdf, 
> iris2_4943659641078733308_unc.pdf
>
>
> I have tryed this code:
> {code}
>         try {
>             PDDocument doc = PDDocument.load(f);
>             PDPage page = (PDPage) 
> doc.getDocumentCatalog().getAllPages().get(0);
>             BufferedImage image;
>             image = page.convertToImage();
>             File outputfile = new File("/home/daniel/image.png");
>             ImageIO.write(image, "png", outputfile);
>         } catch (IOException ex) {
>             Logger.getLogger(Impresora.class.getName()).log(Level.SEVERE, 
> null, ex);
>         }
> {code}
> Using the PDF attached
> And get the PNG attached
> In the rendering process I get the following error... in the 
> 'convertToImage()' call
> {code}
> 2014-11-14 13:56:12,592 WARN [org.apache.pdfbox.util.PDFStreamEngine] - 
> <java.lang.ArrayIndexOutOfBoundsException>
> java.lang.ArrayIndexOutOfBoundsException
>     at java.lang.System.arraycopy(Native Method)
>     at 
> org.apache.pdfbox.pdmodel.graphics.xobject.PDInlinedImage.createImage(PDInlinedImage.java:218)
>     at 
> org.apache.pdfbox.util.operator.pagedrawer.BeginInlineImage.process(BeginInlineImage.java:69)
>     at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:557)
>     at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
>     at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
>     at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
>     at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:139)
>     at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:801)
>     at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:732)
> {code}
> One for each of the 4 barcodes in PDF file
> As you can see, te page is rendered OK but without any barcode
> How could I render completily OK?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PDFBOX-2501) Page render without barcode

Reply via email to