[
https://issues.apache.org/jira/browse/PDFBOX-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14214757#comment-14214757
]
Tilman Hausherr commented on PDFBOX-2501:
-----------------------------------------
Your file is malformed:
- p215 of the 1.7 spec: "Unless the image uses ASCIIHexDecode or ASCII85Decode
as one of its filters, the ID operator shall be followed by a single
white-space character, and the next character shall be interpreted as the first
byte of image data."
- p214 of the 1.7 spec: "Because the inline format gives the reader less
flexibility in managing the image data, it shall be used only for small images
(4 KB or less)." One of your inline images (which is blank, i.e. you're wasting
space) has a size of 110KB. The 4 others (the barcodes) have sizes of 5KB and
13KB. Additionally, I don't think it is a good idea to encode barcodes in a
JPEG. Barcodes should be b/w, and your barcodes have 34 different colors.
You're wasting space, you are using a color format instead of a 1 bit b/w
format.
The second point is mostly harmless, but the first one is not, the PDF stream
has this: ...49 44 0d 0a ff d8 ff e0... 49 44 = "ID". Then comes "0d 0a" which
is CR LF. However the LF is already the first byte of data :-(
(You can't see this yourself when editing your file, the PDF stream is
compressed, I have attached an uncompressed version)
Now I could of course add a check that the second byte is a LF and then skip it
(I tried it and now your file can be rendered). However what if this is really
data? This could be the case with a different filter than the one you have,
which is DCT (= JPEG).
So these are the options:
1) you contact those who created this file (Fast Reports,
http://www.fast-report.com/en/support/), show them this issue, and ask them to
fix it immediately. They can write here or contact me directly if they have
more questions.
2) you build PDFBox from source and I tell you the change to do
There are two possibilities that come to my mind for (2). One is to add a check
in PDFStreamParser.java near "case 'I'", the second one is to change JPXFilter
to use a PushbackInputStream to peek the first byte and unread it if it is not
0x0A. The first possibility brings the risk I mentioned above. The second
possibility, which I'd prefer, means a slightly slower performance and a
slightly higher memory usage which is why I'm reluctant to do it for 1.8.
(In the trunk the change is easy and I'll do it, because DCTFilter already uses
ImageInputStream which allows a seek. However you're using 1.8)
> Page render without barcode
> ---------------------------
>
> Key: PDFBOX-2501
> URL: https://issues.apache.org/jira/browse/PDFBOX-2501
> Project: PDFBox
> Issue Type: Bug
> Components: PDModel
> Affects Versions: 1.8.7
> Environment: Ubuntu Linux, Java 8
> Reporter: Daniel Egea
> Assignee: Tilman Hausherr
> Attachments: image.png, iris2_4943659641078733308.pdf,
> iris2_4943659641078733308_unc.pdf
>
>
> I have tryed this code:
> {code}
> try {
> PDDocument doc = PDDocument.load(f);
> PDPage page = (PDPage)
> doc.getDocumentCatalog().getAllPages().get(0);
> BufferedImage image;
> image = page.convertToImage();
> File outputfile = new File("/home/daniel/image.png");
> ImageIO.write(image, "png", outputfile);
> } catch (IOException ex) {
> Logger.getLogger(Impresora.class.getName()).log(Level.SEVERE,
> null, ex);
> }
> {code}
> Using the PDF attached
> And get the PNG attached
> In the rendering process I get the following error... in the
> 'convertToImage()' call
> {code}
> 2014-11-14 13:56:12,592 WARN [org.apache.pdfbox.util.PDFStreamEngine] -
> <java.lang.ArrayIndexOutOfBoundsException>
> java.lang.ArrayIndexOutOfBoundsException
> at java.lang.System.arraycopy(Native Method)
> at
> org.apache.pdfbox.pdmodel.graphics.xobject.PDInlinedImage.createImage(PDInlinedImage.java:218)
> at
> org.apache.pdfbox.util.operator.pagedrawer.BeginInlineImage.process(BeginInlineImage.java:69)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:557)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
> at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:139)
> at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:801)
> at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:732)
> {code}
> One for each of the 4 barcodes in PDF file
> As you can see, te page is rendered OK but without any barcode
> How could I render completily OK?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)