[
https://issues.apache.org/jira/browse/PDFBOX-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14021313#comment-14021313
]
Tilman Hausherr commented on PDFBOX-362:
----------------------------------------
I get this when reading the file:
org.apache.pdfbox.filter.FlateFilter:81 - FlateFilter: stop reading corrupt
stream due to a DataFormatException
IOException for file real-empty-page.pdf
java.io.IOException: java.util.zip.DataFormatException: too many length or
distance symbols
at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:84)
at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:380)
at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:278)
at
org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:189)
at
org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:109)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:243)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:223)
at
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:205)
at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:164)
at
org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:227)
at
org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:160)
at
org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:109)
at pdfboxpageimageextraction.ExtractImages.doPdf(ExtractImages.java:422)
at pdfboxpageimageextraction.ExtractImages.main(ExtractImages.java:212)
Caused by: java.util.zip.DataFormatException: too many length or distance
symbols
at java.util.zip.Inflater.inflateBytes(Native Method)
at java.util.zip.Inflater.inflate(Inflater.java:259)
at java.util.zip.Inflater.inflate(Inflater.java:280)
at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:102)
at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:75)
... 13 more
So from what I see, the "empty" PDF page has a corrupt stream. The one that is
246 bytes long. PDFDebugger also fails.
> ZipException occuring upon importing a page
> -------------------------------------------
>
> Key: PDFBOX-362
> URL: https://issues.apache.org/jira/browse/PDFBOX-362
> Project: PDFBox
> Issue Type: Bug
> Components: PDModel
> Reporter: Jukka Zitting
> Attachments: real-empty-page.pdf
>
>
> [Issue from SourceForge]
> http://sourceforge.net/tracker/index.php?func=detail&aid=2019925&group_id=78314&atid=552832
> copying page from source document to other doc, if the source page has no
> content, a ZipException occurs.
> the following sample code exhibits the problem with such source pdf:
> {code}
> public class LoadSaveSample
> {
> /**
> * @param args holds the name of a file to copy
> * @throws IOException
> * @throws COSVisitorException
> */
> public static void main(String[] args) throws IOException,
> COSVisitorException
> {
> String name = args[0];
> File file = new File(name);
> System.out.println("loading file " + file.getPath());
> PDDocument doc = PDDocument.load(file);
> ClassLoader loader = doc.getClass().getClassLoader();
> System.out.println("loader: " + loader);
> try
> {
> PDDocument doc2 = new PDDocument();
> List all = doc.getDocumentCatalog().getAllPages();
> Iterator it = all.iterator();
> while (true == it.hasNext())
> {
> PDPage page = (PDPage) it.next();
> // now do the copy through import...
> PDPage imported = doc2.importPage(page);
> imported.setCropBox(page.findCropBox());
> imported.setMediaBox(page.findMediaBox());
> imported.setResources(page.findResources());
> imported.setRotation(page.findRotation());
> }
> String outName = file.getPath() + ".saved.pdf";
> doc2.save(outName);
> System.out.println("saved as " + outName);
> }
> finally
> {
> doc.close();
> }
> }
> }
> {code}
> (Edited on 8.6.14 by [~tilman] for clarity)
--
This message was sent by Atlassian JIRA
(v6.2#6252)