[ 
https://issues.apache.org/jira/browse/PDFBOX-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14021313#comment-14021313
 ] 

Tilman Hausherr commented on PDFBOX-362:
----------------------------------------

I get this when reading the file:

 org.apache.pdfbox.filter.FlateFilter:81 - FlateFilter: stop reading corrupt 
stream due to a DataFormatException

IOException for file real-empty-page.pdf
java.io.IOException: java.util.zip.DataFormatException: too many length or 
distance symbols
        at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:84)
        at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:380)
        at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:278)
        at 
org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:189)
        at 
org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:109)
        at 
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:243)
        at 
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:223)
        at 
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:205)
        at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:164)
        at 
org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:227)
        at 
org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:160)
        at 
org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:109)
        at pdfboxpageimageextraction.ExtractImages.doPdf(ExtractImages.java:422)
        at pdfboxpageimageextraction.ExtractImages.main(ExtractImages.java:212)
Caused by: java.util.zip.DataFormatException: too many length or distance 
symbols
        at java.util.zip.Inflater.inflateBytes(Native Method)
        at java.util.zip.Inflater.inflate(Inflater.java:259)
        at java.util.zip.Inflater.inflate(Inflater.java:280)
        at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:102)
        at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:75)
        ... 13 more

So from what I see, the "empty" PDF page has a corrupt stream. The one that is 
246 bytes long. PDFDebugger also fails.

> ZipException occuring upon importing a page
> -------------------------------------------
>
>                 Key: PDFBOX-362
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-362
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>            Reporter: Jukka Zitting
>         Attachments: real-empty-page.pdf
>
>
> [Issue from SourceForge]
> http://sourceforge.net/tracker/index.php?func=detail&aid=2019925&group_id=78314&atid=552832
> copying page from source document to other doc, if the source page has no 
> content, a ZipException occurs.
> the following sample code exhibits the problem with such source pdf:
> {code}
> public class LoadSaveSample
> {
>     /**
>      * @param args holds the name of a file to copy
>      * @throws IOException
>      * @throws COSVisitorException
>      */
>     public static void main(String[] args) throws IOException, 
> COSVisitorException
>     {
>         String name = args[0];
>         File file = new File(name);
>         System.out.println("loading file " + file.getPath());
>         PDDocument doc = PDDocument.load(file);
>         ClassLoader loader = doc.getClass().getClassLoader();
>         System.out.println("loader: " + loader);
>         try
>         {
>             PDDocument doc2 = new PDDocument();
>             List all = doc.getDocumentCatalog().getAllPages();
>             Iterator it = all.iterator();
>             while (true == it.hasNext())
>             {
>                 PDPage page = (PDPage) it.next();
>                 // now do the copy through import...
>                 PDPage imported = doc2.importPage(page);
>                 imported.setCropBox(page.findCropBox());
>                 imported.setMediaBox(page.findMediaBox());
>                 imported.setResources(page.findResources());
>                 imported.setRotation(page.findRotation());
>             }
>             String outName = file.getPath() + ".saved.pdf";
>             doc2.save(outName);
>             System.out.println("saved as " + outName);
>         }
>         finally
>         {
>             doc.close();
>         }
>     }
> }
> {code}
> (Edited on 8.6.14 by [~tilman] for clarity)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to