[ 
https://issues.apache.org/jira/browse/PDFBOX-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Liu updated PDFBOX-2690:
------------------------------
    Description: 
I am using PDFBox 1.8.8 to manipulate existing PDF files. After saving a 
document, the output file becomes several times larger than the original. This 
is undesirable.

**How to reproduce my problem:**

In the following code, PDFBox simply loads an existing PDF and then save it. 
Nothing else is done. Yet the file size still becomes several times larger.
{code}
import java.io.*;
import org.apache.pdfbox.pdmodel.*;
import org.apache.pdfbox.exceptions.*;

class Test 
{
    public static void main(String[] args) throws IOException, 
COSVisitorException {

    PDDocument document = PDDocument.load("input1.pdf");
    document.save("output.pdf");
    document.close();       
    }
}   
{code}
Below are links to two sample input files. For input1.pdf, file size increases 
from 6MB to 50MB. For input2.pdf, file size increases from 0.4MB to 1.3MB.

https://dl.dropboxusercontent.com/u/13566649/samplePDF/input1.pdf 
https://dl.dropboxusercontent.com/u/13566649/samplePDF/input2.pdf

**Possible reason**

 Tilman Hausherr suggests that there is an enormous amount of "structure" 
information / object stream that is compressed in the input file, but not in 
the output file.

  was:
I am using PDFBox 1.8.8 to manipulate existing PDF files. After saving a 
document, the output file becomes several times larger than the original. This 
is undesirable.

How to reproduce my problem:
========================
In the following code, PDFBox simply loads an existing PDF and then save it. 
Nothing else is done. Yet the file size still becomes several times larger.
{code}
import java.io.*;
import org.apache.pdfbox.pdmodel.*;
import org.apache.pdfbox.exceptions.*;

class Test 
{
    public static void main(String[] args) throws IOException, 
COSVisitorException {

    PDDocument document = PDDocument.load("input1.pdf");
    document.save("output.pdf");
    document.close();       
    }
}   
{code}
Below are links to two sample input files. For input1.pdf, file size increases 
from 6MB to 50MB. For input2.pdf, file size increases from 0.4MB to 1.3MB.

https://dl.dropboxusercontent.com/u/13566649/samplePDF/input1.pdf 
https://dl.dropboxusercontent.com/u/13566649/samplePDF/input2.pdf

Possible reason
============
 Tilman Hausherr suggests that there is an enormous amount of "structure" 
information / object stream that is compressed in the input file, but not in 
the output file.


> Filesize becomes extremely large after saving
> ---------------------------------------------
>
>                 Key: PDFBOX-2690
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2690
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Writing
>    Affects Versions: 1.8.8, 1.8.9, 2.0.0
>         Environment: PDFBox 1.8.8, Java8u25, Windows 8.1
>            Reporter: Brian Liu
>         Attachments: input2-after-save.pdf, input2.pdf
>
>
> I am using PDFBox 1.8.8 to manipulate existing PDF files. After saving a 
> document, the output file becomes several times larger than the original. 
> This is undesirable.
> **How to reproduce my problem:**
> In the following code, PDFBox simply loads an existing PDF and then save it. 
> Nothing else is done. Yet the file size still becomes several times larger.
> {code}
> import java.io.*;
> import org.apache.pdfbox.pdmodel.*;
> import org.apache.pdfbox.exceptions.*;
> class Test 
> {
>     public static void main(String[] args) throws IOException, 
> COSVisitorException {
>     PDDocument document = PDDocument.load("input1.pdf");
>     document.save("output.pdf");
>     document.close();       
>     }
> }   
> {code}
> Below are links to two sample input files. For input1.pdf, file size 
> increases from 6MB to 50MB. For input2.pdf, file size increases from 0.4MB to 
> 1.3MB.
> https://dl.dropboxusercontent.com/u/13566649/samplePDF/input1.pdf 
> https://dl.dropboxusercontent.com/u/13566649/samplePDF/input2.pdf
> **Possible reason**
>  Tilman Hausherr suggests that there is an enormous amount of "structure" 
> information / object stream that is compressed in the input file, but not in 
> the output file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to