[ 
https://issues.apache.org/jira/browse/PDFBOX-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15276168#comment-15276168
 ] 

Bastian Preindl commented on PDFBOX-785:
----------------------------------------

In Version 1.8.12, this issue still exists. If I split a PDF which contains 
scanned images (one on each page) having 6MB in overall size, I get [number of 
pages] PDF files having 6MB as well. So each single PDF page contains all 
resources of all pages, obviously. When re-merging all pages to one PDF file, 
the PDF file (and this is our main issue) becomes [number of pages] multiplied 
by the original document size. So a 6MB PDF file with 10 pages becomes a 60MB 
PDF file when being split and re-merged.

> Spliting a PDF creates unnecessarily large files
> ------------------------------------------------
>
>                 Key: PDFBOX-785
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-785
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Utilities
>    Affects Versions: 1.8.8, 2.0.0
>         Environment: Windows XP, openOffice3.0.0, pdfsam
>            Reporter: mathieu radiguet
>            Assignee: Andreas Lehmkühler
>             Fix For: 1.8.9, 2.0.0
>
>         Attachments: fileSizeIssue.zip
>
>
> Using PDFBox 0.8.0 (also tried on 1.1.0 and 1.2.1) to split files result in 
> bigger parts than the original.
> Concerned files were made from openOffice .odt documents in version 3.0.0 
> using openOffice pdf Export and then merging several copies with pdfsam 
> (http://www.pdfsam.org/)
> In joined eclipse project the test file size is 10 712 749  bytes for 2812 
> pages and the result file sizes after splitting in two at page 2300 are : 8 
> 812 515  bytes and 10 701 142 bytes.
> Using pdfSplit in command line as result we have all single result files 
> bigger than the original. An example is also attached. An error tells the 
> original file is corrupted, but we tried it on a file (using pdfsam and 
> without using it) with no error and with similar result, so I think it's not 
> related. 
> This issue seems similar to PDFBOX-28.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to