While the input PDF file is less than pristine, it would appear
that the input PDF is NOT root of this performance issue.  We
added some debugging information into the iText source code and
found that over 50% of the processing time is being spent in
in the RandomAccessFileOrArray::reOpen() method. When the objects
are processed, you eventually see PdfReader.getStreamBytesRaw() get
called which has a finally block which forces the input file to be
closed.  When the next object is processed, the library re-reads 
the entire input file again, which in our case is a 26Mb file.

So in the end, we see the input PDF file (26Mb) being re-read over
5000 times !  So while the input PDF could be better, re-reading 
the input file with every object processed is what is resulting in
in the poor performance.  

Is there any way for us to use the library in such a way that the
file will not be re-read with each processing loop ?




Paulo Soares wrote:
> 
> In my machine it takes 1 minute and that's already way too long. The
> reason 
> is that the application that created the PDF is really lousy, it can't
> even 
> get the syntax right in the creation data. You PDF is composed of 2503 
> images each one 2 pixels high. It looks like it was built from a striped 
> TIFF without any effort to adapt the image to the new media.
> 
> Paulo
> 
> ----- Original Message ----- 
> From: "BorisTheCat" <[EMAIL PROTECTED]>
> To: <itext-questions@lists.sourceforge.net>
> Sent: Thursday, August 23, 2007 4:10 PM
> Subject: [iText-questions] One page PDF takes 7-8 minutes to merge ??
> 
> 
>>
>> We used the examples to create a pdf merge program.  We have a one-page 
>> PDF,
>> 26Mb in size that contains an image.  It is taking 7-8 minutes to merge 
>> this
>> PDF.  We have written a tiny java app that demonstrates the problem.  We
>> stripped the test app down so that it is only processing one input file
>> so
>> that it is now really only "merging" one PDF into a new PDF.  So... the 
>> app
>> is only processing a one-page PDF (albeit big pdf) and it is taking 7-8
>> minutes to complete.   Any help would be GREATLY appreciated to determine
>> why it is taking this long and whether there is any way to fix this.
>>
>> The input PDF can be found at:  http://home.fuse.net/mikebrungs/test.pdf
>> and the source code is shown below:
>>
>>
>> import java.io.File;
>> import java.io.FileOutputStream;
>> import com.lowagie.text.Document;
>> import com.lowagie.text.pdf.PdfCopy;
>> import com.lowagie.text.pdf.PdfImportedPage;
>> import com.lowagie.text.pdf.PdfReader;
>> import java.util.Date;
>>
>> public class PdfTest
>> {
>>   public static void main(String[] args)
>>   {
>>      try
>>      {
>>         // input file
>>         File pdfFile = new File(args[0]);
>>
>>         // copied file being created
>>         File outFile = new File(args[0] + ".Copied.pdf");
>>
>>         System.out.println("Input file is: " + args[0]);
>>         System.out.println("Output file is: " + args[0] + ".Copied.pdf");
>>         System.out.println("starting at: " + new Date().toString());
>>
>>         PdfReader reader = new PdfReader(pdfFile.getPath());
>>
>>         reader.consolidateNamedDestinations();
>>         int numPages = reader.getNumberOfPages();
>>
>>         Document document = new
>> Document(reader.getPageSizeWithRotation(1));
>>         PdfCopy writer = new PdfCopy(document, new
>> FileOutputStream(outFile.getPath()));
>>
>>         document.open();
>>
>>         PdfImportedPage page;
>>
>>         for (int j = 1; j <= numPages; j++)
>>         {
>>            page = writer.getImportedPage(reader, j);
>>            writer.addPage(page);
>>         }
>>
>>         reader.close();
>>         writer.close();
>>         writer.freeReader(reader);
>>
>>         System.out.println("finishing at: " + new Date().toString());
>>      }
>>      catch (Exception t)
>>      {
>>         t.printStackTrace();
>>      }
>>   }
>> }
> 
> 
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >>  http://get.splunk.com/
> _______________________________________________
> iText-questions mailing list
> iText-questions@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/itext-questions
> Buy the iText book: http://itext.ugent.be/itext-in-action/
> 
> 

-- 
View this message in context: 
http://www.nabble.com/One-page-PDF-takes-7-8-minutes-to-merge----tf4318146.html#a12349431
Sent from the iText - General mailing list archive at Nabble.com.


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://itext.ugent.be/itext-in-action/

Reply via email to