Paulo,
Thanks for the quick response and clarification. I will change it back to PdfCopy to ensure that the input data is cleaned up as quickly as possible. I believe I have only the one reference (local var) to PdfReader so it could just be slowness by the VM in responding with a GC sweep.
Regarding flushing the output, did you have any suggestions for me there? Is it possible to flush the output in chunks (i.e. after each document is added) while using PdfCopy, PdfWriter, etc?
Thanks,
Mark
"Paulo Soares"
<[EMAIL PROTECTED]>
04/17/2006 11:06 AM |
|
PdfCopy frees the PdfReader as they are written to output. PdfCopyFields keeps all PdfReader in memory until close. If you are getting similar memory uses you are keeping a reference to PdfReader somewhere.
Paulo
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mark Spitzer
Sent: Monday, April 17, 2006 3:40 PM
To: itext-questions@lists.sourceforge.net
Subject: [iText-questions] Managing Memory During Concatenation
I'm having some difficulty dealing with memory constraints while concatenating a large number of files. The heap on the VM spikes dramatically and I'm trying to determine what I can to do to minimize the impact. I could have up to a few thousand pages from a few hundred documents. I've been using PDFCopy and recently switched to PDFCopyFields based on one of the responses to a similar post. I've noticed no difference. What can I do to ensure that the output stream is flushed after each read pdf is added to the new merged/concatenated document? Is there a smarter way to do what I'm doing?
The code I use and the research I performed are below. I've looked at two different postings and have looked at concat_pdf.java. I apologize in advance if I have missed the answer somewhere else.
Here are the postings I read to try to find an answer:
- http://thread.gmane.org/gmane.comp.java.lib.itext.general/16977/focus=16977
- http://article.gmane.org/gmane.comp.java.lib.itext.general/18809/match=itext+concatenate+large+files
Thanks in advance for your help,
Mark
// code starts here
OutputStream out = printService.getOutputStream();
// Document document = new Document();
// PdfCopy writer = new PdfCopy(document, out);
// document.open();
PdfCopyFields writer = new PdfCopyFields(out);
while (current != null) {
Doc doc = current.getDoc();
// get the doc flavor; if it's url then get a handle to the
// stream
// so we can close each doc when we're done with it
if (doc.getDocFlavor().getRepresentationClassName().equals(
URL.class.getName())) {
in = doc.getStreamForBytes();
// create a new doc with the input stream doc flavor
doc = new SimpleDoc(in, new DocFlavor.INPUT_STREAM(doc
.getDocFlavor().getMimeType()), doc.getAttributes());
}
Doc printedDoc = printToDoc(doc, attributes);
byte[] docBytes = convertToByteArray(printedDoc.getStreamForBytes());
PdfReader reader = new PdfReader(docBytes);
int numberOfPages = reader.getNumberOfPages();
// for(int i = 1; i <= numberOfPages; i++){
// PdfImportedPage page = writer.getImportedPage(reader, i);
// writer.addPage(page);
// }
writer.addDocument(reader);
writer.getWriter().freeReader(reader);
close(in);
current = current.next();
notifyJobListenersOnCompletion();
writer.getWriter().flush();
}
writer.close();
// document.close();