[
https://issues.apache.org/jira/browse/PDFBOX-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15338501#comment-15338501
]
Tilman Hausherr commented on PDFBOX-3280:
-----------------------------------------
I have now reverted most of the changes due to split creating huge files and
per the short discussion in the dev list
{quote}
rev 1741295 in PDFBOX-3280 (deep clone) was a bad decision related to
splitting, or related to creating new documents from existing documents. What
should be done now?
my thought:
1) revert the change per this logic (ignore the insults)
http://article.gmane.org/gmane.linux.kernel/1369384
2) create a third method additional to addPage and importPage, e.g.
importPageClone, that does what importPage does now.
The alternative would be to fix the splitter so that it does what importPage
did in the past.
However I think it is better to revert the mess, instead of correcting the
mess.
{quote}
Maruan agreed with (1) and John mentioned
{quote}
The original JIRA issue isn’t valid. The whole point is that the source PDF
*should* be kept open - we need to read from it. And multi-threaded writing
isn’t supported, so no surprise there.
{quote}
> PDDocument.importPage does not deep clone source page
> -----------------------------------------------------
>
> Key: PDFBOX-3280
> URL: https://issues.apache.org/jira/browse/PDFBOX-3280
> Project: PDFBox
> Issue Type: Bug
> Components: PDModel
> Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0
> Reporter: Cornelis Hoeflake
> Attachments: 1.pdf
>
>
> The method PDDocument.importPage does not deep clone the source page. This
> causes two issues, when closing the source document BEFORE saving the target
> document throws an already closed exception.
> Placing the close after saving the target document works fine. But... When
> splitting a document into a lot of small documents and than save that
> documents multithreaded will cause random exceptions like
> ArrayIndexOutOfBounds, COSStream closed etc.
> Check for example the following code. I attach the used source document.
> {code:title=Test.java|borderStyle=solid}
> PDDocument doc = new PDDocument();
> PDDocument load = PDDocument.load(new File(SOURCE_DOC));
> for (int p = 0; p<1000; p++) {
> doc.importPage(load.getPage(0));
> }
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> doc.save(baos);
> doc.close();
> load.close();
> final PDDocument doc2 = PDDocument.load(baos.toByteArray());
> // ok, now we have a big document loaded as it normally will be loaded.
> ExecutorService es = Executors.newFixedThreadPool(4);
> List<PDDocument> docs = Lists.newArrayList();
> for (int p = 0; p<doc2.getNumberOfPages(); p++) {
> final PDDocument newDoc = new PDDocument();
> newDoc.importPage(doc2.getPage(p));
> docs.add(newDoc);
> }
> for (int p = 0; p<doc2.getNumberOfPages(); p++) {
> final int page = p;
> es.submit(new Runnable() {
> @Override
> public void run() {
> try {
> PDDocument newDoc = docs.get(page);
> newDoc.save(new ByteArrayOutputStream());
> newDoc.close();
> } catch (IOException e) {
> e.printStackTrace();
> }
> }
> });
> }
> es.shutdown();
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]