[ 
https://issues.apache.org/jira/browse/PDFBOX-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810064#comment-16810064
 ] 

Tilman Hausherr commented on PDFBOX-45:
---------------------------------------

I played a bit with this thing and read my text from 2017. Sadly I didn't 
document all my failed attempts :-(
{code}
List<COSBase> objectsToWrite = new ArrayList<>();
COSDictionary obj = doc.getPage(0).getResources().getCOSObject();
objectsToWrite.add(obj);
doc.saveIncremental(fileOutputStream, objectsToWrite);
{code}
works, resource is in object 22.

I wrote in 2017:
{quote}
It sometimes creates orphan objects, possibly when a direct object is marked 
for update. Calling isDirect() to check doesn't help.
{quote}
So I tried this:
{code}
List<COSBase> objectsToWrite = new ArrayList<>();
PDPage page = doc.getPage(0);
PDRectangle mediaBox = page.getMediaBox();
((COSArray) mediaBox.getCOSObject()).setNeedToBeUpdated(true);
objectsToWrite.add(mediaBox.getCOSObject());
objectsToWrite.add(page.getCOSObject());
doc.saveIncremental(fileOutputStream, objectsToWrite);
{code}
The result:
{code}
22 0 obj
[0.0 0.0 595.2 841.92]
endobj
3 0 obj
<<
/Type /Page
/Parent 2 0 R
/Resources <<
/Font <<
/F1 5 0 R
>>
/ProcSet [/PDF /Text /ImageB /ImageC /ImageI]
>>
/MediaBox [0 0 595.2 841.92]
/Contents 4 0 R
/StructParents 0
>>
{code}
Object 22 is an orphan, and the media box rectangle exists twice!

Setting "direct" to false doesn't help, because {{PDRectangle.getCOSObject()}} 
doesn't return the real array, it returns its own COSArray, see the 
{{PDRectangle(COSArray array)}} constructor. A (minor) API flaw is that 
{{getCOSObject()}} API returns a COSBase type but it should be a COSArray.

So lets get the COSArray directly:

{code}
List<COSBase> objectsToWrite = new ArrayList<>();
PDPage page = doc.getPage(0);
((COSArray) 
(page.getCOSObject().getCOSArray(COSName.MEDIA_BOX))).setNeedToBeUpdated(true);
((COSArray) 
(page.getCOSObject().getCOSArray(COSName.MEDIA_BOX))).setDirect(false);
objectsToWrite.add(page.getCOSObject().getCOSArray(COSName.MEDIA_BOX));
objectsToWrite.add(page.getCOSObject());
doc.saveIncremental(fileOutputStream, objectsToWrite);
{code}
Still doesn't work, despite {{setDirect(false)}}.

The cause is a flaw in COSWriter, it is never checked whether the array is a 
direct object. This is done in {{visitFromDictionary}} for dictionaries, but 
not for arrays. {{visitFromArray}} has the same flaw. Fixing this brings a good 
result, i.e. the array only once. But if we remove the {{setDirect(false)}} 
call it no longer works.

So it is important that if an object is to be updated, that this object should 
be an indirect object.

Back to my thoughts from 2017... I wanted to search for the objects, i.e. 
create {{objectsToWrite}} automatically. Per the above argument, this is too 
risky for now.

Another piece of code:
{code}
COSDictionary dict = doc.getDocumentInformation().getCOSObject();
COSString str = (COSString) dict.getDictionaryObject(COSName.AUTHOR);
objectsToWrite.add(str);
doc.saveIncremental(fileOutputStream, objectsToWrite);
{code}
Now there's an orphan string "(Marco Monacelli)" in the PDF. We can't even call 
{{setDirect(false)}} because the method doesn't exist. So this suggests to use 
only {{COSUpdateInfo}} for the objects list.


> Support incremental save
> ------------------------
>
>                 Key: PDFBOX-45
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-45
>             Project: PDFBox
>          Issue Type: New Feature
>          Components: Writing
>             Fix For: 3.0.0 PDFBox
>
>         Attachments: XFAFormFiller.java, fontbox-3.0.0-SNAPSHOT.jar, 
> fontbox-3.0.0-SNAPSHOT.jar, pdfbox-3.0.0-SNAPSHOT.jar
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552835&aid=1157431
> Originally submitted by purplish_cat on 2005-03-05 12:28.
> After opening a PDF file and changing objects out of it, 
> allow to save the changes incrementally to the same file 
> instead of creating a completely new file.
> [comment on SourceForge]
> Originally sent by benlitchfield.
> Logged In: YES 
> user_id=601708
> See forum thread at
> https://sourceforge.net/forum/message.php?msg_id=3032112



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to