[
https://issues.apache.org/jira/browse/PDFBOX-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364887#comment-17364887
]
Michael Klink commented on PDFBOX-5216:
---------------------------------------
[~chae],
{quote}Could you please tell me the reason?{quote}
The code I posted on stack overflow only does one thing, it checks whether
there are distinct objects with identical content in the PDF; if there are two
such objects, it removes one of them and replaces all object references to the
removed one by references to the remaining one. It does not check the role of
the objects, though. In case of your example it e.g. does not recognize that it
can drop one of two identical XObject resources if it replaces the associated
names in the related content streams.
{quote}You mentioned that the new version of PDFBox has not been tested yet,
can it be used reliably in versions prior to PDFBox 3.0 pre-releases?{quote}
Not _the new version of PDFBox has not been tested_ but _my code has not been
tested with newer PDFBox versions_. I know that there have been some changes in
the {{equals}} checks... Checking the code with a number of real-live documents
should suffice to determine whether it still can be used. Consider the _Words
of warning_ at the end of the stack overflow answer, though!
> Is there a way to optimize by cleaning up duplicate objects?
> ------------------------------------------------------------
>
> Key: PDFBOX-5216
> URL: https://issues.apache.org/jira/browse/PDFBOX-5216
> Project: PDFBox
> Issue Type: Wish
> Reporter: yoonho
> Priority: Major
> Attachments: samepage.png, 스크린샷 2021-06-15 오후 2.02.21.png
>
>
> Is there a way to clean up duplicate objects using PDFBox?
> [http://gofile.me/4hSqO/Cis33w0Sa] - Original
> [http://gofile.me/4hSqO/7XKmWqUBB] - Clean version
> I applied the Adobe DC's Optimize option (relevant in the attached file). As
> a result, a 48mb PDF file was reduced to 19mb. I think this is due to
> cleaning up duplicate objects in the PDF.
> Am I right? I would like to implement this process with PDFBox. How should I
> approach it?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]