[
https://issues.apache.org/jira/browse/PDFBOX-5222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366475#comment-17366475
]
Michael Klink commented on PDFBOX-5222:
---------------------------------------
{quote}I found pdfbox couldn't get rid of exif data embedded in the pdf file, I
want to consult if there is any way we can rely on pdfbox to remove the
data?{quote}
Well, exif data are not directly embedded in the PDF, they are embedded in a
file in another format which in turn is embedded in the PDF, namely a jfif file
or an attached tiff file. Thus, to remove such metadata you have to iterate
over all embedded jfif files and attached tiff files (and all other attachments
which in turn may have directly or indirectly embedded jfif or tiff files),
extract the file data, manipulate the data using a jfif/tiff library that can
remove exif sections, and finally re-embed or re-attach the manipulated file
data.
This clearly is no *bug*, let alone a _major_ one.
> Is it possible to get rid of embedded image metadata from the pdf
> ------------------------------------------------------------------
>
> Key: PDFBOX-5222
> URL: https://issues.apache.org/jira/browse/PDFBOX-5222
> Project: PDFBox
> Issue Type: Bug
> Reporter: Jack
> Priority: Major
> Attachments: origin.pdf
>
>
> Hello, I found pdfbox couldn't get rid of exif data embedded in the pdf file,
> I want to consult if there is any way we can rely on pdfbox to remove the
> data? I attached the file. Thanks
> [^origin.pdf]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]