[ 
https://issues.apache.org/jira/browse/PDFBOX-5222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366475#comment-17366475
 ] 

Michael Klink commented on PDFBOX-5222:
---------------------------------------

{quote}I found pdfbox couldn't get rid of exif data embedded in the pdf file, I 
want to consult if there is any way we can rely on pdfbox to remove the 
data?{quote}

Well, exif data are not directly embedded in the PDF, they are embedded in a 
file in another format which in turn is embedded in the PDF, namely a jfif file 
or an attached tiff file. Thus, to remove such metadata you have to iterate 
over all embedded jfif files and attached tiff files (and all other attachments 
which in turn may have directly or indirectly embedded jfif or tiff files), 
extract the file data, manipulate the data using a jfif/tiff library that can 
remove exif sections, and finally re-embed or re-attach the manipulated file 
data.

This clearly is no *bug*, let alone a _major_ one.

> Is it possible to get rid of embedded image metadata from the pdf 
> ------------------------------------------------------------------
>
>                 Key: PDFBOX-5222
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5222
>             Project: PDFBox
>          Issue Type: Bug
>            Reporter: Jack
>            Priority: Major
>         Attachments: origin.pdf
>
>
> Hello, I found pdfbox couldn't get rid of exif data embedded in the pdf file, 
> I want to consult if there is any way we can rely on pdfbox to remove the 
> data? I attached the file. Thanks
> [^origin.pdf]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to