[ https://issues.apache.org/jira/browse/PDFBOX-5222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366475#comment-17366475 ]
Michael Klink commented on PDFBOX-5222: --------------------------------------- {quote}I found pdfbox couldn't get rid of exif data embedded in the pdf file, I want to consult if there is any way we can rely on pdfbox to remove the data?{quote} Well, exif data are not directly embedded in the PDF, they are embedded in a file in another format which in turn is embedded in the PDF, namely a jfif file or an attached tiff file. Thus, to remove such metadata you have to iterate over all embedded jfif files and attached tiff files (and all other attachments which in turn may have directly or indirectly embedded jfif or tiff files), extract the file data, manipulate the data using a jfif/tiff library that can remove exif sections, and finally re-embed or re-attach the manipulated file data. This clearly is no *bug*, let alone a _major_ one. > Is it possible to get rid of embedded image metadata from the pdf > ------------------------------------------------------------------ > > Key: PDFBOX-5222 > URL: https://issues.apache.org/jira/browse/PDFBOX-5222 > Project: PDFBox > Issue Type: Bug > Reporter: Jack > Priority: Major > Attachments: origin.pdf > > > Hello, I found pdfbox couldn't get rid of exif data embedded in the pdf file, > I want to consult if there is any way we can rely on pdfbox to remove the > data? I attached the file. Thanks > [^origin.pdf] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org