hello, I am the developer behind the previously cited MAT (https://mat.boum.org). I just want to add my 2 cents based on what I learned by developing metadata-anonymisation processes.
Since visible metadata like lines of text, or pictures can be detected visually and removed with the help of some pdfminer-fu, I rather speak about hidden metadata/watermarks. Since PDF is a pretty complex format to process, I'm doing a rendering of it on a cairo[1] surface, and then saving this surface to a PDF file. Since this produces a completely new PDF, this strips a large part of (if not all) hidden wartermarks/metadata, without transforming the text into pictures. The whole process is implemented in MAT [2]. This could be added in pdfparanoia to counter hidden threats. 1. http://www.cairographics.org/ 2. https://gitweb.torproject.org/user/jvoisin/mat.git/blob/HEAD:/MAT/office.py#l141
-- Unsubscribe, change to digest, or change password at: https://mailman.stanford.edu/mailman/listinfo/liberationtech
