On 08.11.2023 07:23, Andreas Lehmkühler wrote:
Am 03.11.23 um 17:24 schrieb Tilman Hausherr:
https://www.danisch.de/blog/2023/10/31/aktennotiz-zu-pdftotext-bei-vermurksten-zeichensaetzen/
The text is in german but what he says that he was able to extract
text from obfuscated PDFs by converting them to PostScript and then
back to PDF. I didn't test this myself but I suspect that the
conversion to PostScript dumps the /ToUnicode stream, and that it is
rebuilt from the font itself when the conversion is done.
The information has to be somehwere otherwise such "conversion" won't
work.
@Tilman did you try to contact the author to ask for an example?
No because I didn't expect to get a PDF, he mentioned account
statements. It's just a thing to keep in mind the next time we hit such
a file.
Tilman
Andreas
Tilman
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]