At 15:11 19/09/2017 +0000, William Bader wrote:

>It would be possible to write a tool which could reliably detect identical fonts in a PDF file,


There are already libraries that can read PDFs into a data structure and then write a new PDF, for example, pdfsizeopt in python, poppler <https://poppler.freedesktop.org/>https://poppler.freedesktop.org/ and PoDoFo <http://podofo.sourceforge.net/about.html>http://podofo.sourceforge.net/about.html in C++, pdfclown <https://sourceforge.net/projects/clown/>https://sourceforge.net/projects/clown/ in .net, PDFBox <http://pdfbox.apache.org/>http://pdfbox.apache.org/ in java, iText <https://itextpdf.com/>https://itextpdf.com/ in java and c#, pdfsam <http://www.pdfsam.org/>http://www.pdfsam.org/ in java. Maybe one of them would be suitable as a starting point for writing a font merging tool.

Indeed, and if I was going to do this I would use MuPDF. Note that it will likely be a slow job to run. You can't do the job until you have all the PDF files collected into one, then you need to check each instance of each font to see if its the same as any other font, and remove the other font, updating the relevant Resources dictionaries. Fortunately you don't need to alter any of the content streams. Finally you'd need to rewrite the PDF file with a modified xref and the relevant font streams removed.

Of course, because you have a fixed workflow you *could* simply look for the second and following instances of any font rather than checking them all exhaustively, but I think it would be better to do the job right. Firstly you'd be protected against any further changes in your workflow, and secondly you would have a genuinely useful tool in its own right.

Ghostscript is entirely the wrong tool for that job. Its possible, but I wouldn't want to write the PostScript program for it.


                Ken


_______________________________________________
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel

Reply via email to