Hi, By way of investigating these questions, I wrote Java code based on the 'pdfbox' library to spit out PDF file information, in particular, the "ToUnicode" entry for each font on each page.
The code is really awful -- really an experiment into the 'pdfbox' interface. Please, no aesthetic comments. But it is useful to see just what text-conversion information is packaged in a PDF file. ======================================= To use it, you need: * a Java SDK * the Java 'pdfbox' library, and the path to its jar files * the Java 'commons-logging' library etc. To build: javac -classpath /usr/share/java/pdfbox.jar -Xlint PDFView.java To run: java -classpath /usr/share/java/pdfbox.jar:/usr/share/java/commons-logging.jar:. PDFView pdf_file_path.pdf
PDFView.java
Description: Binary data
