What I can not understand is why you wouldn't file a bug against hocr2pdf. As you discovered, Cuneiform exports bboxes for both lines and characters, so it shouldn't be its fault. So now what we can do for you? You are not going to get the font metrics. It's very ambiguous and lots of work. bboxes are by far more than enough to approximately fit the characters.
And as surprising as it might sound, not everybody is interested in creating sandwich PDFs. E.g. I don't care. So you have to push it, if you want to get it solved. -- Font size not correct in merged sandvich PDF https://bugs.launchpad.net/bugs/623438 You received this bug notification because you are a member of Cuneiform Linux, which is the registrant for Cuneiform for Linux. Status in Linux port of Cuneiform: Invalid Bug description: After processing with Cuneiform for Linux 1.0.0 and hOCR to PDF converter, version 0.7.4 (should be the most current version) I get a sandvich pdf that looks nice until I select text. See the sample 5AADFEE1-0000.* files in the attachment and the result.pdf. The effect is shown in screen087.png For another file (Test10pages.pdf) the effect is either worse - basically I cannot really select any more text to copy because I only can guess where to move with the mouse. It looks like that the font size in the HTML is somehow not correct - I am not an expert, but this link might help you: http://www.emdpi.com/fontsize.html _______________________________________________ Mailing list: https://launchpad.net/~cuneiform Post to : cuneiform@lists.launchpad.net Unsubscribe : https://launchpad.net/~cuneiform More help : https://help.launchpad.net/ListHelp