On Mon, Jun 21, 2021 at 3:18 PM Nicholas DiPiazza < [email protected]> wrote:
> Let's say we have a PDF with a bunch of custom encodings. they would look > like this in your Font Properties: > > [image: image.png] > > Notice those with "encoding: custom". > > So even though the PDF has normal looking hebrew text such as: > > [image: image.png] > When you copy it to clipboard it looks like this: > > ©°³ ž ³ž©¤³ > > That's because the custom encoding does not actually map to UTF-8 > characters. > > Has anyone heard of a way to magically process these custom encodings to > find a reasonable UTF-8 mapping? > > I've done this by opening font in fontforge, so you can see the glyph table, and mapping each glyph to proper unicode sequences. In your hebrew case, you'd need additional processing beyond that: because PDF glyphs will be in visual order but unicode needs to be in logical order. So if you just "map" and don't reorder you will end out with backwards text. It is very annoying if there are a lot of ligatures, complex writing systems, or both.
