Great thanks everyone! I appreciate your responses. Yeah sounds like this is definitely possible if we get really desperate. But very non-trivial.
-Nicholas On Mon, Jun 21, 2021 at 9:35 PM Robert Muir <[email protected]> wrote: > On Mon, Jun 21, 2021 at 3:18 PM Nicholas DiPiazza < > [email protected]> wrote: > > > Let's say we have a PDF with a bunch of custom encodings. they would look > > like this in your Font Properties: > > > > [image: image.png] > > > > Notice those with "encoding: custom". > > > > So even though the PDF has normal looking hebrew text such as: > > > > [image: image.png] > > When you copy it to clipboard it looks like this: > > > > ©°³ ž ³ž©¤³ > > > > That's because the custom encoding does not actually map to UTF-8 > > characters. > > > > Has anyone heard of a way to magically process these custom encodings to > > find a reasonable UTF-8 mapping? > > > > > I've done this by opening font in fontforge, so you can see the glyph > table, and mapping each glyph to proper unicode sequences. > In your hebrew case, you'd need additional processing beyond that: because > PDF glyphs will be in visual order but unicode needs to be in logical > order. So if you just "map" and don't reorder you will end out with > backwards text. > It is very annoying if there are a lot of ligatures, complex writing > systems, or both. >
