https://bugs.documentfoundation.org/show_bug.cgi?id=62846
--- Comment #37 from [email protected] --- I'm sorry. Yes, this is about extracting text from a PDF. tldr; The basic way to fix this font is to get the associations correct. But this requires both a compiler change and a serious fixup of the GDL in the font. OK. What's going on here is a combination of things. Firstly a font riddle with bugs. Second a compiler that is far too kind and that therefore outputs a font that is less than ideal and thirdly a pdf engine that doesn't give us any help. The compiler has problems at the moment with glyph associations for deletion and insertion. We will fix that. But when we do we will also fix it that having a different number of slots on the left hand side of a rule to the right hand side or to the context will be an error. You really really need to fix those. There is a work around but it will take a lot of work. If you make all the associations in deletions explicit as in: gEscape ga gb > _ _ gab:(1 3) / ^ _ _ _; then the compiler will output a font that the engine will accept. You should do this anyway. IOW try seriously to get your font down to there being no warnings except ignored ones. The warnings are trying to tell you something that you really should listen to. Why is it outputting a 1 all the time instead of nothing? As a text is converted to PDF each glyph's association with its underlying Unicode is tracked and stored as the glyph mapping in the font's ToUnicode table. Since the inserted narrow non-breaking space is associated with one of the digits in the underlying text, it gets associated with a digit in the ToUnicode table for the font. The last such association is taken and that is in a line of 1s hence using 1 everywhere. Ideally it should output nothing. This is why associations in a font are important, and to be honest, tricky. Does anyone know a way to get the pdf writer to store ActualText elements in the generated PDF containing the actual Unicode for a string rather than trying to back infer it from a sequence of glyphs? -- You are receiving this mail because: You are the assignee for the bug.
_______________________________________________ Libreoffice-bugs mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
