https://bugs.documentfoundation.org/show_bug.cgi?id=62846

--- Comment #37 from [email protected] ---
I'm sorry. Yes, this is about extracting text from a PDF.

tldr; The basic way to fix this font is to get the associations correct. But
this requires both a compiler change and a serious fixup of the GDL in the
font.

OK. What's going on here is a combination of things. Firstly a font riddle with
bugs. Second a compiler that is far too kind and that therefore outputs a font
that is less than ideal and thirdly a pdf engine that doesn't give us any help.

The compiler has problems at the moment with glyph associations for deletion
and insertion. We will fix that. But when we do we will also fix it that having
a different number of slots on the left hand side of a rule to the right hand
side or to the context will be an error. You really really need to fix those.

There is a work around but it will take a lot of work. If you make all the
associations in deletions explicit as in:

gEscape ga gb > _ _ gab:(1 3) / ^ _ _ _;

then the compiler will output a font that the engine will accept. You should do
this anyway. IOW try seriously to get your font down to there being no warnings
except ignored ones. The warnings are trying to tell you something that you
really should listen to.

Why is it outputting a 1 all the time instead of nothing? As a text is
converted to PDF each glyph's association with its underlying Unicode is
tracked and stored as the glyph mapping in the font's ToUnicode table. Since
the inserted narrow non-breaking space is associated with one of the digits in
the underlying text, it gets associated with a digit in the ToUnicode table for
the font. The last such association is taken and that is in a line of 1s hence
using 1 everywhere. Ideally it should output nothing. This is why associations
in a font are important, and to be honest, tricky.

Does anyone know a way to get the pdf writer to store ActualText elements in
the generated PDF containing the actual Unicode for a string rather than trying
to back infer it from a sequence of glyphs?

-- 
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
Libreoffice-bugs mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs

Reply via email to