On Fri, 20 Mar 2020 Jan-Marek Glogowski wrote:

> Hmm - I know fcitx uses some kind of tables for the direct mappings. My
> Debian has fcitx-table-emoji. Guess that would be the easiest starting
> point, if your languages typed letters don't depend already existing
> previous or next letters and just need some keys to code point mapping.
There are two separate issues here - keyboard input and display of the glyph. 
Leaving aside for the moment the input mechanism assuming that I have done what 
you suggest, I'd like to understand the code dealing with the display mechanism 
in LO. This is because even if some external method did the input mappings and 
the keycode came into LO as a result of those mappings, the problem here is 
that although everything works fine in the case of copy-paste, it is not the 
same with keyboard input. 

In the case of keyboard input, the keycodes that have a value above 65535 get 
truncated to short when it passes through various layers of functions that 
handle the codes. The PUAs I use are values greater than 65535.
As an example, the values of keyval and aOrigCode in the arguments of 
GtkSalFrame::doKeyCallback are both 97 when you type the letter 'a' on the 
standard keyboard. Printing the individual elements of the array pStr in 
CommonSalLayout::LayoutText, you see the value 97 printed here. Now change the 
97 to a PUA value in doKeyCallback (e.g.: 1051531) and you see that the 
corresponding value printed in LayoutText is the truncated value (printed value 
of 2955 for 1051531). 2955 is the value that will be printed when an integer 
type containing 1051531 is written into a short type and printed.
I also see that uInt16 is used in many places in the code. 

At this point, I just want to understand the flow. I'm not suggesting that LO 
make any change. Where in the code do the key values get handled as they are 
typed in and where in the code do they get mapped to the value needed for 
displaying the glyph. I assume the value for display will be encoded in UTF-8. 
I'd like to know where in the source code that happens as well.

> Yup. No LO changes needed, unless you find some bug.
I'm definitely not suggesting changes, but am trying to understand the code as 
I explained above. However, I would also not rule out the possibility that 
copy-paste part of the code works well because it correctly reads the UTF-8 
encoded values of the codepoints expected by the font file, while the keyboard 
input results in these values being incorrect as they pass through various 
layers of the program. I just want to know what these layers are.

> I'm not sure I understand you. Is this a Gtk-only problem, so qt5 or kf5
> works? I'm not aware of any restriction regarding file names. Sure Gtk+
> and Qt5 default to utf-8 encoding, but that should just work. Or do they
> reject PUA code points (which IMHO makes sense, because a filename has
> no font).

Not sure about other systems, but GNOME restricts to valid unicode values. It 
does not reject PUA but rejects 32 bit values encoded in UTF-8. I wrote my own 
UTF-8 encoding mechanism that would take 32 bit values but some GNOME functions 
fail which is why I mapped my coding system to PUAs. As far as this discussion 
for LO's functionality is concerned, it is only related to PUA values.

> From the filesystem POV it's all just bytes. 

This is not related to LO, but this is where many GNOME libraries impose the 
restriction. It does not follow the filesystem of filenames being just bytes. 
If you try using a g_filesystem* function and pass a filename containing a 
character which is not approved by the Unicode Consortium, it will fail. GNOME 
is not agnostic to various Standards out there but follows the Standards set by 
some organizations. Of course, in those cases, I just use fopen or related 
calls.

-a

_______________________________________________
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice

Reply via email to