Re: UTF-8 in string literals and translation strings in particular

Guillaume Munch Tue, 06 Oct 2015 11:53:46 -0700

Le 06/10/2015 18:28, Jean-Marc Lasgouttes a écrit :

Le 06/10/2015 18:38, Guillaume Munch a écrit :

I'm trying to come up with examples where do we actually "need"
unicode in the interface. The ellipsis case seems to be trivial
indeed.


Is it trivial?  On my ubuntu libreoffice (where I can write the same
string to compare), I would say that what is used is three dots and not
ellipsis. I failed to find documentation on the subject, except:
http://stackoverflow.com/questions/3777072/in-menus-for-should-one-use-ellipsis-sign-or-just-three-dots


Note that the above comments are from 2010.



After more looking around, I see that Apple recommends the ellispsis
character:
https://developer.apple.com/library/mac/documentation/UserExperience/Conceptual/OSXHIGuidelines/TerminologyWording.html#//apple_ref/doc/uid/20000957-CH15-SW3


Gnome seems to prefer ellipsis character too:
https://developer.gnome.org/hig/stable/writing-style.html.en#ellipses

Microsoft does not say anything, but their examples use ...:
https://msdn.microsoft.com/en-us/library/dn742392.aspx

In addition, Qt already uses … to elide strings so we currently have aninconsistent UI.


Seriously, I think this is just going to annoy our translators.

Seriously :) as described in the rationale of my patch, this cannot bemore transparent for translators. Gettext pre-sets the previoustranslation; translators just have to do a global search-replace. Anduntil they do, the old translation is still displayed. I took the timeto check that it's indeed the case. And you might be underestimating ourtranslators, some of whom might be lovers of proper typographic usage.

Seriously, who knows how to get an ellipsis on his/her keyboard?

Seriously :) I am sorry, ignorance is not an argument. A lot of badtypographic usage is only the legacy of past technical limitations thatare long gone. It took me 10 seconds to learn that … is AltGr+Shift+, inthe french Linux keyboard and Option+; on Mac. (For proper French usage,you can also input accented upper-case characters and I am sure thatyour question marks are preceded by a space − though I would agree thata thin space would appear as too formal in e-mails.)


I could propose to hack it into the menu code...

Please, no. Would you also hack my changes tosrc/frontends/qt4/ui/HSpaceUi.ui into the dialog code?


I general, our source code is already UTF8 (in particular author names
in .cpp files), but I am not sure that adding weird characters in the
source is always helpful. I would not swear that there only one
character looking like an ellipsis in unicode standard.

Do you have an example where this might lead to a confusing situation?If I see … in a translation string I would trust the author that he didnot write U+1D087 BYZANTINE MUSICAL SYMBOL TRIPLI without a good reason.And I do not see how … is more confusing than 0x2026, developers arefree to explain with a comment in both cases.


 > Properly formatted text in general, not just proper ellipses…
 > A revamped IPA toolbar?
 > The math toolbar?
 > See also src/frontends/qt4/ui/HSpaceUi.ui in the patch.

Please note that we have to be very careful with unicode characters. At
some times I advocated using the proper unicode visible space character
in our documentation, but it turned out that several windows font did
not have that. You do not want to force users to use such or such font
in their text editor.

As already discussed in the "newline char" thread, this is a separateissue, easily fixed by providing a fallback font. This is standard practice.

I hear hacks and half-solutions left and right, when Unicode provides astandard solution. A fallback font would be a good investment for notreinventing the wheel constantly. (And a custom portable font with a fewspecial characters taken from various free fonts is incredibly easy to do.)

Of all programs, if LyX does not need an Unicode interface, which
program does? I am sure you will be able to come up with creative uses.


LyX needs an interface that blends well with the environment where it runs.


This is repeating the argument before, so I would repeat the reply.

LyX already uses a ton of Unicode chars, defined by hand with their code
point. This patch makes it easier to use Unicode in the source, and
enables special chars in translation strings as well.


Code point is more precise than trying to recognize a character in a
unicode table. I am sure that emacs can tell me what is the code point
at cursor position, but life is to short to try to find it.

First, the patch does not prevent you from using code points whenappropriate. But it now also allows you to use the \u and \U escapesequences in string literals, e.g. for translation strings, if you findit appropriate. Though this is c++11, so only starting from LyX 2.3.(For ellipses though, I find more useful to read … in the source ratherthan \u2026.)

And the patch is free.


:)

For the docstring part of the code, I am not sure what code like the
following do:

-    LASSERT(static_cast<unsigned char>(*c) < 0x80, return l);
-    s.push_back(*c);
+    if (static_cast<unsigned char>(*c) < 0x80)
+        s.push_back(*c);
+    else
+        return s += from_utf8(string(c));

There is nothing magic about from_utf8(string(c)), right? This is just
accepting latin1 characters, or am I blind?

I do not understand your remark. The naive version would be to returnl+from_utf8(string(r)) right away, but I kept the optimisation on theASCII subset.

I do not know how from_utf8 handles mis-encoded (e.g. Latin-1) strings(one should have a look in support/unicode.cpp or iconv).



Guillaume

Re: UTF-8 in string literals and translation strings in particular

Reply via email to