Re: UTF-8 in string literals and translation strings in particular

Georg Baum Thu, 08 Oct 2015 14:13:05 -0700

Jean-Marc Lasgouttes wrote:

> The problem with the patch is that it does not have a clear goal. The
> discussion would have been much easier if you had splitted it in 3 from
> the start:
> 
> 1/ easy use of utf8 in docstring
> 2/ allow utf8 in translattable strings
> 3/ use … instead of ... in UI


4) use of unicode string literals in C++ source files

This would have been easier indeed. For example, I have no real opinion 
about 2) and 3).

4) is not possible as long as we support C++98 (because the source encoding 
is not standardized and especially MSVC has a horrible interpretation of 
it).

Concerning 1) I have a strong opinion which needs a bit of history 
explained: When unicode support was introduced in LyX the idea was to 
replace all strings which can contain non-ASCII contents with docstring. The 
only exceptions would be interfaces to third party libraries or 
import/export, where it is sometimes needed to use std::string with a 
certain encoding. Unfortunately this conversion was never completely 
finished (this is the reason for all the "FIXME UNICODE" comments). 
Therefore, after finishing this task, all occurences of std::string would 
contain ASCII contents with very rare exceptions.
The alternative which was also discussed was to use docstring everywhere. 
This would have been less work to do, but the advantages of the mixed 
docstring/std::string approach were bugs found during the transition 
process, more memory and runtime efficiency, and (if it was completed) a 
clear picture where one can expect ASCII and where user visible contents is 
used.

The proposed changes to docstring weaken the clear separation of ASCII/non-
ASCII contents. They are not needed if the unicode transition is finished 
(i.e. all "FIXME UNICODE" comments addressed). They are not needed either if 
we change our mind and use the alternative approach of docstring everywhere 
instead. For me, the disadvantages count much higher than the advantages, 
therefore I would suggest to either finish the unicode transition, or using 
docstring everywhere. The only exception would be unicode string literals in 
C++11 mode. Support for these in docstring is both safe and useful in any 
case.

> For the record, concerning these 3 problems:
> 1/ I would agree with extending docstring so that it considers that char
> const * and std::string represent UTF8. However, I wonder what is the
> best approach for that. Making this work only for some operators seems
> strange to me. Wouldn't it be possible to set up some implicit
> constructors?

Implicit constructors do not exist on purpose, for the reason explained 
above. qt has also learned that they are problematic (see 
QT_NO_CAST_FROM_ASCII, which is actually misnamed since it disables casts 
from const char * which are implemented using fromUtf8). Recently I also 
learned that a volunteer provided a huge patch some time ago for the kate 
editor which made it use QT_NO_CAST_FROM_ASCII in order to avoid bugs.
 

Georg

Re: UTF-8 in string literals and translation strings in particular

Reply via email to