Guenter Milde wrote:

> On 2015-06-26, Jürgen Spitzmüller wrote:
>> 2015-06-26 16:44 GMT+02:00 Guenter Milde <mi...@users.sf.net>:
> 
>>> Please don't check for unencodable characters in comments.
> 
>> It's still invalid encoding, since the output file contains invalid
>> glyphs (no matter if this line is processed by LaTeX or not).
> 
> I have a different view on this:
> 
> **Invalid** characters may only occure in utf-8 encoded files,
> for example in a file generated by LyX with default settings if
> * the document language defaults to utf8
> * a second language defaults to an 8-bit encoding:
> 
>   %% LyX 2.1.3 created this file.  For more info, see http://www.lyx.org/.
>   %% Do not edit unless you really know what you are doing.
>   \documentclass[a4paper]{article}
>   \usepackage{lmodern}
>   \renewcommand{\sfdefault}{lmss}
>   \renewcommand{\ttdefault}{lmtt}
>   \usepackage[T1]{fontenc}
>   \usepackage[latin9,utf8]{inputenc}
> 
>   \makeatletter
> 
>   %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% LyX specific LaTeX commands.
>   \special{papersize=\the\paperwidth,\the\paperheight}
> 
> 
>   \makeatother
> 
>   \usepackage[ngerman,mongolian]{babel}
>   \begin{document}
>   Test
> 
>   \selectlanguage{ngerman}%
>   \inputencoding{latin9}%
>   Gr��e \selectlanguage{mongolian}%
> 
>   \end{document}
> 
> Here, the German word Grüße contains 2 invalid characters if you want to
> process/view/edit the whole file as Utf-8.

This is not invalid. Such a file is not an utf8 file, it is a file with 
mixed encoding, but each single character is valid. The fact that most 
editors cannot display such a file correctly is something else, but e.g. 
emacs can display this file correctly.

> With TeX, there is no problem, as the German text part is read using a
> different encoding.

Yes.
 
> Similarily, all text parts in a comment are uncritical if the file is
> processed by TeX, because comments are not decoded at all.

This does not matter. If the user enters a comment (remember that this is 
either in the preamble or in ERT) we must assume that he did that on purpose 
and wants the comment to be preserved. We should not silently throw away 
parts of the comment.

> This would mean I have to be very verbose and write 0218 LATIN CAPITAL
> LETTER S WITH COMMA BELOW (or some other unambiguous ASCII representation
> of the to-be-tested letter Ș) in the LaTeX preamble of my LyX file to
> be able to use LyX "unicodesymbols" export conversions.

You could also use an ASCII approximation (e.g. "S,").

> In my view, LyX is overly restrictive here and the new feature stands in
> the way.

I agree with Jürgen here. Ignoring unencodable characters in comments means 
that you don't care for the contents of the comment. If you don't care, then 
why don't you omit the characters in the first place (or even the whole 
comment)?


Georg


Reply via email to