On Wed, Jun 24, 2020 at 10:20:48AM +0200, Hans Åberg wrote: > > > The problem is that I haven't changed my environment variable. > > > LC_ALL=UTF-8 > … > > LC_ALL=fr_FR.UTF-8 > > I pointed out that out: There is a double bug, locale dependent generation of > the parser file, and relying on software that can't handle LC_CTYPE=UTF-8.
On (at least) linux using glibc, LC_CTYPE requires a valid locale. And UTF-8 on its own is not a valid locale. A quick search on google suggests that LC_CTYPE will, among other things, control what is a valid letter, and lowercase|uppercase conversions. Taking an easy case, with languages written in latin alphabets, what is the uppercase of 'i' ? In Turkey it is İ (with a dot), because in turkish dotted-i and dotless-i are different letters. ĸen -- He died at the console, of hunger and thirst. Next day he was buried, face-down, nine-edge first. - the perfect programmer