Hi,

trying to clearify things a little...

To display all characters used in all languages of the world (nearly)
there was a table established: The unicode.
This table is HUGE (obviously) and uses too much bytes per character.

Here starts the "encoding" thing starts. UTF-8 (big shout out to Ken
Thompson) is one of those encoding systems.

This encoding system describes which characted corresponds of the
encoding system points to what character in the unicode table.

Or in other words: There is a one-to-one-relationship from the
UTF8-encoding system to the unicode table.

If a UTF8-encoded filename is displayed on a terminal, which is unaware
of that encoding system, this relation fails and the terminal shows 
the "raw" values pointing to characters in -for example- the
ASCII-table. Since the ASCII table contains control characters also,
which byte values may be totally valid parts of the UTF8-encoding
scheme, one may screw up the terminal settings totally and even an
ls -l will output gibberish only. Type "reset" blindly followed by
RETURN blindly and everything is back to normal.

But way are the boxes and boundaries ok when using utf-8 AND non-UTF8
aware terminals.

That is, because the software is clever enough to SWITCH to different
characters when sensing a non-UTF8 terminal.

This is possible for the filenames using UTF8-character...otherwise the
name would be another one.

The underlaying filesystem is still using UTF8, despite what the
terminal understands.

(this is german...I am using it only for the purpose of using
characters, which are part of UTF-8 but not of ASCII).

When a filename is for example "Straßennameundähnliches.txt"
("streetnamesandthatalike.txt"), then the "weird" characters
will only displayed correctly in an UTF-8 aware environmen. You
cannot use "different characters" to replace those, because you
would use a different filename then.

Whether spaces and non-ASCII-charactes should be part of filenames
or not is another disussion.

HTH!
Cheers!
mcc



On 11/03 01:49, wwp via mc wrote:
> Hello Yury,
> 
> 
> On Sat, 2 Nov 2024 18:06:29 +0100 "Yury V. Zaytsev via mc" 
> <mc@lists.midnight-commander.org> wrote:
> 
> > > On 2. Nov 2024, at 17:41, wwp via mc <mc@lists.midnight-commander.org> 
> > > wrote:
> > > 
> > > I'm trying to get mc with UTF-8 support to run in mrxvt, which has no
> > > support for it. Of course I know I can switch to another terminal.. but
> > > none bring what mrxvt does and the UTF-8 branch of mrxvt is not usable,
> > > thus asking here if that sounds possible or not.  
> > 
> > mc supports UTF-8 in terms of a) printing UTF-8 characters and b) 
> > converting UTF-8 characters to the terminal encoding, if it’s different 
> > from UTF-8 as long as you build with --enable-charset.
> > 
> > > Any tips here? The only thing I get if I run mc with LANG/LC_ALL set to
> > > en_US.UTF-8 from a mrxvt instance is a broken layout (frames) and UTF-8
> > > chars not shown correctly. Tried with slang or ncurses for screen
> > > support, no change.  
> > 
> > This is working exactly as expected. You lie to mc that your terminal 
> > supports UTF-8, it starts printing UTF-8 characters and this results in a 
> > broken layout.
> > 
> > You should tell mc which encoding is really supported by your terminal, and 
> > it will convert everything it can to this encoding, e.g.
> > 
> >   LANG=C LC_ALL=en_US.ISO8859-1 mc
> > 
> > Maybe you should have started by explaining what is not working and what 
> > you are trying to achieve in the first place.
> 
> Thanks for those explanations.
> 
> What I am trying to achieve by using mc sounded obvious to me,
> apologies for this: I'm trying to list/manipulate files and folders (mc
> does a great job at that) whose names contain UTF-8 characters. At
> least list and rename them, by running mc in mrxvt.
> 
> What's not working? In such conditions, UTF-8 filenames are not showing
> in a human readable manner, and (worst case), mc layout frames (main
> screen, dialogs) are shown as broken since every single bar char takes 1-n
> characters to display, breaking look and alignment, unless I use `mc -a`.
> 
> 
> >   LANG=C LC_ALL=en_US.ISO8859-1 mc
> 
> With this (or close to this, -15), I see proper mc frame layouts and
> dialogs, but filenames still show undecoded chars.
> 
> 
> Regards,
> 
> -- 
> wwp
> 



> -- 
> mc mailing list
> mc@lists.midnight-commander.org
> https://lists.midnight-commander.org/mailman/listinfo/mc

-- 
mc mailing list
mc@lists.midnight-commander.org
https://lists.midnight-commander.org/mailman/listinfo/mc

Reply via email to