On 6/9/07, Yen-Ju Chen <[EMAIL PROTECTED]> wrote: > On 6/9/07, Quentin Mathé <[EMAIL PROTECTED]> wrote: > > Le 9 juin 07 à 22:42, Yen-Ju Chen a écrit : > > > > > On 6/9/07, Quentin Mathé <[EMAIL PROTECTED]> wrote: > > >>> I need to know what's the output of 'locale' command. > > >> > > >> LANG= > > >> LC_COLLATE="C" > > >> LC_CTYPE="C" > > >> LC_MESSAGES="C" > > >> LC_MONETARY="C" > > >> LC_NUMERIC="C" > > >> LC_TIME="C" > > >> LC_ALL="C" > > > > > > It is interesting that LC_CTYPE is 'C', > > > which means it treats all character as C encoding (ASCII ?). > > > > > >>> And what is your default C string encoding [NSString > > >>> defaultCStringEncoding]. > > >> > > >> NSMacOSRomanStringEncoding > > > > > > And all Cocoa (and probably Carbon) applicaions > > > treats characters as Roman. > > > Then I wonder how Unix command, like 'more' and 'vi', see the > > > characters. > > > > Badly like that: > > $ more TestAccent.txt > > <83>toil<8E>
I play with 'locale' and 'localedef ' a little to show the situation on mac. Terminal.app is set to UTF8 and a file is also in UTF8 encoding. 'locale' show "LC_CTYPE=C". 'cat utf8.txt' will display the right glyph 'more utf8.txt' will show <E9><B3> ... for characters > 127 because it thinks your terminal is in 'C' encoding. 'vi utf8.txt' will shows invalid glyph (mostly ?) for characters > 127 because it tries to interpret UTF8 in 'C' encoding. Therefore, a glyph which takes 2 characters will be interpret as 2 glyphs. It is similar to your 'ls' result. (Therefore, I suspect HFS is not really in UTF8, or 'ls' did some conversion behind). You can change default encoding by executing 'export LC_CTYPE=en_US.UTF-8' The output of 'locale' should show your LC_CTYPE is UTF-8. Now, 'cat', 'more', 'vi' should show 'utf8.txt' correctly. This pretty much explains everything. On mac, there is a discrepancy between Unix environment and Cocoa (probably also carbon). GNUstep check Unix locale to decide default encoding. So there should not be a discrepancy. Yen-Ju > > With TermX and use UTF8 as default encoding, > I got different result with 'cat', 'more' and 'vi'. > So I don't really know which one is the correct one. > > We are dealing two issues here: > 1. What is the default encoding used by system ? > This encoding is probably the one for file system. > It should be something we can solve. > 2. What is the encoding for a text file ? > This one cannot be solve solely by terminal emulator. > It involves the text editor and the tool you use to view it. > Only when both of them use the default encoding can you display > them correctly with terminal emulator. > Otherwise, the viewer has to convert the encoding of a text file > to the default encoding. > So if you use vi, you have to know which > encoding it uses to save the text file. > Most of unix command use LANG or LC_CTYPE for encoding. > But surprisingly your locale is 'C' even on a French system. > So I don't really know which encoding these Unix command use. > If vi think your system is in 'C' encoding, > you can only save file in UTF8 without losing information, I think. > (Not 100% sure about that). > > > > > That's why I always set my text encoding to UTF-8 in Save Panel. > > > > > It also raises the question what is the encoding of the file system > > > (filename). Is it UTF8 (compatible with ASCII) or MacOSRoman ? > > > > iirc HFS+ uses UTF-8. > > Here is 'ls' output example: > > $ ls > > E??toile?? > > Liens a?? trier > > Ico??ne > > > > This looks like UTF-8 when you consider UTF-8 is ASCII compatible and > > only accents are wrongly intepreted here (not the whole character as > > with Roman). > > > > > A quick way to see is changing line 416 in TXTextView.m to > > > difference encoding and see. > > > It is where it decides how to convert characters into NSString. > > > > Do you want I try that on Mac OS X (by compiling TermX on it) or on > > Ubuntu/GNUstep? > > On Mac. Thanx. > GNUstep has font issue. > So even you get the encoding right, > it might not have the right font to display it. > Therefore, you cannot tell which one is wrong (encoding or font). > > Yen-Ju > > > > > Quentin. > > > > > > _______________________________________________ > > Etoile-discuss mailing list > > [email protected] > > https://mail.gna.org/listinfo/etoile-discuss > > > _______________________________________________ Etoile-discuss mailing list [email protected] https://mail.gna.org/listinfo/etoile-discuss
