Sorry I should have said "The original Gabor's question persists..."
Fabio Montoya | -----Original Message----- | From: [EMAIL PROTECTED] | [mailto:[EMAIL PROTECTED] On Behalf Of Fabio | Montoya [EMAIL PROTECTED] | Sent: Monday, February 09, 2004 12:04 AM | To: [EMAIL PROTECTED]; 'gabor'; [EMAIL PROTECTED] | Subject: RE: [Mono-list] unicode trouble | | | | Gabor is right Max! The Unicode standard defines characters | in a 32 bit space, The Unicode Character Space in 32 bits or UCS-32. | | For practical reasons, the Unicode standard defines | transformation formats, | i.e.: | | UTF-8 Unicode transformation format for 8 bits | UTF-16 Unicode transformation format for 16 bits [Any | transformation format above 8 bits needs to handle | byte-ordering issues.] | | | The original Max's question persists... | | | > but what about unicode characters, that are simply above | the 16-bit | | > limit? | | > | | > for example: | | > OLD ITALIC LETTER A (unicode code: 10300). | | > | | > how do you represent those in .net? | | | Cheers! | | | Fabio Montoya | | | | -----Original Message----- | | From: [EMAIL PROTECTED] | | [mailto:[EMAIL PROTECTED] On Behalf Of max | | Sent: Sunday, February 08, 2004 10:04 PM | | To: gabor; [EMAIL PROTECTED] | | Subject: Re: [Mono-list] unicode trouble | | | | Hi Gabor, | | I think you're confused. Characters in .NET are 16 bits | | BECAUSE they are unicode. 16 bits = 2 bytes = 65536 values. | | | | a way to check that is simple. here's some C# example code: | | | | string s = "a"; | | s += (char)10300; | | | | Console.WriteLine("s = " + s); | | Console.WriteLine("len = " + s.Length); | | | | for (int i = 0; i < s.Length; i++ ) { | | Console.WriteLine("s["+i+"] = " + (int)s[i]); | | } | | | | max | | | | On Sunday 08 February 2004 15:19, gabor wrote: | | > hi, | | > | | > as i understand, characters in .net are 16-bit values. | | > | | > but what about unicode characters, that are simply above | the 16-bit | | > limit? | | > | | > for example: | | > OLD ITALIC LETTER A (unicode code: 10300). | | > | | > how do you represent those in .net? | | > | | > i tried to open a textfile containing this old-italic-a: | | > | | > - the length and indexing methods of string all said that | | old-italic-a | | > is actually 2 letters => it doesn't work | | > - when writing the string back to an utf8 encoded | textfile, then it | | > was correctly written. | | > | | > so for me it seems that dotnet (mono) uses utf16 as | | internal encoding | | > format, but indexing (and length) doesn't use that information. | | > | | > am i correct? | | > | | > are there any ways to handle those characters in dotnet? | | > | | > for example the new java-1.5 contains some new | | string-methods that can | | > handle these characters. it's not perfect in java, but at | | least there | | > is something. | | > | | > if someone wants to play with it, i attached a text file | containing | | > the text "marrakesh", encoded in utf8, where i replaced the | | first "a" | | > with old-italic-a (it's easy to do with a little iconv | to-from ucs4 | | > and hexedit) | | > | | > thanks, | | > gabor farkas | | | | _______________________________________________ | | Mono-list maillist - [EMAIL PROTECTED] | | http://lists.ximian.com/mailman/listinfo/mono-list | | | | | | | _______________________________________________ | Mono-list maillist - [EMAIL PROTECTED] | http://lists.ximian.com/mailman/listinfo/mono-list | | _______________________________________________ Mono-list maillist - [EMAIL PROTECTED] http://lists.ximian.com/mailman/listinfo/mono-list