Re: patch to gnustep-base (Unicode and others)

Richard Frith-Macdonald Mon, 08 Apr 2002 00:32:53 -0700


On Sunday, April 7, 2002, at 11:15 PM, Serg Stoyan wrote:


> Hello, Richard Frith-Macdonald.
>
>  RFM> > Here is a patch to the gnustep-base, whith additions such as:
>  RFM> > - fixes NSString's initWithCString* methods behaviour by 
> commenting out
>  RFM> >   GSString's. Without it initWithCString* methods doesn't 
> convert C
>  RFM> >   string into Unicode and this is not OpenStep compliant;
>  RFM>
>  RFM> Perhaps you can explain more ... as far as I cn see the above is 
> simply
>  RFM> wrong.  Certainly initWithCString* methods are not supposed to 
> convert to
>  RFM> unicode (as a general rule), and OpenStep doesn't say they 
> should - so
>  RFM> I'm guessing you have some meaning in mind that is not immediately
>  RFM> obvious to me.
>
>   Here is the citation from "OpenStep Specification" (c) 1994 NeXT 
> Computer
>   Inc. Class NSString, page 2-127:
>   "- (id)initWithCString:(const char *)byteString
>
>   Initializes the receiver, a newly allocated NSString, by converting 
> the
>   one-byte characters in byteString into Unicode characters. byteString 
> must
>   be a null-terminated C string in the default C string encoding."

OK ... guess I was wrong about that ... it *does* seem to say strings 
should be
converted to unicode ... but that's incorrect/misleading documentation.

If you look in the class description documentation, it tells you that -

'While the actual representation of character strings stored in NSString 
and
NSMutableString is independant of any particular implementation, you can 
in general
think of the contents of NSString and NSMNutableString object as being, 
canonically,
Unicode characters (defined by the unichar data type)'

Really, this means that you should not take the method descriptions too 
literally,
they are describing an API, not particular internal implementation 
details.

>  RFM> > - adds 2 languages into Resources/Languages: Russian and 
> Ukrainian;
>  RFM>
>  RFM> Thanks, but I can't use them ... as I don't know what encoding 
> you have
>  RFM> created them in.  I have added a README file to the 
> Resources/Languages
>  RFM> subdirectory to say what format language files *should* be in (and
>  RFM> corrected some errors in the existing files).
>
>   It's ok. I've just updated from CVS and created this files by 
> cvtenc'ing
>   them, just like README says. But... When i start any app i get this
>   message:
>
>   File NSDictionary.m: 458. In [GSDictionary -initWithContentsOfFile:] 
> Contents of file 
> '/home/stoyan/GNUstep/System/Libraries/Resources/Languages/Russian' 
> does not contain a dictionary

All I can suggest here is making sure you have the latest code installed.
I fixed a bug in loading 16-bit unicode property lists a day or two ago.

>   Here is my some environment vars:
>
>   [stoyan@localhost]$ echo $GNUSTEP_STRING_ENCODING; echo $LANG
>   NSKOI8RStringEncoding
>   ru_RU.KOI8-R
>
>   I've attached Russian and UkraineRussian(conforming to Locale.aliases)
>   files as well.

Thanks, I've added them (I converted to ascii with \u escapes for 
consistency
with the other files, but that should make no difference).

>   I guess we can use 2 types of language files -- plain text property 
> list,
>   with encoding in its file name and non-printable unicode file. For 
> example,
>   in case of russian:
>
>   Languages/Russian.KOI8-R         <-- plain proplist in KOI8-R encoding
>   Languages/Russian.WindowsCP1251  <-- plain proplist in Windows 1251 
> encoding
>   Languages/Russian                <-- Unicode file, created with 
> 'cvtenc'

Property lists should be ascii ... so I prefer to keep an ascii property 
list
containing \u escape sequences for non-ascii character, and create the 
other
files temporarily (for editing) using cvtenc

>   In this case we use Unicode file, and proplist files remains for 
> editors.

But keeping multiple copies in different formats could let them get out 
of
sync with each other if you are not careful.

>   Or we can use proplist files with appropriate encoding scheme, if we 
> have
>   to use it(no unicode file for some reason).

Property list files are ascii.
Strictly speaking, anything non-ascii is not a legal property-list file, 
so
while unicode files are also portable, I'd still prefer to stick to 
ascii files
with \u escape sequences.  That is, if we are sticking to one portable 
format
for consistency, I'd prefer it to be the ascii.


> PS: Another thing i've mentioned (and i guess should be somwhere in
> Documentation) is about using non-ascii characters when initializing 
> NSString
> variable. I mean using such definition:
>
> NSString  *some_string = @"some non-ascii characters";
>
> is deprecated. In this case string doesn't not converted into Unicode 
> and
> results is unpredictable, or something.

Well, OpenStep spec simply tells you not to do it (I'd say that's closer 
to
'illegal' than 'deprecated') in the NSString class description.

Where do you think this should be documented in GNUstep ?


_______________________________________________
Bug-gnustep mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/bug-gnustep

Re: patch to gnustep-base (Unicode and others)

Reply via email to