On Tue, 20 Jul 2010 05:10:47 -0400, bearophile wrote: > In that code, for further safety, I'd like to make it not possible > (without a cast) code like this (here toStringz doesn't get called): > strcmp(Cstring(s1.ptr), Cstring(s2.ptr)); > > So I think this code is a bit better: > > import std.string: toStringz; > > struct Cstring { > const(char)* ptr; // const(ubyte)* ? > static Cstring opCall(string s) { > Cstring cs; > cs.ptr = toStringz(s); > return cs; > } > } > > extern(C) Cstring strcmp(Cstring s1, Cstring s2); > > void main() { > auto s1 = "abba"; > auto s2 = "red"; > auto r2 = strcmp(Cstring(s1), Cstring(s2)); > } > > Lars T. Kyllingstad: > >> but I think it should wrap a ubyte*, not a char*. The reason for this >> is that D's char is supposed to be a UTF-8 code unit, whereas C's char >> can be anything. > > Right. But toStringz() returns a const(char)*, so do you want to change > toStringz() first?
Yes. I think we should stop using char* when interfacing with C code altogether. The "right" thing to do, if you can call it that, would be to use char* only if you KNOW the C function expects text input encoded as UTF-8 (or just plain ASCII), and ubyte* for other encodings and non- textual data. But this rule requires knowledge of what each function does with its input and must hence be applied on a case-by-case basis, which makes automated translation of C headers to D difficult. So I say make it simple, don't assume that your C functions handle UTF-8, and use ubyte* everywhere. (Actually, it's not that simple, either. I just remembered that C's char is sometimes signed, sometimes unsigned...) Maybe this should be discussed on the main NG. It's been bothering me for a while. I think I'll start a topic on it later. -Lars