Re: Ansi vs Unicode API

Denis Koroskin Mon, 16 Nov 2009 03:18:07 -0800

On Mon, 16 Nov 2009 12:36:30 +0300, Walter Bright<[email protected]> wrote:

Denis Koroskin wrote:
I'd like to raise 2 issues for a discussion.
First, Phobos makes calls to different functions, based on the OS weare running on (e.g. CreateFileA vs. CreateFileW) and I wonder if it's*really* necessary, since Microsoft has a Unicode Layer for thoseOperating Systems.All an application needs to do to call W API on those OS'es is linkwith unicows.lib (which could be a part of Phobos). It does nothing onWin2k+ and only triggers on 9x OS family.
 A very good overview of it is written here:
http://msdn.microsoft.com/en-us/goglobal/bb688166.aspx
The unicows doesn't do anything more than what Phobos does in attemptingto translate unicode into the local code page. All that using unicowswould do is cause confusion and installation problems as the user wouldhave to get a copy of unicows and install it. unicows doesn't exist ondefault Windows 9x installations.
There is simply no advantage to unicows.

End-users don't have to worry about it at all. They will just use Wfunctions all the time and unicows will trigger and translate UTF16strings into ANSI strings automatically on those operating systems. Thechange would be transparent for them. There is also a redistributableversion of unicows, so those users who want to deploy their software onWin9x could use it and don't force manual install of the .dll.

I was about to propose a drop of Win9x support initially, but thought itmight get hostile reception...

Second, "A" API accepts ansi strings as parameters, not UTF-8 strings.I think this should be reflected in the function signatures, since Dencourages distinguishing between UTF-8 and ANSI strings and not storethe latter as char[].LPCSTR currently resolves to char*/const(char)*, but it could bebetter for it to be an alias to ubyte*/const(ubyte)* so that usercouldn't pass unicode string to an API that doesn't expect one. Thesame is applicable to other APIs, too, for example, how does C stdlibco-operate with Unicode? I.e. is core.stdc.stdio.fopen() unicode-aware?
Calling C functions means one needs to pass them what the host C systemexpects. C itself doesn't define what character set char* is. If you usethe Phobos functions, those are required to work with unicode.

Since char*/char[] denotes a sequence of Unicode characters in D, I see noreason for the API that works with ANSI characters to accept it. Forexample, there is a std.windows.charset.toMBSz() function that returns anANSI variant of a Unicode string. I think it might be preferred for it toreturn ubyte sequence instead of char sequence.

Ideally, I'd like to see all the function that aren't guarantied to workwith UTF-8 strings to accept ubyte*/ubyte[] instead.

Re: Ansi vs Unicode API

Reply via email to