On Mon, 16 Nov 2009 12:36:30 +0300, Walter Bright
<[email protected]> wrote:
Denis Koroskin wrote:
I'd like to raise 2 issues for a discussion.
First, Phobos makes calls to different functions, based on the OS we
are running on (e.g. CreateFileA vs. CreateFileW) and I wonder if it's
*really* necessary, since Microsoft has a Unicode Layer for those
Operating Systems.
All an application needs to do to call W API on those OS'es is link
with unicows.lib (which could be a part of Phobos). It does nothing on
Win2k+ and only triggers on 9x OS family.
A very good overview of it is written here:
http://msdn.microsoft.com/en-us/goglobal/bb688166.aspx
The unicows doesn't do anything more than what Phobos does in attempting
to translate unicode into the local code page. All that using unicows
would do is cause confusion and installation problems as the user would
have to get a copy of unicows and install it. unicows doesn't exist on
default Windows 9x installations.
There is simply no advantage to unicows.
End-users don't have to worry about it at all. They will just use W
functions all the time and unicows will trigger and translate UTF16
strings into ANSI strings automatically on those operating systems. The
change would be transparent for them. There is also a redistributable
version of unicows, so those users who want to deploy their software on
Win9x could use it and don't force manual install of the .dll.
I was about to propose a drop of Win9x support initially, but thought it
might get hostile reception...
Second, "A" API accepts ansi strings as parameters, not UTF-8 strings.
I think this should be reflected in the function signatures, since D
encourages distinguishing between UTF-8 and ANSI strings and not store
the latter as char[].
LPCSTR currently resolves to char*/const(char)*, but it could be
better for it to be an alias to ubyte*/const(ubyte)* so that user
couldn't pass unicode string to an API that doesn't expect one. The
same is applicable to other APIs, too, for example, how does C stdlib
co-operate with Unicode? I.e. is core.stdc.stdio.fopen() unicode-aware?
Calling C functions means one needs to pass them what the host C system
expects. C itself doesn't define what character set char* is. If you use
the Phobos functions, those are required to work with unicode.
Since char*/char[] denotes a sequence of Unicode characters in D, I see no
reason for the API that works with ANSI characters to accept it. For
example, there is a std.windows.charset.toMBSz() function that returns an
ANSI variant of a Unicode string. I think it might be preferred for it to
return ubyte sequence instead of char sequence.
Ideally, I'd like to see all the function that aren't guarantied to work
with UTF-8 strings to accept ubyte*/ubyte[] instead.