Re: how to localize console and GUI apps in Windows

Andrei via Digitalmars-d-learn Thu, 04 Jan 2018 02:02:02 -0800

On Friday, 29 December 2017 at 18:13:04 UTC, H. S. Teoh wrote:

On Fri, Dec 29, 2017 at 10:35:53AM +0000, Andrei viaDigitalmars-d-learn wrote:
This may be endurable if you write an application whereRussian is only one of rare options, and what if your wholeenvironment is totally Russian?
You mean if your environment uses a non-UTF encoding? If yourenvironment uses UTF, there is no problem. I have code withstrings in Russian (and other languages) embedded, and it's noproblem because everything is in Unicode, all input and alloutput.

No, I mean difficulties to write a program based on non-ASCIIlocales. Every programming language learning since C starts witha "hello world" program which every non-English programmeressentially tries to translate to native language - and getsunreadable mess on the screen. Thousands try, hundreds look for asolution, dozens find it, and a few continue with the newlanguage. That's not because these programmers cannot readEnglish text-books, they can. That's because they want to writenon-English programs for non-English people, and that'sessential. And there are many programming languages (or rathertheir runtimes) which do not suffer such a deficiency.

That's the reason for UNICODE adoption all over the programmingworld - including D language, but what's the good for me if I canwrite in a D program a UTF8 string with my native language text,and get the same unreadable mess on the screen?

Yes, a new language in development can lack support for somefeatures, but this forum branch shows that a simple and handysolution exists - yet nobody cares to bring it to the first pagesof every text-book for beginners, at least as a footnote. Thusthousands of potential new language fans are lost from start.

But I understand that in Windows you may not have this luxury.So you have to deal with codepages and what-not.
Converting back and forth is not a big problem, and it actuallyalso solves the problem of string comparisons, because std.uniprovides utilities for collating strings, etc.. But it onlyworks for Unicode, so you have to convert to Unicode internallyanyway. Also, for static strings, it's not hard to make thecodepage mapping functions CTFE-able, so you can actually writestring literals in a codepage and have the compilerautomatically convert it to UTF-8.
The other approach, if you don't like the idea of convertingcodepages all the time, is to explicitly work in ubyte[] forall strings. Or, preferably, create your own string type withubyte[] representation underneath, and implement your owncomparison functions, etc., then use this type for all strings.Better yet, contribute this to code.dlang.org so that otherswho have the same problem can reuse your code instead ofneeding to write their own.

I'd definitely try this if I decide to use D language for mypurposes (which not settled yet). But to decide I need someexperience, and for now it stopped at reading the user's input(for training I intend to translate into D my recent rathercomplex interactive C# program).

Still this does not decide localized input problem: anylocalized input throws an exception “std.utf.UTFException...Invalid UTF-8 sequence”.
Is the exception thrown in readln() or in writeln()? If it's in
writeln(), it shouldn't be a big deal, you just have to passthe data returned by readln() to fromKOI8 (or whatever othercodepage you're using).
If the problem is in readln(), then you probably need to readthe input in binary (i.e., as ubyte[]) and convert it manually.Unfortunately, there's no other way around this if you'reforced to use codepages. The ideal situation is if you can justuse Unicode throughout your environment. But of course,sometimes you have no choice.


It depends.

If I avoid proper console code page initializing, I see indebugger that runtime reads the user's input as CP866 (MS DOS)Cyrillic and then throws the exception "Invalid UTF-8 sequence"when trying to handle it as UTF8 string (in particular by strip()or writeln() functions). This situation seems quite manageable bycode page conversions you've mentioned above. I've tried firstlibrary function found (std.windows.charset), and got a ratherfanciful working statement:


response = fromMBSz((readln()~"\0").ptr, 1).strip();

which assigns correct Latin/Cyrillic contents to the responsevariable.

And if I initialize console with SetConsoleCP(65001) statementthings get worse, as I've said above. Then readln() statementreturns an empty string and something gets broken inside theruntime, because any further readln() statements do not wait foruser input, and return empty strings immediately.

Re: how to localize console and GUI apps in Windows

Reply via email to