Re: ICU D Wrapper

Trent Forkert via Digitalmars-d Sat, 13 Dec 2014 09:31:23 -0800

On Saturday, 13 December 2014 at 15:44:59 UTC, Sean Kelly wrote:

On Friday, 12 December 2014 at 17:57:41 UTC, Trent Forkertwrote:
I've looked into writing a binding for ICU recently, butultimately decided to abandon that idea in favor of writing areplacement for it in D.
Wow... really? You're actually going to write transcoders forall available encodings? Plus the conversion and parsing tools,plus expand our calendar functionality to handle the things itdoesn't do now, plus... I mean I'd love it, but the scope ofthe project can be measured in tens of man-years.


Running down the icu4c API listing:

* Basic Types and Constants - only as needed
* Strings and character iteration - Just use D strings, std.string

* Unicode character properties and names - I think std.unihandles this

* Sets of Unicode Code Points and Strings - ditto
* Codepage conversion - ignoring, at least for now. See below.
* Unicode text compression - again, I think std.uni handles this
* Locales - yes

* Resource Bundles - will offer equivalent functionality, justnot identical

* Normalization - std.uni
* Calendars - see below
* Date and time formatting - yes
* Message formatting - yes
* Number formatting / spell-out - yes

* Transliteration - yes, but may be delayed until after initialrelease

* Bidirectional Algorithm - not at first, is this in std.uni?
* Arabic shaping - not at first, is this in std.uni?

* Collation - I'm delaying this until after the initial releaseto get it out faster

* String searching - depends on Collation
* Index characters - depends on Collation
* Text Boundary analysis - depends on Collation
* Regular Expression - use std.regex
* StringPrep - not initially, is this in std.uni?
* IDNA - not initially, is this in Phobos?
* Identifier spoofing and confusability - not initially

* Layout engine - delayed, looks like ICU is removing this andpointing to another library

* Universal Time Scale - see below
* ICU I/O - use phobos

There are very few things above that are not possible to generatefrom CLDR data. Of those, most are RFC-defined algorithms,several of which I believe are already part of Phobos.

If I add codepage conversion, it will likely be in terms of iconvon POSIX and MultiByteToWideChar and friends on Windows.Alternatively, I could "borrow" the IBM CDRA/UCM data the way I'mgetting almost everything else from CLDR data.

Support of other calendar systems is up in the air at the moment.I had thought CLDR contained what I needed, but it looks like itmight not. It has locale-specific formatting and display info forcalendars, and mappings to when other calendar's eras begin interms of the Gregorian calendar, but I don't see furtherbreakdown of information. So, initially it looks like I'll onlybe supporting Gregorian calendar, but I may add the others in thefuture.

It is a lot of work, yes, but the Unicode Consortium already doesa significant chunk of it with CLDR.


 - Trent

Re: ICU D Wrapper

Reply via email to