Hi, > The problem isn't the number of the year, but that C/C++ and its > standard libs neglected advanced string processing aside char (even > wchar is kind of a step child even in todays C++ programming) for a > long time, so you are always reliant on some advanced lib that > supports this (non-trivial, if correctly done) encoding stuff (QString > is an excellent example, and in my eyes still a reference for how > string classes should be done), or you had to roll your own, using > what little support C is able to give. Should get better with C++0x, > but for source-highlight I wouldn't count on it, as it will take a > while until it's available on most platforms and installations.
I've abandoned C++ almost a decade ago, and nowadays I use mostly Ocaml. Nevertheless the situation with the two languages is similar in the sense that the string type in the core language is not encoding-aware. Ocaml users have got used to relying on external libraries if they ever need encoding-aware handling of UTF-8. > That may be the case, but you still need some non-standard > infrastructure around it to make UTF-8 string processing work > properly, and usually that's nothing that you do in one evening for > your home-brew projects (not meant to slag you, Lorenzo ;-)). Yes, I would not recommend either that Lorenzo implements his own UTF-8 handling functions. And even if he is reluctant to link against yet another library, perhaps he can just copy+paste the required code if the license allows it. This latter solution is feasible because for many applications all that is required are one or two UTF-8 specific functions, such as strlen. > One problem, aside from strlen() (without which it's IMHO hard to > write any string processing at all), is how to determine which type > the string literal in your code is, or which encoding the file you're > processing has. Or you can just expect the caller of Source-highlight's core functions to give you the encoding and/or provide always UTF-8 strings. This simplifies things tremendously. > Never heard of any environment using UTF-32 seriously. And UTF-16 I > know mostly from VFAT and NTFS... However, my experience in this field > is limited. UTF-32 is used by some people who prefer to deal with fixed-length encodings. And if you application requires frequent access to arbitrary character positions, then it may be worth paying the price in extra memory in exchange for O(1) access. Cheers, Dario Teixeira _______________________________________________ Help-source-highlight mailing list [email protected] http://lists.gnu.org/mailman/listinfo/help-source-highlight
