Angus Leeming wrote: > Peter Kümmel wrote: >> Abdelrazak Younes wrote: >>> Peter Kümmel wrote: >>>> Peter Kümmel wrote: >>>> >>>>> for values which are not surrogates "if (ch >= UNI_SUR_HIGH_START && >>>>> ch <= UNI_SUR_LOW_END)" (2047 values) >>>> read: only 2047 of the 65535 values are not allowed, and for the rest >>>> a cast transforms from utf32 to utf16. >>> I think QChar will automatically replace those with interrogation marks >>> anyway. >>> >>> But I could also check for these values explicitely in my conversion >>> routine and return this '?' characters for those unknown characters: >>> >>> char_type const UNI_SUR_HIGH_START 0xD800; >>> char_type const UNI_SUR_LOW_END 0xDFFF; >>> >>> QChar const UnknownChar(...); >>> >>> QChar const ucs4_to_qchar(char_type const & ucs4) >>> { >>> if (ucs4 >= 0xFFFE >>> || (ucs4 >= UNI_SUR_HIGH_START && ucs4 <= UNI_SUR_LOW_END) >>> return UnknownChar; >>> >>> return QChar(static_cast<unsigned short>(ucs4)); >>> } >>> >>> Abdel. >>> >>> >> >> >> Could we not replace the current implementation of >> >> unsigned short ucs4_to_ucs2(boost::uint32_t c) >> >> with such a inline implementation, because iconv must >> in principle do the same. >> >> char_type const UNI_REPLACEMENT_CHAR 0x0000FFFD >> char_type const UNI_SUR_HIGH_START 0xD800; >> char_type const UNI_SUR_LOW_END 0xDFFF; >> >> unsigned short ucs4_to_ucs2(boost::uint32_t ucs4) >> { >> if (ucs4 >= 0xFFFE || (ucs4 >= UNI_SUR_HIGH_START && ucs4 <= >> UNI_SUR_LOW_END)) >> return UnknownChar; >> >> return static_cast<unsigned short>(ucs4); >> } >> >> compare with >> http://www.unicode.org/Public/PROGRAMS/CVTUTF/ConvertUTF.c > > I think iconv does it already. See the "Unicode on Mac" thread and links > to ucs4internal.h. If I know Lars, he's beavering away benchmarking all > these ideas. Give him a little time to do his day job too ;-) > > Angus > >
I've tried ucs4_to_ucs2 with the -INTERNAL arguments, but this had not resolved the problem, but maybe there must be more changed than the conversion strings. Peter