Angus Leeming wrote:
> Peter Kümmel wrote:
>> Abdelrazak Younes wrote:
>>> Peter Kümmel wrote:
>>>> Peter Kümmel wrote:
>>>>
>>>>> for values which are not surrogates "if (ch >= UNI_SUR_HIGH_START &&
>>>>> ch <= UNI_SUR_LOW_END)" (2047 values)
>>>> read: only 2047 of the 65535 values are not allowed, and for the rest
>>>> a cast transforms from utf32 to utf16.
>>> I think QChar will automatically replace those with interrogation marks
>>> anyway.
>>>
>>> But I could also check for these values explicitely in my conversion
>>> routine and return this '?' characters for those unknown characters:
>>>
>>> char_type const UNI_SUR_HIGH_START 0xD800;
>>> char_type const UNI_SUR_LOW_END 0xDFFF;
>>>
>>> QChar const UnknownChar(...);
>>>
>>> QChar const ucs4_to_qchar(char_type const & ucs4)
>>> {
>>>     if (ucs4 >= 0xFFFE
>>>         || (ucs4 >= UNI_SUR_HIGH_START &&  ucs4 <= UNI_SUR_LOW_END)
>>>         return UnknownChar;
>>>
>>>     return QChar(static_cast<unsigned short>(ucs4));
>>> }
>>>
>>> Abdel.
>>>
>>>
>>
>>
>> Could we not replace the current implementation of
>>
>> unsigned short ucs4_to_ucs2(boost::uint32_t c)
>>
>> with such a inline implementation, because iconv must
>> in principle do the same.
>>
>> char_type const UNI_REPLACEMENT_CHAR 0x0000FFFD
>> char_type const UNI_SUR_HIGH_START 0xD800;
>> char_type const UNI_SUR_LOW_END 0xDFFF;
>>
>> unsigned short ucs4_to_ucs2(boost::uint32_t ucs4)
>> {
>>      if (ucs4 >= 0xFFFE || (ucs4 >= UNI_SUR_HIGH_START &&  ucs4 <=
>> UNI_SUR_LOW_END))
>>          return UnknownChar;
>>
>>      return static_cast<unsigned short>(ucs4);
>> }
>>
>> compare with
>> http://www.unicode.org/Public/PROGRAMS/CVTUTF/ConvertUTF.c
> 
> I think iconv does it already. See the "Unicode on Mac" thread and links
> to ucs4internal.h. If I know Lars, he's beavering away benchmarking all
> these ideas. Give him a little time to do his day job too ;-)
> 
> Angus
> 
> 

I've tried ucs4_to_ucs2 with the -INTERNAL arguments, but this had not resolved
the problem, but maybe there must be more changed than the conversion strings.

Peter

Reply via email to