Re: [Lcms-user] Getting the correct unicode name from a v2 profile

Paul Miller Mon, 18 Mar 2013 06:16:38 -0700

On 18 Mar 2013, at 09:06, Lee Badham wrote:

> Hi,
> 
> I'm trying to get a correctly encoded unicode name from an ICC v2 profile.
> 
> The name as displayed in ColorSync Utility is correct - Japan Color コート紙
> 
> (Using the LCMS2MBS plugin)
> 
>      n= p.ReadTag(LCMS2MBS.kcmsSigProfileDescriptionTag)
> 
> either
>      se = n.getUnicode("en", n.kNoCountry)
> or
>      st = n.getUnicode("", n.kNoCountry)
> 
> I need a UTF8 encoded string from the description.
> 
> I've also tried using a LCMS1 description tag but that is wrong too.
> 
> What encoding does the Unicode descriptions have?
The answer to that is somewhat complicated.


It depends on the definition of wchar_t on your system.

According to the ICC spec (V4.3), the unicode strings are UTF-16BE encoded.  
There is no BOM. 

lcms reads the UTF-16BE data from the profile, converts it to native byte order 
and then stores the result in an array of wchar_t.

If wchar_t is 16 bits wide on your system, the result is UTF-16 data in the 
native byte order.

if wchar_t is 32 bits wide (like it is on MacOS X), the result is UTF-16 data 
in 32bit words.  This is almost 32bit unicode except that some unicode 
characters don't fit into UTF-16 as single units, and are encoded as 2 16 bit 
values.

On Mac OS X you can get the description out with:

static NSString* NSStringFromMLU( const cmsMLU* mlu, const char 
languageCode[3], const char countryCode[3] ) {
    
    int bufferSize = cmsMLUgetWide(mlu, languageCode, countryCode, NULL, 0);
    
    wchar_t* wcharBuffer = (wchar_t*)malloc(bufferSize);
    cmsUInt32Number len = cmsMLUgetWide( mlu, languageCode, countryCode, 
wcharBuffer, bufferSize);
    size_t numEntries = len / sizeof(wchar_t);
    
    // lcms returns UTF16 data in an array of wchar_t.  Unfortunately on MacOS 
X, wchar_t is 32bits, 
    // so convert the buffer to UTF16 (or convert the buffer contents to UTF32, 
which may shorten the buffer)

    // Copy wchar_t elements to uint16_t array to make UTF16 data:
    uint16_t* utf16Buffer = (uint16_t*)malloc(numEntries * sizeof(uint16_t));
    for (size_t i = 0; i < numEntries; i++) {
        utf16Buffer[i] = wcharBuffer[i];
    }
    
    // Specify endianness explicitly: UTF16 expects a BOM or big-endian data.
    NSString* s = [[NSString alloc] initWithBytes: utf16Buffer length: 
numEntries*sizeof(uint16_t) encoding: NSUTF16LittleEndianStringEncoding];
    
    free( utf16Buffer);
    free( wcharBuffer);
    
    return s;
}

(real code would check for errors)

- Paul



> 
> Lee Badham
> 
> www.bodoni.co.uk | www.presssign.com
> 
> 
> 
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_d2d_mar
> _______________________________________________
> Lcms-user mailing list
> Lcms-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/lcms-user

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar

_______________________________________________
Lcms-user mailing list
Lcms-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lcms-user

Re: [Lcms-user] Getting the correct unicode name from a v2 profile

Reply via email to