On 18 Mar 2013, at 09:06, Lee Badham wrote:
> Hi,
>
> I'm trying to get a correctly encoded unicode name from an ICC v2 profile.
>
> The name as displayed in ColorSync Utility is correct - Japan Color コート紙
>
> (Using the LCMS2MBS plugin)
>
> n= p.ReadTag(LCMS2MBS.kcmsSigProfileDescriptionTag)
>
> either
> se = n.getUnicode("en", n.kNoCountry)
> or
> st = n.getUnicode("", n.kNoCountry)
>
> I need a UTF8 encoded string from the description.
>
> I've also tried using a LCMS1 description tag but that is wrong too.
>
> What encoding does the Unicode descriptions have?
The answer to that is somewhat complicated.
It depends on the definition of wchar_t on your system.
According to the ICC spec (V4.3), the unicode strings are UTF-16BE encoded.
There is no BOM.
lcms reads the UTF-16BE data from the profile, converts it to native byte order
and then stores the result in an array of wchar_t.
If wchar_t is 16 bits wide on your system, the result is UTF-16 data in the
native byte order.
if wchar_t is 32 bits wide (like it is on MacOS X), the result is UTF-16 data
in 32bit words. This is almost 32bit unicode except that some unicode
characters don't fit into UTF-16 as single units, and are encoded as 2 16 bit
values.
On Mac OS X you can get the description out with:
static NSString* NSStringFromMLU( const cmsMLU* mlu, const char
languageCode[3], const char countryCode[3] ) {
int bufferSize = cmsMLUgetWide(mlu, languageCode, countryCode, NULL, 0);
wchar_t* wcharBuffer = (wchar_t*)malloc(bufferSize);
cmsUInt32Number len = cmsMLUgetWide( mlu, languageCode, countryCode,
wcharBuffer, bufferSize);
size_t numEntries = len / sizeof(wchar_t);
// lcms returns UTF16 data in an array of wchar_t. Unfortunately on MacOS
X, wchar_t is 32bits,
// so convert the buffer to UTF16 (or convert the buffer contents to UTF32,
which may shorten the buffer)
// Copy wchar_t elements to uint16_t array to make UTF16 data:
uint16_t* utf16Buffer = (uint16_t*)malloc(numEntries * sizeof(uint16_t));
for (size_t i = 0; i < numEntries; i++) {
utf16Buffer[i] = wcharBuffer[i];
}
// Specify endianness explicitly: UTF16 expects a BOM or big-endian data.
NSString* s = [[NSString alloc] initWithBytes: utf16Buffer length:
numEntries*sizeof(uint16_t) encoding: NSUTF16LittleEndianStringEncoding];
free( utf16Buffer);
free( wcharBuffer);
return s;
}
(real code would check for errors)
- Paul
>
> Lee Badham
>
> www.bodoni.co.uk | www.presssign.com
>
>
>
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_d2d_mar
> _______________________________________________
> Lcms-user mailing list
> Lcms-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/lcms-user
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Lcms-user mailing list
Lcms-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lcms-user