Are you running mapserver on Windows or Linux?
If are on Linux try to set the LC_ALL environment variable to
something reasonable for your setup, like en_US.UTF-16 or en_US.UTF-8
and see if things improve. On Windows do something similar on the
regional settings in control panel.

On linux you can get more detail by looking at the locale man page.

HTH,
Umberto


On 2/20/07, Russell de Grove <[EMAIL PROTECTED]> wrote:
I have map layers in ArcSDE on Sql Server 2005 and I have been trying to
label features from a field with Unicode data (type nvarchar).

To get around the ""Unknown SDE column type" error I had to add the
following to the sdeGetRecord method in mapsde.c, in the "switch(itemdefs
[i].sde_type)" block:

#ifdef SE_NSTRING_TYPE
    case SE_NSTRING_TYPE:
      shape->values[i] = (char *)malloc( (itemdefs[i].size + 1) * sizeof
(unsigned short));
      status = SE_stream_get_nstring(sde->stream,
                                    (short) (i+1),
                                    (unsigned short *)shape->values[i]);
      if(status == SE_NULL_VALUE)
        ((unsigned short *)shape->values[i])[0] = (unsigned short)0; /* empty
string */
      else if(status != SE_SUCCESS) {
        sde_error(status, "sdeGetRecord()", "SE_stream_get_nstring()");
        return(MS_FAILURE);
      }
      break;
#endif

So far, so good, but I only see the first character of each label.  If I 
explicitly
include a Unicode "preamble", I see two garbage characters followed by the
first expected characters.  As it happens, my data is in UTF-16 and my
characters are all ASCII-type characters that use only the low byte.  I believe
what is causing my problem is the "msGetEncodedString" method in mapgd.c.

char *msGetEncodedString(const char *string, const char *encoding)
{
#ifdef USE_ICONV
  iconv_t cd = NULL;
  char *in, *inp;
  char *outp, *out = NULL;
  size_t len, bufsize, bufleft, status;
  cd = iconv_open("UTF-8", encoding);
  if(cd == (iconv_t)-1) {
    msSetError(MS_IDENTERR, "Encoding not supported by libiconv (%s).",
               "msGetEncodedString()", encoding);
    return NULL;
  }
  len = strlen(string);

// Problem point: strlen will return the count up to the first null byte,
so "Shape #0" as Unicode will return 1 for the S stored little-endian, or 3 if a
Unicode "preamble" is used

  bufsize = len * 4;
  in = strdup(string);
  inp = in;
  out = (char*) malloc(bufsize);
  if(in == NULL || out == NULL){
    msSetError(MS_MEMERR, NULL, "msGetEncodedString()");
    msFree(in);
    iconv_close(cd);
    return NULL;
  }
  strcpy(out, in);
  outp = out;

  bufleft = bufsize;
  status = -1;
  while (len > 0){
    status = iconv(cd, (const char**)&inp, &len, &outp, &amp;bufleft);

// Problem point: since this expects byte pairs, a byte length of 1 or 3 is 
going
to cause problems.

    if(status == -1){
      msFree(in);
      msFree(out);
      iconv_close(cd);
      return strdup(string);

// Problem point: since there was a problem, strdup returns the original 
"string"
up to the first null byte... so I get "S", possibly with a couple of preceding
garbage characters if I used a preamble

    }
  }
  out[bufsize - bufleft] = '\0';

  msFree(in);
  iconv_close(cd);

  return out;
#else
  msSetError(MS_MISCERR, "Not implemeted since Iconv is not enabled.",
             "msGetEncodedString()");
  return NULL;
#endif
}

Has anyone else encountered similar problems? Does anyone know how I can
determine the correct width of characters based on the "encoding" parameter?

Reply via email to