Hi
As per the code documentation in the method msGetEncodedString (as shown
below), the characters are assumed to be UTF-8 by default.
char *msGetEncodedString(const char *string, const char *encoding)
{
---
if (len == 0 || (encoding && strcasecmp(encoding, "UTF-8")==0))
return strdup(string); /* Nothing to do: string already in UTF-8 */
Where as in the ‘values’ property of shapeObj.cs in C#, it is using
System.Runtime.InteropServices.Marshal.PtrToStringAnsi (to marshal characters
from c to c#). Shoudn’t it be using
System.Runtime.InteropServices.Marshal.PtrToStringUni method since as the
charcters are by default held in UTF-8 encoding?
Thanks
Murty
From: Tamas Szekeres [mailto:[email protected]]
Sent: Wednesday, March 04, 2009 5:25 PM
To: Murty Maganti
Cc: [email protected]
Subject: Re: [mapserver-users] Encoding issues
Hi,
I don't know much about the hindi character sets.
I guess you could extent that byte array to string copy function with arbitrary
character sizes, like for double bytes something like:
for (int i = 0; i < bytes.Length; i=i+2)
s.Append(Convert.ToChar(bytes[i] + 256*bytes[i+1]));
Best regards,
Tamas
2009/3/4 Murty Maganti <[email protected]>
Hi Tamas
This is still not working for some of the Asian languages.
I suspect the issue could be in this line of your sample code below
s.Append(Convert.ToChar(bytes[i]));
Here, one single byte is used to convert to a character. But my understanding
is that UTF-8 can consume from 1 to 4 bytes to represent one character code
point. It worked fine for Arabic may be because all Arabic characters can be
represented using a single byte.
When I tried the same code below with Hindi, an Indian language, some of the
characters are shown junk (but not all characters). I guess those characters
which occupy more than a byte turned out to be junk.
I am also trying the opposite of the sample code below i.e. read a field value
from map server (shapeObj.values), which is in Hindi, and show on a web page,
again it turns out to be junk. I tried to look at the byte values of the string
in VS by using
Byte[] bites = Encoding.Unicode.GetBytes(shapeObj.values[0])
I notice that they are actually code point of UTF-8 but interpreted as UTF-16
and may be the reason I see junk characters on web page. But I don’t know how
to extract those UTF-8 byte values from UTF-16.
I am just giving sample code here to explain
byte[] utf16 = Encoding.Unicode.GetBytes("कीचनर"); //The text
is in Hindi, an Indian language
byte[] utf8 = Encoding.UTF8.GetBytes("कीचनर");
shapeObj shape = layer.getFeature(result.shapeindex,
result.tileindex);
string value = shape.values[1]; //This contains the same text
(in Hindi) as above in the shape file.
byte[] bytes = Encoding.Unicode.GetBytes(value); //There are
byte values of characters decoded from UTF-16. .Net internally stores all
strings in UTF-16
Now if I examine the values of ‘utf8’ and ‘bytes’ arrays
utf8 – 224,164,149,224,165,128,224,164,154,224,164,168,224,164,176
bytes –
224,0,164,0,34,32,224,0,165,0,172,32,224,0,164,0,97,1,224,0,164,0,168,0,224,0,164,0,176,0
utf16 – 21,9,64,9,26,9,40,9,48,9
The first byte value is same as UTF-8. Second byte value is 0 as UTF-16 takes
atleast 2 bytes for a character. This gives me impression that the byte values
are in UTF-8 and are not converted to UTF-16 to by .Net.
Appreciate if you see any solution for this and let me know.
Thanks
Murty
From: Tamas Szekeres [mailto:[email protected]]
Sent: Friday, February 06, 2009 6:59 PM
To: Murty Maganti
Cc: [email protected]
Subject: Re: [mapserver-users] Encoding issues
You might have to make explicit conversion maually something like:
string value = "لققافعععىىةةونه"; //I actually get this (in arabic)
through user input
byte[] bytes = Encoding.Convert(Encoding.Unicode,
Encoding.GetEncoding(1256), Encoding.Unicode.GetBytes(value));
StringBuilder s = new StringBuilder();
for (int i = 0; i < bytes.Length; i++)
s.Append(Convert.ToChar(bytes[i]));
shpObj.text = s.ToString();
Best regards,
Tamas
2009/2/6 Murty Maganti <[email protected]>
HI
I am doing a simple thing. I have a map file and trying to show some static
text in Arabic on map. You can try this with any map file as it is nothing to
do with layers from map.
At run time (like on a button click), please add this
layerObj lyr = new layerObj(mapObj);
lyr.name = "TextAcetate";
lyr.status = mapscript.MS_ON;
lyr.type = MS_LAYER_TYPE.MS_LAYER_ANNOTATION;
lyr.labelcache = mapscript.MS_TRUE;
double locationX = 50;
double locationY = 50;
lyr.transform = (int)mapscript.MS_FALSE;
classObj layerClass = new classObj(lyr);
//All label properties
layerClass.label.size = 15;
layerClass.label.type = MS_FONT_TYPE.MS_TRUETYPE;
…
…
layerClass.label.encoding = "CP1256";
shapeObj shpObj = new
shapeObj((int)MS_SHAPE_TYPE.MS_SHAPE_POINT);
lineObj lnObj = new lineObj();
pointObj pt = new pointObj(locationX, locationY, 0, 0);
lnObj.add(pt);
shpObj.add(lnObj);
shpObj.text = "لققافعععىىةةونه"; //I actually get this (in
arabic) through user input
lyr.addFeature(shpObj);
mapObj.draw(); //Onto a picture box or save as file
(In the map file, my output format is set to GD/PNG)
Please let me know if you need more information.
Thanks
Murty
From: [email protected]
[mailto:[email protected]] On Behalf Of Tamas Szekeres
Sent: Friday, February 06, 2009 4:12 PM
To: Murty Maganti
Cc: [email protected]
Subject: Re: [mapserver-users] Encoding issues
Please send me your example so that I could examine what's going on.
Best regards,
Tamas
2009/2/6 Murty Maganti <[email protected]>
Hi
I tried with the suggested encoding but still no success.
From the output below, I guess ICONV support is included.
E:\Utils\MapServer\Map Server 5.2 RC\ms4w\Apache\cgi-bin>mapserv -v
MapServer version 5.2.0 OUTPUT=GIF OUTPUT=PNG OUTPUT=JPEG OUTPUT=WBMP OUTPUT=PDF
OUTPUT=SWF OUTPUT=SVG SUPPORTS=PROJ SUPPORTS=AGG SUPPORTS=FREETYPE SUPPORTS=ICO
NV SUPPORTS=FRIBIDI SUPPORTS=WMS_SERVER SUPPORTS=WMS_CLIENT SUPPORTS=WFS_SERVER
SUPPORTS=WFS_CLIENT SUPPORTS=WCS_SERVER SUPPORTS=SOS_SERVER SUPPORTS=FASTCGI SUP
PORTS=THREADS SUPPORTS=GEOS SUPPORTS=RGBA_PNG INPUT=JPEG INPUT=POSTGIS INPUT=OGR
INPUT=GDAL INPUT=SHAPEFILE
Where can get some details on how to build the C# mapscript (Managed assembly
only) from Visual Studio, keeping all unmanaged dlls from binaries from ms4w. I
just want to give a try using MarshalAsAttribute.
Thanks
Murty
From: Tamas Szekeres [mailto:[email protected]]
Sent: Friday, February 06, 2009 3:02 PM
To: Murty Maganti
Cc: [email protected]
Subject: Re: [mapserver-users] Encoding issues
Hi,
You might want to try with encoding="ISO-8859-6" assuming you have libiconv
compiled in.
The c# mapscript doesn't specify explicit conversion during the marshaling. In
this case I assume an unicode to Charset.Ansi conversion will automatically
takes place by default.
Best regards,
Tamas
2009/2/6 Murty Maganti <[email protected]>
Hello
I am having some issues using Arabic text as labels. I am using C# map script.
I am setting the following at runtime
labelObj label = classObj.label;
label.encoding = "CP1256";
label.text = "some text in Arabic"; (At rune time in VS, I can see the text is
actually in Arabic)
But labels are displayed as '?????'.
Is there any conversion I need to do before setting the text value. How are
the string represented in the underlying mapscript dll (ASCII or Unicode?). As
I was reading in the MSDN, the default marshalling uses LPStr which is a single
byte of ASCII. Does it mean that first I need to convert from Unicode to ASCII
in C# before setting the value.
Appreciate any help.
Thanks
Murty
_______________________________________________
mapserver-users mailing list
[email protected]
http://lists.osgeo.org/mailman/listinfo/mapserver-users
_______________________________________________
mapserver-users mailing list
[email protected]
http://lists.osgeo.org/mailman/listinfo/mapserver-users