Hello, list.

Recently, I noticed that ArcGIS software (at least since version 10.3) can 
produce shapefiles where the DBF file is encoded with UTF-16.
https://desktop.arcgis.com/en/arcmap/latest/extensions/production-mapping/converting-a-geodatabase-to-shapefiles.htm

But they have made it difficult to do so, since you need the "Production 
Mapping" license. Without that, produced shapefiles
will by default be in UTF-8; one can use some other code page by modifying a 
system registry setting dbfDefault, but there
doesn't seem to be any setting that will produce UTF-16.

I have never encountered a shapefile in UTF-16, but I am beginning to wonder if 
we ought to support them. I guess they would be
more space-efficient for languages like Chinese and Japanese, where most 
characters need three UTF-8 bytes but only two UTF-16
bytes. This could be important since DBF reserves only 10 bytes for field names.

Some questions:

Can the OGR Shape driver handle UTF-16?
More generally, are there many GIS systems that can handle UTF-16 in shapefiles?
Or perhaps I should just ask: has anyone ever seen a shapefile in UTF-16?
If so, would the content of the CPG file be always UTF-16LE or always UTF-16BE, 
or is it just UTF-16?
I suppose the only things encoded in UTF-16 would be the field values of type 
String, plus the field names?

(I also wonder if shapefiles in UTF-16 is a good idea, or if the GIS community 
just ought to forget about them,
but I guess there is no definite answer to that!)

Kind regards,

Mikael Rittri
Carmenta Geospatial Technologies
_______________________________________________
gdal-dev mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to