Yes, character sets should have been unified from the beginning of the computer
age...
Codepages are essentially an IBM PC / Windows thing, as described by
Wikipedia. Fairly complete lists are available in the external links, e.g.
http://msdn.microsoft.com/en-us/library/ms776446.aspx.
For us the important issue is to easily use the shapefiles correctly in
different softwares. To my knowledge, the codepage file ('.cpg) is read by all
ESRI softwares and many others as well. And can easily be updated if needed...
Best Regards
________________________________
Andreas Oxenstierna
Telefon direkt 040-16 70 17
Mobil 0734-12 80 17
[EMAIL PROTECTED]
SWECO Position AB
Hans Michelsensgatan 2
Box 286
201 22 Malmö
Telefon 040-16 70 00
www.sweco.se
-----Ursprungligt meddelande-----
Från: Andrea Aime [mailto:[EMAIL PROTECTED]
Skickat: den 19 november 2008 10:14
Till: Oxenstierna Andreas
Kopia: Andrea Aime (JIRA); [email protected]
Ämne: Re: [Geoserver-devel] [jira] Created: (GEOS-2399) Need a way to specify
the encoding of shapefiles generated with SHAPE-ZIP output format
Oxenstierna Andreas ha scritto:
> Great enhancement for all non-A-Z languages.
>
> How will the encoding be stored in the DBF-file?
> ESRI has two ways of doing this, either storing LDID in the DBF header
> or creating a textfile <filename>.cpg which stores the codepage.
> See
> http://support.esri.com/index.cfm?fa=knowledgebase.techarticles.articl
> eShow&d=26015
> <http://support.esri.com/index.cfm?fa=knowledgebase.techarticles.artic
> leShow&d=26015>
Hum, not sure we can use any of these... In particular, Java has no notion of
what a codepage is, only knows about Locale and Charset, both basically go with
the standard encoding names such as ISO-8859-xx or UTF8/16/32 family.
For reading foreign chars shapefiles we already allow the user to specify the
encoding that way, and for writing we would to the same, but how to turn a
java.nio.Charset to a codepage number is something I don't know.
By quickly looking around with Google I've found this library
(http://cpdetector.sourceforge.net/) that does the opposite, it guesses the
encoding based on the file contents, and it's called Code Page detector, but in
fact it does return a java.nio.Charset.
By looking more I've found this post
(http://forums.sun.com/thread.jspa?messageID=10372122) where someone states
that codepage concept is not supported by Java as it's something Windows
specific.
There was some discussion about codepage support in OGR, not sure how it turned
out:
http://article.gmane.org/gmane.comp.gis.gdal.devel/8710
So it seems to pull this we'd first need to build a conversion table from
codepages to encodings, provided that is even possible.
Seems like quite a bit of long boring work...
Cheers
Andrea
PS: more info about code pages here:
http://en.wikipedia.org/wiki/Code_page
--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Geoserver-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/geoserver-devel