Re: [Geoserver-devel] [jira] Created: (GEOS-2399) Need a way to specify the encoding of shapefiles generated with SHAPE-ZIP output format

Oxenstierna Andreas Wed, 19 Nov 2008 01:48:31 -0800

Yes, character sets should have been unified from the beginning of the computer 
age... 
Codepages are essentially an IBM PC /  Windows thing, as described by 
Wikipedia. Fairly complete lists are available in the external links, e.g. 
http://msdn.microsoft.com/en-us/library/ms776446.aspx.
For us the important issue is to easily use the shapefiles correctly in 
different softwares. To my knowledge, the codepage file ('.cpg) is read by all 
ESRI softwares and many others as well. And can easily be updated if needed...
 
Best Regards
 
________________________________


Andreas Oxenstierna

Telefon direkt 040-16 70 17 
Mobil 0734-12 80 17 
[EMAIL PROTECTED]
SWECO Position AB

Hans Michelsensgatan 2
Box 286
201 22 Malmö
Telefon 040-16 70 00
www.sweco.se

 






-----Ursprungligt meddelande-----
Från: Andrea Aime [mailto:[EMAIL PROTECTED]
Skickat: den 19 november 2008 10:14
Till: Oxenstierna Andreas
Kopia: Andrea Aime (JIRA); [email protected]
Ämne: Re: [Geoserver-devel] [jira] Created: (GEOS-2399) Need a way to specify 
the encoding of shapefiles generated with SHAPE-ZIP output format

Oxenstierna Andreas ha scritto:
> Great enhancement for all non-A-Z languages.
> 
> How will the encoding be stored in the DBF-file?
> ESRI has two ways of doing this, either storing LDID in the DBF header
> or creating a textfile <filename>.cpg which stores the codepage.
> See
> http://support.esri.com/index.cfm?fa=knowledgebase.techarticles.articl
> eShow&d=26015
> <http://support.esri.com/index.cfm?fa=knowledgebase.techarticles.artic
> leShow&d=26015>

Hum, not sure we can use any of these... In particular, Java has no notion of 
what a codepage is, only knows about Locale and Charset, both basically go with 
the standard encoding names such as ISO-8859-xx or UTF8/16/32 family.

For reading foreign chars shapefiles we already allow the user to specify the 
encoding that way, and for writing we would to the same, but how to turn a 
java.nio.Charset to a codepage number is something I don't know.

By quickly looking around with Google I've found this library 
(http://cpdetector.sourceforge.net/) that does the opposite, it guesses the 
encoding based on the file contents, and it's called Code Page detector, but in 
fact it does return a java.nio.Charset.
By looking more I've found this post
(http://forums.sun.com/thread.jspa?messageID=10372122) where someone states 
that codepage concept is not supported by Java as it's something Windows 
specific.

There was some discussion about codepage support in OGR, not sure how it turned 
out:
http://article.gmane.org/gmane.comp.gis.gdal.devel/8710

So it seems to pull this we'd first need to build a conversion table from 
codepages to encodings, provided that is even possible.
Seems like quite a bit of long boring work...

Cheers
Andrea

PS: more info about code pages here:
http://en.wikipedia.org/wiki/Code_page

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/

_______________________________________________
Geoserver-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Re: [Geoserver-devel] [jira] Created: (GEOS-2399) Need a way to specify the encoding of shapefiles generated with SHAPE-ZIP output format

Reply via email to