Hi,

As part of a AWEOC (*), following a recent email thread about year-long issues 
when 
importing shapefiles, I've committed a first version of a OSRFindMatches() 
function that 
iterates through the EPSG database to check for one or several matches between 
the input 
SRS and the catalog.

The shapefile driver now uses this function, instead of the more simple 
AutoIdentifyEPSG(), 
which only worked on a tiny subset of SRS. When a single match is found by 
OSRFindMatches() with an excellent match quality (meaning IsSame() returning 
TRUE), it is 
automatically picked up by the Shapefile driver.

OSRFindMatches() returns both a list of candidate SRS and a confidence value: 
0=bad, 
100=perfect . Currently if several candidates pass IsSame(), they will be 
returned with 90% 
confidence. If a SRS doesn't pass IsSame() but is the single one to have a 
PROJCS/GEOGCS 
name matching the input SRS name (name matching is done with a few hardcoded 
approximation rules), it is returned with 50% confidence.

OSRFindMatches() execution time is :
- first execution ever: around 500 ms. It will then create files in
~/.gdal/$(GDAL_VERSION_MAJOR).$(GDAL_VERSION_MINOR)/srs_cache to speed-up later 
executions
- first execution in a process, using the above cache: around 100ms
- later executions in the process: around 10 ms per SRS (55s to identify the 
5216 entries of 
GEOGCS and PROJCS from the EPSG database that are valid for GDAL)

So I feel this is good enough to be enabled by default in the shapefile driver.

I've tested & tuned the function:
- the 5216 above mentionned SRS exported from OGC WKT to ESRI WKT, and 
reimported 
from ESRI WKT to OGC WKT. 100% confidence single match is obtained for most 
SRS. Except 
occurences using Polar_Stereographic where ESRI WKT lose the scale_factor and 
other using 
Hotine_Oblique_Mercator where the rectified_grid_angle parametr is lost. But 
those are 
probably issues with morphToESRI() / morphFromESRI()
- the SRS definitions at
http://help.arcgis.com/en/arcgisserver/10.0/apis/rest/pcs.html . Results are 
less good there. 
Sometimes for good reason: no entry in the EPSG code, different projection or 
datum 
parameters, etc.

It could certainly be improved. For example by using instead of the back&white 
IsSame() 
method a (to-be-created) GetSimilarityLevel() method that would compare how 
different 2 
SRS are. More relaxed SRS name comparisons would also help (as the 'tokens' 
making SRS 
names are separated with underscore once 'massaged', you could split on that, 
and then 
compare how many tokens are in common (or have partial match), and sum the 
number of 
identical characters, etc..).


The gdalsrsinfo utility is also enhanced to use the new OSRFindMatches() 
function if you use 
the -e switch (for epsg).

For example, with some.prj containing: 
PROJCS["NAD_1983_StatePlane_Texas_Central_FIPS_4203_Feet",GEOGCS["GCS_North_Ame
rican_1983",DATUM["D_North_American_1983",SPHEROID["GRS_1980",
6378137.0,298.257222101]],PRIMEM["Greenwich",0.0],UNIT["Degree",
0.0174532925199433]],PROJECTION["Lambert_Conformal_Conic"],PARAMETER["False_Easti
ng",2296583.333333333],PARAMETER["False_Northing",
9842500.0],PARAMETER["Central_Meridian",-100.3333333333333],PARAMETER["Standard_P
arallel_1",30.11666666666667],PARAMETER["Standard_Parallel_2",
31.88333333333333],PARAMETER["Latitude_Of_Origin",
29.66666666666667],UNIT["Foot_US",0.3048006096012192]]

$ gdalsrsinfo -e ESRI::some.prj

EPSG:2277

PROJ.4 : +proj=lcc +lat_1=31.88333333333333 +lat_2=30.11666666666667 
+lat_0=29.66666666666667 +lon_0=-100.3333333333333 +x_0=699999.9998983998 
+y_0=3000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=us-ft +no_defs 

OGC WKT :
PROJCS["NAD83 / Texas Central (ftUS)",
    GEOGCS["NAD83",
        DATUM["North_American_Datum_1983",
            SPHEROID["GRS 1980",6378137,298.257222101,
                AUTHORITY["EPSG","7019"]],
            TOWGS84[0,0,0,0,0,0,0],
            AUTHORITY["EPSG","6269"]],
        PRIMEM["Greenwich",0,
            AUTHORITY["EPSG","8901"]],
        UNIT["degree",0.0174532925199433,
            AUTHORITY["EPSG","9122"]],
        AUTHORITY["EPSG","4269"]],
    PROJECTION["Lambert_Conformal_Conic_2SP"],
    PARAMETER["standard_parallel_1",31.88333333333333],
    PARAMETER["standard_parallel_2",30.11666666666667],
    PARAMETER["latitude_of_origin",29.66666666666667],
    PARAMETER["central_meridian",-100.3333333333333],
    PARAMETER["false_easting",2296583.333],
    PARAMETER["false_northing",9842500.000000002],
    UNIT["US survey foot",0.3048006096012192,
        AUTHORITY["EPSG","9003"]],
    AXIS["X",EAST],
    AXIS["Y",NORTH],
    AUTHORITY["EPSG","2277"]]


It can also return several occurences if there isn't a single candidate

For example:
$ gdalsrsinfo  -e 
'PROJCS["undefined",GEOGCS["ETRS89",DATUM["European_Terrestrial_Reference_System_
1989",SPHEROID["GRS 1980",6378137,298.257222101]],PRIMEM["Greenwich",
0],UNIT["degree",
0.0174532925199433]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origi
n",0],PARAMETER["central_meridian",27],PARAMETER["scale_factor",
0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",
1]]'

Confidence in this match: 90 %

EPSG:3047

PROJ.4 : +proj=utm +zone=35 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m 
+no_defs 

OGC WKT :
PROJCS["ETRS89 / UTM zone 35N (N-E)",
    GEOGCS["ETRS89",
        DATUM["European_Terrestrial_Reference_System_1989",
_______________________________________________
gdal-dev mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to