Hello On Wed, Nov 16, 2011 at 5:09 PM, Even Rouault <[email protected]> wrote: > Etienne, > >> >> It seems that setting source srs is needed when using shapefiles, as >> you said. This should be documented somewhere (probably on the >> ogr2ogr page and/or shapefile driver page). > > Feel free to add a warning. Logically, this should be more in the shapefile > driver page. But this assumes that people actually read docs, which is dubious > ;-)
It would be nice to put it in the ogr2ogr page too, but I understand you wouldn't want to put format-specific stuff in there. I'll update the shapefile driver docs. > >> >> The more I use shapefiles the more I see the limitation in this file >> format, and am quite puzzled as to why it is still so widespread... > > Yes shapefiles suffers from a lot of deficiencies (limitations of dbf format, > no > native - documented - spatial indexing, prj files, ...) You might experiment > with spatialite which is far more capable, but still less widespread. > >> >> Any other ideas on how we can fix this? >> Here is how I think it could be done: >> >> 1- for all EPSG projections, generate its ESRI WKT (and perhaps a few >> variations) >> 2- make a mapping from ESRI WKT (or its hash) to EPSG codes >> 3- use the hash mapping to find the EPSG code from a given WKT. >> >> Does this make sense? >> >> An obvious hurdle is that WKTs can have small variations. >> >> For example, >> >> EPSG:4618 as output by GDAL: >> $ gdalsrsinfo -o wkt_esri EPSG:4618 >> GEOGCS["SAD69",DATUM["D_South_American_1969",SPHEROID["GRS_1967_Truncated", >> 6378160,298.25]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]] >> >> whereas an example file (brazil.prj) has: >> GEOGCS["SAD69",DATUM["D_South_American_1969",SPHEROID["GRS_1967_Modified",6 >> 378160,298.25]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]] >> >> however, GDAL can deal with these variations: >> $ gdalsrsinfo -o wkt_esri ESRI::brazil.prj >> GEOGCS["SAD69",DATUM["D_South_American_1969",SPHEROID["GRS_1967_Truncated", >> 6378160,298.25]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]] > > The conversion between GDAL WKT and ESRI WKT belongs to the field of > experimental science certainly. There are some known rules, but a lot of > particular cases, some still remaining to be unearthed. The version of > ogr_srs_esri.cpp in 1.8-esri branch is far more complicated than the one in > trunk. Is this going to be merged into trunk eventually? > > As far as your above algorithm is concerned, I'm wondering how it could work, > with the variations you gave above. Perhaps a statistical approach with fuzzy > string matching would give better results than something based on hashing ;-) > More seriously, I think that a campaign of collecting a lot of .PRJ files > (ideally coming from ESRI software, and not produced by GDAL) would be needed > first to see which rules can work in practice. I have been playing around a bit and here is what I did that works (first try): - take a given CRS definition (from say EPSG or .prj file) and find it's ESRI WKT or "simple" WKT. - for all the EPSG codes in pcs.csv and gcs.csv, get it's ESRI (or simple WKT), and compare that to the target WKT - if you've a matching WKT, then get the full WKT corresponding to the EPSG code that matches. The problem is that it's pretty inefficient as you can imagine, taking a few seconds to find one single target. A second iteration: - generate full WKT, ESRI WKT and "simple" (StripCT) WKT for all EPSG codes in pcs.csv and gcs.csv - save these to a flat (gzipped) file in csv form - use these tables to find the EPSG code that matches a given WKT (in whatever WKT flavor you need) This is rather efficient in terms of processing time. I thought that a hashing method could decrease the time to find a matching string, but probably not because you have to load the entire dataset anyway, and it doesn't make sense when you are scanning once. This works for all EPSG codes I tried (think of it as a reverse EPSG lookup), and also a few .prj files. A problem I encountered was the differences in significant digits in the ESRI-WKT and OGC-WKT, so for now it works best if warping to ESRI WKT. I will file a bug about this, concerning the shapefile driver, and also incorporate this into the gdalsrsinfo utility (with a new "EPSG" output). Should I create a sandbox for an experimental gdalsrsinfo util implementing this idea? I found a few "fuzzy string" algorithms floating around, the idea is not bad but could be expensive computationally. It could serve as a backup if direct string matching fails. > > Another point to keep in mind is that the TOWGS84 parameters proposed by GDAL > do not always make concensus. The GRASS developers are not particularly happy > with that : they would prefer that a list of possible transformations would be > proposed when EPSG lists several of them, instead of just one picked up. See > http://lists.osgeo.org/pipermail/gdal-dev/2011-September/030280.html That's interesting also. So what is best, using the TOWS84 params that GDAL chooses, or using none at all (as happens in this case)? merci, Etienne > > Best regards, > > Even > _______________________________________________ gdal-dev mailing list [email protected] http://lists.osgeo.org/mailman/listinfo/gdal-dev
