Hi Sandro, > I'm here to submit to your attention an alternative gml:xlink > resolving method (GML driver) I've implemented for Tuscany Region. >
As usual, very in-depth analysis and high quality patch. Here are my comments : * #include "sqlite3.h" in ogr/ogrsf_frmts/gml/gmlreaderp.h is not appropriate in that file, and should be only found in hugefileresolver.cpp. SQLite is a very common dependency in most GDAL/OGR builds, but not a compulsory one. So some adjustements will be needed in the GNUmakefile/makefile.vc to detect if sqlite dependency is really available before trying to build hugefileresolver.cpp and calling it from ogrgmldatasource.cpp * Did you consider just using the OGR API to abstract the access to the temporary database, instead of directly using sqlite API ? This would allow to use other DB backends, such as postgresql for example. Not a strong objection however to justify reworking your patch. (A nice side-effect of it would be that there would make build integration easy) * GML_GFS_TEMPLATE config option is general purpose, and could be made also available even when xlink:href resolving is not asked. * I feel you could completely remove the GML_HUGE_SQLITE_UNSAFE config option, and decide it to be TRUE. I don't see any reason why one would want to have recovery guarantees for a temporary file. If the process is interrupted, I suppose that the full resolving must be starting again and the temp file will be recreated : correct ? * Same for GML_HUGE_TEMPFILE. Why would one want to keep the temporary sqlite DB ? Or perhaps keep it undocumented for just debugging purposes, and make it default to YES. * A bit correlated with the above : the use of int* pbOutIsTempFile in hugeFileResolver() is confusing, because (contrary to the resolvexlinks.cpp case) it is not used as an output parameter, but as an input one. And I am not certain it only does what is documented (i.e. deleting the temporary .sqlite). I somehow feel it will also delete the .resolved.gml file. See GRGMLDataSource::~OGRGMLDataSource(). Perhaps I'm just confused... My understanding of how resolvexlinks.cpp works is the following one : if for some reason, we cannot create a .resolved.gml file in the same directory as the .gml, we'll try to generate a temporary file somewhere else, in which case ResolveXlinks() will set *pbOutIsTempFile to TRUE, so that the destructor of OGRGMLDataSource() can clean it when no longer used. * I'm wondering if the HUGE resolver could be automatically selected if the GML file is bigger than a threshold ( 100 MB ? ) (and if sqlite is available of course). The naming GML_SKIP_RESOLVE_ELEMS=HUGE sounds a bit awckward to select a resolving algorithm and not very consistant with the current allowed values (ALL / NONE / commma separated values), but I also find the current logic a bit twisted too (GML_SKIP_RESOLVE_ELEMS=NONE means resolving all elements...). * In hugefileresolver.cpp, I think that the hand-parsing of XML is unnecessary in gmlHugeFileCheckXrefs(). The only caller of gmlHugeFileCheckXrefs() does a gmlText = CPLSerializeXMLTree((CPLXMLNode *)pszNode) just before. Why not just exploring the nodes through the CPLXMLNode objects ? This would be much cleaner and robust (in case the XML serialization changes a bit), and likely a bit faster too. * There might be security issues due to the use of char XXX[1024]. I see quite a few strcpy() with no size checking. I'd suggest using CPLStrlcpy(). * A few naming inconsistencies. The z in papszGeomList = poFeature->GetGeometryList() should be removed because no strings are used. Same for pszNode = papszGeomList[i]. * At line 1793 of hugefileresolver.cpp, VSIFPrintfL ( fp, " <%s>\%s</%s>\n", --> the \ before %s looks wrong. And I see no use of XML escaping for the value of the attribute (use CPLEscapeString()) * I see special cases for GML topology geometries in your code. Do you have small GML samples that illustrate that and could be used for regression testing ? Please open a ticket on GDAL Trac. You mentionned the limitations of the XSD parser. That could be an interesting area for improvements... Downloading XSD via URLs or supporting <xs:include> doesn't look the more difficult part. The real challenge is dealing with complex schema (complex being everything that doesn't fit into the Simple Feature Model of "flat" attributes). Best regards, Even _______________________________________________ gdal-dev mailing list [email protected] http://lists.osgeo.org/mailman/listinfo/gdal-dev
