Thanks, Frank. I am pleased to hear that I have probably not missed much in my
analysis of the current GDAL/OGR facilities. I have not yet moved to 1.7, so it
may be that there is a little more in the driver now than I have seen in 1.6.3.
I started out thinking that I could concatenate these multiple objects into
simple strings ... until I discovered how long these could be. Admittedly I
opted to include the date along with the reason in the same string, but at 20 or
so characters per entry - and some of the other multiples involve rather more
characters per entry - I realised this was not realistic. In my sample dataset
and, eventually, I need to work with national coverage, I have objects
comprising several thousand 20 character entries - yes, they could be processed
to become long long integers but that is the same as holding the numeric part as
text! I decided that it would be easier to write a parser than to meddle with
the GML driver ... I think it probably was.
I do not mind admitting that, over the past couple of years, I have spent
several months effort evaluating apparently promising solutions for ingesting
Mastermap. Snowflake's products are out of our budget. The crunch came this
spring when advice from Safe Software's support team suggested that I could only
solve the problems by using the FME API to extend the FME product's capability
to handle these multiple objects. It seemed to me that if I had to code my own
solution, I might as well use an API with which I was already familiar: writing
interfaces is not my main job!
I think that we need a strategic assessment of whether the OSGB usage of GML is
likely to be a 'one off' or whether others may also exploit the power of GML in
a similar fashion. Certainly in Europe where the INSPIRE Directive is beginning
to take effect and requires the delivery of interoperable data services there is
the possibility of others following suit: I am not familiar with the German NAS
work, so do not know how that might relate, if at all. If OSGB is a 'one off',
then the amount of work that would appear to be involved for GDAL/OGR is
possibly a show stopper; if, however, there is a need for a generic solution
which will solve the OSGB issues and support a range of other products, such
that there is a reasonably widespread application, then it is probably worth at
least a proper assessment of the effort involved.
Best wishes,
Peter
PS my sample gml dataset covers just 50 km square and comes in at 2.8Gb: editing
constructs to avoid their being processed is not easy with files of that size!
Frank Warmerdam wrote:
On Thu, Jul 1, 2010 at 4:55 PM, Peter J Halls <[email protected]> wrote:
Jez, Even,
there are actually several issues relating to using GDAL/OGR to read
Ordnance Survey of Great Britain (OSGB) GML files distributed as their
Mastermap
product. One of these I reported as bug #1604 - I now find that this was
against GDAL-1.4.0 - which concerns handling 'duplicate' tokens: OGR ignores
Namespace and so treats <osgb:point> as the starter and fouls on the
following
<gml:point> which contains the geometry. There is a similar problem with
polygon objects.
Peter,
Ouch, the namespace stripping issue is unfortunate. I'm not sure
of a cheap fix.
The data Jez describes below is simpler than much of the data in the file
I did implement some degree of "complex structure flattening" when
I worked on the custom NAS (german GML profile) reader. I thought
perhaps it had made it into the mainline GML reader, but perhaps not.
If so, I think it could be ported.
If someone files a ticket specifically on this issue I can try to address
this or perhaps more likely have Chaitanya do it as he is now getting
quite familiar with the GML driver.
perhaps this next point is not an issue for him. Several of the tokens are
described in the schema snippet as 'unbounded': this means that there can be
several instances
<osgb:changeHistory>
<osgb:changeDate>2004-12-19</osgb:changeDate>
<osgb:reasonForChange>Revised</osgb:reasonForChange>
<osgb:changeDate>2002-09-07</osgb:changeDate>
<osgb:reasonForChange>Revised</osgb:reasonForChange>
<osgb:changeDate>2001-03-12</osgb:changeDate>
<osgb:reasonForChange>New</osgb:reasonForChange>
</osgb:changeHistory>
Hmm, that is also somewhat ugly. OGR has the concept of a
string list field type, so in theory this could be reduced to two
string list fields:
changeHistory_changeData: 2004-12-19, 2002-09-07,...
changeHistory_reasonForChange: Revised, Revised
I also thought I had done something like this for the NAS driver,
but perhaps it did not make it back into the mainstream GML
driver.
Likewise, if a focused ticket is filed, I'll turn this over to
Chaitanya.
I do not know whether the
GDAL/OGR
GML driver was designed primarily for writing gml rather than for reading:
maybe.
Where does this leave us? As I mentioned, there are also problems with
most
other gml readers: this is not solely an issue with GDAL/OGR. I have an
immediate need for the ITN data and have written my own parser to extract
the
information from the gml source: so far, so good. However, as I mentioned,
there is now the problem of how to store these data: shapefiles use the
dBaseIV
format and have no structure for handling these multiple attributes. In a
sample dataset, I have a record with 49 changeHistory records, for example;
some
other multiple constructs have several thousand entries. I happen to have
access to Oracle, although to use GDAL/OGR to write to it requires that I do
some significant work on the oci driver: I've been trying to understand the
code
of that to assess what I can reasonably do. Alternatively, I could use oci
directly and bypass GDAL/OGR entirely. All this, however, is non-trivial
and
holding me back from doing what I am supposed to be doing ... but does seem
to
be the only way forward, having exhausted FME, etc.
Even if we do the stringlist and related stuff you are quite
right that there aren't many output formats that will support
the esoteric arrangement well. OGR was really intended
to read *simplistic* GML files that match existing GIS
type conventions (flat, non-repeating). I am interested in
extending it somewhat to read important GML profiles
reasonably well, but there are limits to how much of this
can be done without a fundamental rewrite.
I really do try to discourage GML generators from using
some of these more esoteric practices.
Best regards,
--
--------------------------------------------------------------------------------
Peter J Halls, GIS Advisor, University of York
Telephone: 01904 433806 Fax: 01904 433740
Snail mail: Computing Service, University of York, Heslington, York YO10 5DD
This message has the status of a private and personal communication
--------------------------------------------------------------------------------
_______________________________________________
gdal-dev mailing list
[email protected]
http://lists.osgeo.org/mailman/listinfo/gdal-dev