Thanks, Frank. I am pleased to hear that I have probably not missed much in my analysis of the current GDAL/OGR facilities. I have not yet moved to 1.7, so it may be that there is a little more in the driver now than I have seen in 1.6.3.

I started out thinking that I could concatenate these multiple objects into simple strings ... until I discovered how long these could be. Admittedly I opted to include the date along with the reason in the same string, but at 20 or so characters per entry - and some of the other multiples involve rather more characters per entry - I realised this was not realistic. In my sample dataset and, eventually, I need to work with national coverage, I have objects comprising several thousand 20 character entries - yes, they could be processed to become long long integers but that is the same as holding the numeric part as text! I decided that it would be easier to write a parser than to meddle with the GML driver ... I think it probably was.

I do not mind admitting that, over the past couple of years, I have spent several months effort evaluating apparently promising solutions for ingesting Mastermap. Snowflake's products are out of our budget. The crunch came this spring when advice from Safe Software's support team suggested that I could only solve the problems by using the FME API to extend the FME product's capability to handle these multiple objects. It seemed to me that if I had to code my own solution, I might as well use an API with which I was already familiar: writing interfaces is not my main job!

I think that we need a strategic assessment of whether the OSGB usage of GML is likely to be a 'one off' or whether others may also exploit the power of GML in a similar fashion. Certainly in Europe where the INSPIRE Directive is beginning to take effect and requires the delivery of interoperable data services there is the possibility of others following suit: I am not familiar with the German NAS work, so do not know how that might relate, if at all. If OSGB is a 'one off', then the amount of work that would appear to be involved for GDAL/OGR is possibly a show stopper; if, however, there is a need for a generic solution which will solve the OSGB issues and support a range of other products, such that there is a reasonably widespread application, then it is probably worth at least a proper assessment of the effort involved.

Best wishes,

Peter

PS my sample gml dataset covers just 50 km square and comes in at 2.8Gb: editing constructs to avoid their being processed is not easy with files of that size!

Frank Warmerdam wrote:
On Thu, Jul 1, 2010 at 4:55 PM, Peter J Halls <[email protected]> wrote:
Jez, Even,

  there are actually several issues relating to using GDAL/OGR to read
Ordnance Survey of Great Britain (OSGB) GML files distributed as their
Mastermap
product.  One of these I reported as bug #1604 - I now find that this was
against GDAL-1.4.0 - which concerns handling 'duplicate' tokens: OGR ignores
Namespace and so treats <osgb:point> as the starter and fouls on the
following
<gml:point> which contains the geometry.  There is a similar problem with
polygon objects.

Peter,

Ouch, the namespace stripping issue is unfortunate.  I'm not sure
of a cheap fix.

  The data Jez describes below is simpler than much of the data in the file

I did implement some degree of "complex structure flattening" when
I worked on the custom NAS (german GML profile) reader.  I thought
perhaps it had made it into the mainline GML reader, but perhaps not.
If so, I think it could be ported.

If someone files a ticket specifically on this issue I can try to address
this or perhaps more likely have Chaitanya do it as he is now getting
quite familiar with the GML driver.

perhaps this next point is not an issue for him.  Several of the tokens are
described in the schema snippet as 'unbounded': this means that there can be
several instances

       <osgb:changeHistory>
               <osgb:changeDate>2004-12-19</osgb:changeDate>
               <osgb:reasonForChange>Revised</osgb:reasonForChange>
               <osgb:changeDate>2002-09-07</osgb:changeDate>
               <osgb:reasonForChange>Revised</osgb:reasonForChange>
               <osgb:changeDate>2001-03-12</osgb:changeDate>
               <osgb:reasonForChange>New</osgb:reasonForChange>
       </osgb:changeHistory>

Hmm, that is also somewhat ugly.  OGR has the concept of a
string list field type, so in theory this could be reduced to two
string list fields:

changeHistory_changeData: 2004-12-19, 2002-09-07,...
changeHistory_reasonForChange: Revised, Revised

I also thought I had done something like this for the NAS driver,
but perhaps it did not make it back into the mainstream GML
driver.

Likewise, if a focused ticket is filed, I'll turn this over to
Chaitanya.

 I do not know whether the
GDAL/OGR
GML driver was designed primarily for writing gml rather than for reading:
maybe.

  Where does this leave us?  As I mentioned, there are also problems with
most
other gml readers: this is not solely an issue with GDAL/OGR.  I have an
immediate need for the ITN data and have written my own parser to extract
the
information from the gml source: so far, so good.  However, as I mentioned,
there is now the problem of how to store these data: shapefiles use the
dBaseIV
format and have no structure for handling these multiple attributes.  In a
sample dataset, I have a record with 49 changeHistory records, for example;
some
other multiple constructs have several thousand entries.  I happen to have
access to Oracle, although to use GDAL/OGR to write to it requires that I do
some significant work on the oci driver: I've been trying to understand the
code
of that to assess what I can reasonably do.  Alternatively, I could use oci
directly and bypass GDAL/OGR entirely.  All this, however, is non-trivial
and
holding me back from doing what I am supposed to be doing ... but does seem
to
be the only way forward, having exhausted FME, etc.

Even if we do the stringlist and related stuff you are quite
right that there aren't many output formats that will support
the esoteric arrangement well.  OGR was really intended
to read *simplistic* GML files that match existing GIS
type conventions (flat, non-repeating).  I am interested in
extending it somewhat to read important GML profiles
reasonably well, but there are limits to how much of this
can be done without a fundamental rewrite.

I really do try to discourage GML generators from using
some of these more esoteric practices.

Best regards,

--
--------------------------------------------------------------------------------
Peter J Halls, GIS Advisor, University of York
Telephone: 01904 433806     Fax: 01904 433740
Snail mail: Computing Service, University of York, Heslington, York YO10 5DD
This message has the status of a private and personal communication
--------------------------------------------------------------------------------
_______________________________________________
gdal-dev mailing list
[email protected]
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to