Jukka, Data type guessing implemented in the OGR GeoJSON driver is quite natural hopefully. A whole scan of the GeoJSON file is made and the following rules are applied : - if an attribute has integer-only content --> Integer - if an attribute has an array of integer-only content --> IntegerList - if an attribute has integer or floating point content --> Real - if an attribute has an array of integer or floating point content --> RealList - if an attribute has an array of anything else content --> StringList - otherwise --> String
With RFC 50 and other pending improvements in the driver: - if an attribute has boolean-only content --> Integer(Boolean) - if an attribute has an array of boolean-only content --> IntegerList(Boolean) - if an attribute has date-only content --> Date - if an attribute has time-only content --> Time - if an attribute has datetime or date content --> DateTime I'm not sure we want to invent a .jsont format, but if you download http://svn.osgeo.org/gdal/trunk/gdal/swig/python/samples/ogr2vrt.py and run : python ogr2vrt.py "http://demo.opengeo.org/geoserver/wfs?service=wfs&version=1.0.0&request=getfeature&typename=topp:states&outputformat=json" test.vrt This will create you a VRT with the default schema, that you can easily edit. Note: as with OGR SQL CAST, this is post processing. So if the guess done by the GeoJSON driver leads to a loss of information, you cannot recover it. Hopefully the implemented rules will not lead to information loss. A better approach would be to have the schema embedded in a JSON way in the GeoJSON file itself. That could be an evolution of the format, but I'm not sure this would be really popular, given JSON/GeoJSON is heavily used by NoSQL approaches... Hum, doing a quick search, I just found http://json-schema.org/ that appears to be an IETF draft. It doesn't look that the schema is embedded in the data file itself. There's also GeoJSON-LD that might be a bit related : https://github.com/geojson/geojson-ld CC'ing Sean in case he has thoughts on this. Even > Hi, > > I wonder if GDAL could have some simple and relatively user friendly way > for defining a schema for GeoJSON data. The GeoJSON driver seems to guess > the data types of attributes with some undocumented way but users could > have better knowledge about the desired schema. > > I know I can control the data type by using OGR SQL and CAST as in > ogrinfo -sql "select cast(EMPLOYED as float) from OGRGeojson" states.json > -so > > However, perhaps GeoJSON is enough popular for deserving an easier way for > writing a schema. First I thought that it would be enough to copy the > "csvt" text file mechanism from the GDAL CSV driver > http://www.gdal.org/drv_csv.html. However, the csvt file is a plain list of > types which will be applied to the attributes in the same order than they > appear in the text file > "Integer(5)","Real(10.7)","String(15)" > > For GeoJSON it would feel more user friendly to include the attribute names > in the list somehow like > "population;Integer(5)","area;Real(10.7)","name;String(15)". > > This would make it easier for users to write a valid "jsont" file. A list > with attribute names could perhaps also help GDAL as well because the > features in GeoJSON file do not necessarily have same attributes. > > As an example this is the right schema for a WFS feature type which is > captured from > http://demo.opengeo.org/geoserver/wfs?service=wfs&version=1.0.0&request=des > cribefeaturetype&typename=topp:states > > > name="the_geom" type="gml:MultiPolygonPropertyType"/> > name="STATE_NAME" type="xsd:string"/> > name="STATE_FIPS" type="xsd:string"/> > name="SUB_REGION" type="xsd:string"/> > name="STATE_ABBR" type="xsd:string"/> > name="LAND_KM" type="xsd:double"/> > name="WATER_KM" type="xsd:double"/> > name="PERSONS" type="xsd:double"/> > name="FAMILIES" type="xsd:double"/> > name="HOUSHOLD" type="xsd:double"/> > name="MALE" type="xsd:double"/> > name="FEMALE" type="xsd:double"/> > name="WORKERS" type="xsd:double"/> > name="DRVALONE" type="xsd:double"/> > name="CARPOOL" type="xsd:double"/> > name="PUBTRANS" type="xsd:double"/> > name="EMPLOYED" type="xsd:double"/> > name="UNEMPLOY" type="xsd:double"/> > name="SERVICE" type="xsd:double"/> > name="MANUAL" type="xsd:double"/> > name="P_MALE" type="xsd:double"/> > name="P_FEMALE" type="xsd:double"/> > name="SAMP_POP" type="xsd:double"/> > > > This is what GDAL is guessing: > STATE_NAME: String (0.0) > STATE_FIPS: String (0.0) > SUB_REGION: String (0.0) > STATE_ABBR: String (0.0) > LAND_KM: Real (0.0) > WATER_KM: Real (0.0) > PERSONS: Real (0.0) > FAMILIES: Integer (0.0) > HOUSHOLD: Real (0.0) > MALE: Real (0.0) > FEMALE: Real (0.0) > WORKERS: Real (0.0) > DRVALONE: Integer (0.0) > CARPOOL: Integer (0.0) > PUBTRANS: Integer (0.0) > EMPLOYED: Real (0.0) > UNEMPLOY: Integer (0.0) > SERVICE: Integer (0.0) > MANUAL: Integer (0.0) > P_MALE: Real (0.0) > P_FEMALE: Real (0.0) > SAMP_POP: Integer (0.0) > bbox: RealList (0.0) > > -Jukka Rahkonen- > > _______________________________________________ > gdal-dev mailing list > [email protected] > http://lists.osgeo.org/mailman/listinfo/gdal-dev -- Spatialys - Geospatial professional services http://www.spatialys.com _______________________________________________ gdal-dev mailing list [email protected] http://lists.osgeo.org/mailman/listinfo/gdal-dev
