On Mon, Apr 09, 2012 at 09:04:51AM -0700, J. Andrew Rogers wrote: > On Mon, Apr 9, 2012 at 6:56 AM, Stefan Keller <[email protected]> wrote: > > > > A while ago I proposed the idea of SQLite as the "The Shapefile of the > > future?" - and Im still supporting it, especially the Spatialite > > extension. > > > The idea of SQLite as a "shapefile" has merit but SQLite per se has > limits that would make it a suboptimal choice and is missing some > useful features. As a self-contained database engine it was not > designed for the use case of being a geospatial storage format (e.g. > R-tree is a poor index choice for this purpose). > > We implement our own import/export format, which we will freeze and > open source at some point. We convert some of the other popular > formats to this format before doing anything with the data. There are > two things that we needed as a practical operational matter that are > difficult to find in common geospatial data file formats: > > - The ability to deal with really huge data sets. Routinely wrangling > countless terabytes of spatial data import/export requires a format > that is amenable to slicing, dicing, concatenating, etc giant files > with minimal muss and fuss in order to comply with various limits of > systems and to parallelize processing. This puts some design > requirements on the internal structure of the files. > > - Read and write I/O throughput. Many storage formats are badly CPU > bound on any decent storage system or have poor I/O access patterns > that limit I/O throughput. Few if any common geospatial data formats > were designed with this in mind but it is a major bottleneck. Not a > big deal if you are dealing with a few gigabytes of data but it > approaches intractability once you start dealing with really large > quantities of data. > > > In short, current formats are not designed to scale to practical use > cases. For us, the easiest and most efficient solution was to roll our > own without much consideration for existing standards. Most of the > standards seem to be designed for, either by legacy or intent, trivial > amounts of geo data that is infrequently processed. > > I'd be interested in a practical and scalable standard for moving geo > data around if one is actively being developed. > > > -- > J. Andrew Rogers > > _______________________________________________ > Geowanking mailing list > [email protected] > http://geowanking.org/mailman/listinfo/geowanking_geowanking.org
My efforts were guided by some of these same ideas. I wanted it to be built around a key-value store, and I wanted it to be aimed at large-scale, eventually-consistent management system architectures (along the lines of Mongo, Couch, etc.) yet still designed for feature and attribute data and useful without a DBMS (via cli tools, etc.). Could you talk about some of the alternatives you explored? I'd be interested to hear what you tried and learned. -R. _______________________________________________ Geowanking mailing list [email protected] http://geowanking.org/mailman/listinfo/geowanking_geowanking.org
