Damian, > > I'm trying to speed up processing of OSM data by opening an OSM file into > multiple datasets in multiple threads. One dataset per thread. Each thread > is processing a separate section of data, basically tiling the data. > > I've however run into a scaling issue with the amount of memory allocated > per dataset. > > The Open in the OSM driver seems to allocate a lot of memory for buffers > for processing regardless of the size of the data loaded. > > So I have a couple of questions: > > 1. is there away of reducing the memory load when reading OSM in multiple > threads?
You may play with the OSM_MAX_TMPFILE_SIZE config option that defaults to 100 (MB) / dataset. If you are brave enough, you can edit ogr/ogrsf_frmts/osm/ogrosmdatasource.cpp and reduce the values of the #define MAX_DELAYED_FEATURES, MAX_ACCUMULATED_NODES and HASHED_INDEXES_ARRAY_SIZE (and possibly disabling ENABLE_NODE_LOOKUP_BY_HASHING in ogr_osm.h) > > 2. Could I convert the OSM data into a different format that can be read > efficiently from multiple threads? and what would that format be? > My thought for (2) would be to load the data into a database and read from > the database using ogr. If this is the correct way forward which database > would be recommended (PostGIS, SpatialLite,...) ? Reading the same OSM file from multiple threads is indeed probably an inefficient approach as they don't have spatial indices, so you'll end up reading the whole file completely for each tile. So prior conversion would probably be better for later scaling. SpatiaLite/GPKG are probably good choices. Even -- Spatialys - Geospatial professional services http://www.spatialys.com _______________________________________________ gdal-dev mailing list [email protected] http://lists.osgeo.org/mailman/listinfo/gdal-dev
