I believe that you could try to increase your swap RAM, for linux it is pretty straightforward, and having a SSD or NVME it will perform good. Free disk space is a must have to this to work, as you are going to need about 10 - 20 Gb disk space as swap, according to Even calcs + 8Gb that you have. Not as fast as true RAM, but may be able to get the job done,
On Thu, Sep 28, 2023 at 3:18 PM Scott via gdal-dev <gdal-dev@lists.osgeo.org> wrote: > Thanks for digging into that Even! > > Can I create my new .fgb in sections? > > If I limit the number of source rows with -sql, doing that multiple > times with -update, will it still build the entire R-tree when writing > to the destination? > > I'm looking for a way to get the desired results. > > On 9/28/23 11:04, Even Rouault wrote: > > ok, that now makes sense. Writing a .fgb files comes into those > > exceptions where RAM consumption might be important, as it involves > > building a packed Hilbert R-Tree in memory. With the current > > implementation, you need at least the number of features times some > > constant amount of RAM, at least to store the list of each feature > > bounding box + their offset in a temporary file. From what I can see > > this constant is at least 40 bytes. So in your particular case this > > requires at least 145459485 * 40 = 5.5 GB of RAM. And probably (not > > totally sure) twice that to store this initial list and the tree itself. > > I guess the implementation could be made smarter and use on-disk > > temporary memory, but that would likely involve serious implementation > > complications. I let Björn comment more on this if he follows this > > discussion. > > > > I've submitted a doc enhancement to mention this requirement: > > https://github.com/OSGeo/gdal/pull/8490 > > > > Le 28/09/2023 à 19:17, Scott a écrit : > >> USA.fgb is 36 GB. I've renamed it from its original source which can > >> be found here: > >> https://beta.source.coop/vida/google-microsoft-open-buildings > >> > >> ogr2ogr -sql "select area_in_meters from bfp_USA" -nln footprints > >> footprints.fgb ~/Downloads/USA.fgb > >> > >> GDAL 3.7.1 > >> OS Debian Buster > >> > >> Output from ogrinfo -ro -al USA.fgb > >> > >> Layer name: bfp_USA > >> Geometry: Unknown (any) > >> Feature Count: 145459485 > >> Extent: (-160.221701, 17.677691) - (-64.583428, 71.360579) > >> Layer SRS WKT: > >> GEOGCRS["WGS 84", > >> DATUM["World Geodetic System 1984", > >> ELLIPSOID["WGS 84",6378137,298.257223563, > >> LENGTHUNIT["metre",1]]], > >> PRIMEM["Greenwich",0, > >> ANGLEUNIT["degree",0.0174532925199433]], > >> CS[ellipsoidal,2], > >> AXIS["geodetic latitude (Lat)",north, > >> ORDER[1], > >> ANGLEUNIT["degree",0.0174532925199433]], > >> AXIS["geodetic longitude (Lon)",east, > >> ORDER[2], > >> ANGLEUNIT["degree",0.0174532925199433]], > >> USAGE[ > >> SCOPE["unknown"], > >> AREA["World"], > >> BBOX[-90,-180,90,180]], > >> ID["EPSG",4326]] > >> Data axis to CRS axis mapping: 2,1 > >> boundary_id: Integer64 (0.0) > >> bf_source: String (0.0) > >> confidence: Real (0.0) > >> area_in_meters: Real (0.0) > >> OGRFeature(bfp_USA):0 > >> boundary_id (Integer64) = 116 > >> bf_source (String) = google > >> confidence (Real) = 0.906 > >> area_in_meters (Real) = 187.4652 > >> POLYGON ((-64.6399621676723 17.7225504518464,-64.6400377660957 > >> 17.722583049763,-64.6400238635835 17.7226126625647,-64.6400901719124 > >> 17.7226412545727,-64.640104074415 > >> 17.722611641767,-64.6401239848718 17.7226202271066,-64.6401528522526 > >> 17.7225587385527,-64.6400955687758 17.7225340380511,-64.6401051288881 > >> 17.7225136746756,-64.640040 > >> 1136221 17.7224856402151,-64.640030553504 > >> 17.7225060035881,-64.6399910351014 17.7224889633119,-64.6399621676723 > >> 17.7225504518464)) > >> > >> OGRFeature(bfp_USA):1 > >> boundary_id (Integer64) = 116 > >> bf_source (String) = microsoft > >> area_in_meters (Real) = 51.0777955237376 > >> POLYGON ((-64.6398677811851 17.7219759840792,-64.6397939789141 > >> 17.7219853127982,-64.6398020235506 17.7220430591893,-64.6398758258215 > >> 17.7220337304732,-64.63986778118 > >> 51 17.7219759840792)) > >> > >> OGRFeature(bfp_USA):2 > >> boundary_id (Integer64) = 116 > >> bf_source (String) = google > >> confidence (Real) = 0.8323 > >> area_in_meters (Real) = 178.5448 > >> POLYGON ((-64.6397672401299 17.7220665249078,-64.6397654280552 > >> 17.722041016034,-64.6395789582891 17.7220531822569,-64.6395832735872 > >> 17.7221139302758,-64.639696737462 > >> 3 17.7221065273415,-64.639698399651 17.7221299263498,-64.6398064310524 > >> 17.7221228777942,-64.6398022655579 17.7220642396531,-64.6397672401299 > >> 17.7220665249078)) > >> > >> > >> On 9/28/23 10:03, Even Rouault wrote: > >>> > >>> Le 28/09/2023 à 18:47, Scott via gdal-dev a écrit : > >>>> > >>>> I should have been more specific. > >>>> > >>>> One particular machine has 8GB of memory. When I try to do the most > >>>> simple ogr2ogr command on large files, the host runs out of memory > >>>> (vmstat shows this) and ogr2ogr terminates with 'Killed', nothing > more. > >>>> > >>>> The data formats I have experienced this with are .fgb, .parquet and > >>>> .gpkg. The data files are 10's of GB. > >>> > >>> As input ? as output? Which operating system ? Which GDAL version ? > >>> The output of "ogrinfo -al -so the_input" might also be helpful. An > >>> exact ogr2ogr command line invocation that triggers the issue would > >>> certainly be useful. In general, most GDAL drivers and ogr2ogr > >>> itself operate in streaming mode with low RAM requirements, but there > >>> might be exceptions (some configurations of GeoJSON file may require > >>> full ingestion on reading for example). I'm also aware of issues > >>> with RAM fragmentation due to how some memory allocators work, but > >>> they seem to be restricted to multithreaded uses > >>> ( > https://gdal.org/user/multithreading.html#ram-fragmentation-and-multi-threading), > which current ogr2ogr shouldn't trigger > >>> > >>> Even > >>> > >>>> > >>>> Thanks for the responses! > >>>> _______________________________________________ > >>>> gdal-dev mailing list > >>>> gdal-dev@lists.osgeo.org > >>>> https://lists.osgeo.org/mailman/listinfo/gdal-dev > >>> > _______________________________________________ > gdal-dev mailing list > gdal-dev@lists.osgeo.org > https://lists.osgeo.org/mailman/listinfo/gdal-dev >
_______________________________________________ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev