Hi,

currently rebuilding an infra on new servers, i'm contemplating updating our stack to the state of the art (to be defined ?)

So far, we're using mapserver 7.6/gdal 2.4 on debian buster, eventually mapproxy 1.12 in front of it (not all layers), our 25cm imagery is mostly stored in 4000px TIFs (YCbCr, TILED, JPEG 90%, 3/4 levels of overviews, about 6/7Mb per file), depending on datasets/layers/areas we have between 6000 and 600000 files, all stored locally. many datasets are between 50 and 300Gb.

In mapserver, we use GROUP layers to 'merge' 3 layers:
* a layer using TILEINDEX (pointing at a postgis table generated with gdaltindex) below 1:25000 - thus directly hitting the original tiles * for upper scales, two layers pointing at 6m & 24m resamples of the same dataset on the complete area, stored in single-file TIFFs (with the same compression params, those resamples are between 200Mb & some GB files)

So far performance is quite acceptable for end-users (mostly QGIS consuming mapserver or mapproxy as WMS), but i'd like to eventually get rid of mapproxy (less cache handling/recompression/resample issues, less storage, etc...)

I've of course looked at COG, as i'm able to convert most of my datasets to it - from my limited testing with GDAL 3.1.0 (now available in debian testing) it only 'reorders' the existing metadata/overviews in a file if it's already compressed as JPEG (and rebuilds the overviews w/ 512px instead of the default 128px i had so far), so from my understanding that wouldnt be lossy 'recompressing already compressed data'.

But i fail to see in which direction to go for mapserver.
- i've tried keeping the same mechanism with TILEINDEX, it still works and doesnt seem to have an impact on perf. I dunno if it would squeeze some perfs from reading the file, as gdal might read 'less' from the tiff if the MD is COG-optimized, even if stored locally ? - i've tried building a huge (7Mb) vrt for the dataset, pointing mapserver at it via DATA /path/to/vrt - works too, perf seems to be the same. Is it 'clever' than using TILEINDEX, i dunno. - should i rather build/use a huge single-file COG for the dataset, at its original resolution (25cm), and point mapserver at it like for upper-scale resamples ? for a 5800km2 area, a regular JPEG-in-TIFF singlefile is about 17Gb, with 6Gb external overviews.

And of course, the same questions also apply to a similar dataset, this time at a 5cm resolution, so much larger sizes.

As COG was meant to be used (among other things) via /vsicurl/, is there a point/improvement by pointing mapserver (or the vrt file) at all the same files via /vsicurl/ (and of course a webserver in-between) rather than pointing at local files - ie is GDAL as efficient at reading a local file header as it is at getting chunks from a /vsicurl/ url ? I've played with that scheme, it works, but i dunno if it really brings an improvement for users.

I get it that COG/vsicurl allows separating the storage from the actual mapserver process, but in my situation i have no urge to change my infra in this direction, unless it really brings perf improvements.

Sure, also serving COG files via a webserver allows nifty things like opening a remote vrt/tif in QGIS and natively use files on a remote web server, which would be somewhat an alternative to WMS (bringing all the shinies of having native files in the client), but all users are not ready yet for such modern concepts...and this doesnt allow setting scale limits serverside, ie if you open a vrt which points at 6000 images and zoom to the dataset extent, you will get as many calls as files to get their metadata - that's not very efficient.

All that to say - how are people handling large aerial datasets, with many files, served over WMS (because that's the lowest common denominator so far) in 2020 ? Still using tile caches in front of mapserver ?

--
Landry Breuil
_______________________________________________
mapserver-users mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/mapserver-users

Reply via email to