Re: [gdal-dev] IO Overhead when reading small subsets from Global Files

Julian Zeidler Mon, 08 Dec 2014 03:21:54 -0800

Am 08.12.2014 11:26, schrieb Even Rouault:

Le lundi 08 décembre 2014 11:14:04, Julian Zeidler a écrit :

Hi Even,


Thanks for the quick reply.

I should have mentioned, that I also tried converting it to a compressed
tiled Tiff.
There i can see the same kind of overhead. I extract a 1MB Subset from
the File and depending on the Tiles 1 read between 9-12 MB via the Network.
The Process only reads one 500x500 block from every single File

For GeoTIFF, I would expect the overhead to be the size of the "tags" that
store the offset and size of each block, so for a 40000x20000 dataset with
100x100 blocks :
(40000 / 100) * (20000 / 100) * 2 * 4 = 640 000 bytes
I can't explain how it could reach 9-12 MB

And if you use GDAL >= 1.11 compiled with its internal libtiff, there's a trick
that avoids reading the full tags, and should only read them by "pages" of 4K,
resulting in a neglectable overhead.

I just rechecked the versions.
it is

Gdal: 1.11.0 with internal libtiff and

Netcdf: 4.2

I was quite surprised by the overhead myself. I ran some quick testwith different blocksizes and output windows. The overhead decreasedwith size (1000x1000, 2000x2000) but was still significant at 5x outputsize.

I guess we will have to live with the Overhead.

Thanks
Julian

Cheers Julian

Am 08.12.2014 11:10, schrieb Even Rouault:

Le lundi 08 décembre 2014 10:44:41, Julian Zeidler a écrit :

Dear Gdal-mailinglist,

I am currently trying to optimize a Global Modell.
The Modell reads small chunks (500x500) from lots (One for each day) of
Global Datasets (40000x20000)
These Dataset are compressed NetCDFs with a tilling activated (100x100).
(See output oif gdalinfo attached)
However when I measure the File-IO via NFS i get a Factor of ~10
compared to the uncompressed Output image when testing with gdal. Inside
teh Modell using the netCDF library diretyl i measure an even worst
Factor of ~60 compared to compressed outputs). This is better than using
untiled Inputs where the overhad was ~80x, but still a larger overhead
than I expected.
I tested it using gdal_translate in.nc out.tif -srcwin 6000 6000 500 500

Julian,

I'm not sure how chunck indexing works internally in netCDF, but there
might be an overhead when reading the "index" the first time. So perhaps
if you do your reads from the same GDAL dataset object, without closing
it between different requests, the overhead will decrease. If you were
already doing that, then I'm not sure what you can do, except converting
into another format, like GTiff.

Even

_______________________________________________
gdal-dev mailing list
[email protected]
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] IO Overhead when reading small subsets from Global Files

Reply via email to