On 2012-04-11 19:17:27 +0000, David Shean said:

Michael,
Scott is right.  Not sure if this is the preferred approach, but I accomplished this for large datasets by specifying buffer sizes for ReadAsArray.  The doc I consulted is here: http://gdal.org/python/osgeo.gdal_array-module.html#BandReadAsArray.   I used masked arrays to exclude nodata values - you may not need to worry about with this.
-David

Excerpt from my script:

src_ds = gdal.Open(src_fn, gdal.GA_ReadOnly)
b = src_ds.GetRasterBand(1)
ndv = b.GetNoDataValue()
ns = src_ds.RasterXSize
nl = src_ds.RasterYSize

#Don't want to load the entire dataset for stats computation
#This is maximum dimension for reduced resolution array
max_dim = 1024.

scale_ns = ns/max_dim
scale_nl = nl/max_dim
scale_max = max(scale_ns, scale_nl)

if scale_max > 1:
    nl = round(nl/scale_max)
    ns = round(ns/scale_max)

#The buf_size parameters determine the final array dimensions
bm = numpy.ma.masked_equal(numpy.array(b.ReadAsArray(buf_xsize=ns, buf_ysize=nl)), ndv)


Holy excrements. I didn't know about this functionality, most likely because I only have one-banded datasets and the ds.ReadAsArray does not offer buf_xsize and buf_ysize! I just tried it out and it's so endlessly fast? My compliments to the GDAL team!
And muchas gracias to you and Scott for pointing it out to me.

This will be a happy summer when I finally implement my private fast image viewer… ;)

Best regards,
Michael



On Apr 11, 2012, at 11:17 AM, Scott Arko wrote:
Hi Michael,


I may be missing your question, but why aren't you just using ReadAsArray?  It has an option to return a smaller array from the input array.  Now, I'm not sure how it does the resampling (you could look to see), but you can make a call like

data = banddata.ReadAsArray(0,0,filehandle.RasterXSize,filehandle.RasterYSize,xsize,ysize)

where xsize and ysize are smaller than the true RasterXSize or RasterYSize.  I haven't looked at this in a while, but I'm pretty sure this will work.  Did I miss the point of what you were asking?


Thanks,
Scott


On Wed, Apr 11, 2012 at 6:31 AM, K.-Michael Aye <[email protected]> wrote:
Dear all,

is there a Python API for downsampling a huge dataset?
What I would like to do:

* get my dataset
* read out RasterXSize and RasterYSize
* calculate how many lines and rows I need to skip to get a quick overview image, e.g. 10 lines to skip.
* Have a ReadAsArray interface where I can say something like this:
** data = ds.ReadAsArray(xoffset, yoffset, 10000, 10000, skipping=10)

which in numpy terms would give me every 10nth line like this: array[:,:,10]

I really don't need quality at all, just speed, for a rough overview for further zooming in with lassos, as the images I deal with sometimes have more than 200 MPixels.

Is this possible in Python?
I was thinking now, maybe one could use numpy's memmap somehow for this, don't know much about it, though…

Thanks for any hints!

Best regards,
Michael


_______________________________________________
gdal-dev mailing list
[email protected]
http://lists.osgeo.org/mailman/listinfo/gdal-dev



_______________________________________________
gdal-dev mailing list
[email protected]
http://lists.osgeo.org/mailman/listinfo/gdal-dev

_______________________________________________
gdal-dev mailing list
[email protected]
http://lists.osgeo.org/mailman/listinfo/gdal-dev



_______________________________________________
gdal-dev mailing list
[email protected]
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to