------------------------------
Message: 7
Date: Mon, 13 Oct 2008 09:12:54 +0200
From: Markus Metz <[EMAIL PROTECTED]>
Subject: [GRASS-dev] Re: big region r.watershed
To: [EMAIL PROTECTED], [email protected]
Message-ID: <[EMAIL PROTECTED]>
Content-Type: text/plain; charset=ISO-8859-1
Hamish wrote:
Markus Metz wrote:
The original version uses very little memory, so assuming that GRASS
runs today on systems where at least 500MB RAM are available I
changed
the parameters for the seg mode, more data are kept in memory,
speeding
up the seg mode. Looking at other modules using the segment library
(e.g. v.surf.contour, r.cost), it seems that there is not one
universally
used setting, instead the segment parameters are tuned to each
module.
The new settings work for me, but not necessarily for others, and
maybe
using 500MB is a bit much.
fwiw r.terraflow has a memory= option, the default is 300mb.
AFAIU, the bigger you make that, the smaller the on-disk temp
files need
to be (ie work-around to keep tmp files <2gb for 32bit filesystems).
a number of modules like r.in.poly have a rows= option, which I
didn't
really understand until I got into the code. (hold at most that many
region rows (all columns) in memory at once). Interestingly the
default
value has scaled quite well over the years.
and other modules like r.in.xyz have percent= (0-100) for how much
of the
map to keep in memory at once.
A default value that scales well over the years would be
preferable, but
performance of r.watershed.fast -m is really poor if whole columns (or
rows ) are kept in memory and much better if segments have equal
dimensions. Interestingly, segments of 200 rows and 200 columns are
processed fastest, faster than e.g. 150 rows and columns or 250
rows and
columns. The more segments are kept in memory the better.
Right now I don't want to introduce a new option to give the user
control over how much memory is used (be it MB memory, number of
rows or
percent of the map) because I want to keep all options of
r.watershed.fast identical to the original version. I'm still not
happy
with the speed of the segmented version of r.watershed.fast, but at
least it is magnitudes faster than the in-memory version of the
original
r.watershed. Maybe the iostream library that came with r.terraflow can
be used for r.waterhed -m as well.
Markus
To use the Iostream library you need to change the underlying
algorithm of watershed. Iostream implements streams (files on disk)
and sorting streams. If you use Iostream you need to store the
grids in streams on disk, rather than 2d-arrays in memory. On
streams random access is very expensive, so you need a way to express
the computation as a sequence of sorting streams followed by
sequential accesses to streams. This usually requires a complete
rewrite of the algorithm.
-Laura
_______________________________________________
grass-dev mailing list
[email protected]
http://lists.osgeo.org/mailman/listinfo/grass-dev