Dylan Beaudette wrote: > > Would it be a big mess to implement r.regression.line a C module? > > > > Markus > > If written as a C module, could it take advantage of any stats library > functions lying around ?
The aggregate functions in lib/stats require that the entire sample is held in memory. This makes them unsuitable for computing an aggregate over a substantial proportion of a map's values. For r.regression.line, r.univar, r.statistics, etc, you need to use an incremental approach. For some aggregates (count, sum, mean), this is relatively straightforward. For variance and deviation, there's the issue of a one-pass or two-pass algorithm. A two-pass approach (calculating the mean on the first pass) is more accurate, but requires two passes (which rules out reading data from a pipe). For quantiles, you don't want to sort vast amounts of data at O(n.log(n)) complexity just to obtain specific quantiles. It's more efficient to compute successive histograms, refining the interval(s) containing the desired quantile(s) on each pass, and only sorting once you've reduced the data to a manageable size. This could require several passes, depending upon the amount of data, the amount of memory available, and the distribution of the data. -- Glynn Clements <[EMAIL PROTECTED]> _______________________________________________ grass-dev mailing list [email protected] http://lists.osgeo.org/mailman/listinfo/grass-dev
