Glynn Clements wrote: > So far as memory usage is concerned: if we think that people might > want to compute quantiles on data sets which are large enough that we > need to worry about memory consumption, we should probably be looking > for a more efficient algorithm. Sorting the entire data set then > pulling out quantiles is less than ideal if you're dealing with that > much data.
I've added a new module, r.quantile, which computes quantiles without loading the entire map into memory. Apart from not being limited by memory availability, it should have better asymptotic performance. Sorting large amounts of data is O(n.log(n)), while r.quantile is mostly O(n). The final step still involves sorting, but the data being sorted consists of one bin for each quantile, where the size of the bin will be roughly inversely proportional to the number of bins used (which is user selectable, and defaults to 1,000,000 bins). It has only had brief testing, but it manages to process a map of ~30 million cells (elevation.dem resampled to 3m resolution, plus some noise to smooth the distribution) in ~1 minute on a P3/800. I tried running r.univar on the same map for comparison, but it crashed while trying to compute the percentile (the other statistics were computed okay). -- Glynn Clements <[EMAIL PROTECTED]> _______________________________________________ grass-dev mailing list [email protected] http://lists.osgeo.org/mailman/listinfo/grass-dev
