Adrian Custer ha scritto: > Hey all, > > Wherein we discover that stats are hard, even for the simple > questions... > > > On Tue, 2008-05-20 at 10:18 +0200, Andrea Aime wrote: >> Jody Garnett ha scritto: >>> What a difficult question; is there a strict definition of the quantile >>> function we could grab from statistics or something? > > I'm not sure the use of "Quantile" for this function is correct > terminology but don't have time to explore it rigourously. So far all > I've learned is that I've now forgotten how to use R. > > > As ever, wikipedia is our friend these days: > By a quantile, we mean the fraction (or percent) of points below > the given value. That is, the 0.3 (or 30%) quantile is the point > at which 30% percent of the data fall below and 70% fall above > that value.
Right, but that is not a good definition for the what the so called quantile classification aims to, that is, generate a set of rules to paint a map, in the case I'm trying to handle, that is, when there is a wide range of data that contains the same value. > Since the key footnote points us to R, we can start to trust this as an > authoritative source. > > http://stat.ethz.ch/R-manual/R-devel/library/stats/html/quantile.html > > > In R, it seems you want a type=3 method of quantification > " Type 3 SAS definition: nearest even order statistic" > but, again, I don't have the time to answer this rigourously today. > > >> Quantile( {-1 -2 0 0 0 0 3 5 7 9}, 2) ==> ? >> Quantile( {-1 -2 0 0 0 0 3 5 7 9}, 3) ==> ? > > eratosthenes:~> R > ... > >> x <- c(-1,-2,0,0,0,0,3,5,7,9) >> n <- 2 >> quantile(x,probs=seq(0,1,1/n)) > 0% 50% 100% > -2 0 9 >> n <-3 >> quantile(x,probs=seq(0,1,1/n)) > 0% 33.33333% 66.66667% 100% > -2 0 3 9 > > with the value shown being the rightmost in the original vector and > defining the breaks which can be applied to the vector to yield the > resulting classes. (You don't care about the leftmost value). > > >> Quantile( {-10 -9 -2 0 0 0 1 2 4 9 9 9}, 3) ==> what now? > >> x2 <- c(-10,-9,-2,0,0,0,1,2,4,9,9,9) >> n <- 3 >> quantile(x2,probs=seq(0,1,1/n)) > 0% 33.33333% 66.66667% 100% > -10.000000 0.000000 2.666667 9.000000 >> quantile(x2,probs=seq(0,1,1/n),type=3) > 0% 33.33333% 66.66667% 100% > -10 0 2 9 Again, not very useful... it's telling you that at the 33% break there is a 0, and by applying it, you'd get a class that ends with 0, and another that starts with 0. Which is something the layman using the application does not understand, it does not make sense to him. That's why I was suggesting to have the classes avoid breaks on flat areas.... so I'm back at square one... current method is mathematically sound, but does not make any sense to the normal user. What now? Cheers Andrea ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Geotools-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/geotools-devel
