Re: [Geotools-devel] Quantile classification oddities

Andrea Aime Tue, 20 May 2008 10:34:22 -0700

Adrian Custer ha scritto:
> Hey all,
> 
>         Wherein we discover that stats are hard, even for the simple
>         questions...
> 
> 
> On Tue, 2008-05-20 at 10:18 +0200, Andrea Aime wrote:
>> Jody Garnett ha scritto:
>>> What a difficult question; is there a strict definition of the quantile 
>>> function we could grab from statistics or something?
> 
> I'm not sure the use of "Quantile" for this function is correct
> terminology but don't have time to explore it rigourously. So far all
> I've learned is that I've now forgotten how to use R.
> 
> 
> As ever, wikipedia is our friend these days:
>         By a quantile, we mean the fraction (or percent) of points below
>         the given value. That is, the 0.3 (or 30%) quantile is the point
>         at which 30% percent of the data fall below and 70% fall above
>         that value.


Right, but that is not a good definition for the what the so called
quantile classification aims to, that is, generate a set of rules to
paint a map, in the case I'm trying to handle, that is, when there
is a wide range of data that contains the same value.

> Since the key footnote points us to R, we can start to trust this as an
> authoritative source.
> 
> http://stat.ethz.ch/R-manual/R-devel/library/stats/html/quantile.html
> 
> 
> In R, it seems you want a type=3 method of quantification 
>   " Type 3 SAS definition: nearest even order statistic"
> but, again, I don't have the time to answer this rigourously today.
> 
> 
>> Quantile(  {-1 -2 0 0 0 0 3 5 7 9}, 2) ==> ?
>> Quantile(  {-1 -2 0 0 0 0 3 5 7 9}, 3) ==> ?
> 
> eratosthenes:~> R
> ...
> 
>> x <- c(-1,-2,0,0,0,0,3,5,7,9)
>> n <- 2
>> quantile(x,probs=seq(0,1,1/n))
>   0%  50% 100% 
>   -2    0    9 
>> n <-3
>> quantile(x,probs=seq(0,1,1/n))
>        0% 33.33333% 66.66667%      100% 
>        -2         0         3         9 
> 
> with the value shown being the rightmost in the original vector and
> defining the breaks which can be applied to the vector to yield the
> resulting classes. (You don't care about the leftmost value).
> 
> 
>> Quantile(  {-10 -9 -2 0 0 0 1 2 4 9 9 9}, 3) ==> what now?
> 
>> x2 <- c(-10,-9,-2,0,0,0,1,2,4,9,9,9)
>> n <- 3
>> quantile(x2,probs=seq(0,1,1/n))
>         0%  33.33333%  66.66667%       100% 
> -10.000000   0.000000   2.666667   9.000000 
>> quantile(x2,probs=seq(0,1,1/n),type=3)
>        0% 33.33333% 66.66667%      100% 
>       -10         0         2         9 

Again, not very useful... it's telling you that at the 33% break there
is a 0, and by applying it, you'd get a class that ends with 0, and
another that starts with 0. Which is something the layman using
the application does not understand, it does not make sense to him.

That's why I was suggesting to have the classes avoid breaks on
flat areas.... so I'm back at square one... current method is
mathematically sound, but does not make any sense to the normal
user. What now?


Cheers
Andrea

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Geotools-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/geotools-devel

Re: [Geotools-devel] Quantile classification oddities

Reply via email to