Mike wrote:
> 
> Greetings all,
> 
> I'd like to estimate the 95th percentile of a distribution p(X) by
> making N independent measurements of X. I'm assuming that the 95th
> percentile of measurements is the best estimate of the 95th percentile
> for the distribution.
> 
> Is this halfway reasonable?
> Do I need to make strong assummptions about the shape of p(X)?
> How would I arrive at the s.d. of the estimate, or some other
> indicator of quality?


        (1) In the absence of distributional assumptions, I don't think you
have any alternative but to use the 95th percentile of the data.  You
will have no way to decide if the resulting estimator is unbiased or to
determine its standard deviation. Consider, for instance, the prizes in
a fair lottery with 1000 one-dollar tickets and one $1000 prize; and
suppose your sample consists of 10 tickets.

        The true 95th percentile prize is 0. However, one time in 100 your
sample will contain the winning ticket and you will estimate the 95th
percentile at $1000; thus your mean estimate will be $10.

        Consideration of such extreme distributions also shows that you have no
useful way of estimating the SD from your sample.

        (2) If you have a single-parametric model, estimating the 95th
percentile is normally equivalent to estimating the parameter.
(Pathological counterexamples, such as ones in which the 95th percentile
is the same for every distribution in the family, exist.  [EG:
Unif[-19A,A] if you want a simple one!]) That is: estimating the 95th
percentile is essentially the same task as estimating the mean, or the
median, or...

        In this case it is highly unlikely that the 95th percentile of the
sample will be an optimal estimator for the 95th percentile of the
distribution. 

        (3) Somewhere between these extremes there are presumably
semiparametric families of distributions (perhaps symmetric
distributions, or distributions obeying entropy constraints, or
distributions within a certain distance in probability of normal
distributions, or...) for which other answers to your question are
appropriate. Just as a guess, I'd say that this looks like a serious
research problem (or cottage industry) if it hasn't already been done.

        -Robert Dawson
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to