Re: AW: [R] Rank and extract data from a series

Tony Plate Tue, 23 Sep 2003 10:45:56 -0700

Using Thomas Unternährer's handy example, one could also do:

> X <- c(1, 4.5, 2.3, 1, 7.3)
> mean(order(X, decreasing=TRUE)[1:2])
[1] 3.5
>

I think this will give the same results as Thomas Unternährer's suggested code in almost all cases, but it is perhaps more concise and direct (provided that you don't actually need the values of the top items).

(of course you have to change the 1:2 to 1:10 for your needs).

Note that this question gets tricky if there are ties such that there is no unique set of row numbers that identify N "top" items.

For example, consider the following data:

> X <- c(1,3,2,3,4)

Taking "top two", should the answer be 3.5 (avg of row numbers 2 and 5), 4.5 (avg of row numbers 4 and 5), or 3.666667 (avg of row numbers 2,4 and 5)?

> mean(order(X, decreasing=TRUE)[1:2])
[1] 3.5
> order(X, decreasing=TRUE)[1:2]
[1] 5 2
> # Andy Liaw's suggestion:
> mean(which(X %in% sort(X, decreasing=TRUE)[1:2]))
[1] 3.666667
> which(X %in% sort(X, decreasing=TRUE)[1:2])
[1] 2 4 5
> # Thomas Unternährer's suggestion:
> mean(match(sort(X, decreasing=TRUE)[1:2], X))
[1] 3.5
> match(sort(X, decreasing=TRUE)[1:2], X)
[1] 5 2
>

hope this helps,

Tony Plate

At Tuesday 02:23 PM 9/23/2003 +0200, Unternährer Thomas, uth wrote:

Hi,

>I would like to rank a time-series of data, extract the top ten data items from this series, determine the >corresponding row numbers for each value in the sample, and take a mean of these *row numbers* (not the data).

>I would like to do this in R, rather than pre-process the data on the UNIX command line if possible, as I need to >calculate other statistics for the series.

>I understand that I can use 'sort' to order the data, but I am not aware of a function in R that would allow me >to extract a given number of these data and then determine their positions within the original time series.

>e.g.

>Time series:
>1.0 (row 1)
>4.5 (row 2)
>2.3 (row 3)
>1.0 (row 4)
>7.3 (row 5)
>Sort would give me:
>1.0
>1.0
>2.3
>4.5
>7.3
>I would then like to extract the top two data items:
>4.5
>7.3
>and determine their positions within the original (unsorted) time series:
>4.5 = row 2
>7.3 = row 5
>then take a mean:

>2 and 5 = 3.5

>Thanks in advance.

>James Brown
X <- c(1, 4.5, 2.3, 1, 7.3)
X1 <- sort(X, decreasing=TRUE)[1:2]
X2 <- match(X1, X)
mean(X2)
Hope this helps

Thomas

___________________________________________

James Brown
Cambridge Coastal Research Unit (CCRU)
Department of Geography
University of Cambridge
Downing Place
Cambridge
CB2 3EN, UK
Telephone: +44 (0)1223 339776
Mobile: 07929 817546
Fax: +44 (0)1223 355674
E-mail: [EMAIL PROTECTED]
E-mail: [EMAIL PROTECTED]
http://www.geog.cam.ac.uk/ccru/CCRU.html
___________________________________________
On Wed, 10 Sep 2003, Jerome Asselin wrote:
> On September 10, 2003 04:03 pm, Kevin S. Van Horn wrote:
> >
> > Your method looks like a naive reimplementation of integration, and
> > won't work so well for distributions that have the great majority of
> > the probability mass concentrated in a small fraction of the sample
> > space.  I was hoping for something that would retain the
> > adaptability of integrate().
>
> Yesterday, I've suggested to use approxfun(). Did you consider my
> suggestion? Below is an example.
>
> N <- 500
> x <- rexp(N)
> y <- rank(x)/(N+1)
> empCDF <- approxfun(x,y)
> xvals <- seq(0,4,.01)
> plot(xvals,empCDF(xvals),type="l",
> xlab="Quantile",ylab="Cumulative Distribution Function")
> lines(xvals,pexp(xvals),lty=2)
> legend(2,.4,c("Empirical CDF","Exact CDF"),lty=1:2)
>
>
> It's possible to tune in some parameters in approxfun() to better
> match your personal preferences. Have a look at help(approxfun) for
> details.
>
> HTH,
> Jerome Asselin
>
> ______________________________________________
> [EMAIL PROTECTED] mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Tony Plate [EMAIL PROTECTED]

______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: AW: [R] Rank and extract data from a series

Reply via email to