Hi:

Here's one approach (not unique), and dragged out a bit to illustrate its
different components.

1. Create a list object, something like
   l <- vector('list', 600)
2. Populate it. There are several ways to do this, but one is to initially
create a vector of file names and then populate the list by looping over the
file names. If your file names have a simple format (dat001 - dat600, say),
then it's easy to create the file name vector with paste(); otherwise, you
may need to do more work. Then run a loop that assigns to each list
component the corresponding data frame, something like

for(i in seq_along(filenames)) l[[i]] <- get(filenames[i])

3. Create a function for one of the data sets, under the obvious proviso
that you intend to process each data frame in the list the same way. To
return only the p-values from a binomial test applied to each row of your
input data frame, the following works for me (explanation below):

f <- function(df)
  do.call(c, with(df, mapply(binom.test, x = X, n = N))[3, ])

4. Use lapply() to map the function to each component data frame in your
list; the result will also be a list.
 pvlist <- lapply(l, f)

5. *IF* each of your data frames has the same number of rows, you can use
the following to slurp together all the p-values into a matrix:

do.call(rbind, pvlist)

OTOH, if the number of rows vary from one data frame to another, it may be
best to keep the p-value results in list form or perhaps you could flatten
them into a numeric vector, depending on your purposes.

----
The function f:

 mapply() allows you, in this case, to execute the non-vectorized function
binom.test() to a pair of vector arguments supplied from the input data
frame. The result is a 9 x n matrix where each column comprises a list of
output for each of the n calls to binom.test() [where n = number of rows of
the input data frame]. Since you wanted the p-values (component/row 3), we
pull out the third row of the matrix. This will return a list, so using the
concatenation function c() in do.call() coerces them into a numeric vector
for output.

The lapply() call maps the function f to each component of the list of data
frames created in (2).
-----

An alternative approach to this problem would be to use the plyr (and
perhaps reshape, too) package, since it was designed to handle this
'split-apply-combine' strategy.

HTH,
Dennis

On Thu, Jul 29, 2010 at 1:05 AM, Wilson, Andrew <a.wil...@lancaster.ac.uk>wrote:

> I need to run binomial tests (binom.test) on a large set of data, stored
> in a table - 600 tests in total.
>
> The values of x are stored in a column, as are the values of n.  The
> data for each test are on a separate row.
>
> For example:
>
> X       N
> 11      19
> 9       26
> 13      21
> 13      27
> 18      30
>
> It is a two-tailed test, and P in all cases is 0.5.
>
> My question is:  Is there a quicker way of running these tests without
> having to type an individual command for each test - and ideally also to
> store the resulting p-values in a single data vector?
>
> Many thanks for any pointers,
>
> Andrew Wilson
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to