
Here's one approach (not unique), and dragged out a bit to illustrate its
different components.

1. Create a list object, something like
   l <- vector('list', 600)
2. Populate it. There are several ways to do this, but one is to initially
create a vector of file names and then populate the list by looping over the
file names. If your file names have a simple format (dat001 - dat600, say),
then it's easy to create the file name vector with paste(); otherwise, you
may need to do more work. Then run a loop that assigns to each list
component the corresponding data frame, something like

for(i in seq_along(filenames)) l[[i]] <- get(filenames[i])

3. Create a function for one of the data sets, under the obvious proviso
that you intend to process each data frame in the list the same way. To
return only the p-values from a binomial test applied to each row of your
input data frame, the following works for me (explanation below):

f <- function(df)
  do.call(c, with(df, mapply(binom.test, x = X, n = N))[3, ])

4. Use lapply() to map the function to each component data frame in your
list; the result will also be a list.
 pvlist <- lapply(l, f)

5. *IF* each of your data frames has the same number of rows, you can use
the following to slurp together all the p-values into a matrix:

do.call(rbind, pvlist)

OTOH, if the number of rows vary from one data frame to another, it may be
best to keep the p-value results in list form or perhaps you could flatten
them into a numeric vector, depending on your purposes.

The function f:

 mapply() allows you, in this case, to execute the non-vectorized function
binom.test() to a pair of vector arguments supplied from the input data
frame. The result is a 9 x n matrix where each column comprises a list of
output for each of the n calls to binom.test() [where n = number of rows of
the input data frame]. Since you wanted the p-values (component/row 3), we
pull out the third row of the matrix. This will return a list, so using the
concatenation function c() in do.call() coerces them into a numeric vector
for output.

The lapply() call maps the function f to each component of the list of data
frames created in (2).

An alternative approach to this problem would be to use the plyr (and
perhaps reshape, too) package, since it was designed to handle this
'split-apply-combine' strategy.


On Thu, Jul 29, 2010 at 1:05 AM, Wilson, Andrew <a.wil...@lancaster.ac.uk>wrote:

> I need to run binomial tests (binom.test) on a large set of data, stored
> in a table - 600 tests in total.
> The values of x are stored in a column, as are the values of n.  The
> data for each test are on a separate row.
> For example:
> X       N
> 11      19
> 9       26
> 13      21
> 13      27
> 18      30
> It is a two-tailed test, and P in all cases is 0.5.
> My question is:  Is there a quicker way of running these tests without
> having to type an individual command for each test - and ideally also to
> store the resulting p-values in a single data vector?
> Many thanks for any pointers,
> Andrew Wilson
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

R-help@r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to