Re: [R] summary statistics into table/data base, many factors to analyse

2008-11-22 Thread Gabor Grothendieck
On Fri, Nov 21, 2008 at 5:50 AM, Gerit Offermann [EMAIL PROTECTED] wrote:
 Dear list,

 thanks to your help I managed to find means of analysing my data.

 However, the whole data set contains 264 variables. Of which some are
 factors, others are not. The factors tend to be grouped, e.g.
 data$f1304 to data$f1484 and data$f3204 to data$5408.

 But there are other types of variables in the data set as well,
 e.g. data$f1504.

 Not every spot is taken, i.e data$f1345 to data$1399 might not exist
 in the data set.

We can compute on the names like this (using the builtin anscombe
data set to get just columns y1, x1, x2, x3, x4).  Try this:

# display anscombe data set
anscombe

# names.x are names that start with x
names.x - grep(^x, names(anscombe), value = TRUE)
anscombe[, c(y1, names.x)]


 The solution summaryBy works for cross analysis, of which there is
 a handful. So I am not worried here.

 The solution from Jorge is fine.
 However, I am trying to get my head around how to efficiently
 reduce my data set to the dependet variable and the factors such that
 the solution is applicable.

 Having to type each variable into
 my.reduced.data - cbind(my.data$f1001, my.data$1002, my.data$1003...
 is an obvious option, but does not seem to be the most efficient one.

 Are there better ways to go about?

 Thanks,
 Gerit
 --
 Sensationsangebot nur bis 30.11: GMX FreeDSL - Telefonanschluss + DSL
 für nur 16,37 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] summary statistics into table/data base, many factors to analyse

2008-11-21 Thread Gerit Offermann
Dear list,

thanks to your help I managed to find means of analysing my data.

However, the whole data set contains 264 variables. Of which some are
factors, others are not. The factors tend to be grouped, e.g. 
data$f1304 to data$f1484 and data$f3204 to data$5408. 

But there are other types of variables in the data set as well, 
e.g. data$f1504. 

Not every spot is taken, i.e data$f1345 to data$1399 might not exist
in the data set. 

The solution summaryBy works for cross analysis, of which there is
a handful. So I am not worried here.

The solution from Jorge is fine. 
However, I am trying to get my head around how to efficiently
reduce my data set to the dependet variable and the factors such that
the solution is applicable.

Having to type each variable into
my.reduced.data - cbind(my.data$f1001, my.data$1002, my.data$1003...
is an obvious option, but does not seem to be the most efficient one.

Are there better ways to go about?

Thanks,
Gerit
-- 
Sensationsangebot nur bis 30.11: GMX FreeDSL - Telefonanschluss + DSL 
für nur 16,37 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] summary statistics into table/data base, many factors to analyse

2008-11-21 Thread Petr PIKAL
Hi

[EMAIL PROTECTED] napsal dne 21.11.2008 11:50:52:

 Dear list,
 
 thanks to your help I managed to find means of analysing my data.
 
 However, the whole data set contains 264 variables. Of which some are
 factors, others are not. The factors tend to be grouped, e.g. 
 data$f1304 to data$f1484 and data$f3204 to data$5408. 
 
 But there are other types of variables in the data set as well, 
 e.g. data$f1504. 
 
 Not every spot is taken, i.e data$f1345 to data$1399 might not exist
 in the data set. 
 
 The solution summaryBy works for cross analysis, of which there is
 a handful. So I am not worried here.
 
 The solution from Jorge is fine. 
 However, I am trying to get my head around how to efficiently
 reduce my data set to the dependet variable and the factors such that
 the solution is applicable.
 
 Having to type each variable into
 my.reduced.data - cbind(my.data$f1001, my.data$1002, my.data$1003...
 is an obvious option, but does not seem to be the most efficient one.

Maybe not so obvious. 
How did you get your data into R? By some read.* command? Then it shall be 
data frame with appropriate column type.

see str(mydata)

and you can choose only columns you really want by

mydata[, select.some.columns]

If your data is a list (see Intro manual for data types and its 
properties), then the transformation to data frame depends partly on how 
it looks like and if it has the same number of values.

do.call(cbind, mydata) shall combine all vectors in mydata however it 
will convert them to unique type as cbind produce matrix which has to have 
only one type of data.

If all variables have same length

do.call(data.frame, mydata)

will produce data frame and all variables shall be preserved in their 
respective type.

Regards
Petr


 
 Are there better ways to go about?
 
 Thanks,
 Gerit
 -- 
 Sensationsangebot nur bis 30.11: GMX FreeDSL - Telefonanschluss + DSL 
 für nur 16,37 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] summary statistics into table/data base, many factors to analyse

2008-11-20 Thread Gerit Offermann
Dear list,

I reduced my data to the following:

x - c(1,4,2,6,8,3,4,2,4,5,1,3)
y - as.factor(c(2,2,1,1,1,2,2,1,1,2,1,2))
z - as.factor(c(1,2,2,1,1,2,2,3,3,3,3,3))

I can produce the statistical summary just fine.
s1 - tapply(x, y, summary)
d1 - tapply(x, y, sd)
s2 - tapply(x, z, summary)
d2 - tapply(x, z, sd)

First thing:
I have 100 plus factors to analyse. Theirs names are f1001 to f1381 (about).
Is there a way to avoid having to write these lines 100 plus times?

Second thing:
How can I put the standard deviation and the summary statistics into one output?

Third thing:
In the end I want to write the summary statistics into a data base (Access). It 
would be fantastic if I could achieve a table such as:

factor  level  Min. 1st Qu.  MedianMean 3rd Qu.Max.   SDev.
y 1   1.000   2.000   3.000   3.833   5.500   8.000  2.714160
y 2   1.000   3.000   3.500   3.333   4.000   5.000  1.366260
z 1   1.0   3.5   6.0  5.0   7.0  8.0 3.6055513
.
.
.

I tried to unlist the matrices, but it did not help much.
it - NULL # it - Iterationen

for (i in 1:nlevels(z)){
 it[[i]] - unlist(s1[[i]])}
 

Help to any of the three points is greatly appreciated.

Cheers,
Gerit
--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] summary statistics into table/data base, many factors to analyse

2008-11-20 Thread Gabor Grothendieck
Look at summaryBy in the doBy package.

On Thu, Nov 20, 2008 at 9:16 AM, Gerit Offermann [EMAIL PROTECTED] wrote:
 Dear list,

 I reduced my data to the following:

 x - c(1,4,2,6,8,3,4,2,4,5,1,3)
 y - as.factor(c(2,2,1,1,1,2,2,1,1,2,1,2))
 z - as.factor(c(1,2,2,1,1,2,2,3,3,3,3,3))

 I can produce the statistical summary just fine.
 s1 - tapply(x, y, summary)
 d1 - tapply(x, y, sd)
 s2 - tapply(x, z, summary)
 d2 - tapply(x, z, sd)

 First thing:
 I have 100 plus factors to analyse. Theirs names are f1001 to f1381 (about).
 Is there a way to avoid having to write these lines 100 plus times?

 Second thing:
 How can I put the standard deviation and the summary statistics into one 
 output?

 Third thing:
 In the end I want to write the summary statistics into a data base (Access). 
 It would be fantastic if I could achieve a table such as:

 factor  level  Min. 1st Qu.  MedianMean 3rd Qu.Max.   SDev.
 y 1   1.000   2.000   3.000   3.833   5.500   8.000  2.714160
 y 2   1.000   3.000   3.500   3.333   4.000   5.000  1.366260
 z 1   1.0   3.5   6.0  5.0   7.0  8.0 
 3.6055513
 .
 .
 .

 I tried to unlist the matrices, but it did not help much.
 it - NULL # it - Iterationen

 for (i in 1:nlevels(z)){
 it[[i]] - unlist(s1[[i]])}


 Help to any of the three points is greatly appreciated.

 Cheers,
 Gerit
 --

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] summary statistics into table/data base, many factors to analyse

2008-11-20 Thread Jorge Ivan Velez
Dear Gerit,
Here is a start using a data set which first column is numeric and the rest
are factors 'f1', 'f2',,'f1381' (I'm using only 3):

# Data set
x - c(1,4,2,6,8,3,4,2,4,5,1,3)
y - as.factor(c(2,2,1,1,1,2,2,1,1,2,1,2))
z - as.factor(c(1,2,2,1,1,2,2,3,3,3,3,3))
mydata=data.frame(x,y,z)
mydata

# Function
foo=function(FACTOR) do.call(rbind,tapply(x,FACTOR,function(w)
c(summary(w),SD=sd(w

# Calculations
res=apply(mydata[,-1],2,foo)
res2=do.call(rbind,res)
rnames=rownames(res2)
rownames(res2)-NULL

# Output
final=data.frame(Factor=rep(names(res),lapply(res,function(x)
nrow(x))),Levels=rnames,res2)
colnames(final)=c('Factor','Level',c('Min.','1st.Qu.','Median','Mean','3rd.Qu.','Max.','SD'))
final

See ?tapply and ?do.call for details.

HTH,

Jorge



On Thu, Nov 20, 2008 at 9:16 AM, Gerit Offermann [EMAIL PROTECTED]wrote:

 Dear list,

 I reduced my data to the following:

 x - c(1,4,2,6,8,3,4,2,4,5,1,3)
 y - as.factor(c(2,2,1,1,1,2,2,1,1,2,1,2))
 z - as.factor(c(1,2,2,1,1,2,2,3,3,3,3,3))

 I can produce the statistical summary just fine.
 s1 - tapply(x, y, summary)
 d1 - tapply(x, y, sd)
 s2 - tapply(x, z, summary)
 d2 - tapply(x, z, sd)

 First thing:
 I have 100 plus factors to analyse. Theirs names are f1001 to f1381
 (about).
 Is there a way to avoid having to write these lines 100 plus times?

 Second thing:
 How can I put the standard deviation and the summary statistics into one
 output?

 Third thing:
 In the end I want to write the summary statistics into a data base
 (Access). It would be fantastic if I could achieve a table such as:

 factor  level  Min. 1st Qu.  MedianMean 3rd Qu.Max.   SDev.
 y 1   1.000   2.000   3.000   3.833   5.500   8.000  2.714160
 y 2   1.000   3.000   3.500   3.333   4.000   5.000  1.366260
 z 1   1.0   3.5   6.0  5.0   7.0  8.0
 3.6055513
 .
 .
 .

 I tried to unlist the matrices, but it did not help much.
 it - NULL # it - Iterationen

 for (i in 1:nlevels(z)){
 it[[i]] - unlist(s1[[i]])}


 Help to any of the three points is greatly appreciated.

 Cheers,
 Gerit
 --

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.