from:"David Freedman"

Re: [R] bootstrapping quantile regression

2012-10-31 Thread David Freedman

A possiblie solution might be to use the survey package.  You could specify
that the data is clustered using the svydesign function, and then speciy the
replicate weights using the as.svrepdesign function.  And then, it would be
possible to use the withReplicates function to bootstrap the clusters

A copy of Thomas Lumley's book - Complex Surveys - would probably help with
this

Hope this helps



--
View this message in context: 
http://r.789695.n4.nabble.com/bootstrapping-quantile-regression-tp4647948p4648032.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] weighted mean by week

2012-07-17 Thread David Freedman

The plyr package is very helpful for this:

library(plyr)
ddply(x ,.(myweek), summarize, m1=weighted.mean(var1,myweight),
m2=weighted.mean(var2,myweight))


--
View this message in context: 
http://r.789695.n4.nabble.com/weighted-mean-by-week-tp4636814p4636816.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] weighted mean by week

2012-07-17 Thread David Freedman

If there are many variables, I'd then suggest the data.table package:

library(data.table)
dt=data.table(x)
dt[,lapply(.SD, function(x)weighted.mean(x,myweight)), keyby=c('group',
'myweek')]

The '.SD' is an abbreviation for all the variables in the data table
(excluding the grouping variables).  There's an .SDcols= 'variables of
interest' option if you want to limit the dozens of variables to only some
of them.  Or, in the data.table(x) statement, you could limit the created
data table to only the variables your interested in.

As an added benefit, the data.table approach is amazingly fast (particularly
when there are numerous grouping categories)


--
View this message in context: 
http://r.789695.n4.nabble.com/weighted-mean-by-week-tp4636814p4636825.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] weighted mean by week

2012-07-17 Thread David Freedman

Honestly, I wasn't sure what you wanted to do with 'group'.  Here it is with
the 'group' variable deleted

library(data.table) 
dt=data.table(x[,-1]) 
dt[,lapply(.SD, function(x)weighted.mean(x,myweight)), keyby='myweek'] 




--
View this message in context: 
http://r.789695.n4.nabble.com/weighted-mean-by-week-tp4636814p4636828.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Amelia error

2012-04-18 Thread David Freedman

I've had a little experience using the package, Amelia.   Are you sure that
your nominal variables - race, south, etc - are in your ad04 data frame ?

David Freedman

--
View this message in context: 
http://r.789695.n4.nabble.com/Amelia-error-tp4568455p4568600.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Trojan in R 2.13.0?

2011-04-14 Thread David Freedman

2.13.0 looks fine with VIPRE

david freedman
atlanta 

--
View this message in context: 
http://r.789695.n4.nabble.com/Trojan-in-R-2-13-0-tp3450084p3450784.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] for loop to merge .csvs

2011-02-09 Thread David Freedman


you might try merge_recurse(list(DF1,DF2,DF3,DF4,DF17)) in the reshape
package
-- 
View this message in context: 
http://r.789695.n4.nabble.com/for-loop-to-merge-csvs-tp3298549p3298565.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using summaryBy with weighted data

2011-01-17 Thread David Freedman


You might use the plyr package to get group-wise weighted means

library(plyr)
ddply(mydata,~group,summarise, b=mean(weights),
c=weighted.mean(response,weights))

hth
david freedman

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Using-summaryBy-with-weighted-data-tp3220761p3221212.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to this SAS transport file in R?

2010-12-05 Thread David Freedman


I've had good success with the read.xport function in the SASxport (not
foreign) package.  But couldn't you just use something like

mydata-read.xport('http/demo_c.xpt')

The transport file looks like it's one of the NHANES demographic files.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/How-to-this-SAS-transport-file-in-R-tp3073969p3074009.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] data.frame and formula classes of aggregate

2010-11-29 Thread David Freedman


Hi - I apologize for the 2nd post, but I think my question from a few weeks
ago may have been overlooked on a Friday afternoon.

I might be missing something very obvious, but is it widely known that the
aggregate function handles missing values differently depending if a data
frame or a formula is the first argument ?  For example, 

(d- data.frame(sex=rep(0:1,each=3),
wt=c(100,110,120,200,210,NA),ht=c(10,20,NA,30,40,50)))
x1- aggregate(d, by = list(d$sex), FUN = mean); 
names(x1)[3:4]- c('mean.dfcl.wt','mean.dfcl.ht')
x2- aggregate(cbind(wt,ht)~sex,FUN=mean,data=d); 
names(x2)[2:3]- c('mean.formcl.wt','mean.formcl.ht')
cbind(x1,x2)[,c(2,3,6,4,7)]

The output from the data.frame class has an NA if there are missing values
in the group for the variable with missing values.  But, the formula class
output seems to delete the entire row (missing and non-missing values) if
there are any NAs.  Wouldn't one expect that the 2 forms (data frame vs
formula) of aggregate would give the same result? 

thanks very much
david freedman, atlanta




-- 
View this message in context: 
http://r.789695.n4.nabble.com/data-frame-and-formula-classes-of-aggregate-tp3063668p3063668.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data.frame and formula classes of aggregate

2010-11-29 Thread David Freedman


Thanks for the information.  

There was a discussion of different results obtained with the formula and
data.frame methods for a paired t-test -- there are many threads, but one is
at 
http://r.789695.n4.nabble.com/Paired-t-tests-td2325956.html#a2326291

david freedman
-- 
View this message in context: 
http://r.789695.n4.nabble.com/data-frame-and-formula-classes-of-aggregate-tp3063668p3064177.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] aggregate with missing values, data.frame vs formula

2010-11-13 Thread David Freedman


It seems that the formula and data.frame forms of aggregate handle missing
values differently.  For example, 

(d=data.frame(sex=rep(0:1,each=3),
wt=c(100,110,120,200,210,NA),ht=c(10,20,NA,30,40,50)))
x1=aggregate(d, by = list(d$sex), FUN = mean);
names(x1)[3:4]=c('list.wt','list.ht')
x2=aggregate(cbind(wt,ht)~sex,FUN=mean,data=d);
names(x2)[2:3]=c('form.wt','form.ht')
cbind(x1,x2)

 Group.1 sex list.wt list.ht sex form.wt form.ht
1   0   0 110  NA   0 105  15
2   1   1  NA  401 205  35

So, the data.frame form deletes gives an NA if there are missing values in
the group for the variable with missing values.  But, the formula form
deletes the entire row (missing and non-missing values) if any of the values
are missing.  Is this what was intended or the best option ?

thanks, david freedman
-- 
View this message in context: 
http://r.789695.n4.nabble.com/aggregate-with-missing-values-data-frame-vs-formula-tp3041198p3041198.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Svy function doesn't work nested in user-defined function

2010-05-20 Thread David Freedman


I'm not sure that this is the problem, but are you certain that the variable
'con' is in audit ?  You check outside the function just really tells you
that X and SEX are in audit.

hth, david freedman (good to see another CDCer using R)
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Svy-function-doesn-t-work-nested-in-user-defined-function-tp2224732p2224843.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] percent by subset

2010-05-03 Thread David Freedman


how about:

d=data.frame(ht=rnorm(20,60,20),sex=sample(0:1,20,rep=T)); d
with(d,by(ht,sex,mean))
with(d,by(ht,sex,function(x)mean(x=60)))

hth, david freedman
-- 
View this message in context: 
http://r.789695.n4.nabble.com/percent-by-subset-tp2123057p2123079.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] binary logistic regression taking account clustering

2010-05-01 Thread David Freedman


The bootcov function for models fit using lrm in the rms package might also
be an option

hth
-- 
View this message in context: 
http://r.789695.n4.nabble.com/binary-logistic-regression-taking-account-clustering-tp2122255p2122311.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help wiht model.matrix -- how to keep missing values?

2010-04-28 Thread David Freedman


I'm not sure what to do with model.matrix, but you might look at some of the
code for ols in the rms package.  The rms package (from frank harrell), has
several options for keeping the NAs in the output from regression models so
that residuals are correctly aligned.

hth
david freedman
-- 
View this message in context: 
http://r.789695.n4.nabble.com/help-wiht-model-matrix-how-to-keep-missing-values-tp2072248p2073573.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] get means of elements of 5 matrices in a list

2010-04-27 Thread David Freedman


I've got a list of 5 matrices that are each 5 x 6.  I'd like to end up with a
5 x 6 matrix that contains the mean value of the 5 original matrices.  I can
do this by brute force, but there must be a better way than making each
matrix into a vector and then remaking a matrix

thanks very much for any help
david freedman

ll=list(structure(c(9.7, 17.6, 20.8, 24.1, 33.8, 14.5, 25.7, 29.8, 
33.6, 44.8, 21.8, 32.6, 37.5, 40.9, 53.3, 16.7, 26.1, 29.5, 32.7, 
42.6, 26.2, 34.3, 37, 39.8, 47.1, 31.9, 40.3, 43.3, 46.2, 54.1
), .Dim = 5:6), structure(c(9.4, 17.7, 20.7, 24.1, 33.7, 14.5, 
25.7, 29.8, 33.6, 44.8, 21.8, 32.7, 37.5, 40.9, 53.1, 16, 26, 
29.5, 32.7, 42.7, 25.6, 34.1, 37, 39.8, 47.1, 31.9, 40.3, 43.3, 
46.1, 54.1), .Dim = 5:6), structure(c(9.6, 17.7, 20.8, 24.4, 
34.3, 14.5, 25.7, 29.8, 33.6, 44.8, 21.8, 32.6, 37.5, 40.9, 53.2, 
16.7, 26.1, 29.5, 32.8, 42.8, 26.2, 34.2, 36.8, 39.9, 47.1, 31.9, 
40.3, 43.3, 46.1, 54.8), .Dim = 5:6), structure(c(9.7, 17.6, 
20.7, 24.1, 33.8, 14.5, 25.7, 29.8, 33.6, 44.8, 21.8, 32.6, 37.5, 
41.1, 52.6, 16.7, 26.1, 29.5, 32.8, 42.8, 26.1, 34.3, 37, 40, 
47.1, 31.9, 40.3, 43.3, 46.2, 54.9), .Dim = 5:6), structure(c(8.6, 
17.6, 20.7, 24.1, 33.8, 14.5, 25.6, 29.8, 33.6, 44.8, 21.8, 32.6, 
37.5, 40.9, 52.5, 16, 26, 29.4, 32.8, 42.8, 25.6, 34.2, 37, 40.1, 
47.1, 31.9, 40.3, 43.1, 46.1, 54.1), .Dim = 5:6))

ll
x=rbind(as.vector(ll[[1]]),as.vector(ll[[2]]),as.vector(ll[[3]]),as.vector(ll[[4]]),as.vector(ll[[5]]));
x
x2=apply(x,2,mean); matrix(x2,byrow=F,nrow=5); 

-- 
View this message in context: 
http://r.789695.n4.nabble.com/get-means-of-elements-of-5-matrices-in-a-list-tp2067722p2067722.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] get means of elements of 5 matrices in a list

2010-04-27 Thread David Freedman


thanks very much for the help - all of these suggestions were much better
than what I was doing
-- 
View this message in context: 
http://r.789695.n4.nabble.com/get-means-of-elements-of-5-matrices-in-a-list-tp2067722p2068329.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Ellipse that Contains 95% of the Observed Data

2010-03-29 Thread David Freedman


for a picture of the bagplot, try going to
http://www.statmethods.net/graphs/boxplot.html
-- 
View this message in context: 
http://n4.nabble.com/Ellipse-that-Contains-95-of-the-Observed-Data-tp1694538p1695236.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tapply syntax

2010-03-28 Thread David Freedman


sorry - I use many abbreviations and I try to remove them before I post 
questions/answers - 'set' is my abb. for subset

david

On 3/28/2010 8:27 PM, Jeff Brown [via R] wrote:
 What is the function set()?  Is that a typo?  When I type ?set I get 
 nothing, and when I try to evaluate that code R tells me it can't find 
 the function.

 View message @ http://n4.nabble.com/tapply-syntax-tp1692503p1694586.html
 To unsubscribe from Re: tapply syntax, click here 
  (link removed) =. 



-- 
View this message in context: 
http://n4.nabble.com/tapply-syntax-tp1692503p1694626.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tapply syntax

2010-03-26 Thread David Freedman


how about:

d1=data.frame(pat=c(rep('a',3),'b','c',rep('d',2),rep('e',2),'f'),var=c(1,2,3,1,2,2,3,2,4,4))
ds=set(d1,var %in% c(2,3))
with(ds,tapply(var,pat,FUN=length))

hth,
David Freedman, CDC, Atlanta
-- 
View this message in context: 
http://n4.nabble.com/tapply-syntax-tp1692503p1692553.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] for() loop

2010-03-15 Thread David Freedman


Hi - do you want 

sum(datjan$V4)
 OR
sum(unique(datjan$V4))  ?

there's also a cumsum function that might be useful

hth, 
David Freedman, Atlanta




Schmidt Martin wrote:
 
 Hello
 
 I'm working with R since a few month and have still many trivial  
 questions - I guess! Here is what I want:
 
 I have this matrix:
 dim(datjan)
 [1] 899   4
 
 The first 10 rows looks as follows:
 datjan[1:10,]
   V1 V2 V3 V4
 1  1961  1  1 24
 2  1961  1  2 24
 3  1961  1  3 24
 4  1961  1  4 24
 5  1961  1  5 24
 6  1961  1  6 27
 7  1961  1  7 27
 8  1961  1  8 27
 9  1961  1  9 27
 10 1961  1 10 27
 
 I tried now to create a for() loop, which gives me the sum of the 30  
 different classes (1:30!) in [,4].
 
 for(i in 1:30){
 sum(datjan[,4]==i)
 }
 
 R is then actually calculating the sum of i which certainly doesn't  
 exist and results in a 0 value
 
 t1-sum(datjan[,4]==1)
 t2-sum(datjan[,4]==2)
 .until '30'
 This way its working, but I won't find a end by doing all this by  
 hand, because there are many other matrix waiting.
 
 So, how can I make work that loop??
 
 thanks for helping me
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
-- 
View this message in context: 
http://n4.nabble.com/for-loop-tp1592713p1593235.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ordering columns in a data frame

2010-03-10 Thread David Freedman


I'm not sure what's incorrect about your result, but the following works:
d=data.frame(a=sample(letters[1:5],10,rep=T),b=rnorm(10),c=sample(1:10,10));
d
d[order(d$a,d$c),]

or, you can use orderBy:
lib(doBy)
orderBy(~a+b,data=d)  #use a - sign to sort in descending sequence

Did you leave off the tilde in your orderBy example?

hth
David Freedman, CDC Atlanta
-- 
View this message in context: 
http://n4.nabble.com/ordering-columns-in-a-data-frame-tp1587294p1587318.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with Hmisc, cut2, split and quantile

2010-03-08 Thread David Freedman


try 
as.numeric(read_data$DEC)

this should turn it into a numeric variable that you can work with

hth
David Freedman
CDC, Atlanta


Guy Green wrote:
 
 Hi Peter  others,
 
 Thanks (Peter) - that gets me really close to what I was hoping for.
 
 The one problem I have is that the cut approach breaks the data into
 intervals based on the absolute value of the Target data, rather than
 their frequency.  In other words, if the data ranged from 0 to 50, the
 data would be separated into 0-5, 5-10 and so on, regardless of the
 frequency within those categories.  However I want to get the data into
 deciles.
 
 The code that does this (incorporating Peter's) is:
 
 read_data=read.table(C:/Sample table.txt, head = T)
 read_data$DEC - with(read_data, cut(Target, breaks=10, labels=1:10))
 L - split(read_data, read_data$DEC)
 
 This means that I can get separate data frames, such as L$'10', which
 comes out tidy, but only containing 2 data items (the sample has 63 rows,
 so each decile should have 6+ data items):
  ActualTarget   DEC
 9   0.572 0.3778386   10
 31  0.2990.3546606   10
 
 If I try to adjust this to get deciles using cut2(), I can break the data
 into deciles as follows:
 
 read_data=read.table(C:/Sample table.txt, head = T)
 read_data$DEC - with(read_data, cut2(read_data$Target, g=10),
 labels=1:10)
 L - split(read_data, read_data$DEC)
 
 However this time, while the data is broken into even data frames, the
 labels for the separate data frames are unuseable, e.g.:
 $`[ 0.26477, 0.37784]`
 ActualTarget DEC
 6   0.243   0.2650960[ 0.26477, 0.37784]
 9   0.572   0.3778386[ 0.26477, 0.37784]
 10 -0.049  0.3212681[ 0.26477, 0.37784]
 15  0.780  0.2778518[ 0.26477, 0.37784]
 31  0.299  0.3546606[ 0.26477, 0.37784]
 33  0.105  0.2647676[ 0.26477, 0.37784]
 
 Could anyone suggest a way of rearranging this to make the labels useable
 again?  Sample data is reattached
 http://n4.nabble.com/file/n1585427/Sample_table.txt Sample_table.txt .
 
 Thanks,
 Guy
 
 
 
 Peter Ehlers wrote:
 
 On 2010-03-08 8:47, Guy Green wrote:

 Hello,
 I have a set of data with two columns: Target and Actual.  A
 http://n4.nabble.com/file/n1584647/Sample_table.txt Sample_table.txt  is
 attached but the data looks like this:

 Actual  Target
 -0.125  0.016124906
 0.135   0.120799865
 ... ...
 ... ...

 I want to be able to break the data into tables based on quantiles in
 the
 Target column.  I can see (using cut2, and also quantile) how to get
 the
 barrier points between the different quantiles, and I can see how I
 would
 achieve this if I was just looking to split up a vector.  However I am
 trying to break up the whole table based on those quantiles, not just
 the
 vector.

 However I would like to be able to break the table into ten separate
 tables,
 each with both Actual and Target data, based on the Target data
 deciles:

 top_decile = ...(top decile of read_data, based on Target data)
 next_decile = ...and so on...
 bottom_decile = ...
 
 I would just add a factor variable indicating to which decile
 a particular observation belongs:
 
   dat$DEC - with(dat, cut(Target, breaks=10, labels=1:10))
 
 If you really want to have separate data frames you can then
 split on the decile:
 
   L - split(dat, dat$DEC)
 
 -Peter Ehlers
 -- 
 Peter Ehlers
 University of Calgary
 
 
 
 
-- 
View this message in context: 
http://n4.nabble.com/Help-with-Hmisc-cut2-split-and-quantile-tp1584647p1585503.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to add a variable to a dataframe whose values are conditional upon the values of an existing variable

2010-02-27 Thread David Freedman


there's a recode function in the Hmisc package, but it's difficult (at least
for me) to find documentation for it

library(Hmisc)
week - c('SAT', 'SUN', 'MON', 'FRI');
recode(week,c('SAT', 'SUN', 'MON', 'FRI'),1:4)

HTH
-- 
View this message in context: 
http://n4.nabble.com/How-to-add-a-variable-to-a-dataframe-whose-values-are-conditional-upon-the-values-of-an-existing-vare-tp1571214p1572261.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plot of odds ratios obtained from a logistic model

2010-02-06 Thread David Freedman


You might want to look at the plot.Predict function in the rms package - it
allows you to plot the logits or probablities vs the predictor variable at
specified levels of other covariates (if any) in the model.  There are many
examples in http://cran.r-project.org/web/packages/rms/rms.pdf

David Freedman
-- 
View this message in context: 
http://n4.nabble.com/Plot-of-odds-ratios-obtained-from-a-logistic-model-tp1471496p1471566.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] replace a for loop with lapply or relative

2010-02-04 Thread David Freedman


Dear helpers.
I often need to make dichotomous variables out of continuous ones (yes, I
realize the problems with throwing away much of the information), but I then
like to check the min and max of each category.  I have the following simple
code to do this that cuts each variable (x1,x2,x3) at the 90th percentile,
and then prints the min and max of each category:

d=data.frame(x1=rnorm(100),x2=runif(100)); d=transform(d,x3=x1-x2)
d[,4:6]=data.frame(sapply(d,function(v)as.numeric(v=quantile(v,0.9;
names(d)[4:6]=c('x1high','x2high','x3high')
head(d)
for (i in
1:3){print(do.call(rbind,by(d[,i],d[,i+3],function(x)(c(min(x),max(x))}

Is there a way to replace the ugly for loop in the last line with some type
of apply function that would know that my continuous and indicator variable
are 3 variables apart in the dataframe?

Thanks very much
David Freedman
-- 
View this message in context: 
http://n4.nabble.com/replace-a-for-loop-with-lapply-or-relative-tp1469453p1469453.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tapply for function taking of 1 argument?

2010-02-03 Thread David Freedman


also, 

library(plyr)
ddply(d,~grp,function(df) weighted.mean(df$x,df$w))
-- 
View this message in context: 
http://n4.nabble.com/tapply-for-function-taking-of-1-argument-tp1460392p1461428.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Quartiles and Inter-Quartile Range

2010-01-25 Thread David Freedman


please do some reading - I *think* the main difference is that the x and y
axes are reversed, but I really don't know what SAS prints out

there are a many 'defaults' that are rather arbitrary - sometimes SAS uses
1, while R uses another

??qqnorm brings up a list of functions, including stats::qqnorm
?stats::qqnorm brings up the help page for the function

On Mon, Jan 25, 2010 at 12:41 PM, eeramalho [via R] 
ml-node+1289588-934349...@n4.nabble.comml-node%2b1289588-934349...@n4.nabble.com
 wrote:

 Thank you David. I got it now.

 Maybe you can help again.

 I am typing the following code to draw a QQ plot but the line does not look
 right?

 ***
 qqnorm(cbiomass)

 qqline(cbiomass)
 ***
 the graph looks very different from the graph generated by SAS.

 Thank you again.

 Emiliano.

 --
 Date: Fri, 22 Jan 2010 21:32:26 -0800
 From: [hidden 
 email]http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1289588i=0
 To: [hidden 
 email]http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1289588i=1
 Subject: Re: Quartiles and Inter-Quartile Range

 It's looks like you think that type=2 are the 'true' quantiles, but the
 default method in R is type=7

 You might want to look at ?stats::quantile

 hth
 david freedman

 --
  View message @
 http://n4.nabble.com/Quartiles-and-Inter-Quartile-Range-tp1145817p1213199.html
 To unsubscribe from Quartiles and Inter-Quartile Range, click here.


 --
 Quer fazer um Ã¡lbum Ãncrivel? ConheÃ§a o Windows Live Fotos clicando 
 aqui.http://www.eutenhomaisnowindowslive.com.br/?utm_source=MSN_Hotmailutm_medium=Taglineutm_campaign=InfuseSocial

 --
  View message @
 http://n4.nabble.com/Quartiles-and-Inter-Quartile-Range-tp1145817p1289588.html
 To unsubscribe from Re: Quartiles and Inter-Quartile Range, click here (link 
 removed) =.





-- 
Natalia and/or David

-- 
View this message in context: 
http://n4.nabble.com/Quartiles-and-Inter-Quartile-Range-tp1145817p1289653.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Quartiles and Inter-Quartile Range

2010-01-23 Thread David Freedman


and SAS give one a choice of 5 option, and i'm fairly sure that it used 
a different default than does R (although one of the 5 corresponds to 
the sas default)

see pctldef on
http://www.technion.ac.il/docs/sas/proc/z0146803.htm

my simple brain thinks of thinks of the problem as 'how does one 
calculate the median of 4 values?'

david freedman

Girish A.R. [via R] wrote:
 Interestingly, Hmisc::describe() and summary() seem to be using one 
 Type, and stats::fivenum() seems to be using another Type.

  fivenum(cbiomass)
 [1]  910.0 1039.0 1088.5 1156.5 1415.0
  summary(cbiomass)
Min. 1st Qu.  MedianMean 3rd Qu.Max.
 91010481088110411391415
  describe(cbiomass)$counts
n  missing   unique Mean  .05  .10  .25 
  .50  .75
 12  0 12   1104  920.5  938.3 1047.5 
 1088.5 1138.8
  .90  .95
 1248.7 1327.0

 cheers,
 -Girish

 View message @ 
 http://n4.nabble.com/Quartiles-and-Inter-Quartile-Range-tp1145817p1248728.html
  

 To unsubscribe from Re: Quartiles and Inter-Quartile Range, click here 
  (link removed) =. 



-- 
View this message in context: 
http://n4.nabble.com/Quartiles-and-Inter-Quartile-Range-tp1145817p1264043.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Quartiles and Inter-Quartile Range

2010-01-22 Thread David Freedman


It's looks like you think that type=2 are the 'true' quantiles, but the
default method in R is type=7

You might want to look at ?stats::quantile

hth
david freedman
-- 
View this message in context: 
http://n4.nabble.com/Quartiles-and-Inter-Quartile-Range-tp1145817p1213199.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Mutliple sets of data in one dataset....Need a loop?

2010-01-20 Thread David Freedman


You'll probably want to look at the 'by' function

d=data.frame(sex=rep(1:2,50),x=rnorm(100))
d$y=d$x+rnorm(100)
head(d)
cor(d)
by(d[,-1],d['sex'],function(df)cor(df))

You might also want to look at the doBy package
-- 
View this message in context: 
http://n4.nabble.com/Mutliple-sets-of-data-in-one-dataset-Need-a-loop-tp1018503p1018616.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SPLUS Seqtrial vs. R Packages for sequential clinical trials designs

2009-12-17 Thread David Freedman

You'll probably want to take a look at the CRAN Task View,
http://cran.r-project.org/web/views/ClinicalTrials.html

david freedman

Paul Miller wrote:

Hello Everyone,
Â
Iâm a SAS user who has recently become interested in sequential clinical
trials designs. Iâve discovered that the SASÂ based approaches for these
designs are either too costly or areÂ âexperimental.â So now Iâm
looking for alternative software. Two programs that seem promising are
SPLUS Seqtrial and R.
Â
I recently obtained a 30 day trial for the SPLUS Seqtrial add-on and have
worked my way through most of the examples in the manual. Iâve also
gotten access to R, installed a package called gsDesign, and worked
through most of the examples in its documentation.
Â
Although I donât yet have a good understanding of the various approaches
to sequential clinical trials designs, I thought that the gsDesign package
seemed very impressive. I also understand that there are several other R
packages that relate to sequential clinical trials designs, such as
AGSDest , GroupSeq, ldbounds, MChtest, PwrGSD, and Seqmon. Some of these
seem fairly comprehensive while others seem to focus on a single approach.
Â
My questions center on the adequacy of SPLUS Seqtrial and the R Packages.
I was wondering if there is anyone out there who would be familiar enough
with these to comment on their relative merits. Will SPLUS Seqtrial or the
R packages allow me to do all the designs Iâm ever likely to need? If I
pay for SPLUS Seqtrial, will I get anything that I canât get using the
various R packages? Are any of the R packages comprehensive? Or would it
at least be possible to cover all the types of designs that are commonly
used by employing a variety of R packages? What kind of validation work
generally goes into an R package and how would this likely compare to the
sort of validation work that has gone into Seqtrial?
Â
There may be other questions that I should be asking but havenât thought
of.
Â
At any rate, if some of you would be willing to share some advice or
insights Iâd greatly appreciate it.
Â
Thanks,
Â
Paul

__
The new Internet ExplorerÂ® 8 - Faster, safer, easier. Optimized for Y
er/
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
View this message in context:
http://n4.nabble.com/SPLUS-Seqtrial-vs-R-Packages-for-sequential-clinical-trials-designs-tp966071p966780.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Partial correlations and p-values

2009-12-05 Thread David Freedman


you might look at partial.r in the psych package


dadrivr wrote:
 
 I'm trying to write code to calculate partial correlations (along with
 p-values).  I'm new to R, and I don't know how to do this.  I have
 searched and come across different functions, but I haven't been able to
 get any of them to work (for example, pcor and pcor.test from the ggm
 package).
 
 In the following example, I am trying to compute the correlation between x
 and y, while controlling for z (partial correlation):
 
 x - c(1,20,14,7,9)
 y - c(5,6,7,9,10)
 z - c(13,27,16,5,4)
 
 What function can I append to this to find this partial correlation?  Many
 thanks!
 
 
 

-- 
View this message in context: 
http://n4.nabble.com/Partial-correlations-and-p-values-tp908641p949283.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to duplicate each row in a data.frame?

2009-12-04 Thread David Freedman


I *think* this is from from 'StatsRUs' - how about

as.data.frame(lapply(df,function(x)rep(x,n)))

hth, david freedman


pengyu.ut wrote:
 
 I want to duplicate each line in 'df' 3 times. But I'm confused why
 'z' is a 6 by 4 matrix. Could somebody let me know what the correct
 way is to duplicate each row of a data.frame?
 
 df=expand.grid(x1=c('a','b'),x2=c('u','v'))
 n=3
 z=apply(df,1
 ,function(x){
   result=do.call(rbind,rep(list(x),n))
   result
 }
 )
 z
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://n4.nabble.com/How-to-duplicate-each-row-in-a-data-frame-tp948830p948839.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Weighted descriptives by levels of another variables

2009-11-14 Thread David Freedman


In addition to using the survey package (and the svyby function), I've found
that many of the 'weighted' functions, such as wtd.mean, work well with the
plyr package.  For example, 

wtdmean=function(df)wtd.mean(df$obese,df$sampwt);
ddply(mydata, ~cut2(age,c(2,6,12,16)),'wtdmean')

hth, david freedman


Andrew Miles-2 wrote:
 
 I've noticed that R has a number of very useful functions for  
 obtaining descriptive statistics on groups of variables, including  
 summary {stats}, describe {Hmisc}, and describe {psych}, but none that  
 I have found is able to provided weighted descriptives of subsets of a  
 data set (ex. descriptives for both males and females for age, where  
 accurate results require use of sampling weights).
 
 Does anybody know of a function that does this?
 
 What I've looked at already:
 
 I have looked at describe.by {psych} which will give descriptives by  
 levels of another variable (eg. mean ages of males and females), but  
 does not accept sample weights.
 
 I have also looked at describe {Hmisc} which allows for weights, but  
 has no functionality for subdivision.
 
 I tried using a by() function with describe{Hmisc}:
 
 by(cbind(my, variables, here), division.variable, describe,  
 weights=weight.variable)
 
 but found that this returns an error message stating that the  
 variables to be described and the weights variable are not the same  
 length:
 
 Error in describe.vector(xx, nam[i], exclude.missing =  
 exclude.missing,  :
length of weights must equal length of x
 In addition: Warning message:
 In present  !is.na(weights) :
longer object length is not a multiple of shorter object length
 
 This comes because the by() function passes down a subset of the  
 variables to be described to describe(), but not a subset of the  
 weights variable.  describe() then searches the whatever data set is  
 attached in order to find the weights variables, but this is in its  
 original (i.e. not subsetted) form.  Here is an example using the  
 ChickWeight dataset that comes in the datasets package.
 
 data(ChickWeight)
 attach(ChickWeight)
 library(Hmisc)
 #this gives descriptive data on the variables Time and Chick by  
 levels of Diet)
 by(cbind(Time, Chick), Diet, describe)
 #trying to add weights, however, does not work for reasons described  
 above
 wgt=rnorm(length(Chick), 12, 1)
 by(cbind(Time, Chick), Diet, describe, weights=wgt)
 
 Again, my question is, does anybody know of a function that combines  
 both the ability to provided weighted descriptives with the ability to  
 subdivide by the levels of some other variable?
 
 
 Andrew Miles
 Department of Sociology
 Duke University
 
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://old.nabble.com/Weighted-descriptives-by-levels-of-another-variables-tp26354665p26355885.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Weighted descriptives by levels of another variables

2009-11-14 Thread David Freedman


In addition to using the survey package (and the svyby function), I've found
that many of the 'weighted' functions, such as wtd.mean, work well with the
plyr package.  For example, 

wtdmean=function(df)wtd.mean(df$obese,df$sampwt);
ddply(mydata, ~cut2(age,c(2,6,12,16)),'wtdmean')

hth, david freedman


Andrew Miles-2 wrote:
 
 I've noticed that R has a number of very useful functions for  
 obtaining descriptive statistics on groups of variables, including  
 summary {stats}, describe {Hmisc}, and describe {psych}, but none that  
 I have found is able to provided weighted descriptives of subsets of a  
 data set (ex. descriptives for both males and females for age, where  
 accurate results require use of sampling weights).
 
 Does anybody know of a function that does this?
 
 What I've looked at already:
 
 I have looked at describe.by {psych} which will give descriptives by  
 levels of another variable (eg. mean ages of males and females), but  
 does not accept sample weights.
 
 I have also looked at describe {Hmisc} which allows for weights, but  
 has no functionality for subdivision.
 
 I tried using a by() function with describe{Hmisc}:
 
 by(cbind(my, variables, here), division.variable, describe,  
 weights=weight.variable)
 
 but found that this returns an error message stating that the  
 variables to be described and the weights variable are not the same  
 length:
 
 Error in describe.vector(xx, nam[i], exclude.missing =  
 exclude.missing,  :
length of weights must equal length of x
 In addition: Warning message:
 In present  !is.na(weights) :
longer object length is not a multiple of shorter object length
 
 This comes because the by() function passes down a subset of the  
 variables to be described to describe(), but not a subset of the  
 weights variable.  describe() then searches the whatever data set is  
 attached in order to find the weights variables, but this is in its  
 original (i.e. not subsetted) form.  Here is an example using the  
 ChickWeight dataset that comes in the datasets package.
 
 data(ChickWeight)
 attach(ChickWeight)
 library(Hmisc)
 #this gives descriptive data on the variables Time and Chick by  
 levels of Diet)
 by(cbind(Time, Chick), Diet, describe)
 #trying to add weights, however, does not work for reasons described  
 above
 wgt=rnorm(length(Chick), 12, 1)
 by(cbind(Time, Chick), Diet, describe, weights=wgt)
 
 Again, my question is, does anybody know of a function that combines  
 both the ability to provided weighted descriptives with the ability to  
 subdivide by the levels of some other variable?
 
 
 Andrew Miles
 Department of Sociology
 Duke University
 
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://old.nabble.com/Weighted-descriptives-by-levels-of-another-variables-tp26354665p26355886.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using something like the by command, but on rows instead of columns

2009-11-09 Thread David Freedman


Some variation of the following might be want you want:

df=data.frame(sex=sample(1:2,100,replace=T),snp.1=rnorm(100),snp.15=runif(100))
df$snp.1[df$snp.11.0]-NA; #put some missing values into the data
x=grep('^snp',names(df)); x #which columns that begin with 'snp'
apply(df[,x],2,summary)
#or
apply(df[,x],2,FUN=function(x)mean(x,na=T))

hth,
david


Josh B-3 wrote:
 
 Hello R Forum users,
 
 I was hoping someone could help me with the following problem. Consider
 the following toy dataset:
 
 AccessionSNP_CRY2SNP_FLCPhenotype
 1NAA0.783143079
 2BQA0.881714811
 3BQA0.886619488
 4AQB0.416893034
 5AQB0.621392903
 6ASB0.031719125
 7ASNA0.652375037
 
 Accession = individual plants, arbitrarily identified by unique numbers
 SNP_ = individual genes. 
 SNP_CRY2 = the CRY2 gene. The plants either have the BQ, AQ, or AS
 genotype at the CRY2 gene. NA = missing data.
 SNP_FLC = the FLC gene. The plants either have the A or B genotype at
 the FLC gene. NA = missing data.
 Phenotype = a continuous variable of interest.
 
 I have a much larger number of columns corresponding to genes (i.e., more
 columns with the SNP_ prefix) in my real dataset. For each gene in turn
 (i.e., each SNP_ column), I would like to find the phenotypic variance
 for all of the plants with non-missing data. Note that the plants with
 missing genotype data (NA) differ for each gene (each SNP_ column).
 
 Would one of you be able to offer some specific code that could do this
 operation? Please rest assured that I am not a student trying to elicit
 help with a homework assignment. I am a post-doc with limited R skills,
 working with a large genetic dataset. 
 
 Thanks very much in advance to a wonderful online community.
 Sincerely,
 Josh
 
 
 
   
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://old.nabble.com/Using-something-like-the-%22by%22-command%2C-but-on-rows-instead-of-columns-tp26273840p26274373.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Within-group correlation confidence intervals

2009-09-20 Thread David Freedman


you should save your 3 variables into a new *dataframe*

d-mydata[,c(iq,education,achievement)]

and then the command would be
by(d,d$sex,function(df) cor.test(df$educ,df$achiev))

but you could also just use
by(mydata,mydata$sex,function(df) cor.test(df$educ,df$achiev))


david freedman



jlwoodard wrote:
 
 I'm trying to obtain within-group correlations on a subset of variables. I
 first selected my variables using the following command:
 mydata$x-mydata[c(iq,education,achievement)]
 
 I'd like to look at correlations among those variables separately for men
 and women. My gender variable in mydata is coded 1 (women) and 0 (men).
 
 I have successfully used the following to get within group correlations
 and p values:
 by(x,gender,function(x) rcorr(as.matrix(x)))
 
 However, I'm also interested in getting confidence intervals for the
 correlations as well, using cor.test.  I tried the following without
 success.
 
 by(x,gender,function(x) cor.test(as.matrix(x)))
 
 Even if I just use 2 variables (e.g., IQ and education), I get exactly the
 same output for men and women with this command:
 
 by(x,gender,function(x) cor.test(iq,education))
 
 I'm still in the learning stages with the by and cor.test functions, so I
 assume I'm using it incorrectly.  Is it possible to get the correlation
 confidence intervals for each group using this approach?  
 
 Many thanks in advance!
 
 John
 

-- 
View this message in context: 
http://www.nabble.com/Within-group-correlation-confidence-intervals-tp25509629p25530117.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] could not find function Varcov after upgrade of R?

2009-09-12 Thread David Freedman


I've had the same problem with predict.Design, and have sent an email to the
maintainer of the Design package at Vanderbilt University.  I wasn't even
able to run the examples given on the help page of predict.Design - I
received the same error about Varcov that you did.  

I *think* it's a problem with the package, rather than R 2.9.2, and I hope
the problem will soon be fixed.  I was able to use predict.Design with 2.9.2
until I updated the Design package a few days ago.

david freedman


zhu yao wrote:
 
 I uses the Design library.
 
 take this example:
 
 library(Design)
 n - 1000
 set.seed(731)
 age - 50 + 12*rnorm(n)
 label(age) - Age
 sex - factor(sample(c('Male','Female'), n,
   rep=TRUE, prob=c(.6, .4)))
 cens - 15*runif(n)
 h - .02*exp(.04*(age-50)+.8*(sex=='Female'))
 dt - -log(runif(n))/h
 label(dt) - 'Follow-up Time'
 e - ifelse(dt = cens,1,0)
 dt - pmin(dt, cens)
 units(dt) - Year
 dd - datadist(age, sex)
 options(datadist='dd')
 Srv - Surv(dt,e)
 
 f - cph(Srv ~ rcs(age,4) + sex, x=TRUE, y=TRUE)
 cox.zph(f, rank) # tests of PH
 anova(f)
 # Error in anova.Design(f) : could not find function Varcov
 
 
 
 Yao Zhu
 Department of Urology
 Fudan University Shanghai Cancer Center
 No. 270 Dongan Road, Shanghai, China
 
 
 2009/9/12 Ronggui Huang ronggui.hu...@gmail.com
 
 I cannot reproduce the problem you mentioned.

   ctl - c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
   trt - c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group - gl(2,10,20, labels=c(Ctl,Trt))
weight - c(ctl, trt)
anova(lm.D9 - lm(weight ~ group))
  sessionInfo()
 R version 2.9.2 (2009-08-24)
 i386-pc-mingw32

 locale:
 LC_COLLATE=Chinese (Simplified)_People's Republic of
 China.936;LC_CTYPE=Chinese (Simplified)_People's Republic of
 China.936;LC_MONETARY=Chinese (Simplified)_People's Republic of
 China.936;LC_NUMERIC=C;LC_TIME=Chinese (Simplified)_People's Republic
 of China.936

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 2009/9/12 zhu yao mailzhu...@gmail.com:
  After upgrading R to 2.9.2, I can't use the anova() fuction.
  It says could not find function Varcov .
  What's wrong with my computer? Help needed, thanks!
 
  Yao Zhu
  Department of Urology
  Fudan University Shanghai Cancer Center
  No. 270 Dongan Road, Shanghai, China
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 HUANG Ronggui, Wincent
 Doctoral Candidate
 Dept of Public and Social Administration
 City University of Hong Kong
 Home page: http://asrr.r-forge.r-project.org/rghuang.html

 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/could-not-find-function-%22Varcov%22-after-upgrade-of-R--tp25412881p25414017.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] could not find function Varcov after upgrade of R?

2009-09-12 Thread David Freedman


Thanks for the reminder - actually I think I've become sloppy about the 'T'
(and the order of loading the Hmisc and Design packages), but that doesn't
seem to be the problem.  Also, all of my packages have been updated

david

Carlos Alzola wrote:
 
 Did you type library(Hmisc,T) before loading Design?
 
 Carlos
 
 --
 From: David Freedman 3.14da...@gmail.com
 Sent: Saturday, September 12, 2009 8:26 AM
 To: r-help@r-project.org
 Subject: Re: [R] could not find function Varcov after upgrade of R?
 

 I've had the same problem with predict.Design, and have sent an email to 
 the
 maintainer of the Design package at Vanderbilt University.  I wasn't even
 able to run the examples given on the help page of predict.Design - I
 received the same error about Varcov that you did.

 I *think* it's a problem with the package, rather than R 2.9.2, and I
 hope
 the problem will soon be fixed.  I was able to use predict.Design with 
 2.9.2
 until I updated the Design package a few days ago.

 david freedman


 zhu yao wrote:

 I uses the Design library.

 take this example:

 library(Design)
 n - 1000
 set.seed(731)
 age - 50 + 12*rnorm(n)
 label(age) - Age
 sex - factor(sample(c('Male','Female'), n,
   rep=TRUE, prob=c(.6, .4)))
 cens - 15*runif(n)
 h - .02*exp(.04*(age-50)+.8*(sex=='Female'))
 dt - -log(runif(n))/h
 label(dt) - 'Follow-up Time'
 e - ifelse(dt = cens,1,0)
 dt - pmin(dt, cens)
 units(dt) - Year
 dd - datadist(age, sex)
 options(datadist='dd')
 Srv - Surv(dt,e)

 f - cph(Srv ~ rcs(age,4) + sex, x=TRUE, y=TRUE)
 cox.zph(f, rank) # tests of PH
 anova(f)
 # Error in anova.Design(f) : could not find function Varcov



 Yao Zhu
 Department of Urology
 Fudan University Shanghai Cancer Center
 No. 270 Dongan Road, Shanghai, China


 2009/9/12 Ronggui Huang ronggui.hu...@gmail.com

 I cannot reproduce the problem you mentioned.

   ctl - c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
   trt - c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group - gl(2,10,20, labels=c(Ctl,Trt))
weight - c(ctl, trt)
anova(lm.D9 - lm(weight ~ group))
  sessionInfo()
 R version 2.9.2 (2009-08-24)
 i386-pc-mingw32

 locale:
 LC_COLLATE=Chinese (Simplified)_People's Republic of
 China.936;LC_CTYPE=Chinese (Simplified)_People's Republic of
 China.936;LC_MONETARY=Chinese (Simplified)_People's Republic of
 China.936;LC_NUMERIC=C;LC_TIME=Chinese (Simplified)_People's Republic
 of China.936

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 2009/9/12 zhu yao mailzhu...@gmail.com:
  After upgrading R to 2.9.2, I can't use the anova() fuction.
  It says could not find function Varcov .
  What's wrong with my computer? Help needed, thanks!
 
  Yao Zhu
  Department of Urology
  Fudan University Shanghai Cancer Center
  No. 270 Dongan Road, Shanghai, China
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 HUANG Ronggui, Wincent
 Doctoral Candidate
 Dept of Public and Social Administration
 City University of Hong Kong
 Home page: http://asrr.r-forge.r-project.org/rghuang.html


 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 -- 
 View this message in context: 
 http://www.nabble.com/could-not-find-function-%22Varcov%22-after-upgrade-of-R--tp25412881p25414017.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/could-not-find-function-%22Varcov%22-after-upgrade-of-R--tp25412881p25416444.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] (no subject)

2009-09-12 Thread David Freedman


A better subject for your question might have been helpful.  There are many
options for hist and truehist in the MASS package, but this might help:

x=rnorm(100, mean=5, sd=3000)
hist(x, prob=T)
x2=density(x)
lines(x2$x,x2$y)


KABELI MEFANE wrote:
 
 Dear All
  
 I hope you can help me with this small problem. I want to draw a normal
 distribution line to this data: 
 p-rnorm(100, mean=5, sd=3000)
 hist(p)
  
 Kabeli
 
 
   
   [[alternative HTML version deleted]]
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/%28no-subject%29-tp25418417p25418513.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculate weighted mean for each group

2009-07-23 Thread David Freedman


After you fix your data frame and if you don't using 2 packages, you might
try something like:

lib(plyr) #for 'by' processing
lib(Hmisc) # for its wtd.mean function
d=data.frame(x=c(15,12,3,10,10),g=c(1,1,2,2,3),w=c(2,1,5,2,5)) ; d
ddply(d,~g,function(df) wtd.mean(df$x,df$w))



milton ruser wrote:
 
 try your first reproducible line first :-)
 
 
 
 On Thu, Jul 23, 2009 at 5:18 PM, Alexis Maluendas
 avmaluend...@gmail.comwrote:
 
 Hi R experts,

 I need know how calculate a weighted mean by group in a data frame. I
 have
 tried with aggragate() function:

 data.frame(x=c(15,12,3,10,10),g=c(1,1,1,2,2,3,3),w=c(2,3,1,5,5,2,5)) - d
 aggregate(d$x,by=list(d$g),weighted.mean,w=d$w)

 Generating the following error:

 Error en FUN(X[[1L]], ...) : 'x' and 'w' must have the same length

 Thanks in advance

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Calculate-weighted-mean-for-each-group-tp24634873p24635837.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Can I use mcnemar.test for 3*3 tables (or is there a bug in the command?)

2009-07-19 Thread David Freedman


There is a function mh_test in the coin package.  

library(coin)
mh_test(tt)

The documentation states, The null hypothesis of independence of row and
column totals is tested. The corresponding test for binary factors x and y
is known as McNemar test. For larger tables, Stuart’s W0 statistic (Stuart,
1955, Agresti, 2002, page 422, also known as Stuart-Maxwell test) is
computed.

hth, david freedman


Tal Galili wrote:
 
 Hello all,
 
 I wish to perform a mcnemar test for a 3 by 3 matrix.
 By running the slandered R command I am getting a result but I am not sure
 I
 am getting the correct one.
 Here is an example code:
 
 (tt -  as.table(t(matrix(c(1,4,1,
 0,5,5,
 3,1,5), ncol = 3
 mcnemar.test(tt, correct=T)
 #And I get:
 McNemar's Chi-squared test
 data:  tt
 McNemar's chi-squared = 7.6667, df = 3, p-value = *0.05343*
 
 
 Now I was wondering if the test I just performed is the correct one.
From looking at the Wikipedia article on mcnemar (
 http://en.wikipedia.org/wiki/McNemar's_test), it is said that:
 The Stuart-Maxwell
 testhttp://ourworld.compuserve.com/homepages/jsuebersax/mcnemar.htm
 is
 different generalization of the McNemar test, used for testing marginal
 homogeneity in a square table with more than two rows/columns
 
From searching for a Stuart-Maxwell
 testhttp://ourworld.compuserve.com/homepages/jsuebersax/mcnemar.htm
 in
 google, I found an algorithm here:
 http://www.m-hikari.com/ams/ams-password-2009/ams-password9-12-2009/abbasiAMS9-12-2009.pdf
 
From running this algorithm I am getting a different P value, here is the
 (somewhat ugly) code I produced for this:
 get.d - function(xx)
 {
   length1 - dim(xx)[1]
   ret1 - margin.table(xx,1) - margin.table(xx,2)
   return(ret1)
 }
 
 get.s - function(xx)
 {
   the.s - xx
   for( i in 1:dim(xx)[1])
   {
 for(j in 1:dim(xx)[2])
 {
   if(i == j)
   {
 the.s[i,j] - margin.table(xx,1)[i] + margin.table(xx,2)[i] -
 2*xx[i,i]
   } else {
 the.s[i,j] - -(xx[i,j] + xx[j,i])
   }
 }
   }
   return(the.s)
 }
 
 chi.statistic - t(get.d(tt)[-3]) %*% solve(get.s(tt)[-3,-3])  %*%
 get.d(tt)[-3]
 paste(the P value:, pchisq(chi.statistic, 2))
 
 #and the result was:
  the P value: 0.268384371053358
 
 
 
 So to summarize my questions:
 1) can I use mcnemar.test for 3*3 (or more) tables ?
 2) if so, what test is being performed (
 Stuart-Maxwellhttp://ourworld.compuserve.com/homepages/jsuebersax/mcnemar.htm)
 ?
 3) Do you have a recommended link to an explanation of the algorithm
 employed?
 
 
 Thanks,
 Tal
 
 
 
 
 
 -- 
 --
 
 
 My contact information:
 Tal Galili
 Phone number: 972-50-3373767
 FaceBook: Tal Galili
 My Blogs:
 http://www.r-statistics.com/
 http://www.talgalili.com
 http://www.biostatistics.co.il
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Can-I-use-%22mcnemar.test%22-for-3*3-tables-%28or-is-there-a-bug-in-the-command-%29-tp24556414p24556693.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help With Fleiss Kappa

2009-07-13 Thread David Freedman


If you apply the function to a simple dataframe or show your code, you might
be able to get more accurate help.  I've used the IRR package in the past
and haven't noticed any problems (maybe I overlooked them ?)

david freedman

mehdi ebrahim wrote:
 
 Hi All,
 
 I am using fleiss kappa for inter rater agreement.  Are there any know
 issues with Fleiss kappa calculation in R?  Even when I supply mock data
 with total agreement among the raters I do not get a kappa value of 1.
 instead I am getting negative values.
 
 I am using the irr package version 0.70
 
 Any help is much appreciated.
 
 Thanks and Regards
 
 M
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Help-With-Fleiss-Kappa-tp24456963p24461821.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Stratified data summaries

2009-07-12 Thread David Freedman


You might also want to look at the doBy package - one function is summaryBy:

summaryBY(var1 + var2 ~ patient_type, data=d, FUN=summary)

david freedman

Hayes, Rachel M wrote:
 
 Hi All,
 
  
 
 I'm trying to automate a data summary using summary or describe from the
 HMisc package.  I want to stratify my data set by patient_type.  I was
 hoping to do something like:
 
  
 
 Describe(myDataFrame ~ patient_type)
 
  
 
 I can create data subsets and run the describe function one at a time,
 but there's got to be a better way.  Any suggestions?
 
  
 
 Rachel
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Stratified-data-summaries-tp24419184p24449565.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multiplication of data frame with vector

2009-07-07 Thread David Freedman


Would the scale function work for this?  Something like
new=scale(df, center=T)

HTH,
david freedman

cir p wrote:
 
 
 Dear group:
 sorry for my beginners question, but I'm rather new to R and was searching
 high and low without success:
 
 I have a data frame (df) with variables in the rows and observations in
 the columns like (the actual data frame has 15 columns and 1789 rows):
 
   early1 early2 early3 early4 early5
 M386T1000  57056  55372  58012  55546  57309
 M336T9011063  10312  10674  10840  11208
 M427T9112064  11956  12692  12340  11924
 M429T91 4594   3890   4096   4019   4204
 M447T9026553  27647  26889  26751  26929
 
 Now I'm trying to transform each value column-wise to make columns to all
 have the same mean with:
 
 df * mean(mean(df)) / mean(df).
 
 I just can't get my head around this: mean(df) gives me the correct column
 means vector, and mean(mean(df)) gives me the correct total mean. The
 above operation works correctly for individual rows, i.e. if I do
  df[1,]*mean(mean(df))/mean(df)
 
 Can someone tell me what I am doing wrong??
 Thanks!
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Multiplication-of-data-frame-with-vector-tp24368878p24372764.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SAS-like method of recoding variables?

2009-06-23 Thread David Freedman


Frank, would you feel comfortable giving us the reference to the NEJM article
with the 'missing vs ' error ?  I'm sure that things like this happen
fairly often, and I'd like to use this example in teaching

thanks, david freedman


Frank E Harrell Jr wrote:
 
 Dieter Menne wrote:
 
 
 P.Dalgaard wrote:

 IF TYPE='TRUCK' and count=12 THEN VEHICLES=TRUCK+((CAR+BIKE)/2.2);
 vehicles - ifelse(TYPE=='TRUCK'  count=12, TRUCK+((CAR+BIKE)/2.2), NA)


 
 Read both versions to an audience, and you will have to admit that this
 is
 one of the cases where SAS is superior.
 
 Here's a case where SAS is clearly not superior:
 
 IF type='TRUCK' AND count12 THEN vehicles=truck+(car+bike)/2.2;
 
 If count is missing, the statement is considered TRUE and the THEN is 
 executed.  This is because SAS considers a missing as less than any 
 number.  This resulted in a significant error, never corrected, in a 
 widely cited New England Journal of Medicine paper.
 
 Frank
 
 
 Dieter
 
 
 
 
 -- 
 Frank E Harrell Jr   Professor and Chair   School of Medicine
   Department of Biostatistics   Vanderbilt University
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/SAS-like-method-of-recoding-variables--tp24152845p24165879.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] long format - find age when another variable is first 'high'

2009-05-25 Thread David Freedman


Dear R, 

I've got a data frame with children examined multiple times and at various
ages.  I'm trying to find the first age at which another variable
(LDL-Cholesterol) is = 130 mg/dL; for some children, this may never happen. 
I can do this with transformBy and ddply, but with 10,000 different
children, these functions take some time on my PCs - is there a faster way
to do this in R?  My code on a small dataset follows.  

Thanks very much, David Freedman

d-data.frame(id=c(rep(1,3),rep(2,2),3),age=c(5,10,15,4,7,12),ldlc=c(132,120,125,105,142,160))
d$high.ldlc-ifelse(d$ldlc=130,1,0)
d
library(plyr)
d2-ddply(d,~id,transform,plyr.minage=min(age[high.ldlc==1]));
library(doBy)
d2-transformBy(~id,da=d2,doby.minage=min(age[high.ldlc==1]));
d2
-- 
View this message in context: 
http://www.nabble.com/long-format---find-age-when-another-variable-is-first-%27high%27-tp23706393p23706393.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data summary and some automated t.tests.

2009-05-17 Thread David Freedman


You might want to try using a non-parametric test, such as wilcox.test.

How about some modification of the following:

d=data.frame(grp=rep(1:2,e=5),replicate(10,rnorm(100))); head(d)
lapply(d[,-1],function(.column)wilcox.test(.column~grp,data=d))

David Freedman


stephen sefick wrote:
 
 Up and down are the treatments.  These are replicates within date for
 percent cover of habiat.  This is habitat data for a stream
 restoration - up is the unrestored and dn is the restored.  I have
 looked at the density plots and they do not look gaussian - you are
 absolutely right.  Even log(n+1) transformed they do not look
 Gaussian.  Is there some other way that I would test for a difference
 that you can think of?  My thoughts were to run a Permutation t.test,
 but I am very new to permutations, and don't know if this applies.
 The other thing that I was thinking was to use a npmanova (adonis in
 vegan) to test if the centroids of the habitat classifications were
 different.  I am in the process of working up my thesis data for
 publication in a journal (there are other very interesting pieces to
 the data set that I am working with, and this is one of the last
 things that I need to wrap up before I can start editing/rewriting my
 masters work).  Any thoughts would be greatly appreciated.
 thanks,
 
 Stephen Sefick
 
 2009/5/16 Uwe Ligges lig...@statistik.tu-dortmund.de:


 stephen sefick wrote:

 I would like to preform a t.test to each of the measured variables
 (sand.silt etc.)

 I am a big fan of applying t.test()s, but in this case: Are you really
 sure?
 The integers and particularly boxplot(x) do not indicate very well that
 the
 variables are somehow close to Gaussian ...


 with a mean and sd for each of the treatments

 And what is the treatment???

 Best,
 Uwe Ligges


 (up or
 down), and out put this as a table  I am having a hard time
 starting- maybe it is to close to lunch.  Any suggestions would be
 greatly appreciated.

 Stephen Sefick

 x - (structure(list(sample. = structure(c(1L, 7L, 8L, 9L, 10L, 11L,
 12L, 13L, 14L, 2L, 3L, 4L, 5L, 6L, 1L, 7L, 8L, 9L, 10L, 11L,
 12L, 13L, 14L, 2L, 3L, 4L, 5L, 6L, 25L, 28L, 29L, 30L, 31L, 32L,
 33L, 34L, 35L, 26L, 25L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L,
 26L, 27L, 25L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 26L, 15L,
 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 16L, 15L, 17L, 18L, 19L,
 20L, 21L, 22L, 23L, 24L, 16L, 36L, 39L, 40L, 41L, 42L, 43L, 44L,
 45L, 46L, 37L, 36L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 37L,
 38L), .Label = c(0805-r1, 0805-r10, 0805-r11, 0805-r12,
 0805-r13, 0805-r14, 0805-r2, 0805-r3, 0805-r4, 0805-r5,
 0805-r6, 0805-r7, 0805-r8, 0805-r9, 0805-u1, 0805-u10,
 0805-u2, 0805-u3, 0805-u4, 0805-u5, 0805-u6, 0805-u7,
 0805-u8, 0805-u9, 1005-r1, 1005-r10, 1005-r11, 1005-r2,
 1005-r3, 1005-r4, 1005-r5, 1005-r6, 1005-r7, 1005-r8,
 1005-r9, 1005-u1, 1005-u10, 1005-u11, 1005-u2, 1005-u3,
 1005-u4, 1005-u5, 1005-u6, 1005-u7, 1005-u8, 1005-u9
 ), class = factor), date = structure(c(2L, 2L, 2L, 2L, 2L,
 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label =
 c(10/1/05,
 8/29/05), class = factor), Replicate = c(1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L
 ), site = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c(dn, up
 ), class = factor), sand.silt = c(20L, 45L, 90L, 21L, 80L,
 77L, 30L, 80L, 36L, 9L, 62L, 71L, 20L, 65L, 10L, 70L, 50L, 80L,
 90L, 97L, 94L, 82L, 30L, 10L, 65L, 80L, 90L, 70L, 10L, 50L, 60L,
 40L, 10L, 45L, 10L, 10L, 15L, 10L, 8L, 35L, 10L, 40L, 10L, 10L,
 28L, 5L, 45L, 35L, 2L, 10L, 40L, 2L, 70L, 40L, 20L, 30L, 50L,
 60L, 10L, 100L, 98L, 98L, 90L, 87L, 87L, 40L, 97L, 92L, 70L,
 50L, 81L, 35L, 70L, 89L, 28L, 28L, 82L, 81L, 33L, 80L, 40L, 40L,
 60L, 30L, 5L, 50L, 70L, 75L, 85L, 95L, 93L, 80L, 80L, 60L, 82L,
 60L, 5L, 70L, 80L, 40L), gravel = c(8L, 45L, 7L, 5L, 10L, 5L,
 35L, 7L, 45L, 60L, 0L, 0L, 5L, 8L, 25L, 0L, 45L, 15L, 0L, 1L,
 2L, 5L, 6L, 15L, 10L, 5L, 3L, 10L

Re: [R] Help with loops

2009-05-15 Thread David Freedman


I'm not quite sure what you want to do, but this might help:

d=data.frame(replicate(40, rnorm(20)))
d$sample=rep(c('a','b','c','d'),each=5)
lib(doBy)
summaryBy(.~sample,da=d)

David Freedman

Amit Patel-7 wrote:
 
 
 Hi
 I am trying to create a loop which averages replicates in my data.
 The original data has many rows. and consists of 40 column zz[,2:41] plus
 row headings in zz[,1]
 I am trying to average each set of values (i.e. zz[1,2:3] averaged and
 placed in average_value[1,2] and so on.
 below is my script but it seems to be stuck in an endless loop
 Any suggestions??
 
 for (i in 1:length(average_value[,1])) {
 average_value[i] - i^100; print(average_value[i])
 
 #calculates Meanss
 #Sample A
 average_value[i,2] - rowMeans(zz[i,2:3])
 average_value[i,3] - rowMeans(zz[i,4:5])
 average_value[i,4] - rowMeans(zz[i,6:7])
 average_value[i,5] - rowMeans(zz[i,8:9])
 average_value[i,6] - rowMeans(zz[i,10:11])
 
 #Sample B
 average_value[i,7] - rowMeans(zz[i,12:13])
 average_value[i,8] - rowMeans(zz[i,14:15])
 average_value[i,9] - rowMeans(zz[i,16:17])
 average_value[i,10] - rowMeans(zz[i,18:19])
 average_value[i,11] - rowMeans(zz[i,20:21])
 
 #Sample C
 average_value[i,12] - rowMeans(zz[i,22:23])
 average_value[i,13] - rowMeans(zz[i,24:25])
 average_value[i,14] - rowMeans(zz[i,26:27])
 average_value[i,15] - rowMeans(zz[i,28:29])
 average_value[i,16] - rowMeans(zz[i,30:31])
 
 #Sample D
 average_value[i,17] - rowMeans(zz[i,32:33])
 average_value[i,18] - rowMeans(zz[i,34:35])
 average_value[i,19] - rowMeans(zz[i,36:37])
 average_value[i,20] - rowMeans(zz[i,38:39])
 average_value[i,21] - rowMeans(zz[i,40:41])
   }
 
 
 thanks
 
 
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Help-with-loops-tp23558647p23560599.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] by-group processing

2009-05-08 Thread David Freedman


sorry about the mistake - the -data$Type doesn't work: the '-' sign isn't
valid for factors.  I *thought* I had checked this before submitting a
response !


HufferD wrote:
 
 On Thursday, May 07, 2009 7:45 PM, David Freedman wrote:
 
   ...how about:
 d=data[order(data$ID,-data$Type),]
 d[!duplicated(d$ID),]
 
 Does the -data$Type argument to the order function work?
 
 --
  David
  
  -
  David Huffer, Ph.D.   Senior Statistician
  CSOSA/Washington, DC   david.huf...@csosa.gov
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/by-group-processing-tp23417208p23453688.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] by-group processing

2009-05-07 Thread David Freedman


how about:

d=data[order(data$ID,-data$Type),]
d[!duplicated(d$ID),]


Max Webber wrote:
 
 Given a dataframe like
 
data
 ID Type N
   1  45900A 1
   2  45900B 2
   3  45900C 3
   4  45900D 4
   5  45900E 5
   6  45900F 6
   7  45900I 7
   8  49270A 1
   9  49270B 2
   10 49270E 3
   18 46550A 1
   19 46550B 2
   20 46550C 3
   21 46550D 4
   22 46550E 5
   23 46550F 6
   24 46550I 7
   
 
 containing an identifier (ID), a variable type code (Type), and
 a running count of the number of records per ID (N), how can I
 return a dataframe of only those records with the maximum value
 of N for each ID? For instance,
 
data
 ID Type N
   7  45900I 7
   10 49270E 3
   24 46550I 7
 
 I know that I can use
 
 tapply ( data $ N , data $ ID , max )
45900 46550 49270
7 7 3

 
 to get the values of the maximum N for each ID, but how is it
 that I can find the index of these values to subsequently use to
 subscript data?
 
 
 --
 maxine-webber
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/by-group-processing-tp23417208p23437592.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems producing a simple plot

2009-05-02 Thread David Freedman


you might also look at matplot

d=data.frame(x=1:10,y1=sort(rnorm(10)),y2=sort(rnorm(10)))
matplot(d[,1],d[,2:3],type='l',lty=1:2)

David Freedman


Steve Murray-3 wrote:
 
 
 Dear R Users,
  
 I have a data frame of the nature:
  
 head(aggregate_1986)
   Latitude Mean Annual Simulated Runoff per 1° Latitudinal Band
 1  -55574.09287
 2  -54247.23078
 3  -53103.40756
 4  -52 86.1
 5  -51 45.21980
 6  -50 55.92928
   Mean Annual Observed Runoff per 1° Latitudinal Band
 1491.9525
 2319.6592
 3237.8222
 4179.5391
 5105.2968
 6136.6124
  
  
 I am hoping to plot columns 2 and 3 against Latitude. I understand that
 you have to do this by plotting one column at a time, so I have been
 starting by attempting the following, but receiving errors:
  
 plot(aggregate_1986[1] ~ aggregate_1986[2], type=l)
 Error in model.frame.default(formula = aggregate_1986[1] ~
 aggregate_1986[2],  : 
   invalid type (list) for variable 'aggregate_1986[1]'
  
  
 Or...
  
  
  plot(aggregate_1986[1],aggregate_1986[2], type=l)
 Error in stripchart.default(x1, ...) : invalid plotting method
  
  
 I'm obviously doing something fundamentally wrong (!). I'd be grateful for
 any suggestions as to how I should go about this.
  
 Many thanks,
  
 Steve
  
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Problems-producing-a-simple-plot-tp23347296p23348181.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] sequence number for 'long format'

2009-05-01 Thread David Freedman


Dear R-help list,

I've got a data set in long format - each subject can have several (varying
in number) measurements, with each record representing one measurement.  I
want to assign a sequence number to each measurement, starting at 1 for a
person's first measurement.  I can do this with the by function, but there
must be an easier way.  

Here's my code - id is id number, age is the age of the person, and seq is
the sequence variable that I've created.  Thanks very much for the help.

david freedman, atlanta

ds=data.frame(list(id = c(1L, 1L, 1L, 1L, 8L, 8L, 16L, 16L, 16L,
16L, 16L, 19L, 32L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L,
79L, 79L, 80L, 80L, 80L, 80L, 85L, 86L, 96L, 96L, 96L, 103L,
103L, 106L, 106L, 106L, 106L, 106L, 106L, 106L, 140L, 140L, 144L,
144L, 144L, 144L, 144L, 144L, 144L, 146L, 146L, 146L, 146L, 160L,
160L, 160L, 160L, 160L, 160L, 164L, 164L, 176L, 176L, 176L, 176L,
176L, 176L, 176L, 176L, 181L, 190L, 192L, 192L, 192L, 192L, 192L,
192L, 197L, 197L, 197L, 224L, 224L, 224L, 229L, 232L, 232L, 232L,
232L, 232L, 232L, 232L, 249L, 249L), age = c(6.6054794521, 9.301369863,
22.638356164, 31.961670089, 17.15890411, 25.106091718, 8.197260274,
11.295890411, 14.191780822, 22.43394935, 28.6, 6.6794520548,
10.824657534, 10.479452055, 13.432876712, 15.408219178, 17.643835616,
19.268493151, 22.624657534, 26.139726027, 35.493497604, 37.6,
15.895890411, 23.351129363, 13.810958904, 16.783561644, 17.95890411,
22.430136986, 12.021902806, 14.904859685, 7.4219178082, 10.060273973,
15.802739726, 17.328767123, 31.028062971, 8.3945205479, 10.350684932,
13.783561644, 17.843835616, 21.816438356, 27.901437372, 34.3,
10.517808219, 18.18630137, 11.378082192, 14.794520548, 16.77260274,
23.101369863, 27.912328767, 34.316221766, 40.2, 8.6054794521,
11.561643836, 14.863013699, 17.835616438, 8.0219178082, 9, 9.9726027397,
10.690410959, 13.032876712, 30.138261465, 7.0602739726, 10.438356164,
8.9232876712, 9.9589041096, 10.915068493, 12.263013699, 14.257534247,
17.326027397, 18.454794521, 21.334246575, 45.190965092, 8.5643835616,
12.197260274, 15.405479452, 17.106849315, 27.843835616, 34.417522245,
39.9, 6.7890410959, 10.21369863, 15.857534247, 10.147945205,
13.473972603, 36.06844627, 17.331506849, 14.980821918, 15.939726027,
16.939726027, 17.619178082, 18.698630137, 37.084188912, 43.3,
7.7068493151, 10.726027397)))

head(ds,10)
x=with(ds,by(ds,list(id),FUN=function(dc)1:length(dc$age))); x[1:20];
ds$seq=unlist(x); head(ds,20)
-- 
View this message in context: 
http://www.nabble.com/sequence-number-for-%27long-format%27-tp23338043p23338043.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] piecewise linear regression

2009-03-07 Thread David Freedman


Hi - I'd like to construct and plot the percents by year in a small data set
(d) that has values between 1988 and 2007.  I'd like to have a breakpoint
(buy no discontinuity) at 1996.  Is there a better way to do this than in
the code below?

 d
   year percent   se
1  198830.6 0.32
2  198931.5 0.31
3  199030.9 0.30
4  199130.6 0.28
5  199229.3 0.25
6  199430.3 0.26
7  199629.9 0.24
8  199828.4 0.22
9  200027.8 0.22
10 200126.1 0.20
11 200225.1 0.19
12 200324.4 0.19
13 200423.7 0.19
14 200525.1 0.18
15 200623.9 0.20

dput(d)
structure(list(year = c(1988L, 1989L, 1990L, 1991L, 1992L, 1994L, 
1996L, 1998L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 
2007L), percent = c(30.6, 31.5, 30.9, 30.6, 29.3, 30.3, 29.9, 
28.4, 27.8, 26.1, 25.1, 24.4, 23.7, 25.1, 23.9, 23.9), se = c(0.32, 
0.31, 0.3, 0.28, 0.25, 0.26, 0.24, 0.22, 0.22, 0.2, 0.19, 0.19, 
0.19, 0.18, 0.2, 0.18)), .Names = c(year, percent, se), class =
data.frame, row.names = c(NA, 
-16L))

with(d,plot(year,percent,pch=16,xlim=c(1988,2007)))
m=lm(percent~ year + I(year-1996):I(year = 1996), weights=1/se,
subset=year=1988, da=d); 
points(d$year,predict(m,dafr(year=d$year)),type='l',lwd=2,col='red')

thanks very much
David Freedman


-- 
View this message in context: 
http://www.nabble.com/piecewise-linear-regression-tp22388118p22388118.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] beginner's question: group of regressors by name vector?

2009-02-12 Thread David Freedman


The predictors and outcomes in lm can be matrices, so you could use something
like the following:
 
x.mat=cbind(x1=rnorm(20),x2=rnorm(20))
y.mat=cbind(y1=rnorm(20),y2=rnorm(20))
lm(y.mat~x.mat)

David Freedman

ivowel wrote:
 
 dear r-experts: there is probably a very easy way to do it, but it eludes  
 me right now. I have a large data frame with, say, 26 columns named a  
 through z. I would like to define sets of regressors from this data  
 frame. something like
 
 myregressors=c(b, j, x)
 lm( l ~ myregressors, data=... )
 
 is the best way to create new data frames that contain all the variables I  
 want, then use ., and then destroy them again? or am I overlooking  
 something obvious?
 
 sincerely,
 
 /iaw
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/beginner%27s-question%3A-group-of-regressors-by-name-vector--tp21979180p21979495.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R equivalent of SAS Cochran-Mantel-Haenszel tests?

2009-02-09 Thread David Freedman


How about 'cmh_test' in the coin package?

From the PDF: The null hypothesis of the independence of y and x is tested,
block defines an optional factor for stratification. chisq_test implements
Pearson’s chi-squared test, cmh_test the Cochran-Mantel-Haenzsel test and
lbl_test the linear-by-linear association test for ordered data.

David Freedman



Michael Friendly wrote:
 
 In SAS, for a two-way (or 3-way, stratified) table, the CMH option in 
 SAS PROC FREQ gives
 3 tests that take ordinality of the factors into account, for both 
 variables, just the column variable
 or neither.   Is there an equivalent in R?
 The mantelhaen.test in stats gives something quite different (a test of 
 conditional independence for
 *nominal* factors in a 3-way table).
 
 e.g. I'd like to reproduce:
 *-- CMH tests;
 proc freq data=sexfun order=data;
   weight count;
   tables husband * wife / cmh chisq nocol norow;
   run;
 
   The FREQ Procedure
  Summary Statistics for Husband by Wife
   Cochran-Mantel-Haenszel Statistics (Based on Table Scores)
 
 StatisticAlternative HypothesisDF   Value  Prob
 
 1Nonzero Correlation1 10.01420.0016
 2Row Mean Scores Differ 3 12.56810.0057
 3General Association9 16.76890.0525
 
 -- 
 Michael Friendly Email: friendly AT yorku DOT ca 
 Professor, Psychology Dept.
 York University  Voice: 416 736-5115 x66249 Fax: 416 736-5814
 4700 Keele Streethttp://www.math.yorku.ca/SCS/friendly.html
 Toronto, ONT  M3J 1P3 CANADA
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/R-equivalent-of-SAS-Cochran-Mantel-Haenszel-tests--tp21914012p21918306.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Roc curves confidence intervals

2009-02-05 Thread David Freedman


Would you be interested in the cross-validation that's on pp 3 and 4 of the
ROCR package PDF?

plot(perf,lwd=3,avg=vertical,spread.estimate=boxplot,add=TRUE)

there are various options for the 'spread.estimate'

David Freedman

marc bernard-2 wrote:
 
 
 Dear all, I am looking for an R package that allows me to calculate and
 plot the confidence intervals for the roc curve using for example some
 bootstrapping. I tried ROCR who seems doing such work but i couldn't
 find the right option in it. Many thanks Bests Marc
 _
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Roc-curves-confidence-intervals-tp21850816p21851991.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] overlay plot question

2009-02-04 Thread David Freedman


You could change the second 'plot' to 'points'

David Freedman

David Kaplan-2 wrote:
 
 Greetings all,
 
 I have two logistic plots coming from two calls to plogis.  The code is
 
 .x - seq(-7.6, 7.6, length=100)
   plot(.x, plogis(.x, location=0, scale=1), xlab=x, ylab=Density,
  main=Logistic Distribution: location = 0, scale = 1, type=l)
   abline(h=0, col=gray)
  
 
 .y - seq(-7.6, 7.6, length=100)
   plot(.x, plogis(.x, location=2, scale=4), xlab=x, ylab=Density,
   main=Logistic Distribution: location = 2, scale = 4, type=l)
  abline(h=0, col=gray)
  
  remove(.x)
 
  remove(.y)
 
 
 I would like to overlay these on one plot.  Notice here the y-axis is 
 different.  But I would like to axis to be 0 to 1 as in the first plot.
 
 Any suggestions would be greatly appreciated.
 
 Thanks in advance,
 
 David
 
 
 -- 
 ===
 David Kaplan, Ph.D.
 Professor
 Department of Educational Psychology
 University of Wisconsin - Madison
 Educational Sciences, Room, 1061
 1025 W. Johnson Street
 Madison, WI 53706
 
 email: dkap...@education.wisc.edu
 homepage:
 http://www.education.wisc.edu/edpsych/default.aspx?content=kaplan.html
 Phone: 608-262-0836
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/overlay-plot-question-tp21841060p21843564.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem with subsetting in compating one column with a vector

2009-02-03 Thread David Freedman


Do you really want to use '==' ?  How about '%in%', as in

findings1-subset(findings,SUBJECTSID %in% ==SUBJECTS1$SUBJECTSID,
select=c(SUBJECTSID,ORGNUMRES))

David Freedman



dvkirankumar wrote:
 
 Hi all,
 I got one problem withsubset()function
 
 hear i executed:
 
 findings1-subset(findings,SUBJECTSID==SUBJECTS1$SUBJECTSID,select=c(SUBJECTSID,ORGNUMRES))
 hear SUBJECTS1$SUBJECTSID vector contains nearly  65  values
 the problem is after comparing and subsetting its not giving
 
 all the values related to that instead its giving randam values
 
 and giving warning that:
 
 
 
 Warning message:
 In SUBJECTSID == SUBJECTS1$SUBJECTSID :
   longer object length is not a multiple of shorter object length
 
 
 can any one suggest what I can do to retreave all the related data of
 subset
 and store in one object
 
 
 thanks in advance
 
 regards;
 kiran
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/problem-with-subsetting-in-compating-one-column-with-a-vector-tp21805448p21808625.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lapply and aggregate function

2009-02-03 Thread David Freedman


You might want to look at the doBy package

For (1), you could use
summaryBy(value~Light+Feed,data=myD, FUN=mean)

and for (2), the transformBy function would be helpful

David Freedman


Patrick Hausmann wrote:
 
 Dear list,
 
 I have two things I am struggling...
 
 # First
 set.seed(123)
 myD - data.frame( Light = sample(LETTERS[1:2], 10, replace=T),
  Feed  = sample(letters[1:5], 20, replace=T),
  value=rnorm(20) )
 
 # Mean for Light
 myD$meanLight - unlist( lapply( myD$Light,
  function(x) mean( myD$value[myD$Light == x]) ) )
 # Mean for Feed
 myD$meanFeed  - unlist( lapply( myD$Feed,
  function(x) mean( myD$value[myD$Feed == x]) ) )
 myD
 
 # I would like to get a new Var meanLightFeed
 # holding the Group-Mean for each combination (eg. A:a = 0.821581)
 # by(myD$value, list(myD$Light, myD$Feed), mean)[[1]]
 
 
 # Second
 set.seed(321)
 myD - data.frame( Light = sample(LETTERS[1:2], 10, replace=T),
  value=rnorm(20) )
 
 w1 - tapply(myD$value, myD$Light, mean)
 w1
 #  w1
 # A  B
 # 0.4753412 -0.2108387
 
 myfun - function(x) (myD$value  w1[x]  myD$value  w1[x] * 1.5)
 
 I would like to have a TRUE/FALSE-Variable depend on the constraint in
 myfun for each level in Light...
 
 As always - thanks for any help!!
 Patrick
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/lapply-and-aggregate-function-tp21811834p21812057.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to prevent duplications of data within a loop

2009-01-24 Thread David Freedman


or how about
lm(myData$response~as.matrix(myData[2:4]))

hth, david

Juliet Hannah wrote:
 
 Hi All,
 
 I had posted a question on a similar topic, but I think it was not
 focused. I am posting a modification that I think better accomplishes
 this.
 I hope this is ok, and I apologize if it is not. :)
 
 I am looping through variables and running several regressions. I have
 reason to believe that the data is being duplicated because I have
 been
 monitoring the memory use on unix.
 
 How can I avoid this? Here is an example of how I am going about this.
 For lm, I also tried model=FALSE, but this did not seem to do the job.
 Any ideas?
 Thanks for your time.
 
 Regards,
 
 Juliet
 
 # create data
 set.seed(1)
 response - rnorm(50)
 var1 - rnorm(50)
 var2 - rnorm(50)
 var3 - rnorm(50)
 myData - data.frame(response,var1,var2,var3)
 var.names - names(myData)[2:4]
 
 
 numVars - length(var.names)
 betas - rep(-1,numVars)
 names(betas) - var.names
 
 #run regression on var1 through var3.
 
 for (Var_num in 1:numVars)
 {
   col.name - var.names[Var_num]
   mylm - lm(response ~ get(col.name),data=myData,model=FALSE)
   betas[Var_num] - coef(mylm)[2]
 }
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/how-to-prevent-duplications-of-data-within-a-loop-tp21637431p21642232.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Is there any function can be used to compare two probit models made from same data?

2009-01-23 Thread David Freedman


Hi - wouldn't it be possible to bootstrap the difference between the fit of
the 2 models?  For example, if one had a *linear* regression problem, the
following script could be used (although I'm sure that it could be
improved):

library(MASS); library(boot)
#create intercorrelated data
Sigma - matrix(c(1,.5,.4,  .5,1,.8,  .4,.8,1),3,3)
Sigma
dframe-as.data.frame(mvrnorm(n-200, rep(0, 3), Sigma))
names(dframe)-c('disease','age','ht') #age and ht are predictors of
'disease'
head(dframe); cor(dframe)

#bootstrap the difference between models containing the 2 predictors
model.fun - function(data, indices) {
 dsub-dframe[indices,]
 m1se-summary(lm(disease~age,data=dsub))$sigma; 
 m2se-summary(lm(disease~ht,da=dsub))$sigma; 
 diff-m1se-m2se;  #diff is the difference in the SEs of the 2 models
 }
eye - boot(dframe,model.fun, R=200);  class(eye); names(eye);
des(an(eye$t))
boot.ci(eye,conf=c(.95,.99),type=c('norm'))
 


Ben Bolker wrote:
 
 
 jingjiang yan jingjiangyan at gmail.com writes:
 
 
 hi, people
 How can we compare two probit models brought out from the same data?
 Let me use the example used in An Introduction to R.
 Consider a small, artificial example, from Silvey (1970).
 
 On the Aegean island of Kalythos the male inhabitants suffer from a
 congenital eye disease, the effects of which become more marked with
 increasing age. Samples of islander males of various ages were tested for
 blindness and the results recorded. The data is shown below:
 
 Age: 20 35 45 55 70
 No. tested: 50 50 50 50 50
 No. blind: 6 17 26 37 44
 
 
 now, we can use the age and the blind percentage to produce a probit
 model
 and get their coefficients by using glm function as was did in An
 Introduction to R
 
 My question is, let say there is another potential factor instead of age
 affected the blindness percentage.
 for example, the height of these males. Using their height, and their
 relevant blindness we can introduce another probit model.
 
 If I want to determine which is significantly better, which function can
 I
 use to compare both models? and, in addition, compared with the Null
 hypothesis(i.e. the same blindness for all age/height) to prove this
 model
 is effective?

 
   You can use a likelihood ratio test (i.e.
 anova(model1,model0) to compare either model
 to the null model (blindness is independent of
 both age and height).  The age model and height
 model are non-nested, and of equal complexity.
 You can tell which one is *better* by comparing
 log-likelihoods/deviances, but cannot test
 a null hypothesis of significance. Most (but
 not all) statisticians would say you can compare 
 non-nested models by using AIC, but you don't
 get a hypothesis-test/p-value in this way.
 
 
   Ben Bolker
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Is-there-any-function-can-be-used-to-compare-two-probit-models-made-from-same-data--tp21614487p21625839.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] should I use rbind in my example?

2009-01-21 Thread David Freedman


Are you sure that you want to loop over variables for each subset, and do
this within each subset?  A simple way to generate summary statistics
(across variables and within categories) is the summaryBy function in the
doBy package:

d=data.frame(wt=rnorm(100,100,10),ht=rnorm(100,2,1),sex=gl(2,1,100))
library(doBy)
summaryBy(wt+ht~sex,da=d,FUN=c(mean,sd))

David Freedman


SNN wrote:
 
 
 Hi,
 
 This is just for print out so it looks nice. what I have is a loop that
 loops over my variables and calculates the mean and the sd for these
 variables. Then I need to rbind them so I can stack them up in one file.
 the problem is that only the fist header appears and the rest do not. this
 is because I used rbind.
 
 I am new into R , can you tell me how you put them in a 'list'?.
 
 Thanks
 
 
 
 
 jholtman wrote:
 
 What do you want to do with it?  Is this just for printing out? What
 other types of transformations are you intending to do?  Why not just
 put them in a 'list' and then write your own specialized print
 routine.
 
 On Wed, Jan 21, 2009 at 10:30 AM, SNN s.nan...@yahoo.com wrote:

 Hi,

 I need to rbind two data frames. Each one has a header . after the rbind
 I
 would like to keep the header for each and have the two data frames
 separated by a line. Is this possible to do in R?

  For example

  weight_meanweight_sd.dev
  F 14.3  4.932883
 M 34.7 10.692677

hight_meanhight_sd.dev
 F 35.0  7.071068
 M 34.7 10.692677


 --
 View this message in context:
 http://www.nabble.com/should-I-use-rbind-in-my-example--tp21585464p21585464.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 
 
 -- 
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390
 
 What is the problem that you are trying to solve?
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/should-I-use-rbind-in-my-example--tp21585464p21590200.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] conditional weighted quintiles

2009-01-19 Thread David Freedman


You might want to look at the 'quantreg' package, written by Roger Koenker,
in CRAN.  The associated vignette has many examples.


Abuzer Bakis wrote:
 
 Dear All,
 
 I am economist and working on poverty / income inequality.  I need
 descriptive
 statitics like the ratio of education expentitures between different
 income
 quintiles where each household has a different weight. After a bit of
 google search I found 'Hmisc' and 'quantreg' libraries for weighted
 quantiles.
 
 The problem is that these packages give me only weighted quintiles; but
 what
 I need is conditional weighted quintiles. The below example illustrates
 what I mean.
 
 x - data.frame(id=c(1:5),income=c(10,10,20,30,50),
 education=c(0,5,5,0,0),weight=c(3,2,3,1,1))
 x
 library(Hmisc)
 wtd.quantile(x$income,weights=x$weight)
 wtd.quantile(x$education,weights=x$weight)
 
 I would like to see the expenditure of each quintile conditional on
 income, i.e.
 the education expenditures of the 5th quintile equal to zero.
 
 Thanks in advance,
 -- 
 ozan bakis
--__--__--
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


-
David Freedman
Atlanta
-- 
View this message in context: 
http://www.nabble.com/conditional-weighted-quintiles-tp21536671p21543626.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tapply within a data.frame: a simpler alternative?

2008-12-11 Thread David Freedman


You might take a look at the transformBy function in the doBy package

For example, 
new.df=transformBy(~group,data=my.df, new=y/max(y))

David Freedman


baptiste auguie-2 wrote:
 
 Dear list,
 
 I have a data.frame with x, y values and a 3-level factor group,  
 say. I want to create a new column in this data.frame with the values  
 of y scaled to 1 by group. Perhaps the example below describes it best:
 
 x - seq(0, 10, len=100)
 my.df - data.frame(x = rep(x, 3), y=c(3*sin(x), 2*cos(x),  
 cos(2*x)), # note how the y values have a different maximum  
 depending on the group
  group = factor(rep(c(sin, cos, cos2), each=100)))
 library(reshape) 
 df.melt - melt(my.df, id=c(x,group)) # make a long format
 df.melt - df.melt[ order(df.melt$group) ,] # order the data.frame  
 by the group factor
 df.melt$norm - do.call(c, tapply(df.melt$value, df.melt$group,  
 function(.v) {.v / max(.v)})) # calculate the normalised value per  
 group and assign it to a new column
 library(lattice)
 xyplot(norm + value ~ x,groups=group,  data=df.melt, auto.key=T) #  
 check that it worked
 
 
 This procedure works, but it feels like I'm reinventing the wheel  
 using hammer and saw. I tried to use aggregate, by, ddply (plyr  
 package), but I coudn't find anything straight-forward.
 
 I'll appreciate any input,
 
 Baptiste
 
 
 
 
 
 _
 
 Baptiste Auguié
 
 School of Physics
 University of Exeter
 Stocker Road,
 Exeter, Devon,
 EX4 4QL, UK
 
 Phone: +44 1392 264187
 
 http://newton.ex.ac.uk/research/emag
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


-
David Freedman
Atlanta
-- 
View this message in context: 
http://www.nabble.com/tapply-within-a-data.frame%3A-a-simpler-alternative--tp20939647p20958347.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] partial correlation

2008-12-08 Thread David Freedman


not sure which library 'pcor' is in, but why don't you just use the ranks of
the variables and then perform the correlation on the ranks:

x-sample(1:10,10,rep=T)
y-x+ sample(1:10,10,rep=T)
cor(x,y)
cor(rank(x),rank(y))

HTH
david freedman



Jürg Brendan Logue wrote:
 
 Hej!
 
  
 
 I have the following problem: 
 
  
 
 I would like to do partial correlations on non-parametric data. I checked
 pcor (Computes the partial correlation between two variables given a set
 of other variables) but I do not know how to change to a Spearman Rank
 Correlation method [pcor(c(BCDNA,ImProd,A365),var(PCor))]
 
  
 
 Here's a glimpse of my data (data is raw data).
 
 
 A436
 
 A365
 
 Chla
 
 ImProd
 
 BCDNA
 
 
 0.001
 
 0.003
 
 0.624889
 
 11.73023
 
 0.776919
 
 
 0.138
 
 0.126
 
 0.624889
 
 27.29432
 
 0.357468
 
 
 0.075
 
 0.056
 
 0.624889
 
 105.3115
 
 0.429785
 
 
 0.009
 
 0.008
 
 0.312444
 
 55.2929
 
 0.547752
 
 
 0.005
 
 0.002
 
 0.624889
 
 26.9638
 
 0.738775
 
 
 0.018
 
 0.006
 
 0.312444
 
 31.14836
 
 0.705814
 
 
 0.02
 
 0.018
 
 2.22E-16
 
 11.90303
 
 0.755003
 
 
 0.002
 
 0.003
 
 0.624889
 
 7.829781
 
 0.712091
 
 
 0.047
 
 0.044
 
 1.523167
 
 1.423823
 
 0.710939
 
 
 0.084
 
 0.056
 
 13.7085
 
 1.533703
 
 0.280171
 
  
 
 Im really grateful for any help since I only recently started employing
 R.
 
  
 
 Best regards, 
 JBL
 
  
 
  
 
  
 
  
 
  
 
 
   [[alternative HTML version deleted]]
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


-
David Freedman
Atlanta
-- 
View this message in context: 
http://www.nabble.com/partial-correlation-tp20900010p20901761.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] unlist dataframes

2008-11-06 Thread David Freedman


try

newdata=do.call(rbind,l)

David Freedman, Atlanta




Naira wrote:
 
 Dear all,
 
 I would like to know whether it is possible to unlist elements and keep
 the original format of the data.
 To make it more clear, let me give an exemple:
 I have a list l of dataframes that I created with apply but which looks
 like this:
 
 x1=data.frame(Name=LETTERS[1:2],Age=1:2)
 x2=data.frame(Name=LETTERS[3:4],Age=3:4)
 l=list(x1,x2)
 l
 [[1]]
   Name Age
 1A   1
 2B   2
 
 [[2]]
   Name Age
 1C   3
 2D   4
 
 I would like to unlist l to create a dataframe with 2 columns and 4 rows
 but keeping the format of Name (character) and Age (numeric).
 
 Now when I unlist l, I obtain :
 
 unlist(l)
 Name1 Name2  Age1  Age2 Name1 Name2  Age1  Age2
 1 2 1 2 1 2 3 4
 
 Is there a way to at least obtain something like
 A 1 B 2 C 3 D 4 as result from the unlist??
 
 Thanks a lot for your replies
 Naira
 
 
 


-
David Freedman
Atlanta
-- 
View this message in context: 
http://www.nabble.com/unlist---dataframes-tp20358993p20359063.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot2 help

2008-07-24 Thread David Freedman


You might want to take a look at
http://had.co.nz/ggplot2/

There are many examples of how to use this function.  And, there's a book
coming out soon.

hope this helps
david freedman


Edna Bell wrote:
 
 Hi yet again!
 
 Thank you for being patient with me.
 
 Is there a how to for ggplot2, please?  I would like to look at it,
 but have no idea where to start, please.
 
 Thanks,
 Edna Bell
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


-
David Freedman
Atlanta
-- 
View this message in context: 
http://www.nabble.com/ggplot2-help-tp18639728p18643242.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] fill in area between 2 lines with a color

2008-07-20 Thread David Freedman


Hi - I'd like to have the area between 2 lines on a x-y plot be filled with
grey, but I haven't had
any luck using polygon or rect.  (In reality, I'd like to do this for twice
- once for a low group and once for a high group - and then I'd like to plot
a set of data points for a 'normal' group together with these 2 grey areas.)

Here's a simple example of the 2 lines:
 
age=1:10
y.low=rnorm(length(age),150,25)+10*age
y.high=rnorm(length(age),250,25)+10*age
plot(age,y.high,type='n',ylim=c(100,400),ylab='Y Range',xlab='Age (years)')
lines(age,y.low,col='grey')
lines(age,y.high,col='grey')

Is it possible to fill the area between the 2 lines filled with, for
example, 'grey30' ?

thanks very much in advance,
David Freedman

-
David Freedman
Atlanta
-- 
View this message in context: 
http://www.nabble.com/fill-in-area-between-2-lines-with-a-color-tp18556096p18556096.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

71 matches

Mail list logo