[R] Need help to split a given matrix is a sequential way

2010-03-30 Thread Megh

I need to split a given matrix in a sequential order. Let my matrix is :

 dat - cbind(sample(c(100,200), 10, T), sample(c(50,100, 150, 180), 10,
 T), sample(seq(20, 200, by=20), 10, T)); dat
  [,1] [,2] [,3]
 [1,]  200  100   80
 [2,]  100  180   80
 [3,]  200  150  180
 [4,]  200   50  140
 [5,]  100  150   60
 [6,]  100   50   60
 [7,]  100  100  100
 [8,]  200  150  100
 [9,]  100   50  120
[10,]  200   50  180

Now I need to split above matrix according to unique numbers in the 2nd
column. Therefore I have following :

 dat1 - dat[which(dat[,1] == unique(dat[,1])[1]),]
 dat2 - dat[-which(dat[,1] == unique(dat[,1])[1]),]; dat1; dat2
 [,1] [,2] [,3]
[1,]  200  100   80
[2,]  200  150  180
[3,]  200   50  140
[4,]  200  150  100
[5,]  200   50  180
 [,1] [,2] [,3]
[1,]  100  180   80
[2,]  100  150   60
[3,]  100   50   60
[4,]  100  100  100
[5,]  100   50  120

Now each of dat1 and dat2 needs to be splited according to the it's 2nd
column i.e. 

 dat11 - dat1[which(dat1[,2] == unique(dat1[,2])[1]),]
 dat12 - dat1[which(dat1[,2] == unique(dat1[,2])[2]),]
 dat13 - dat1[which(dat1[,2] == unique(dat1[,2])[3]),]; dat11; dat12;
 dat13
[1] 200 100  80
 [,1] [,2] [,3]
[1,]  200  150  180
[2,]  200  150  100
 [,1] [,2] [,3]
[1,]  200   50  140
[2,]  200   50  180

similarly for dat2..

This kind of sequential spliting would continue for
(no_of_cols_of_ogirinal_matrix -1) times. It would be greate if again I can
put all those matrices within a list object for further calculations.

Therefore you see if the original matrix is of small_size then that can be
handled manually. However for a moderately large matrix that task would be
very clumbersome. Therefore I am looking for some mechanized way to do that
for an arbitrary matrix.

Can anyone here help me on this regard?

Thank you so much for your kind attention.

-- 
View this message in context: 
http://n4.nabble.com/Need-help-to-split-a-given-matrix-is-a-sequential-way-tp1744803p1744803.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need help to split a given matrix is a sequential way

2010-03-30 Thread Ivan Calandra

Hi,
Not sure exactly how, but I think using a combination of unique() and 
split() could do what you're looking for.

I hope it will help you
Ivan

Le 3/30/2010 09:20, Megh a écrit :

I need to split a given matrix in a sequential order. Let my matrix is :

   

dat- cbind(sample(c(100,200), 10, T), sample(c(50,100, 150, 180), 10,
T), sample(seq(20, 200, by=20), 10, T)); dat
 

   [,1] [,2] [,3]
  [1,]  200  100   80
  [2,]  100  180   80
  [3,]  200  150  180
  [4,]  200   50  140
  [5,]  100  150   60
  [6,]  100   50   60
  [7,]  100  100  100
  [8,]  200  150  100
  [9,]  100   50  120
[10,]  200   50  180

Now I need to split above matrix according to unique numbers in the 2nd
column. Therefore I have following :

   

dat1- dat[which(dat[,1] == unique(dat[,1])[1]),]
dat2- dat[-which(dat[,1] == unique(dat[,1])[1]),]; dat1; dat2
 

  [,1] [,2] [,3]
[1,]  200  100   80
[2,]  200  150  180
[3,]  200   50  140
[4,]  200  150  100
[5,]  200   50  180
  [,1] [,2] [,3]
[1,]  100  180   80
[2,]  100  150   60
[3,]  100   50   60
[4,]  100  100  100
[5,]  100   50  120

Now each of dat1 and dat2 needs to be splited according to the it's 2nd
column i.e.

   

dat11- dat1[which(dat1[,2] == unique(dat1[,2])[1]),]
dat12- dat1[which(dat1[,2] == unique(dat1[,2])[2]),]
dat13- dat1[which(dat1[,2] == unique(dat1[,2])[3]),]; dat11; dat12;
dat13
 

[1] 200 100  80
  [,1] [,2] [,3]
[1,]  200  150  180
[2,]  200  150  100
  [,1] [,2] [,3]
[1,]  200   50  140
[2,]  200   50  180

similarly for dat2..

This kind of sequential spliting would continue for
(no_of_cols_of_ogirinal_matrix -1) times. It would be greate if again I can
put all those matrices within a list object for further calculations.

Therefore you see if the original matrix is of small_size then that can be
handled manually. However for a moderately large matrix that task would be
very clumbersome. Therefore I am looking for some mechanized way to do that
for an arbitrary matrix.

Can anyone here help me on this regard?

Thank you so much for your kind attention.

   


--
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calan...@uni-hamburg.de

**
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need help to split a given matrix is a sequential way

2010-03-30 Thread Dennis Murphy
Hi:

Does this work for you?

dat - as.data.frame(dat)
lapply(split(dat, dat$V1), function(x) split(x, x$V2))

The result contains two components for 100 and 200, and subcomponents
within each component.

HTH,
Dennis

On Tue, Mar 30, 2010 at 12:20 AM, Megh megh700...@yahoo.com wrote:


 I need to split a given matrix in a sequential order. Let my matrix is :

  dat - cbind(sample(c(100,200), 10, T), sample(c(50,100, 150, 180), 10,
  T), sample(seq(20, 200, by=20), 10, T)); dat
  [,1] [,2] [,3]
  [1,]  200  100   80
  [2,]  100  180   80
  [3,]  200  150  180
  [4,]  200   50  140
  [5,]  100  150   60
  [6,]  100   50   60
  [7,]  100  100  100
  [8,]  200  150  100
  [9,]  100   50  120
 [10,]  200   50  180

 Now I need to split above matrix according to unique numbers in the 2nd
 column. Therefore I have following :

  dat1 - dat[which(dat[,1] == unique(dat[,1])[1]),]
  dat2 - dat[-which(dat[,1] == unique(dat[,1])[1]),]; dat1; dat2
 [,1] [,2] [,3]
 [1,]  200  100   80
 [2,]  200  150  180
 [3,]  200   50  140
 [4,]  200  150  100
 [5,]  200   50  180
 [,1] [,2] [,3]
 [1,]  100  180   80
 [2,]  100  150   60
 [3,]  100   50   60
 [4,]  100  100  100
 [5,]  100   50  120

 Now each of dat1 and dat2 needs to be splited according to the it's 2nd
 column i.e.

  dat11 - dat1[which(dat1[,2] == unique(dat1[,2])[1]),]
  dat12 - dat1[which(dat1[,2] == unique(dat1[,2])[2]),]
  dat13 - dat1[which(dat1[,2] == unique(dat1[,2])[3]),]; dat11; dat12;
  dat13
 [1] 200 100  80
 [,1] [,2] [,3]
 [1,]  200  150  180
 [2,]  200  150  100
 [,1] [,2] [,3]
 [1,]  200   50  140
 [2,]  200   50  180

 similarly for dat2..

 This kind of sequential spliting would continue for
 (no_of_cols_of_ogirinal_matrix -1) times. It would be greate if again I can
 put all those matrices within a list object for further calculations.

 Therefore you see if the original matrix is of small_size then that can be
 handled manually. However for a moderately large matrix that task would be
 very clumbersome. Therefore I am looking for some mechanized way to do that
 for an arbitrary matrix.

 Can anyone here help me on this regard?

 Thank you so much for your kind attention.

 --
 View this message in context:
 http://n4.nabble.com/Need-help-to-split-a-given-matrix-is-a-sequential-way-tp1744803p1744803.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] A link to a collection of tutorials and videos on R.

2010-03-30 Thread datakid

A link to a collection of tutorials and videos on R.
Tutorials: http://www.dataminingtools.net/browsetutorials.php?tag=rdmt
Videos: http://www.dataminingtools.net/videos.php?id=8
-- 
View this message in context: 
http://n4.nabble.com/A-link-to-a-collection-of-tutorials-and-videos-on-R-tp1744835p1744835.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error bars

2010-03-30 Thread Iasonas Lamprianou
Dear friends,
I have a statistical question. Sometimes, if I compare boys to girls on a 
specific variable, the error bars (confidence interval of means) seem to 
overlap slightly. Still, when I run a t-test, I find statistically significant 
differences. The rule is clear: if the confidence intervals do not overlap, 
then there is statistically significant difference. But if they overlap 
slightly, we have to use a t-test to know for sure if the the two means differ 
significantly. The point is: is there a rule of thumb to say, for example, if 
the overlap is less than 20% of the length of the standard error, then a t-test 
would give significant results?

thank you for your time

P.S.1 is there an easy way to plot error bars in R?
P.S.2 an interesting discussion about this - highly recommended to read it - 
can be found at 
http://scienceblogs.com/cognitivedaily/2007/03/ill_bet_you_dont_understand_er.php
 

jason

Dr. Iasonas Lamprianou


Assistant Professor (Educational Research and Evaluation)
Department of Education Sciences
European University-Cyprus
P.O. Box 22006
1516 Nicosia
Cyprus 
Tel.: +357-22-713178
Fax: +357-22-590539


Honorary Research Fellow
Department of Education
The University of Manchester
Oxford Road, Manchester M13 9PL, UK
Tel. 0044  161 275 3485
iasonas.lampria...@manchester.ac.uk






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] use logical in cor.test

2010-03-30 Thread pgseye

Thanks for the replies.

In response to Erik:
What does
 Both[,1]
show you?

 Both[,1]
   [1] 3.36   NA   NA   NA   NA   NA   NA 3.92 3.50   NA   NA   NA   NA 3.76
3.19 3.83   NA 3.66..

What does
 Both[,1]  2.5
show you? 

 Both[,1]2.5
   [1]  TRUENANANANANANA  TRUE  TRUENANA   
NANA  TRUE  TRUE


I understand a logical variable is binary, but don't know how to select a
subset of the data (have tried the subset function, but can't seem to get it
to work)

Bill, when I run what you suggested, I get:

 tBoth - Both 
 is.na(tBoth[tBoth  2.5]) - TRUE 
Error in is.na(tBoth[tBoth  2.5]) - TRUE : 
  NAs are not allowed in subscripted assignments
 R - cor(tBoth, use = complete.obs) 
 R[1,2]
[1] 0.7750889

Any idea with the error message?

Thanks again,

Paul
-- 
View this message in context: 
http://n4.nabble.com/use-logical-in-cor-test-tp1744701p1744896.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] getting CI's for certain y of nls fitted curve

2010-03-30 Thread Kay Cichini

...it's of course simply using the desired x in the predict function. 
in this case: predict(mod1,data.frame(press = x_tenth[1]).

it must have been a trivial syntax error, why this didn't work in the first
place.

kay
-- 
View this message in context: 
http://n4.nabble.com/getting-CI-s-for-certain-y-of-nls-fitted-curve-tp1695025p1744909.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Confusing concept of vector and matrix in R

2010-03-30 Thread Barry Rowlingson
On Tue, Mar 30, 2010 at 2:42 AM, Rolf Turner r.tur...@auckland.ac.nz wrote:

 Well then, why don't you go away and design and build your own statistics and
 data analysis language/package to replace R?  You can then make whatever
 design decisions you like, and you won't have to live with the design 
 decisions
 made by such silly and inept people as John Chambers and Rick Becker and 
 their ilk.

 Aah, argument by (ironic) reference to learned authority!

 Even Einstein was wrong (God does not play dice). He was also
right, thought he was wrong, and then we've discovered he may have
been right all along (The Cosmological Constant, Dark Energy etc).

 How many of us have _never_ interfaced our foreheads with the
keyboard when something breaks because we didn't put ,drop=FALSE in
a matrix subscript?

 There is no doubt that R plays fast and loose with many concepts of
type and structure that Computer Scientists would turn their nose up
at. I would love to go away and redesign it, but I'd just end up with
python. Truth is that R's statistical power is what makes it great
because of the vast wealth of CRAN, not the R language per se with its
features that so fluster my comp-sci friends. And many a beginner.

We work round them by bashing our heads on the keyboards, typing
,drop=FALSE, and vowing never to do it again. And writing more unit
tests.

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help in matlab - r code

2010-03-30 Thread marta rufino
Dear Susanne,

Thank you for your answer :-) and for the other people that helped
privately.
I have been running the code with a friend, and we reached a similar
conclusion. Matlab, apparently automatically transforms the matrices in
vector and does the correlation between vectors thus obtaining one value
only. This differs also from octave, which as R, do the correlation of the
matrices*matrices. I managed to sort this out, transforming the matrices
into vectors, simply by adding a c before.

We also found this aspect of 0-1 and 1-255 intriguing.In the end, I used a
formula from a friend to do the transformation, getting a similar values in
both programmes.

Here's the code:

ImageWidth = dim(x)[2] #number of col
MaxOffset = 99; #defined variable
ImageWidthToProcess = ImageWidth-MaxOffset; #col-defined variable
AutoCData=0 #dif from matlab: in R we should create the matrix/dataframe
where we will store the data created by the loop

for(Offset in 1:MaxOffset){
#Offset=2
OffsetPlaquette = x[,c(1+Offset):c(ImageWidthToProcess+Offset)]
AutoCData[Offset] = cor(c(x[,1:ImageWidthToProcess]),
c(OffsetPlaquette))
print(Offset)
}
AutoCData
plot(AutoCData)

### COOL :-)

The results were very similar to matlab.
I still have many lines to go :-)

Using the function you very well produced (thank you so much), the
difference between the results of the two are low
summary(AutoCData2-AutoCData)

  Min.1st Qu. Median   Mean3rd Qu.
-2.581e-15 -2.741e-16 -2.082e-17  5.723e-17  2.637e-16
  Max.
 5.329e-15


which is good!

All the best,
Marta



2010/3/29 Susanne Schmidt s.schm...@bham.ac.uk

 Dear Marta,

 I did it in Matlab, and fiddled around with R code until I had *almost* the
 same result. The almost is probably due to R handling the picture values
 (ranging from 0 to 1) differently than Matlab (ranging from 0 to 255), and
 simply multiplying the R picture values by 255 did NOT result in exactly the
 same values as the Matlab values. [what seems white in the picture is 245 in
 Matlab, although values potentially range to 255, and white is 0.9642549 in
 R, which multiplied by 255 gives 245.12, e.g.]
 But maybe the precision of this solution is good enough for you ..

 The corr2 demand from matlab is a 2D correlation coefficient - the R
 command cor works elementwise, and is not the solution here.
 Below I tried to implement the formula given in the following matlab page:
 http://www.mathworks.com/access/helpdesk/help/toolbox/images/corr2.html

 Maybe somebody on the list has a nice idea how to make the code more
 elegant

 This is the complete code in R

 setwd(D:/   wherever )
 library(ReadImages)
 x - read.jpeg(  whichever  .jpg) #open image

 plot(x) #plot image
 x - rgb2grey(x) #convert to greyscale
 plot(x) # check ;-) the image is in grey scale

 ImageWidth = dim(x)[2] #number of col
 MaxOffset = 99; #defined variable
 ImageWidthToProcess = ImageWidth-MaxOffset; #col-defined variable

 ## this one does NOT work because matrices not square:
 for(k in 1: MaxOffset) {
   OffsetPlaquette - x[  , c((1+ k)   :  (ImageWidthToProcess + k))]
   dataToProcess - x[,c(1:ImageWidthToProcess)]
  AutoCData[k] -  mantel(OffsetPlaquette, dataToProcess)
  }
 AutoCData
 ## END this one does not work because matrices not square


 AutoCData - rep(0, MaxOffset)
 sumBothM - rep(0, MaxOffset)
 sum1stMsq - rep(0, MaxOffset)
 sum2ndMsq - rep(0, MaxOffset)
 for(k in 1: MaxOffset) {
   OffsetPlaquette - x[  ,(1+k) : (ImageWidthToProcess + k)]
   dataToProcess - x[,c(1:ImageWidthToProcess)]
   meanM - mean(OffsetPlaquette); meanM2 - mean(dataToProcess)
   for(j in 1:dim(dataToProcess)[2]){
 for(i in 1:dim(OffsetPlaquette)[1]){
sumBothM[k]  - sumBothM[k]  +
  (OffsetPlaquette[i,j]-meanM)*(dataToProcess[i,j]-meanM2)
sum1stMsq[k] - sum1stMsq[k] +  (OffsetPlaquette[i,j]-meanM)^2
sum2ndMsq[k] - sum2ndMsq[k] +  (dataToProcess[i,j]-meanM2)^2
}
 }
  AutoCData[k] -  sumBothM[k]/(sqrt(sum1stMsq[k] *  sum2ndMsq[k]))
  }

 AutoCData


 Best wishes,
 Susanne


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] S3 vs S4

2010-03-30 Thread Ivan Calandra
Dear R users,

I'm still a beginner and I'm wondering whether S3 or S4 methods really 
differ for my use.

I understand more or less the distinction between the 2 classes from the 
documentation I've read but the big question is: _*does it make a 
difference in practice**?*_

Up to now, I've worked without noticing anything, but it might be 
important to differentiate and to know which one to use and how.

Thank you for your help
Regards,
Ivan

-- 
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calan...@uni-hamburg.de

**
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Confusing concept of vector and matrix in R

2010-03-30 Thread Mario Valle
Reframe the problem. Rethink why you need to keep dimensions. I never ever had 
to use drop.
My .02 something
mario

Barry Rowlingson wrote:
 On Tue, Mar 30, 2010 at 2:42 AM, Rolf Turner r.tur...@auckland.ac.nz wrote:
 
 Well then, why don't you go away and design and build your own statistics and
 data analysis language/package to replace R?  You can then make whatever
 design decisions you like, and you won't have to live with the design 
 decisions
 made by such silly and inept people as John Chambers and Rick Becker and 
 their ilk.
 
  Aah, argument by (ironic) reference to learned authority!
 
  Even Einstein was wrong (God does not play dice). He was also
 right, thought he was wrong, and then we've discovered he may have
 been right all along (The Cosmological Constant, Dark Energy etc).
 
  How many of us have _never_ interfaced our foreheads with the
 keyboard when something breaks because we didn't put ,drop=FALSE in
 a matrix subscript?
 
  There is no doubt that R plays fast and loose with many concepts of
 type and structure that Computer Scientists would turn their nose up
 at. I would love to go away and redesign it, but I'd just end up with
 python. Truth is that R's statistical power is what makes it great
 because of the vast wealth of CRAN, not the R language per se with its
 features that so fluster my comp-sci friends. And many a beginner.
 
 We work round them by bashing our heads on the keyboards, typing
 ,drop=FALSE, and vowing never to do it again. And writing more unit
 tests.
 
 Barry
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Ing. Mario Valle
Data Analysis and Visualization Group| http://www.cscs.ch/~mvalle
Swiss National Supercomputing Centre (CSCS)  | Tel:  +41 (91) 610.82.60
v. Cantonale Galleria 2, 6928 Manno, Switzerland | Fax:  +41 (91) 610.82.82

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R package licences

2010-03-30 Thread Uwe Behrens
Hi,


please, can SOMEBODY help me to find the right characters to fit into the
field

\details{
...
...
License: \tab  \cr
}

of the description file for a new R-package?

When building the package I always get:

* checking DESCRIPTION meta-information ... WARNING
Non-standard license specification:
  What license is it under?

Last time I just used GPL and it worked, this time it doesn't ...
I tried the following character strings:
GPL  GPL-2  GPL-3  LGPL-2  LGPL-2.1  LGPL-3  AGPL-3  Artistic-1.0
Artistic-2.0
all with the same results.

Thanks in advance, Ove

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] remove from the R mailing list

2010-03-30 Thread zoe zhang
I would like to be removed from the R mailing list.

Thanks.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error bars

2010-03-30 Thread Markus Schmotz

you can write a function for yourself using arrows:

#x,y: dataset as vectors
#xe,ye: errors per entry as vectors, if the errors are symmetric
arrbar-function(x,y,xe,ye){
l-length(x)
for (i in 1:l) {
arrows(x[i],y[i],x[i]-xe[i]/2,y[i],angle=90,length=0.05)
arrows(x[i],y[i],x[i]+xe[i]/2,y[i],angle=90,length=0.05)
arrows(x[i],y[i],x[i],y[i]-ye[i]/2,angle=90,length=0.05)
arrows(x[i],y[i],x[i],y[i]+ye[i]/2,angle=90,length=0.05)
}
}



Iasonas Lamprianou schrieb:

Dear friends,
I have a statistical question. Sometimes, if I compare boys to girls on a specific 
variable, the error bars (confidence interval of means) seem to overlap slightly. Still, 
when I run a t-test, I find statistically significant differences. The rule is clear: if 
the confidence intervals do not overlap, then there is statistically significant 
difference. But if they overlap slightly, we have to use a t-test to know for sure if the 
the two means differ significantly. The point is: is there a rule of thumb to say, for 
example, if the overlap is less than 20% of the length of the standard error, then 
a t-test would give significant results?

thank you for your time

P.S.1 is there an easy way to plot error bars in R?
P.S.2 an interesting discussion about this - highly recommended to read it - can be found at http://scienceblogs.com/cognitivedaily/2007/03/ill_bet_you_dont_understand_er.php 


jason

Dr. Iasonas Lamprianou


Assistant Professor (Educational Research and Evaluation)
Department of Education Sciences
European University-Cyprus
P.O. Box 22006
1516 Nicosia
Cyprus 
Tel.: +357-22-713178

Fax: +357-22-590539


Honorary Research Fellow
Department of Education
The University of Manchester
Oxford Road, Manchester M13 9PL, UK
Tel. 0044  161 275 3485
iasonas.lampria...@manchester.ac.uk






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--



___
Dipl.-Phys. Markus Schmotz
Universität Konstanz
Fachbereich Physik, Lehrstuhl Leiderer
Postfach M 676
D-78457 Konstanz
Tel.: +49 7531 88 3803, Fax: 3127
Mail: markus.schm...@uni-konstanz.de

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] remove from the R mailing list

2010-03-30 Thread Rubén Roa

-Mensaje original-
De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En 
nombre de zoe zhang
Enviado el: martes, 30 de marzo de 2010 12:18
Para: r-help@r-project.org
Asunto: [R] remove from the R mailing list

I would like to be removed from the R mailing list.

Thanks.


---

Hi,

Would you like me to remove you?

Rubén 



 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R package licences

2010-03-30 Thread Barry Rowlingson
On Tue, Mar 30, 2010 at 10:15 AM, Uwe Behrens ubehre...@gmail.com wrote:
 Hi,


 please, can SOMEBODY help me to find the right characters to fit into the
 field

 \details{
 ...
 ...
 License: \tab  \cr
 }

 What file is this? Because \details belongs in a .Rd documentation
file, but the license is specified in the DESCRIPTION file, which
doesn't have \details... Are you editing a .Rd and not the DESCRIPTION
file?

 of the description file for a new R-package?

 When building the package I always get:

 * checking DESCRIPTION meta-information ... WARNING
 Non-standard license specification:
  What license is it under?

 Last time I just used GPL and it worked, this time it doesn't ...
 I tried the following character strings:
 GPL  GPL-2  GPL-3  LGPL-2  LGPL-2.1  LGPL-3  AGPL-3  Artistic-1.0
 Artistic-2.0
 all with the same results.

 Most of those (if not all) should be valid.

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SHLIB not working (Win Vista)

2010-03-30 Thread Duncan Murdoch

On 29/03/2010 11:50 PM, Remko Duursma wrote:

Dear R-helpers,

I tried to build a DLL like I have done so many times, but this time
on my new machine, but it gives the erorr:
(from cmd window)


R CMD SHLIB Boxcnt.f

MAKE Version 5.2  Copyright (c) 1987, 2000 Borland
Error c:/PROGRA~1/R/R-210~1.1/share/make/winshlib.mk 4: Command syntax error
*** 1 errors during make ***


You're using the wrong make, presumably because your path is wrong.  You 
should put the Rtools/bin directory first on your path, but you have a 
Borland make ahead of it.


Duncan Murdoch


The error is not in my Fortran file, because I also tried other files
or even without any arguments (it gives the same error msg
regardless).

System:
Windows Vista
R 2.10.1
Rtools installed (version 2.11)


thanks,
Remko


-
Remko Duursma
Research Lecturer

Centre for Plants and the Environment
University of Western Sydney
Hawkesbury Campus
Richmond NSW 2753

Dept of Biological Science
Macquarie University
North Ryde NSW 2109
Australia

Mobile: +61 (0)422 096908
www.remkoduursma.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error when checking a package.

2010-03-30 Thread Duncan Murdoch

On 30/03/2010 1:59 AM, Jim Lemon wrote:

On 03/30/2010 04:39 PM, Dong H. Oh wrote:

...
* checking R code for possible problems ... NOTE
Found possibly global 'T' or 'F' in the following function:
   ar.dual.dea



Error in ar.dual.dea(ar.dat, noutput = 1, orientation = 1, rts = 1, ar.l =
matrix(c(0,  :
   F used instead of FALSE
Execution halted


Hi Dong-hyun,
It looks like the R core team is getting serious about the TRUE/FALSE 
business. I would suggest that you replace all occurrences of T or F 
in your code with TRUE and FALSE respectively and see what happens.


That test has been around at least since 2003:  it applies in package 
testing, not to people typing in the console.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] library(): load library from a specified location

2010-03-30 Thread Jannis

Dear list memmbers,


I would like to load a R library from a specified folder with library()
and need help on how to call the command.

The reason is that I am loading this library on a remote machine where I
 have no admin rights. Furthermore a library with the same name is
already installed on that machine. I have modified this library slightly
by modifying the source code and created a personal version of this
library that I now want to load instead of the standart one.

Running:

R CMD INSTALL 'path*/*packagename* --library=*path*/Software/R-packages


i managed to compile my modified version on the remote machine and save
the library in a folder on that machine.


When I now start R and run

library(packagename)


R still seem to load the version installed on the remote computer, even 
though

i added my folder to the library path of R by running:

.libPaths(*path*/Software/R-Packages)

Probably this is due to the fact that the package is also available in 
the standart library of R.
Ist there any way of loading the Package from only one specified path? I 
read the help for library() and could imagine that lib.loc could be the 
key to sucess but am not sure which argument it needs?



Thanks a lot
Jannis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error singular gradient matrix at initial parameter estimates in nls

2010-03-30 Thread Corrado

I am using nls to fit a non linear function to some data.

The non linear function is:

y= 1- exp(-(k0+k1*p1+  + kn*pn))

I have chosen algorithm port, with lower boundary is 0 for all of the 
ki parameters, and I have tried many start values for the parameters ki 
(including generating them at random).


If I fit the non linear function to the same data using an external 
algorithm, it fits perfectly and finds the parameters.


As soon as I come to my R installation (2.10.1 on Kubuntu Linux 910 64 
bit), I keep getting the error:


Error in nlsModel(formula, mf, start, wts, upper) :   singular gradient 
matrix at initial parameter estimates


I have read all the previous postings and the documentation, but to no 
avail: the error is there to stay. I am sure the problem is with nls, 
because the external fitting algorithm perfectly fits it in less than a 
second. Also, if my n is 4, then the nls works perfectly (but that 
excludes all the k5  kn).


Can anyone help me with suggestions? Thanks in advance.

Alternatively, what do you suggest I should do? Shall I abandon nls in 
favour of optim?


Regards

--
Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] library(): load library from a specified location

2010-03-30 Thread Duncan Murdoch

On 30/03/2010 7:01 AM, Jannis wrote:

Dear list memmbers,


I would like to load a R library from a specified folder with library()
and need help on how to call the command.

The reason is that I am loading this library on a remote machine where I
  have no admin rights. Furthermore a library with the same name is
already installed on that machine. I have modified this library slightly
by modifying the source code and created a personal version of this
library that I now want to load instead of the standart one.

Running:

R CMD INSTALL 'path*/*packagename* --library=*path*/Software/R-packages


i managed to compile my modified version on the remote machine and save
the library in a folder on that machine.


When I now start R and run

library(packagename)


R still seem to load the version installed on the remote computer, even 
though

i added my folder to the library path of R by running:

.libPaths(*path*/Software/R-Packages)

Probably this is due to the fact that the package is also available in 
the standart library of R.
Ist there any way of loading the Package from only one specified path? I 
read the help for library() and could imagine that lib.loc could be the 
key to sucess but am not sure which argument it needs?


Is the package loaded before you make the change to .libPaths?  The base 
packages are loaded at startup, but this can be suppressed:  see ?Startup.


If that's not it, then I think we need more specific information, 
because what you're doing should work.  Show us the result of


sessionInfo()
.libPaths(*path*/Software/R-Packages)
.libPaths()
library(packagename)
sessionInfo()



Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error singular gradient matrix at initial parameter estimates in nls

2010-03-30 Thread Gabor Grothendieck
You could try method=brute-force in the nls2 package to find starting values.

On Tue, Mar 30, 2010 at 7:03 AM, Corrado ct...@york.ac.uk wrote:
 I am using nls to fit a non linear function to some data.

 The non linear function is:

 y= 1- exp(-(k0+k1*p1+  + kn*pn))

 I have chosen algorithm port, with lower boundary is 0 for all of the ki
 parameters, and I have tried many start values for the parameters ki
 (including generating them at random).

 If I fit the non linear function to the same data using an external
 algorithm, it fits perfectly and finds the parameters.

 As soon as I come to my R installation (2.10.1 on Kubuntu Linux 910 64 bit),
 I keep getting the error:

 Error in nlsModel(formula, mf, start, wts, upper) :   singular gradient
 matrix at initial parameter estimates

 I have read all the previous postings and the documentation, but to no
 avail: the error is there to stay. I am sure the problem is with nls,
 because the external fitting algorithm perfectly fits it in less than a
 second. Also, if my n is 4, then the nls works perfectly (but that excludes
 all the k5  kn).

 Can anyone help me with suggestions? Thanks in advance.

 Alternatively, what do you suggest I should do? Shall I abandon nls in
 favour of optim?

 Regards

 --
 Corrado Topi
 PhD Researcher
 Global Climate Change and Biodiversity
 Area 18,Department of Biology
 University of York, York, YO10 5YW, UK
 Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to recode variables using base R

2010-03-30 Thread johannes rara
Hi,

Is there an efficient way recoding variables in a data.frame using
base R? My purpose is to create
new variables and attach them into old data.frame. The basic idea is
shown below, but how to create recoding for A, B and C and assing them
into new variables?

df - data.frame(A = c(1:5),
B = c(3,6,2,8,10),
C = c(0,15,5,9,12))

df$A[df$A = 3] - x
df$A[df$A  3  df$A = 8] - y
df$A[df$A = 16] - z

Thanks,
-J

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] library(): load library from a specified location

2010-03-30 Thread Jannis

Sorry folks!

My way worked already! I was just too blind to realize.
Treat this post as solved. Anybody trying to achieve the same as me is
adviced to try the way I described in my earlier post!

And thanks a lot for the advice I already recievd.

Cheers
Jannis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to recode variables using base R

2010-03-30 Thread John Fox
Dear Johannes,

You can use cascading ifelse()s:

 df$A - with(df, ifelse(A = 3, x, ifelse(A  3  A = 8, y, z)))
 df$A
[1] x x x y y

This command assumes that you want all values that don't map into xs and
ys to be zs, but you could adapt it if that's not what you want (and no
values in your example become zs anyway).

I hope this helps,
 John


John Fox
Senator William McMaster 
  Professor of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On
 Behalf Of johannes rara
 Sent: March-30-10 7:31 AM
 To: r-help@r-project.org
 Subject: [R] How to recode variables using base R
 
 Hi,
 
 Is there an efficient way recoding variables in a data.frame using
 base R? My purpose is to create
 new variables and attach them into old data.frame. The basic idea is
 shown below, but how to create recoding for A, B and C and assing them
 into new variables?
 
 df - data.frame(A = c(1:5),
 B = c(3,6,2,8,10),
 C = c(0,15,5,9,12))
 
 df$A[df$A = 3] - x
 df$A[df$A  3  df$A = 8] - y
 df$A[df$A = 16] - z
 
 Thanks,
 -J
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to recode variables using base R

2010-03-30 Thread Henrique Dallazuanna
You could try this also:

cut(df$A, c(-Inf, 3, 8), labels = c('x', 'y'))

On Tue, Mar 30, 2010 at 8:30 AM, johannes rara johannesr...@gmail.com wrote:
 Hi,

 Is there an efficient way recoding variables in a data.frame using
 base R? My purpose is to create
 new variables and attach them into old data.frame. The basic idea is
 shown below, but how to create recoding for A, B and C and assing them
 into new variables?

 df - data.frame(A = c(1:5),
 B = c(3,6,2,8,10),
 C = c(0,15,5,9,12))

 df$A[df$A = 3] - x
 df$A[df$A  3  df$A = 8] - y
 df$A[df$A = 16] - z

 Thanks,
 -J

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Paik, et al., NEJM, 2004, Fig. 4, rate of event at 10 years as a function of covariate

2010-03-30 Thread Wittner, Ben, Ph.D.
Does anyone know how to make a plot like Fig. 4 of Paik, et al., New England
Journal of Medicine, Dec. 30, 2004?

Given survival data and a covariate, they plot a curve giving Rate of Distant
Recurrence at 10 Yr (% of patients) on the y-axis versus the covariate on the
x-axis. They also plot curves giving a 95% confidence interval.

Thanks very much.

-Ben




The information in this e-mail is intended only for the ...{{dropped:11}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Finding positions in array

2010-03-30 Thread Romildo Martins
Hello,

I need a function to check what positions of the array are greater than y
and return to positions in another array z.

 x-array(E(gaux)$weight)
 x
[1]  3  8 10  6

If y = 7

z
[1] 2 3


Thanks a lot!

Romild

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error bars

2010-03-30 Thread William Revelle

Iasonas,

In response to PS.1
try error.bars in the psych package.

Bill



At 1:04 AM -0700 3/30/10, Iasonas Lamprianou wrote:

Dear friends,
I have a statistical question. Sometimes, if I compare boys to girls 
on a specific variable, the error bars (confidence interval of 
means) seem to overlap slightly. Still, when I run a t-test, I find 
statistically significant differences. The rule is clear: if the 
confidence intervals do not overlap, then there is statistically 
significant difference. But if they overlap slightly, we have to use 
a t-test to know for sure if the the two means differ significantly. 
The point is: is there a rule of thumb to say, for example, if the 
overlap is less than 20% of the length of the standard error, then a 
t-test would give significant results?


thank you for your time

P.S.1 is there an easy way to plot error bars in R?
P.S.2 an interesting discussion about this - highly recommended to 
read it - can be found at 
http://scienceblogs.com/cognitivedaily/2007/03/ill_bet_you_dont_understand_er.php


jason

Dr. Iasonas Lamprianou


Assistant Professor (Educational Research and Evaluation)
Department of Education Sciences
European University-Cyprus
P.O. Box 22006
1516 Nicosia
Cyprus
Tel.: +357-22-713178
Fax: +357-22-590539


Honorary Research Fellow
Department of Education
The University of Manchester
Oxford Road, Manchester M13 9PL, UK
Tel. 0044  161 275 3485
iasonas.lampria...@manchester.ac.uk






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
William Revelle http://revelle.net/revelle.html
2815 Lakeside Court http://revelle.net/lakeside
Evanston, Illinois
It is 6 minutes to midnight http://www.thebulletin.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error bars

2010-03-30 Thread Iasonas Lamprianou
thank you

Dr. Iasonas Lamprianou


Assistant Professor (Educational Research and Evaluation)
Department of Education Sciences
European University-Cyprus
P.O. Box 22006
1516 Nicosia
Cyprus 
Tel.: +357-22-713178
Fax: +357-22-590539


Honorary Research Fellow
Department of Education
The University of Manchester
Oxford Road, Manchester M13 9PL, UK
Tel. 0044  161 275 3485
iasonas.lampria...@manchester.ac.uk


--- On Tue, 30/3/10, William Revelle li...@revelle.net wrote:

 From: William Revelle li...@revelle.net
 Subject: Re: [R] error bars
 To: Iasonas Lamprianou lampria...@yahoo.com, r-help@r-project.org
 Date: Tuesday, 30 March, 2010, 13:56
 Iasonas,
 
 In response to PS.1
 try error.bars in the psych package.
 
 Bill
 
 
 
 At 1:04 AM -0700 3/30/10, Iasonas Lamprianou wrote:
 Dear friends,
 I have a statistical question. Sometimes, if I compare
 boys to girls 
 on a specific variable, the error bars (confidence
 interval of 
 means) seem to overlap slightly. Still, when I run a
 t-test, I find 
 statistically significant differences. The rule is
 clear: if the 
 confidence intervals do not overlap, then there is
 statistically 
 significant difference. But if they overlap slightly,
 we have to use 
 a t-test to know for sure if the the two means differ
 significantly. 
 The point is: is there a rule of thumb to say, for
 example, if the 
 overlap is less than 20% of the length of the standard
 error, then a 
 t-test would give significant results?
 
 thank you for your time
 
 P.S.1 is there an easy way to plot error bars in R?
 P.S.2 an interesting discussion about this - highly
 recommended to 
 read it - can be found at 
 http://scienceblogs.com/cognitivedaily/2007/03/ill_bet_you_dont_understand_er.php
 
 jason
 
 Dr. Iasonas Lamprianou
 
 
 Assistant Professor (Educational Research and
 Evaluation)
 Department of Education Sciences
 European University-Cyprus
 P.O. Box 22006
 1516 Nicosia
 Cyprus
 Tel.: +357-22-713178
 Fax: +357-22-590539
 
 
 Honorary Research Fellow
 Department of Education
 The University of Manchester
 Oxford Road, Manchester M13 9PL, UK
 Tel. 0044  161 275 3485
 iasonas.lampria...@manchester.ac.uk
 
 
 
 
 
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.
 
 
 -- 
 William Revelle       
     http://revelle.net/revelle.html
 2815 Lakeside Court       
     http://revelle.net/lakeside
 Evanston, Illinois
 It is 6 minutes to midnight    http://www.thebulletin.org
 




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding positions in array

2010-03-30 Thread Henrique Dallazuanna
Try:

which(x  y)



On Tue, Mar 30, 2010 at 9:54 AM, Romildo Martins
romildo.mart...@gmail.com wrote:
 Hello,

 I need a function to check what positions of the array are greater than y
 and return to positions in another array z.

 x-array(E(gaux)$weight)
 x
 [1]  3  8 10  6

 If y = 7

z
 [1] 2 3


 Thanks a lot!

 Romild

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to recode variables using base R

2010-03-30 Thread johannes rara
Thanks John and Henrique, my intention is to do this for A, B and C
(all at once), so I'll have to wrap your solution into lapply or for
loop?

-J

2010/3/30 Henrique Dallazuanna www...@gmail.com:
 You could try this also:

 cut(df$A, c(-Inf, 3, 8), labels = c('x', 'y'))

 On Tue, Mar 30, 2010 at 8:30 AM, johannes rara johannesr...@gmail.com wrote:
 Hi,

 Is there an efficient way recoding variables in a data.frame using
 base R? My purpose is to create
 new variables and attach them into old data.frame. The basic idea is
 shown below, but how to create recoding for A, B and C and assing them
 into new variables?

 df - data.frame(A = c(1:5),
 B = c(3,6,2,8,10),
 C = c(0,15,5,9,12))

 df$A[df$A = 3] - x
 df$A[df$A  3  df$A = 8] - y
 df$A[df$A = 16] - z

 Thanks,
 -J

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] simple loop iteration

2010-03-30 Thread Niklaus Hurlimann
Hi R mailing list,

probably a very basic problem here, I try to do the following:

 Q-c(1,2,3)
 P-c(4,5,6)
 A- data.frame(Q,P)
 A
  Q P
1 1 4
2 2 5
3 3 6

this is my simplified data.frame (matrix) now I try to create following
loop for subtraction of element within the data.frame:

 for(i in length(A[,P]-1){
  delta[i]- A[i,P]-A[i+1,P]
}

All I get is a vector of  the correct length but with no readings.

Thanks for any help on this.



-- 
Niklaus Hürlimann

Université de Lausanne  
Institut de Minéralogie et Géochimie 
L'Anthropole 
CH-1015 Lausanne 
Suisse

E-mail: niklaus.hurlim...@unil.ch
Tel:+41(0)21 692 4452 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Paik, et al., NEJM, 2004, Fig. 4, rate of event at 10 years as a function of covariate

2010-03-30 Thread Frank E Harrell Jr

Wittner, Ben, Ph.D. wrote:

Does anyone know how to make a plot like Fig. 4 of Paik, et al., New England
Journal of Medicine, Dec. 30, 2004?

Given survival data and a covariate, they plot a curve giving Rate of Distant
Recurrence at 10 Yr (% of patients) on the y-axis versus the covariate on the
x-axis. They also plot curves giving a 95% confidence interval.

Thanks very much.

-Ben





Such a plot is easy to do with the rms package if using a Cox or 
accelerated failure time model, e.g.


require(rms)
dd - datadist(mydata); options(datadist='dd')
f - cph(Surv(rtime, event) ~ rcs(covariate,4) + sex + ..., x=TRUE, 
y=TRUE)  # restricted cubic spline with 4 knots
plot(Predict(f, covariate, sex, time=10))  # separate curves for male 
and female; omit sex to make one curve; add age=50 to predict for a 50 
year old




--
Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to recode variables using base R

2010-03-30 Thread Henrique Dallazuanna
Using lapply:

as.data.frame(lapply(df, cut, breaks = c(-Inf, 3, 8, 16), labels =
c('x', 'y', 'z')))


On Tue, Mar 30, 2010 at 10:14 AM, johannes rara johannesr...@gmail.com wrote:
 Thanks John and Henrique, my intention is to do this for A, B and C
 (all at once), so I'll have to wrap your solution into lapply or for
 loop?

 -J

 2010/3/30 Henrique Dallazuanna www...@gmail.com:
 You could try this also:

 cut(df$A, c(-Inf, 3, 8), labels = c('x', 'y'))

 On Tue, Mar 30, 2010 at 8:30 AM, johannes rara johannesr...@gmail.com 
 wrote:
 Hi,

 Is there an efficient way recoding variables in a data.frame using
 base R? My purpose is to create
 new variables and attach them into old data.frame. The basic idea is
 shown below, but how to create recoding for A, B and C and assing them
 into new variables?

 df - data.frame(A = c(1:5),
 B = c(3,6,2,8,10),
 C = c(0,15,5,9,12))

 df$A[df$A = 3] - x
 df$A[df$A  3  df$A = 8] - y
 df$A[df$A = 16] - z

 Thanks,
 -J

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O





-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] simple loop iteration

2010-03-30 Thread Henrique Dallazuanna
Try this:

Reduce(-, as.data.frame(embed(A$P, 2)))

On Tue, Mar 30, 2010 at 10:15 AM, Niklaus Hurlimann
niklaus.hurlim...@unil.ch wrote:
 Hi R mailing list,

 probably a very basic problem here, I try to do the following:

 Q-c(1,2,3)
 P-c(4,5,6)
 A- data.frame(Q,P)
 A
  Q P
 1 1 4
 2 2 5
 3 3 6

 this is my simplified data.frame (matrix) now I try to create following
 loop for subtraction of element within the data.frame:

 for(i in length(A[,P]-1){
  delta[i]- A[i,P]-A[i+1,P]
 }

 All I get is a vector of  the correct length but with no readings.

 Thanks for any help on this.



 --
 Niklaus Hürlimann

 Université de Lausanne
 Institut de Minéralogie et Géochimie
 L'Anthropole
 CH-1015 Lausanne
 Suisse

 E-mail: niklaus.hurlim...@unil.ch
 Tel:+41(0)21 692 4452

        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Competing with SPSS and SAS: improving code that loops throughrows (data manipulation)

2010-03-30 Thread Dimitri Liakhovitski
Below is the script that was based on your help - it works very fast:

### Creating the example data set:
set.seed(123)
MyData-data.frame(group=c(rep(first,10),rep(second,10)),a=abs(round(rnorm(20,mean=0,
sd=.55),2)), b=abs(round(rnorm(20,mean=0, sd=.55),2)))
MyData

### Specifying parameters used in the code below:
vars-names(MyData)[2:3] # names of variables to be transformed
nr.vars-length(vars) # number of variables to be transformed
group.var-names(MyData)[1] # name of the grouping variable

### For EACH subgroup: indexing variables a and b to their maximum in
that subgroup;
### These indexed variables will be used to build the new ones:
system.time({
temp - cbind(MyData, do.call(cbind, lapply(vars, function(x){#x-b
  unlist(by(MyData, MyData[[group.var]], function(y) y[,x] / max(y[,x])))
})))
colnames(temp)[(length(MyData)+1):(length(MyData)+nr.vars)] -
paste(vars, 'IndToMax', sep = '.')
})

# Grabbing names of the newly created variables that end with IndToMax
indexed.vars-names(temp)[grep(IndToMax$, names(temp))] # variables
indexed to subgroup max

# Specifying parameters used for transformation below:
old.length-length(temp)
hl-c(.3,.6,1:5)
hrf-seq(.15,.90,.15)

### Actual Transformation:
library(fortunes) # will use function Reduce from the package fortunes
system.time({
constants - expand.grid(vars = indexed.vars, HL = hl, HRF = hrf)
results - lapply(seq(nrow(constants)), function(x){
  dat - temp[, as.character(constants[x, 1])]
  D - exp(log(0.5) / constants[x, 2])
  L - -10 * log(1 - constants[x, 3])
  unlist(by(dat, temp[[group.var]], function(y)  # function Reduce
Reduce(function(u, v) 1 - ((1 - u * D) / (exp(v * L))), y,
accumulate = T, init = 0)[-1]))
  })
final - cbind(temp, do.call(cbind, results))
colnames(final)[-(1:old.length)] - paste(vars, constants$HL,
100*constants$HRF, '.transformed', sep = '.')
})


Thanks again for all your help!
Dimitri




On Mon, Mar 29, 2010 at 4:16 PM, Dimitri Liakhovitski ld7...@gmail.com wrote:
 Would like to thank every one once more for your great help.
 I was able to reduce the time from god knows how many hours to about 2 
 minutes!
 Really appreciate it!
 Dimitri

 On Sat, Mar 27, 2010 at 11:43 AM, Martin Morgan mtmor...@fhcrc.org wrote:
 On 03/26/2010 06:40 PM, Dimitri Liakhovitski wrote:
 My sincere apologies if it looked large. Let me try again with less code.
 It's hard to do less than that. In fact - there is nothing in this
 code but 1 formula and many loops, which is the problem I am not sure
 how to solve.
 I also tried to be as clear as possible with the comments.
 Dimitri

 ## START OF THE CODE TO PRODUCE SMALL DATA EXAMPLE
   set.seed(123)
   
 data-data.frame(group=c(rep(first,10),rep(second,10)),a=abs(round(rnorm(20,mean=0,
 sd=.55),2)), b=abs(round(rnorm(20,mean=0, sd=.55),2)))
   data                   # data it is the data frame to work with
 ## END OF THE CODE TO PRODUCE SMALL DATA EXAMPLE. In real life data
 would contain up to 150-200 rows PER SUBGROUP

 ### Specifying useful parameters used in the slow code below:
 vars-names(data)[2:3]                    # names of variables used in
 transformation; in real life - up to 50-60 variables
 group.var-names(data)[1]                # name of the grouping variable
 subgroups-levels(data[[group.var]])   # names of subgroups; in real
 life - up to 30 subgroups

 # OBJECTIVE:
 # Need to create new variables based on the old ones (a  b)
 # For each new variable, the value in a given row is a function of (a)
 2 constants (that have several levels each),
 # (b) value of the original variable (e.g., a.ind.to.max), and the
 value in the previous row on the same new variable
 # Plus - it has to be done by subgroup (variable group)

 # Defining 2 constants:
 constant1-c(1:3)                # constant 1 used in transformation -
 has 3 levels, in real life - up to 7 levels
 constant2-seq(.15,.45,.15)  # constant 2 used in transformation - has
 3 levels, in real life - up to 7 levels

 ### CODE THAT IS SLOW. Reason - too many loops with the inner-most
 loop being very slow - as it is looping through rows:

 for(var in vars){                               # looping through variables
   for(c1 in 1:length(constant1)){        # looping through values of 
 constant1
        for(c2 in 1:length(constant2)){   # looping through values of 
 constant2
          d=log(0.5)/constant1[c1]
           l=-log(1-constant2[c2])
           name-paste(var,constant1[c1],constant2[c2]*100,.transf,sep=.)
           data[[name]]-NA
           for(subgroup in subgroups){     # looping through subgroups
             data[data[[group.var]] %in% subgroup, name][1] =
 1-((1-0*exp(1)^d)/(exp(1)^(data[data[[group.var]] %in% subgroup,
 var][1]*l*10)))

       ### THIS SECTION IS THE SLOWEST - BECAUSE I AM LOOPING THROUGH ROWS:
              for(case in 2:nrow(data[data[[group.var]] %in% subgroup,
 ])){ # looping through rows
                data[data[[group.var]] %in% subgroup, name][case]=
 

Re: [R] Reshaping a data frame with a series of factors and 23 repeated measures

2010-03-30 Thread wclapham

Ista,

I have looked at the reshape package and have used Œmelt¹ successfully on
simpler tables.  I tried it here, but have not been successful.  I think I
just need to gain experience.  I am loving R and am having a difficult time
with data structure issues.

I am attaching the data set that I am trying to manipulate.  Ultimately, I
would like to be able to analyze these data with ANOVA and repeated
measures, and also be able to plot  growth,  wt * days.  I appreciate any
help or guidance on references to read that will help me solve my problems.

The data set represents wt over time (starting with days=0, birth day) for
steers.  Factors include Stockering and Finishing treatments.

Regards,

Bill


On 3/29/10 3:39 PM, Ista Zahn [via R]
ml-node+1695531-1043721504-210...@n4.nabble.com wrote:

 Hi Bill, 
 Without an example dataset it's hard to see exactly what you need to
 do. But you can get started by looking at the documentation for the
 reshape function (?reshape), and by looking at the reshape package.
 The reshape package has an associated web page
 (http://had.co.nz/reshape/) with links to papers and other information
 to help you get started.
 
 Best, 
 Ista 
 
 On Mon, Mar 29, 2010 at 3:15 PM, wclapham [hidden email]
 http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1695531i=0
 http://n4.nabble.com/user/SendEmail.jtp?type=nodeamp;node=1695531amp;i=0
  wrote: 
 
  
  I have a data frame that I created using read.table on a csv spreadsheet.
  The data look like the following:
  
  Steer.ID   stocker.trt   Finish.trt  Date   Days Wt ..
  
  Steer.Id, stocker.trt, Finish.trt are factors-- Date, Days, Wt are data
  that are repeated 23 times (wide format).
  
  I want to reshape the data such that I have the correct Steer.ID,
  stocker.trt, Finish.trt identifying all of the repeated measures data in a
  long  format. 
  
  I am a newbie at R and need to develop the skill in reshaping data, so that
  I can handle routine problems like described above.
  
  Thanks so much in advance for help or advice.
  
  Bill 
  -- 
  View this message in context:
 http://n4.nabble.com/Reshaping-a-data-frame-with-a-series-of-factors-and-23-r
 epeated-measures-tp1695500p1695500.html
  Sent from the R help mailing list archive at Nabble.com.
  
  __
  [hidden email]
 http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1695531i=1
 http://n4.nabble.com/user/SendEmail.jtp?type=nodeamp;node=1695531amp;i=1
 mailing list 
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
  
 
 


 
-- 
View this message in context: 
http://n4.nabble.com/Reshaping-a-data-frame-with-a-series-of-factors-and-23-repeated-measures-tp1695500p1745223.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Easy to use R interface into Macroeconomics data at the Fed, the IMF and Eurostat

2010-03-30 Thread Tolga I Uzuner
Dear R Users,

Does anyone know if there is an easy to use interface for the macroeconomics 
databases of the Fed, Eurostat and the IMF ?

Thanks in advance,
Tolga Uzuner


This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] simple loop iteration

2010-03-30 Thread Peter Ehlers

Perhaps you're just looking for the diff() function?
See ?diff.

 -Peter Ehlers

On 2010-03-30 7:15, Niklaus Hurlimann wrote:

Hi R mailing list,

probably a very basic problem here, I try to do the following:


Q-c(1,2,3)
P-c(4,5,6)
A- data.frame(Q,P)
A

   Q P
1 1 4
2 2 5
3 3 6

this is my simplified data.frame (matrix) now I try to create following
loop for subtraction of element within the data.frame:


for(i in length(A[,P]-1){

   delta[i]- A[i,P]-A[i+1,P]
}

All I get is a vector of  the correct length but with no readings.

Thanks for any help on this.






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Peter Ehlers
University of Calgary

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reshaping a data frame with a series of factors and 23 repeated measures

2010-03-30 Thread Gabor Grothendieck
Try this:

cattle - read.csv(http://n4.nabble.com/attachment/1745223/0/CattleGrowth.csv;)

long - reshape(cattle, dir = long, idvar = Steer.ID,
  varying = list(grep(Date, names(cattle)), grep(Days,
names(cattle)), grep(Wt,names(cattle



On Tue, Mar 30, 2010 at 9:34 AM, wclapham william.clap...@ars.usda.gov wrote:

 Ista,

 I have looked at the reshape package and have used Œmelt¹ successfully on
 simpler tables.  I tried it here, but have not been successful.  I think I
 just need to gain experience.  I am loving R and am having a difficult time
 with data structure issues.

 I am attaching the data set that I am trying to manipulate.  Ultimately, I
 would like to be able to analyze these data with ANOVA and repeated
 measures, and also be able to plot  growth,  wt * days.  I appreciate any
 help or guidance on references to read that will help me solve my problems.

 The data set represents wt over time (starting with days=0, birth day) for
 steers.  Factors include Stockering and Finishing treatments.

 Regards,

 Bill


 On 3/29/10 3:39 PM, Ista Zahn [via R]
 ml-node+1695531-1043721504-210...@n4.nabble.com wrote:

 Hi Bill,
 Without an example dataset it's hard to see exactly what you need to
 do. But you can get started by looking at the documentation for the
 reshape function (?reshape), and by looking at the reshape package.
 The reshape package has an associated web page
 (http://had.co.nz/reshape/) with links to papers and other information
 to help you get started.

 Best,
 Ista

 On Mon, Mar 29, 2010 at 3:15 PM, wclapham [hidden email]
 http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1695531i=0
 http://n4.nabble.com/user/SendEmail.jtp?type=nodeamp;node=1695531amp;i=0
  wrote:

 
  I have a data frame that I created using read.table on a csv spreadsheet.
  The data look like the following:
 
  Steer.ID   stocker.trt   Finish.trt  Date   Days Wt ..
 
  Steer.Id, stocker.trt, Finish.trt are factors-- Date, Days, Wt are 
  data
  that are repeated 23 times (wide format).
 
  I want to reshape the data such that I have the correct Steer.ID,
  stocker.trt, Finish.trt identifying all of the repeated measures data in a
  long  format.
 
  I am a newbie at R and need to develop the skill in reshaping data, so 
  that
  I can handle routine problems like described above.
 
  Thanks so much in advance for help or advice.
 
  Bill
  --
  View this message in context:
 http://n4.nabble.com/Reshaping-a-data-frame-with-a-series-of-factors-and-23-r
 epeated-measures-tp1695500p1695500.html
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  [hidden email]
 http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1695531i=1
 http://n4.nabble.com/user/SendEmail.jtp?type=nodeamp;node=1695531amp;i=1
 mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 





 --
 View this message in context: 
 http://n4.nabble.com/Reshaping-a-data-frame-with-a-series-of-factors-and-23-repeated-measures-tp1695500p1745223.html
 Sent from the R help mailing list archive at Nabble.com.

        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Multivariate hypergeometric distribution version of phyper()

2010-03-30 Thread Karl Brand

Dear R Users,

I employed the phyper() function to estimate the likelihood that the 
number of genes overlapping between 2 different lists of genes is due to 
chance. This appears to work appropriately.


Now i want to try this with 3 lists of genes which phyper() does not 
appear to support.


Some googling suggests i can utilize the Multivariate hypergeometric 
distribution to achieve this. eg.:


http://en.wikipedia.org/wiki/Hypergeometric_distribution

But when i try to do this manually using the choose() function (see 
attempt below example with just two gene lists) i'm unable to perform 
the calculations- the numbers hit infinity before getting an answer.


Searching cran archives for Multivariate hypergeometric show this term 
in the vignettes of package's ‘combinat’ and ‘forward’. But i'm unable 
to make sense of the these pachakege functions in the context of my 
aforementioned apllication.


Can some one suggest a function, script or method to achieve my goal of 
estimating the likelyhood of overlap between 3 lists of genes, ideally 
using the multivariate hypergeometric, or anything else for that matter?


cheers in advance,

Karl



#example attempt with two gene lists m  n
N - 45101 # total number balls in urn
m - 720   # number of 'white' or 'special' balls in urn, aka 'success'
n - 801   # number balls drawn or number of samples
k - 40# number of 'white' or 'special' balls DRAWN

a - choose(m,k)
b - choose((N-m),(n-k))
z - choose(N,n)
prK - (a*b)/z #'the answer'
print(prK)
[1] NaN

 a
[1] 7.985852e+65
 b
[1] Inf
 z
[1] Inf


--
Karl Brand
Department of Genetics
Erasmus MC
Dr Molewaterplein 50
3015 GE Rotterdam
T +31 (0)10 704 3457 | F +31 (0)10 704 4743 | M +31 (0)642 777 268

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Calling R from Perl

2010-03-30 Thread Ayush Raman
Hi all,

I am interested to know that how it is possible to call R from Perl. I would
like to read the file in Perl, store it in a data structure and would like
to pass the data structure to R so that I can do the mathematical operations
easily.

Thanks.

-- 
Regards,
Ayush Raman

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Substitute of a For Loop

2010-03-30 Thread Ayush Raman
Hi,

I am trying to permute a vector for 1000 times for which I am using for
loop.Within the for loop, I am doing some matrix operations which is taking
a lot of time. I am looking for a way to permute the vector 1000 times and
do the operations of the matrix without using for loop. This a snippet of my
code:

for (i in 2:1000){
y.permute = permute(y.permute) ### permute the vector
F.stats = calPseudoStat(y.permute,table.Gij) ## call the function
which does some matrix calculation and calculates a pseudo statistics
F.stats.vec = append(F.stats.vec, F.stats) ## add the Pseudo
Statistics in a vector.
}

Thanks.

-- 
Regards,
Ayush Raman

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] number of clusters for k-means

2010-03-30 Thread Sean M. Lucey
I am currently working on a clustering project and would like to obtain 
statistics for the number of clusters to include.  In SAS you get a 
pseudo-F statistic and a cubic clustering criterion.  Has anyone 
developed a function to get these values?


Sean

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] use logical in cor.test

2010-03-30 Thread Peter Ehlers

On 2010-03-30 2:41, pgseye wrote:


Thanks for the replies.

In response to Erik:
What does

Both[,1]

show you?


Both[,1]

[1] 3.36   NA   NA   NA   NA   NA   NA 3.92 3.50   NA   NA   NA   NA 3.76
3.19 3.83   NA 3.66..

What does

Both[,1]  2.5

show you?


Both[,1]2.5

[1]  TRUENANANANANANA  TRUE  TRUENANA
NANA  TRUE  TRUE


I understand a logical variable is binary, but don't know how to select a
subset of the data (have tried the subset function, but can't seem to get it
to work)

Bill, when I run what you suggested, I get:


tBoth- Both
is.na(tBoth[tBoth  2.5])- TRUE

Error in is.na(tBoth[tBoth  2.5])- TRUE :
   NAs are not allowed in subscripted assignments

R- cor(tBoth, use = complete.obs)
R[1,2]

[1] 0.7750889

Any idea with the error message?


This happens because your 'Both' already has missing values.
You can replace the line

 is.na(tBoth[tBoth  2.5]) - TRUE

with

 tBoth[tBoth  2.5] - NA

and the rest should work.



Thanks again,

Paul


--
Peter Ehlers
University of Calgary

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] BaselR

2010-03-30 Thread Charles Roosen
Dear Swiss R Users,

The Basel R meeting has moved to Wed, Apr 28 based on user feedback.  We
now have a lineup of speakers:

* Andreas Krause, Actelion Pharmaceuticals Ltd., on Graphics of
Clinical Data

* Yann Abraham, Novartis Pharma AG, on Graphics with ggplot2

* Charles Roosen, Mango Solutions AG, on Web-based R Reporting

I'm pretty excited about the first two presentations myself.  I'm also
looking forward to seeing old friends and meeting new ones.

Details are on the new Basel R web site at:

http://www.baselr.org/

Warm regards,
Charlie Roosen
croo...@mango-solutions.com

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of Sarah Lewis
Sent: 26 March 2010 17:20
To: r-help@r-project.org
Subject: [R] BaselR

BaselR - The new R meeting

We are pleased to announce the new R meeting to be held in Basel,
Switzerland.

BaselR will be held from 6:30-9:30pm on Tues, Apr 27 at TransBARent:

http://transbarent.business.sv-group.ch

Doors open at 6:30,pm with the presentations starting at 7:00pm

Introduction: What is Basel R?

Andreas Krause:... Graphing Pharma Data

Yann Abraham: Graphics

Charles Roosen: Web based R reporting

(This agenda is yet to be finalised. We will notify any changes)

For further information or to register, please contact:
bas...@mango-solutions.com



Please also visit  - www.mango-solutions.com and www.londonr.org

 

Sarah Lewis



mangosolutions

T: +44 (0)1249 767700
F: +44 (0)1249 767707


Unit 2 Greenways Business Park
Bellinger Close
Chippenham
Wilts
SN15 1BN
UK 

 

LEGAL NOTICE\ \ This message is intended for the use of{{dropped:19}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calling R from Perl

2010-03-30 Thread Jonathan Baron
One way to do this is with Rscript.  If you want Perl because it can
handle cgi, then CGIwithR is also useful.  You can call Rscript from
Perl, but you can also do the reverse (with system()) and have your
Rscript be the main routine.

Jon

On 03/30/10 10:19, Ayush Raman wrote:
 Hi all,
 
 I am interested to know that how it is possible to call R from Perl. I would
 like to read the file in Perl, store it in a data structure and would like
 to pass the data structure to R so that I can do the mathematical operations
 easily.
 
 Thanks.
 
 -- 
 Regards,
 Ayush Raman

-- 
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page: http://www.sas.upenn.edu/~baron

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] From THE R BOOK - Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!

2010-03-30 Thread Corrado

Dear friends,

I am testing glm as at page 514/515 of THE R BOOK by M.Crawley, that is 
on proportion data.


I use glm(y~x1+,family=binomial)

y is a proportion in (0,1), and x is a real number.

I get the error:

In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!

But that is exactly what was suggested in the book, where there is no 
mention of a similar warning. Where am I going wrong?


Here is the output:

 glm(response.prepared~x,data=,family=binomial)

Call:  glm(formula = response.prepared ~ x, family = binomial, data = )

Coefficients:
(Intercept)x 
   -0.3603   0.4480 


Degrees of Freedom: 510554 Total (i.e. Null);  510553 Residual
Null Deviance:  24420
Residual Deviance: 23240AIC: 700700
Warning message:
In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!




Regards
--

Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error singular gradient matrix at initial parameter

2010-03-30 Thread Corrado

Hi Gabor,

same problem even using nls2 with method=brute-force to calculate the 
initial parameters.


Best,

Gabor Grothendieck wrote:

You could try method=brute-force in the nls2 package to find starting values.

On Tue, Mar 30, 2010 at 7:03 AM, Corrado ct...@york.ac.uk wrote:
  

I am using nls to fit a non linear function to some data.

The non linear function is:

y= 1- exp(-(k0+k1*p1+  + kn*pn))

I have chosen algorithm port, with lower boundary is 0 for all of the ki
parameters, and I have tried many start values for the parameters ki
(including generating them at random).

If I fit the non linear function to the same data using an external
algorithm, it fits perfectly and finds the parameters.

As soon as I come to my R installation (2.10.1 on Kubuntu Linux 910 64 bit),
I keep getting the error:

Error in nlsModel(formula, mf, start, wts, upper) :   singular gradient
matrix at initial parameter estimates

I have read all the previous postings and the documentation, but to no
avail: the error is there to stay. I am sure the problem is with nls,
because the external fitting algorithm perfectly fits it in less than a
second. Also, if my n is 4, then the nls works perfectly (but that excludes
all the k5  kn).

Can anyone help me with suggestions? Thanks in advance.

Alternatively, what do you suggest I should do? Shall I abandon nls in
favour of optim?

Regards

--
Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Value-at-Risk Portfolio(both equity and option)

2010-03-30 Thread zhang

Hello All,

I am working on the risk measures for a portfolio, which contain both equity
futures, equity options and currency options. There are many packages
related with the portoflio which only contain the equities,I wonder whether
there is any avaible package that could include the option.

Thank you.
-- 
View this message in context: 
http://n4.nabble.com/Value-at-Risk-Portfolio-both-equity-and-option-tp1745179p1745179.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calling R functions into C# or C++

2010-03-30 Thread Fayssal El Moufatich

The zip file actually works fine for me. Anyhow, here is the code snippet
that you need:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Interop.STATCONNECTORSRVLib;

namespace RFromCsharp
{
  class RConnector
  {

private StatConnectorClass rdcom = null;
private string rcmd;

public StatConnectorClass RConnection { 
  get
  {
return rdcom;
  }
  set
  {
rdcom = value;
  } 
}

private bool initR()
{
  try
  {
rdcom = new StatConnectorClass();
rdcom.Init(R);
return true;
  }
  catch(Exception e)
  {
string errmsg = R Init failed:  + rdcom.GetErrorText() +  Other:
 +e.Message.ToString();
Console.WriteLine(errmsg);
return false;
  }
}

private bool loadR(string table, string filename, bool stripwhite, bool
header, string separator)
{
  try
  {
rcmd = table.ToString() + -read.delim(' + filename.ToString() +
',strip.white= + stripwhite.ToString().ToUpper() + ,header= +
header.ToString().ToUpper() + ,sep=' + separator.ToString() + ');
rdcom.EvaluateNoReturn(rcmd);
return true;
  }
  catch(Exception e)
  {
string errmsg = rcmd.ToString() +   + rdcom.GetErrorText() + 
Other: + e.Message.ToString();
Console.WriteLine(errmsg);
return false;
  }
}

private bool closeR()
{
  try
  {
rcmd = graphics.off();
rdcom.EvaluateNoReturn(rcmd);
rcmd = rm(list=ls(all=TRUE));
rdcom.EvaluateNoReturn(rcmd);
rdcom.Close();
return true;
  }
  catch(Exception e)
  {
string errmsg = R Close failed:  + rdcom.GetErrorText() +  Other:
 +
e.Message.ToString();
Console.WriteLine(errmsg);
return false;
  }
}

static void Main(string[] args)
{
  RConnector conn = new RConnector();

  // Initialize the instance to be used with R
  conn.initR();

  // create an R variable named abc and assign it the value of 5
  conn.RConnection.SetSymbol(abc, 5);

  // Retrieve the value of the R variable named abc and assign that
value to the F# value valueForabc
  var valueForabc = conn.RConnection.GetSymbol(abc);

  // Evaluate an expression in R and assign that value to an F# value
aTestEvaluation
  var aTestEvaluation = conn.RConnection.Evaluate(8 * sin(4));

  // Close the R connection
  conn.closeR();

  Console.BackgroundColor = ConsoleColor.Gray;
  Console.ForegroundColor = ConsoleColor.Blue;
  Console.WriteLine(Value of abc: + valueForabc);
  Console.WriteLine(Value of 8 * sin(4): + aTestEvaluation);
  Console.WriteLine(Press any key to continue ...);
  Console.ReadKey();
  //-
}
  }
}

You would also need to reference the Interop.STATCONNECTORSRVLib.dll
assembly. Here is snapshot of my references list:
http://n4.nabble.com/file/n1744914/RFromCsharpReferences.png 

Best regards,
Fayssal El Moufatich
-- 
View this message in context: 
http://n4.nabble.com/Calling-R-functions-into-C-or-C-tp904267p1744914.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error singular gradient matrix at initial parameter

2010-03-30 Thread Gabor Grothendieck
Sorry, its algorithm=brute-force

On Tue, Mar 30, 2010 at 10:29 AM, Corrado ct...@york.ac.uk wrote:
 Hi Gabor,

 same problem even using nls2 with method=brute-force to calculate the
 initial parameters.

 Best,

 Gabor Grothendieck wrote:

 You could try method=brute-force in the nls2 package to find starting
 values.

 On Tue, Mar 30, 2010 at 7:03 AM, Corrado ct...@york.ac.uk wrote:


 I am using nls to fit a non linear function to some data.

 The non linear function is:

 y= 1- exp(-(k0+k1*p1+  + kn*pn))

 I have chosen algorithm port, with lower boundary is 0 for all of the
 ki
 parameters, and I have tried many start values for the parameters ki
 (including generating them at random).

 If I fit the non linear function to the same data using an external
 algorithm, it fits perfectly and finds the parameters.

 As soon as I come to my R installation (2.10.1 on Kubuntu Linux 910 64
 bit),
 I keep getting the error:

 Error in nlsModel(formula, mf, start, wts, upper) :   singular gradient
 matrix at initial parameter estimates

 I have read all the previous postings and the documentation, but to no
 avail: the error is there to stay. I am sure the problem is with nls,
 because the external fitting algorithm perfectly fits it in less than a
 second. Also, if my n is 4, then the nls works perfectly (but that
 excludes
 all the k5  kn).

 Can anyone help me with suggestions? Thanks in advance.

 Alternatively, what do you suggest I should do? Shall I abandon nls in
 favour of optim?

 Regards

 --
 Corrado Topi
 PhD Researcher
 Global Climate Change and Biodiversity
 Area 18,Department of Biology
 University of York, York, YO10 5YW, UK
 Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Corrado Topi
 PhD Researcher
 Global Climate Change and Biodiversity
 Area 18,Department of Biology
 University of York, York, YO10 5YW, UK
 Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] update.packages() and install.packages() does not work more because of Error in read.dcf

2010-03-30 Thread Juergen Rose
Hi,

on all my systems update.packages() and install.packages() fails now. I
get the following message:

r...@orca:/root(28)# R

R version 2.10.1 (2009-12-14)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

 update.packages(checkBuilt=T)
--- Please select a CRAN mirror for use in this session ---
Loading Tcl/Tk interface ... done
Error in read.dcf(file = tmpf) : Line starting 'Li ...' is malformed!
 update.packages()
Error in read.dcf(file = tmpf) : Line starting 'Li ...' is malformed!
 install.packages(e1071)
Error in read.dcf(file = tmpf) : Line starting 'Li ...' is malformed!


All systems are gentoo systems with R-2.10.1. 
Also reinstalling of R from sources did not solve the problem.
Any hint is appreciated.

Regards
Juergen

-- 
Juergen Rose r...@rz.uni-potsdam.de
Uni-Potsdam

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error singular gradient matrix at initial parameter

2010-03-30 Thread Corrado

Yes, of course. The problem still stays.

Gabor Grothendieck wrote:

Sorry, its algorithm=brute-force

On Tue, Mar 30, 2010 at 10:29 AM, Corrado ct...@york.ac.uk wrote:
  

Hi Gabor,

same problem even using nls2 with method=brute-force to calculate the
initial parameters.

Best,

Gabor Grothendieck wrote:


You could try method=brute-force in the nls2 package to find starting
values.

On Tue, Mar 30, 2010 at 7:03 AM, Corrado ct...@york.ac.uk wrote:

  

I am using nls to fit a non linear function to some data.

The non linear function is:

y= 1- exp(-(k0+k1*p1+  + kn*pn))

I have chosen algorithm port, with lower boundary is 0 for all of the
ki
parameters, and I have tried many start values for the parameters ki
(including generating them at random).

If I fit the non linear function to the same data using an external
algorithm, it fits perfectly and finds the parameters.

As soon as I come to my R installation (2.10.1 on Kubuntu Linux 910 64
bit),
I keep getting the error:

Error in nlsModel(formula, mf, start, wts, upper) :   singular gradient
matrix at initial parameter estimates

I have read all the previous postings and the documentation, but to no
avail: the error is there to stay. I am sure the problem is with nls,
because the external fitting algorithm perfectly fits it in less than a
second. Also, if my n is 4, then the nls works perfectly (but that
excludes
all the k5  kn).

Can anyone help me with suggestions? Thanks in advance.

Alternatively, what do you suggest I should do? Shall I abandon nls in
favour of optim?

Regards

--
Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk






--
Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reshaping a data frame with a series of factors and 23 repeated measures

2010-03-30 Thread Ista Zahn
Hi Bill,
Here is a reshape package version. The key thing to notice is that you
have multiple pieces of information in your original column names.
These can be split out using the colsplit() function:

# Read in the data
cattle - read.csv(http://n4.nabble.com/attachment/1745223/0/CattleGrowth.csv;,
colClasses=character)
# make the naming scheme consistent
names(cattle)[names(cattle) %in% c(Date, Days, Wt)] -
c(Date.0, Days.0, Wt.0)
# melt the data
m.cattle - melt(cattle, id = c(Steer.ID, stocker.trt, Finish.trt))
# split out variable and time info
m.cattle - as.data.frame(cbind(colsplit(m.cattle$variable,
split=\\., names=c(Var, time)), m.cattle))
# get rid of the now-redundant variable column
m.cattle$variable - NULL
# cast the data to put variables back in the columns
long.cattle - cast(m.cattle, ... ~ Var)

Best,
Ista

On Tue, Mar 30, 2010 at 9:34 AM, wclapham william.clap...@ars.usda.gov wrote:

 Ista,

 I have looked at the reshape package and have used Œmelt¹ successfully on
 simpler tables.  I tried it here, but have not been successful.  I think I
 just need to gain experience.  I am loving R and am having a difficult time
 with data structure issues.

 I am attaching the data set that I am trying to manipulate.  Ultimately, I
 would like to be able to analyze these data with ANOVA and repeated
 measures, and also be able to plot  growth,  wt * days.  I appreciate any
 help or guidance on references to read that will help me solve my problems.

 The data set represents wt over time (starting with days=0, birth day) for
 steers.  Factors include Stockering and Finishing treatments.

 Regards,

 Bill


 On 3/29/10 3:39 PM, Ista Zahn [via R]
 ml-node+1695531-1043721504-210...@n4.nabble.com wrote:

 Hi Bill,
 Without an example dataset it's hard to see exactly what you need to
 do. But you can get started by looking at the documentation for the
 reshape function (?reshape), and by looking at the reshape package.
 The reshape package has an associated web page
 (http://had.co.nz/reshape/) with links to papers and other information
 to help you get started.

 Best,
 Ista

 On Mon, Mar 29, 2010 at 3:15 PM, wclapham [hidden email]
 http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1695531i=0
 http://n4.nabble.com/user/SendEmail.jtp?type=nodeamp;node=1695531amp;i=0
  wrote:

 
  I have a data frame that I created using read.table on a csv spreadsheet.
  The data look like the following:
 
  Steer.ID   stocker.trt   Finish.trt  Date   Days Wt ..
 
  Steer.Id, stocker.trt, Finish.trt are factors-- Date, Days, Wt are 
  data
  that are repeated 23 times (wide format).
 
  I want to reshape the data such that I have the correct Steer.ID,
  stocker.trt, Finish.trt identifying all of the repeated measures data in a
  long  format.
 
  I am a newbie at R and need to develop the skill in reshaping data, so 
  that
  I can handle routine problems like described above.
 
  Thanks so much in advance for help or advice.
 
  Bill
  --
  View this message in context:
 http://n4.nabble.com/Reshaping-a-data-frame-with-a-series-of-factors-and-23-r
 epeated-measures-tp1695500p1695500.html
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  [hidden email]
 http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1695531i=1
 http://n4.nabble.com/user/SendEmail.jtp?type=nodeamp;node=1695531amp;i=1
 mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 





 --
 View this message in context: 
 http://n4.nabble.com/Reshaping-a-data-frame-with-a-series-of-factors-and-23-repeated-measures-tp1695500p1745223.html
 Sent from the R help mailing list archive at Nabble.com.

        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] From THE R BOOK - Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!

2010-03-30 Thread David Winsemius
A) It is not an error, only a warning. Wouldn't it seem reasonable to  
issue such a warning if you have data that violates the distributional  
assumptions?


B) You did not include any of the data

C) Wouldn't this be more appropriate to the author of the book if this  
is exactly what was suggested there?


--
David,


On Mar 30, 2010, at 10:51 AM, Corrado wrote:


Dear friends,

I am testing glm as at page 514/515 of THE R BOOK by M.Crawley, that  
is on proportion data.


I use glm(y~x1+,family=binomial)

y is a proportion in (0,1), and x is a real number.

I get the error:

In eval(expr, envir, enclos) : non-integer #successes in a binomial  
glm!


But that is exactly what was suggested in the book, where there is  
no mention of a similar warning. Where am I going wrong?


Here is the output:

 glm(response.prepared~x,data=,family=binomial)

Call:  glm(formula = response.prepared ~ x, family = binomial, data  
= )


Coefficients:
(Intercept)x-0.3603   0.4480
Degrees of Freedom: 510554 Total (i.e. Null);  510553 Residual
Null Deviance:  24420
Residual Deviance: 23240AIC: 700700
Warning message:
In eval(expr, envir, enclos) : non-integer #successes in a binomial  
glm!





Regards
--

Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error singular gradient matrix at initial parameter

2010-03-30 Thread Gabor Grothendieck
What do you mean the problem still stays?   If you are using brute
force its not a problem to have it fail on some of the evaluations
since each one is separate.  How large a grid are you using?  Are you
claiming that every single point on the grid fails? Please provide
reproducible code showing what you are doing.

On Tue, Mar 30, 2010 at 10:56 AM, Corrado ct...@york.ac.uk wrote:
 Yes, of course. The problem still stays.

 Gabor Grothendieck wrote:

 Sorry, its algorithm=brute-force

 On Tue, Mar 30, 2010 at 10:29 AM, Corrado ct...@york.ac.uk wrote:


 Hi Gabor,

 same problem even using nls2 with method=brute-force to calculate the
 initial parameters.

 Best,

 Gabor Grothendieck wrote:


 You could try method=brute-force in the nls2 package to find starting
 values.

 On Tue, Mar 30, 2010 at 7:03 AM, Corrado ct...@york.ac.uk wrote:



 I am using nls to fit a non linear function to some data.

 The non linear function is:

 y= 1- exp(-(k0+k1*p1+  + kn*pn))

 I have chosen algorithm port, with lower boundary is 0 for all of the
 ki
 parameters, and I have tried many start values for the parameters ki
 (including generating them at random).

 If I fit the non linear function to the same data using an external
 algorithm, it fits perfectly and finds the parameters.

 As soon as I come to my R installation (2.10.1 on Kubuntu Linux 910 64
 bit),
 I keep getting the error:

 Error in nlsModel(formula, mf, start, wts, upper) :   singular gradient
 matrix at initial parameter estimates

 I have read all the previous postings and the documentation, but to no
 avail: the error is there to stay. I am sure the problem is with nls,
 because the external fitting algorithm perfectly fits it in less than a
 second. Also, if my n is 4, then the nls works perfectly (but that
 excludes
 all the k5  kn).

 Can anyone help me with suggestions? Thanks in advance.

 Alternatively, what do you suggest I should do? Shall I abandon nls in
 favour of optim?

 Regards

 --
 Corrado Topi
 PhD Researcher
 Global Climate Change and Biodiversity
 Area 18,Department of Biology
 University of York, York, YO10 5YW, UK
 Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Corrado Topi
 PhD Researcher
 Global Climate Change and Biodiversity
 Area 18,Department of Biology
 University of York, York, YO10 5YW, UK
 Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk





 --
 Corrado Topi
 PhD Researcher
 Global Climate Change and Biodiversity
 Area 18,Department of Biology
 University of York, York, YO10 5YW, UK
 Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] From THE R BOOK - Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!

2010-03-30 Thread Rubén Roa
-Mensaje original-
De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En 
nombre de Corrado
Enviado el: martes, 30 de marzo de 2010 16:52
Para: r-help@r-project.org
Asunto: [R] From THE R BOOK - Warning: In eval(expr, envir, enclos) : 
non-integer #successes in a binomial glm!

Dear friends,

I am testing glm as at page 514/515 of THE R BOOK by M.Crawley, that is on 
proportion data.

I use glm(y~x1+,family=binomial)

y is a proportion in (0,1), and x is a real number.

I get the error:

In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!

But that is exactly what was suggested in the book, where there is no mention 
of a similar warning. Where am I going wrong?

Here is the output:

  glm(response.prepared~x,data=,family=binomial)

Call:  glm(formula = response.prepared ~ x, family = binomial, data = )

Coefficients:
(Intercept)x 
-0.3603   0.4480 

Degrees of Freedom: 510554 Total (i.e. Null);  510553 Residual
Null Deviance:  24420
Residual Deviance: 23240AIC: 700700
Warning message:
In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
 



Regards
-- 

Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

---

Probably you are misreading Crawley's Book?
A proportion would usually be modeled with the Beta distribution, not the 
binomial, which is for counts.
If you are modeling a proportion try the betareg function in betareg package.
HTH
Ruben
 



 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Code is too slow: mean-centering variables in a data frame by subgroup

2010-03-30 Thread Dimitri Liakhovitski
Dear R-ers,

I have  a large data frame (several thousands of rows and about 2.5
thousand columns). One variable (group) is a grouping variable with
over 30 levels. And I have a lot of NAs.
For each variable, I need to divide each value by variable mean - by
subgroup. I have the code but it's way too slow - takes me about 1.5
hours.
Below is a data example and my code that is too slow. Is there a
different, faster way of doing the same thing?
Thanks a lot for your advice!

Dimitri


# Building an example frame - with groups and a lot of NAs:
set.seed(1234)
frame-data.frame(group=rep(paste(group,1:10),10),a=rnorm(1:100),b=rnorm(1:100),c=rnorm(1:100),d=rnorm(1:100),e=rnorm(1:100),f=rnorm(1:100),g=rnorm(1:100))
frame-frame[order(frame$group),]
names.used-names(frame)[2:length(frame)]
set.seed(1234)
for(i in names.used){
   i.for.NA-sample(1:100,60)
   frame[[i]][i.for.NA]-NA
}
frame

### Code that does what's needed but is too slow:
Start-Sys.time()
frame - do.call(cbind, lapply(names.used, function(x){
  unlist(by(frame, frame$group, function(y) y[,x] / mean(y[,x],na.rm=T)))
}))
Finish-Sys.time()
print(Finish-Start) # Takes too long

-- 
Dimitri Liakhovitski
Ninah.com
dimitri.liakhovit...@ninah.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] From THE R BOOK - Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!

2010-03-30 Thread Yihui Xie
In a Binomial GLM, typically y is a factor with two levels (indicating
success/failure) instead of a numeric vector on [0, 1]. Perhaps the
description in the book is not so clear. You should interpret data on
proportions as the observations from a Binomial distribution (rather
than we observed some proportion data which fell in [0,1]). E.g.

y=rbinom(10, size = 1, prob = .3); x=rnorm(y)
# or y = factor(y)
glm(y~x, family = binomial)


Regards,
Yihui
--
Yihui Xie xieyi...@gmail.com
Phone: 515-294-6609 Web: http://yihui.name
Department of Statistics, Iowa State University
3211 Snedecor Hall, Ames, IA



On Tue, Mar 30, 2010 at 9:51 AM, Corrado ct...@york.ac.uk wrote:
 Dear friends,

 I am testing glm as at page 514/515 of THE R BOOK by M.Crawley, that is on
 proportion data.

 I use glm(y~x1+,family=binomial)

 y is a proportion in (0,1), and x is a real number.

 I get the error:

 In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!

 But that is exactly what was suggested in the book, where there is no
 mention of a similar warning. Where am I going wrong?

 Here is the output:

 glm(response.prepared~x,data=,family=binomial)

 Call:  glm(formula = response.prepared ~ x, family = binomial, data = )

 Coefficients:
 (Intercept)            x    -0.3603       0.4480
 Degrees of Freedom: 510554 Total (i.e. Null);  510553 Residual
 Null Deviance:      24420
 Residual Deviance: 23240        AIC: 700700
 Warning message:
 In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!




 Regards
 --

 Corrado Topi
 PhD Researcher
 Global Climate Change and Biodiversity
 Area 18,Department of Biology
 University of York, York, YO10 5YW, UK
 Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] From THE R BOOK - Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!

2010-03-30 Thread Corrado

Dear David,

David Winsemius wrote:
A) It is not an error, only a warning. Wouldn't it seem reasonable to 
issue such a warning if you have data that violates the distributional 
assumptions?
I am not questioning the approach. I am only trying to understand why a 
(rather expensive) source of documentation and the behaviour of a 
function are not aligned.



B) You did not include any of the data

Data attached as R object.
C) Wouldn't this be more appropriate to the author of the book if this 
is exactly what was suggested there?


I think it will be definitively appropriate, but only when I am certain 
I am not doing anything wrong.


Regards

--
Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] From THE R BOOK - Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!

2010-03-30 Thread Corrado

Dear Ruben

I am afraid not  the paragraph's title is a bit of a give away:

Proportion Data and Binomial Errors

The sentence reads:

  are dealt with by using a generalised linear model with a 
binomial error structure.


with the example:

glm(y~x,family=binomial)

You can check at page 514/515.

Rubén Roa wrote:

-Mensaje original-
De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En 
nombre de Corrado
Enviado el: martes, 30 de marzo de 2010 16:52
Para: r-help@r-project.org
Asunto: [R] From THE R BOOK - Warning: In eval(expr, envir, enclos) : 
non-integer #successes in a binomial glm!

Dear friends,

I am testing glm as at page 514/515 of THE R BOOK by M.Crawley, that is on 
proportion data.

I use glm(y~x1+,family=binomial)

y is a proportion in (0,1), and x is a real number.

I get the error:

In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!

But that is exactly what was suggested in the book, where there is no mention 
of a similar warning. Where am I going wrong?

Here is the output:

  glm(response.prepared~x,data=,family=binomial)

Call:  glm(formula = response.prepared ~ x, family = binomial, data = )

Coefficients:
(Intercept)x 
-0.3603   0.4480 


Degrees of Freedom: 510554 Total (i.e. Null);  510553 Residual
Null Deviance:  24420
Residual Deviance: 23240AIC: 700700
Warning message:
In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
 



Regards
  



--
Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to recode variables using base R

2010-03-30 Thread johannes rara
Thanks, you're a lifesaver.

-J

2010/3/30 Henrique Dallazuanna www...@gmail.com:
 Using lapply:

 as.data.frame(lapply(df, cut, breaks = c(-Inf, 3, 8, 16), labels =
 c('x', 'y', 'z')))


 On Tue, Mar 30, 2010 at 10:14 AM, johannes rara johannesr...@gmail.com 
 wrote:
 Thanks John and Henrique, my intention is to do this for A, B and C
 (all at once), so I'll have to wrap your solution into lapply or for
 loop?

 -J

 2010/3/30 Henrique Dallazuanna www...@gmail.com:
 You could try this also:

 cut(df$A, c(-Inf, 3, 8), labels = c('x', 'y'))

 On Tue, Mar 30, 2010 at 8:30 AM, johannes rara johannesr...@gmail.com 
 wrote:
 Hi,

 Is there an efficient way recoding variables in a data.frame using
 base R? My purpose is to create
 new variables and attach them into old data.frame. The basic idea is
 shown below, but how to create recoding for A, B and C and assing them
 into new variables?

 df - data.frame(A = c(1:5),
 B = c(3,6,2,8,10),
 C = c(0,15,5,9,12))

 df$A[df$A = 3] - x
 df$A[df$A  3  df$A = 8] - y
 df$A[df$A = 16] - z

 Thanks,
 -J

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O





 --
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multivariate hypergeometric distribution version of phyper()

2010-03-30 Thread Charles C. Berry

On Tue, 30 Mar 2010, Karl Brand wrote:


Dear R Users,

I employed the phyper() function to estimate the likelihood that the number 
of genes overlapping between 2 different lists of genes is due to chance. 
This appears to work appropriately.


Now i want to try this with 3 lists of genes which phyper() does not appear 
to support.


Some googling suggests i can utilize the Multivariate hypergeometric 
distribution to achieve this. eg.:


http://en.wikipedia.org/wiki/Hypergeometric_distribution

But when i try to do this manually using the choose() function (see attempt 
below example with just two gene lists) i'm unable to perform the 
calculations- the numbers hit infinity before getting an answer.


Searching cran archives for Multivariate hypergeometric show this term in 
the vignettes of package's ‘combinat’ and ‘forward’. But i'm unable to make 
sense of the these pachakege functions in the context of my aforementioned 
apllication.


Can some one suggest a function, script or method to achieve my goal of 
estimating the likelyhood of overlap between 3 lists of genes, ideally using 
the multivariate hypergeometric, or anything else for that matter?



Two suggestions:

1) Don't! Likely the theory is unsuited for the application. In
   most applications that generate lists of genes, the genes are
   not iid realizations and the hypergeometric gives results that
   are astonishingly anticonservative. As an alternative , the
   block bootstrap may be suitable. See
http://171.66.122.45/cgi/content/abstract/17/6/760

   and Google (scholar) 'genomic block bootstrap' for some
  starting points.


2) Take this thread to the bioconductor list. You are much
   more likely to get pointers to useful packages and functions
   for genomic statistical software there.

HTH,

Chuck




cheers in advance,

Karl



#example attempt with two gene lists m  n
N - 45101 # total number balls in urn
m - 720   # number of 'white' or 'special' balls in urn, aka 'success'
n - 801   # number balls drawn or number of samples
k - 40# number of 'white' or 'special' balls DRAWN

a - choose(m,k)
b - choose((N-m),(n-k))
z - choose(N,n)
prK - (a*b)/z #'the answer'
print(prK)
[1] NaN


 a

[1] 7.985852e+65

 b

[1] Inf

 z

[1] Inf


--
Karl Brand
Department of Genetics
Erasmus MC
Dr Molewaterplein 50
3015 GE Rotterdam
T +31 (0)10 704 3457 | F +31 (0)10 704 4743 | M +31 (0)642 777 268

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] From THE R BOOK - Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!

2010-03-30 Thread David Winsemius


On Mar 30, 2010, at 11:19 AM, Corrado wrote:


Dear David,

David Winsemius wrote:
A) It is not an error, only a warning. Wouldn't it seem reasonable  
to issue such a warning if you have data that violates the  
distributional assumptions?
I am not questioning the approach. I am only trying to understand  
why a (rather expensive) source of documentation and the behaviour  
of a function are not aligned.



B) You did not include any of the data

Data attached as R object.
C) Wouldn't this be more appropriate to the author of the book if  
this is exactly what was suggested there?


I think it will be definitively appropriate, but only when I am  
certain I am not doing anything wrong.


I don't understand this perspective. You bought Crowley's book so he  
is in some minor sense in debt to you.  Why should you think it is  
more appropriate to send your message out to thousands of readers of r- 
help around the world (some of whom have written books that you did  
not buy) before sending Crowley a question about his text?





Regards

--
Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk



David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Code is too slow: mean-centering variables in a data frame by subgroup

2010-03-30 Thread Dimitri Liakhovitski
I wrote a different code - but it takes twice as long as my original code. :(
However, I thought I should share it as well - because the second part
of the code is fast - it's the first part that's slow. Maybe there is
a way to fix the first part...
Thank you!


group.var-group
subgroups-levels(frame[[group.var]])

system.time({
means.no.zeros-list()
for(i in 1:length(subgroups)){  # SLOW part of the code
  row.of.means-as.data.frame(t(colMeans(frame[frame[[group.var]] %in%
subgroups[i],names.used],na.rm=T)))
  nr.of.rows-(dim(frame[frame[[group.var]] %in% subgroups[i],])[1])
  
means.no.zeros[[i]]-as.data.frame(matrix(nrow=nr.of.rows,ncol=length(names.used)))
  means.no.zeros[[i]]-row.of.means
  for(z in 1:nr.of.rows){ #z-1
means.no.zeros[[i]][z,] = row.of.means
  }
 }
means.no.zeros-do.call(rbind,means.no.zeros)
})

system.time({#FAST part of the code
frame[names.used]-frame[names.used]/means.no.zeros
})



On Tue, Mar 30, 2010 at 11:04 AM, Dimitri Liakhovitski ld7...@gmail.com wrote:
 Dear R-ers,

 I have  a large data frame (several thousands of rows and about 2.5
 thousand columns). One variable (group) is a grouping variable with
 over 30 levels. And I have a lot of NAs.
 For each variable, I need to divide each value by variable mean - by
 subgroup. I have the code but it's way too slow - takes me about 1.5
 hours.
 Below is a data example and my code that is too slow. Is there a
 different, faster way of doing the same thing?
 Thanks a lot for your advice!

 Dimitri


 # Building an example frame - with groups and a lot of NAs:
 set.seed(1234)
 frame-data.frame(group=rep(paste(group,1:10),10),a=rnorm(1:100),b=rnorm(1:100),c=rnorm(1:100),d=rnorm(1:100),e=rnorm(1:100),f=rnorm(1:100),g=rnorm(1:100))
 frame-frame[order(frame$group),]
 names.used-names(frame)[2:length(frame)]
 set.seed(1234)
 for(i in names.used){
       i.for.NA-sample(1:100,60)
       frame[[i]][i.for.NA]-NA
 }
 frame

 ### Code that does what's needed but is too slow:
 Start-Sys.time()
 frame - do.call(cbind, lapply(names.used, function(x){
  unlist(by(frame, frame$group, function(y) y[,x] / mean(y[,x],na.rm=T)))
 }))
 Finish-Sys.time()
 print(Finish-Start) # Takes too long

 --
 Dimitri Liakhovitski
 Ninah.com
 dimitri.liakhovit...@ninah.com




-- 
Dimitri Liakhovitski
Ninah.com
dimitri.liakhovit...@ninah.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding positions in array

2010-03-30 Thread Erich Neuwirth
which(y7)

z-which(y7)


On 3/30/2010 2:54 PM, Romildo Martins wrote:
 Hello,
 
 I need a function to check what positions of the array are greater than y
 and return to positions in another array z.
 
 x-array(E(gaux)$weight)
 x
 [1]  3  8 10  6
 
 If y = 7
 
 z
 [1] 2 3
 
 
 Thanks a lot!
 
 Romild
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
Erich Neuwirth, University of Vienna
Faculty of Computer Science
Computer Supported Didactics Working Group
Visit our SunSITE at http://sunsite.univie.ac.at
Phone: +43-1-4277-39464 Fax: +43-1-4277-39459

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Code is too slow: mean-centering variables in a data frame by subgroup

2010-03-30 Thread Charles C. Berry

On Tue, 30 Mar 2010, Dimitri Liakhovitski wrote:


Dear R-ers,

I have  a large data frame (several thousands of rows and about 2.5
thousand columns). One variable (group) is a grouping variable with
over 30 levels. And I have a lot of NAs.
For each variable, I need to divide each value by variable mean - by
subgroup. I have the code but it's way too slow - takes me about 1.5
hours.
Below is a data example and my code that is too slow. Is there a
different, faster way of doing the same thing?
Thanks a lot for your advice!

Dimitri


# Building an example frame - with groups and a lot of NAs:
set.seed(1234)
frame-data.frame(group=rep(paste(group,1:10),10),a=rnorm(1:100),b=rnorm(1:100),c=rnorm(1:100),d=rnorm(1:100),e=rnorm(1:100),f=rnorm(1:100),g=rnorm(1:100))



Use model.matrix and crossprod to do this in a vectorized fashion:


mat - as.matrix(frame[,-1])
mm - model.matrix(~0+group,frame)
col.grp.N - crossprod( !is.na(mat), mm )
mat[is.na(mat)] - 0.0
col.grp.sum - crossprod( mat, mm )
mat - mat / ( t(col.grp.sum/col.grp.N)[ frame$group,] )
is.na(mat) - is.na(frame[,-1])



mat is now a matrix whose columns each correspond to the columns in 
'frame' as you have it after do.call(...)



Are you sure you want to divide the values by their (possibly negative) 
means??


HTH,

Chuck




frame-frame[order(frame$group),]
names.used-names(frame)[2:length(frame)]
set.seed(1234)
for(i in names.used){
  i.for.NA-sample(1:100,60)
  frame[[i]][i.for.NA]-NA
}
frame

### Code that does what's needed but is too slow:
Start-Sys.time()
frame - do.call(cbind, lapply(names.used, function(x){
 unlist(by(frame, frame$group, function(y) y[,x] / mean(y[,x],na.rm=T)))
}))
Finish-Sys.time()
print(Finish-Start) # Takes too long

--
Dimitri Liakhovitski
Ninah.com
dimitri.liakhovit...@ninah.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] about the possible errors in Rgraphviz Package

2010-03-30 Thread HU,ZHENGJUN

Hi All,

 I tried to install the package of Rgraphviz in the following two 
ways successfully:


source(http://bioconductor.org/biocLite.R;)
biocLite(Rgraphviz)

install.packages(pkgs=C:/Progra~1/R/lib_download/Rgraphviz_1.24.0.zip, 
lib=C:/Progra~1/R/R-2.10.1/library, repos=NULL)


but when I loaded the package though library(Rgraphviz) or 
library(Rgraphviz), and got the same error message below:


Error in inDL(x, as.logical(local), as.logical(now), ...) :
 unable to load shared library 
'C:/PROGRA~1/R/R-210~1.1/library/Rgraphviz/libs/Rgraphviz.dll':

 LoadLibrary failure:  The specified module could not be found.

I think that it is the error in the package because it should go 
to 'C:/PROGRA~1/R/R-2.10.1/library/Rgraphviz/libs/Rgraphviz.dll' 
instead of 
'C:/PROGRA~1/R/R-210~1.1/library/Rgraphviz/libs/Rgraphviz.dll'


Could anyone help me to solve to problem?
Thank you very much for the help. Howard

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] dot plot

2010-03-30 Thread kayj

Hi All,

I need to make a dot plot where the points of the plot are connected with
lines.is the possible to do in R?
Also, I do nto know how to combine two plots into one plot?

thanks and I appreciate your help

-- 
View this message in context: 
http://n4.nabble.com/dot-plot-tp1745415p1745415.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] list index rules evaluation behavior

2010-03-30 Thread Dgnn

I have what may be a simple/foolish question, but I've done the due diligence
and looked through pages of posts here as well as several of the PDFs on the
CRAN site, but haven't been able find what I'm after.

I am working with a list of say 3 histogram objects A, B  C, and each
histogram is a list of 7 elements. I would like to access $name, the 6th
element, of histograms A,B and C.


Trial and error yielded some results that told me I clearly don't understand
how R interprets index commands. For the histogram list above:
a[1:2] give histograms A and B as expected.
a[[1:2]] gives the second element of histogram 1, but a[[1:1]] gives all
elements of histogram 1, while a[[1:3]] gives null?!

If anyone could help with an explanation of indexing rules, or a source that
does so, I would very much appreciate it. Oh and an answer to the first
question!

Thanks All

Jason


-- 
View this message in context: 
http://n4.nabble.com/list-index-rules-evaluation-behavior-tp1745398p1745398.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Code is too slow: mean-centering variables in a data frame by subgroup

2010-03-30 Thread Dgnn

I posted a similar problem last week (but with an uninformative subject
header) See if this 
http://n4.nabble.com/a-vectorized-solution-to-some-simple-dataframe-math-td1692810.html#a1710410
this  helps.

-- 
View this message in context: 
http://n4.nabble.com/Code-is-too-slow-mean-centering-variables-in-a-data-frame-by-subgroup-tp1745335p1745434.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] dot plot

2010-03-30 Thread Ivan Calandra

?lines
?plot (with type= argument)

Ivan

Le 3/30/2010 17:55, kayj a écrit :

Hi All,

I need to make a dot plot where the points of the plot are connected with
lines.is the possible to do in R?
Also, I do nto know how to combine two plots into one plot?

thanks and I appreciate your help

   


--
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calan...@uni-hamburg.de

**
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] weighted.median function from package R.basic

2010-03-30 Thread Joris Meys
Dear all,

I want to apply a weighted median on a huge dataset, and I remember a
function from the package R.basic that could do this using an internal
sorting algorithm qsort. This speeded things up quite a bit. Alas, I can't
find that package anywhere anymore. There is a weighted.median function in
the package limma too, but I didn't use that before.

Anybody who knows what happened to  R.basic?

Cheers
Joris

-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Code is too slow: mean-centering variables in a data frame by subgroup

2010-03-30 Thread Dimitri Liakhovitski
Thanks a lot, Charles - I'll try your approach.
Yes - don't worry about dividing by negative means - in real data all
values are positive.
Dimitri

On Tue, Mar 30, 2010 at 12:24 PM, Charles C. Berry cbe...@tajo.ucsd.edu wrote:
 On Tue, 30 Mar 2010, Dimitri Liakhovitski wrote:

 Dear R-ers,

 I have  a large data frame (several thousands of rows and about 2.5
 thousand columns). One variable (group) is a grouping variable with
 over 30 levels. And I have a lot of NAs.
 For each variable, I need to divide each value by variable mean - by
 subgroup. I have the code but it's way too slow - takes me about 1.5
 hours.
 Below is a data example and my code that is too slow. Is there a
 different, faster way of doing the same thing?
 Thanks a lot for your advice!

 Dimitri


 # Building an example frame - with groups and a lot of NAs:
 set.seed(1234)

 frame-data.frame(group=rep(paste(group,1:10),10),a=rnorm(1:100),b=rnorm(1:100),c=rnorm(1:100),d=rnorm(1:100),e=rnorm(1:100),f=rnorm(1:100),g=rnorm(1:100))


 Use model.matrix and crossprod to do this in a vectorized fashion:

 mat - as.matrix(frame[,-1])
 mm - model.matrix(~0+group,frame)
 col.grp.N - crossprod( !is.na(mat), mm )
 mat[is.na(mat)] - 0.0
 col.grp.sum - crossprod( mat, mm )
 mat - mat / ( t(col.grp.sum/col.grp.N)[ frame$group,] )
 is.na(mat) - is.na(frame[,-1])


 mat is now a matrix whose columns each correspond to the columns in 'frame'
 as you have it after do.call(...)


 Are you sure you want to divide the values by their (possibly negative)
 means??

 HTH,

 Chuck



 frame-frame[order(frame$group),]
 names.used-names(frame)[2:length(frame)]
 set.seed(1234)
 for(i in names.used){
      i.for.NA-sample(1:100,60)
      frame[[i]][i.for.NA]-NA
 }
 frame

 ### Code that does what's needed but is too slow:
 Start-Sys.time()
 frame - do.call(cbind, lapply(names.used, function(x){
  unlist(by(frame, frame$group, function(y) y[,x] / mean(y[,x],na.rm=T)))
 }))
 Finish-Sys.time()
 print(Finish-Start) # Takes too long

 --
 Dimitri Liakhovitski
 Ninah.com
 dimitri.liakhovit...@ninah.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 Charles C. Berry                            (858) 534-2098
                                            Dept of Family/Preventive
 Medicine
 E mailto:cbe...@tajo.ucsd.edu               UC San Diego
 http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901






-- 
Dimitri Liakhovitski
Ninah.com
dimitri.liakhovit...@ninah.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Create a new variable

2010-03-30 Thread Thomas Jensen
Dear R-list,

Sorry for spamming the list lately, I am just learning the more advanced
aspects of R! 

I have some data that looks like this:

Out Country1 Country 2 Country 3 ... CountryN
1   1   1   1   1
0   1   1   0   1
1   1   0   1   0

I want to create a new variable that counts the number of zeros in every
row whenever Out is equal to 1, and else it is a zero, so it would look
like this:

new_var
0 
0
2

I have tried the following:

for (i in length(Out)){
if (Out == 1) {new_var - sum(dat[i,] != 1)}
else {new_var - 0}
}

but this gives me an error message.

Best, Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] list index rules evaluation behavior

2010-03-30 Thread Joris Meys
Hi Jason,

try using comma's instead of colons. eg a[[c(1,6)]], a[[c(3,6)]] etc...

If you use a[[1:3]] this is equivalent to a[[c(1,2,3)]]. As the list only
contains 2 levels, this will give an error or NULL , depending on your R
version.

More info you find by ?[[

Cheers
Joris


On Tue, Mar 30, 2010 at 5:47 PM, Dgnn sharkbrain...@gmail.com wrote:


 I have what may be a simple/foolish question, but I've done the due
 diligence
 and looked through pages of posts here as well as several of the PDFs on
 the
 CRAN site, but haven't been able find what I'm after.

 I am working with a list of say 3 histogram objects A, B  C, and each
 histogram is a list of 7 elements. I would like to access $name, the 6th
 element, of histograms A,B and C.


 Trial and error yielded some results that told me I clearly don't
 understand
 how R interprets index commands. For the histogram list above:
 a[1:2] give histograms A and B as expected.
 a[[1:2]] gives the second element of histogram 1, but a[[1:1]] gives all
 elements of histogram 1, while a[[1:3]] gives null?!

 If anyone could help with an explanation of indexing rules, or a source
 that
 does so, I would very much appreciate it. Oh and an answer to the first
 question!

 Thanks All

 Jason


 --
 View this message in context:
 http://n4.nabble.com/list-index-rules-evaluation-behavior-tp1745398p1745398.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Create a new variable

2010-03-30 Thread Erik Iverson

Hello,

Thomas Jensen wrote:

Dear R-list,

Sorry for spamming the list lately, I am just learning the more advanced
aspects of R! 


I have some data that looks like this:

Out Country1 Country 2 Country 3 ... CountryN
1   1   1   1   1
0   1   1   0   1
1   1   0   1   0



Don't paste data like this to the list. Use ?dput to create an easy to 
use data.frame that users of the list can input with one R command.  You 
will most likely get help very quickly at that point since our data will 
match your's exactly.



I want to create a new variable that counts the number of zeros in every
row whenever Out is equal to 1, and else it is a zero, so it would look
like this:

new_var
0 
0

2

I have tried the following:

for (i in length(Out)){
if (Out == 1) {new_var - sum(dat[i,] != 1)}
else {new_var - 0}
}

but this gives me an error message.


I have not tested any of this, but I'm guessing something like the 
following would work.  Assume your data.frame is called df.


#NOT TESTED
tmp - apply(df, 1, function(x) sum(x == 0))
df$new_var - ifelse(df$Out == 1, tmp, 0)

See ?apply and ?ifelse .






Best, Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Code is too slow: mean-centering variables in a data frame by subgroup

2010-03-30 Thread Dimitri Liakhovitski
Dear Charles, thank you so much!
On my example data frame you code takes 0 sec and mine - 0.05 sec - a
huge difference even if 0 = 0.04 sec.
Dimitri


On Tue, Mar 30, 2010 at 12:30 PM, Dimitri Liakhovitski ld7...@gmail.com wrote:
 Thanks a lot, Charles - I'll try your approach.
 Yes - don't worry about dividing by negative means - in real data all
 values are positive.
 Dimitri

 On Tue, Mar 30, 2010 at 12:24 PM, Charles C. Berry cbe...@tajo.ucsd.edu 
 wrote:
 On Tue, 30 Mar 2010, Dimitri Liakhovitski wrote:

 Dear R-ers,

 I have  a large data frame (several thousands of rows and about 2.5
 thousand columns). One variable (group) is a grouping variable with
 over 30 levels. And I have a lot of NAs.
 For each variable, I need to divide each value by variable mean - by
 subgroup. I have the code but it's way too slow - takes me about 1.5
 hours.
 Below is a data example and my code that is too slow. Is there a
 different, faster way of doing the same thing?
 Thanks a lot for your advice!

 Dimitri


 # Building an example frame - with groups and a lot of NAs:
 set.seed(1234)

 frame-data.frame(group=rep(paste(group,1:10),10),a=rnorm(1:100),b=rnorm(1:100),c=rnorm(1:100),d=rnorm(1:100),e=rnorm(1:100),f=rnorm(1:100),g=rnorm(1:100))


 Use model.matrix and crossprod to do this in a vectorized fashion:

 mat - as.matrix(frame[,-1])
 mm - model.matrix(~0+group,frame)
 col.grp.N - crossprod( !is.na(mat), mm )
 mat[is.na(mat)] - 0.0
 col.grp.sum - crossprod( mat, mm )
 mat - mat / ( t(col.grp.sum/col.grp.N)[ frame$group,] )
 is.na(mat) - is.na(frame[,-1])


 mat is now a matrix whose columns each correspond to the columns in 'frame'
 as you have it after do.call(...)


 Are you sure you want to divide the values by their (possibly negative)
 means??

 HTH,

 Chuck



 frame-frame[order(frame$group),]
 names.used-names(frame)[2:length(frame)]
 set.seed(1234)
 for(i in names.used){
      i.for.NA-sample(1:100,60)
      frame[[i]][i.for.NA]-NA
 }
 frame

 ### Code that does what's needed but is too slow:
 Start-Sys.time()
 frame - do.call(cbind, lapply(names.used, function(x){
  unlist(by(frame, frame$group, function(y) y[,x] / mean(y[,x],na.rm=T)))
 }))
 Finish-Sys.time()
 print(Finish-Start) # Takes too long

 --
 Dimitri Liakhovitski
 Ninah.com
 dimitri.liakhovit...@ninah.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 Charles C. Berry                            (858) 534-2098
                                            Dept of Family/Preventive
 Medicine
 E mailto:cbe...@tajo.ucsd.edu               UC San Diego
 http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901






 --
 Dimitri Liakhovitski
 Ninah.com
 dimitri.liakhovit...@ninah.com




-- 
Dimitri Liakhovitski
Ninah.com
dimitri.liakhovit...@ninah.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Code is too slow: mean-centering variables in a data frame by subgroup

2010-03-30 Thread Dimitri Liakhovitski
I meant - even if 0 = 0.004
D.

On Tue, Mar 30, 2010 at 12:47 PM, Dimitri Liakhovitski ld7...@gmail.com wrote:
 Dear Charles, thank you so much!
 On my example data frame you code takes 0 sec and mine - 0.05 sec - a
 huge difference even if 0 = 0.04 sec.
 Dimitri


 On Tue, Mar 30, 2010 at 12:30 PM, Dimitri Liakhovitski ld7...@gmail.com 
 wrote:
 Thanks a lot, Charles - I'll try your approach.
 Yes - don't worry about dividing by negative means - in real data all
 values are positive.
 Dimitri

 On Tue, Mar 30, 2010 at 12:24 PM, Charles C. Berry cbe...@tajo.ucsd.edu 
 wrote:
 On Tue, 30 Mar 2010, Dimitri Liakhovitski wrote:

 Dear R-ers,

 I have  a large data frame (several thousands of rows and about 2.5
 thousand columns). One variable (group) is a grouping variable with
 over 30 levels. And I have a lot of NAs.
 For each variable, I need to divide each value by variable mean - by
 subgroup. I have the code but it's way too slow - takes me about 1.5
 hours.
 Below is a data example and my code that is too slow. Is there a
 different, faster way of doing the same thing?
 Thanks a lot for your advice!

 Dimitri


 # Building an example frame - with groups and a lot of NAs:
 set.seed(1234)

 frame-data.frame(group=rep(paste(group,1:10),10),a=rnorm(1:100),b=rnorm(1:100),c=rnorm(1:100),d=rnorm(1:100),e=rnorm(1:100),f=rnorm(1:100),g=rnorm(1:100))


 Use model.matrix and crossprod to do this in a vectorized fashion:

 mat - as.matrix(frame[,-1])
 mm - model.matrix(~0+group,frame)
 col.grp.N - crossprod( !is.na(mat), mm )
 mat[is.na(mat)] - 0.0
 col.grp.sum - crossprod( mat, mm )
 mat - mat / ( t(col.grp.sum/col.grp.N)[ frame$group,] )
 is.na(mat) - is.na(frame[,-1])


 mat is now a matrix whose columns each correspond to the columns in 'frame'
 as you have it after do.call(...)


 Are you sure you want to divide the values by their (possibly negative)
 means??

 HTH,

 Chuck



 frame-frame[order(frame$group),]
 names.used-names(frame)[2:length(frame)]
 set.seed(1234)
 for(i in names.used){
      i.for.NA-sample(1:100,60)
      frame[[i]][i.for.NA]-NA
 }
 frame

 ### Code that does what's needed but is too slow:
 Start-Sys.time()
 frame - do.call(cbind, lapply(names.used, function(x){
  unlist(by(frame, frame$group, function(y) y[,x] / mean(y[,x],na.rm=T)))
 }))
 Finish-Sys.time()
 print(Finish-Start) # Takes too long

 --
 Dimitri Liakhovitski
 Ninah.com
 dimitri.liakhovit...@ninah.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 Charles C. Berry                            (858) 534-2098
                                            Dept of Family/Preventive
 Medicine
 E mailto:cbe...@tajo.ucsd.edu               UC San Diego
 http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901






 --
 Dimitri Liakhovitski
 Ninah.com
 dimitri.liakhovit...@ninah.com




 --
 Dimitri Liakhovitski
 Ninah.com
 dimitri.liakhovit...@ninah.com




-- 
Dimitri Liakhovitski
Ninah.com
dimitri.liakhovit...@ninah.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Create a new variable

2010-03-30 Thread Joris Meys
Easy using rowSums :

 x - data.frame(X1=c(0,0,1),X2=c(1,1,1),X3=c(0,1,0))

 x$Nulls - rowSums(x==0)
 x
  X1 X2 X3 Nulls
1  0  1  0 2
2  0  1  1 1
3  1  1  0 1

Cheers


On Tue, Mar 30, 2010 at 6:31 PM, Thomas Jensen 
thomas.jen...@eup.gess.ethz.ch wrote:

 Dear R-list,

 Sorry for spamming the list lately, I am just learning the more advanced
 aspects of R!

 I have some data that looks like this:

 Out Country1 Country 2 Country 3 ... CountryN
 1   1   1   1   1
 0   1   1   0   1
 1   1   0   1   0

 I want to create a new variable that counts the number of zeros in every
 row whenever Out is equal to 1, and else it is a zero, so it would look
 like this:

 new_var
 0
 0
 2

 I have tried the following:

 for (i in length(Out)){
 if (Out == 1) {new_var - sum(dat[i,] != 1)}
 else {new_var - 0}
 }

 but this gives me an error message.

 Best, Thomas

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] weighted.median function from package R.basic

2010-03-30 Thread Henrik Bengtsson
Hi,

good memory.  weightedMedian() is now available in the aroma.light
package (it was moved there from R.basic in Feb 2006).

/Henrik
(author of both packages)

On Tue, Mar 30, 2010 at 6:30 PM, Joris Meys jorism...@gmail.com wrote:
 Dear all,

 I want to apply a weighted median on a huge dataset, and I remember a
 function from the package R.basic that could do this using an internal
 sorting algorithm qsort. This speeded things up quite a bit. Alas, I can't
 find that package anywhere anymore. There is a weighted.median function in
 the package limma too, but I didn't use that before.

 Anybody who knows what happened to  R.basic?

 Cheers
 Joris

 --
 Joris Meys
 Statistical Consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 Coupure Links 653
 B-9000 Gent

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] about the possible errors in Rgraphviz Package

2010-03-30 Thread Duncan Murdoch

On 30/03/2010 10:44 AM, HU,ZHENGJUN wrote:

Hi All,

  I tried to install the package of Rgraphviz in the following two 
ways successfully:


source(http://bioconductor.org/biocLite.R;)
biocLite(Rgraphviz)

install.packages(pkgs=C:/Progra~1/R/lib_download/Rgraphviz_1.24.0.zip, 
lib=C:/Progra~1/R/R-2.10.1/library, repos=NULL)


but when I loaded the package though library(Rgraphviz) or 
library(Rgraphviz), and got the same error message below:


Error in inDL(x, as.logical(local), as.logical(now), ...) :
  unable to load shared library 
'C:/PROGRA~1/R/R-210~1.1/library/Rgraphviz/libs/Rgraphviz.dll':

  LoadLibrary failure:  The specified module could not be found.
  


Most likely the problem is that you haven't followed the installation 
instructions.  (They are pretty hard to find, but I think you can find 
them on the Bioconductor site.)  It is not enough to install the 
Rgraphviz package, you also need to install Graphviz.


Duncan Murdoch
I think that it is the error in the package because it should go 
to 'C:/PROGRA~1/R/R-2.10.1/library/Rgraphviz/libs/Rgraphviz.dll' 
instead of 
'C:/PROGRA~1/R/R-210~1.1/library/Rgraphviz/libs/Rgraphviz.dll'


Could anyone help me to solve to problem?
Thank you very much for the help. Howard

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] list index rules evaluation behavior

2010-03-30 Thread David Winsemius


On Mar 30, 2010, at 11:47 AM, Dgnn wrote:



I have what may be a simple/foolish question, but I've done the due  
diligence
and looked through pages of posts here as well as several of the  
PDFs on the

CRAN site, but haven't been able find what I'm after.

I am working with a list of say 3 histogram objects A, B  C, and each
histogram is a list of 7 elements. I would like to access $name, the  
6th

element, of histograms A,B and C.



If you want better answers, you should provide better examples ...   
with _CODE_.




Trial and error yielded some results that told me I clearly don't  
understand

how R interprets index commands. For the histogram list above:
a[1:2] give histograms A and B as expected.
a[[1:2]] gives the second element of histogram 1, but a[[1:1]] gives  
all

elements of histogram 1, while a[[1:3]] gives null?!

If anyone could help with an explanation of indexing rules, or a  
source that
does so, I would very much appreciate it. Oh and an answer to the  
first

question!


?[[

[[ always returns a single vector or list and so its arguments will  
be coerced to a single value. When passed an arguemnt that has  
multiple values it is interpreted as serial application of [[ with  
the serial values.  The construction [[1:1]] gets turned into [[1]]  
(since 1:1 is just 1)  while the construction [[1:2]] got turned into  
[[1]][[2]]


 list(a=list(aa=5, bb=6),b=2,c=3)[[1:2]]
[1] 6

[ may return a more complex object and so may accept multiple  
arguments


 list(a=1,b=2,c=3)[c(1,3)]
$a
[1] 1

$c
[1] 3






--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] From THE R BOOK - Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!

2010-03-30 Thread Berwin A Turlach
G'day all,

On Tue, 30 Mar 2010 16:19:46 +0100
Corrado ct...@york.ac.uk wrote:

 David Winsemius wrote:
  A) It is not an error, only a warning. Wouldn't it seem reasonable
  to issue such a warning if you have data that violates the
  distributional assumptions?
 I am not questioning the approach. I am only trying to understand why
 a (rather expensive) source of documentation and the behaviour of a 
 function are not aligned.

1) Also expensive books have typos in them.
2) glm() is from a package that is part of R and the author of this
   book is AFAIK not a member of R core, hence has no control on
   whether his documentation and the behaviour of a function are
   aligned.
   a) If he were documenting a function that was part of a package he
  wrote as support for his book, as some authors do, there might be
  a reason to complain.  But then 1) would still apply.
   b) Even books written by members of R core have occasionally
  misalignments between the behaviour of a function and the
  documentation contained in such books.  This can be due to them
  documenting a function over whose implementation they do not have
  control (e.g. a function in a contributed package) or the fact
  that R is improving/changing from version to version while books
  are rather static.

For these reasons it is always worthwhile to check the errata page for
a book, if such exists.

The source of the warning is due to the fact that you do not provide
all necessary information about your response.  If your response is
binomial (with a mean depended on some explanatory variables), then
each response consists of two numbers, the number of trials and the
number of success.  If you calculate the observed proportion of
successes from these two numbers and feed this into glm as the
response, you are omitting necessary information.  In this case, you
should provide the number of trials on which each proportion is based
as prior weights.  For example:

R x - seq(from=-1,to=1,length=41)
R px - exp(x)/(1+exp(x))
R nn - sample(8:12, 41, replace=TRUE)
R yy - rbinom(41, size=nn, prob=px)
R y - yy/nn
R glm(y~x, family=binomial, weights=nn)

Call:  glm(formula = y ~ x, family = binomial, weights = nn) 

Coefficients:
(Intercept)x  
  0.2461.124  

Degrees of Freedom: 40 Total (i.e. Null);  39 Residual
Null Deviance:  91.49 
Residual Deviance: 50.83AIC: 157.6 
R glm(y~x, family=binomial)

Call:  glm(formula = y ~ x, family = binomial) 

Coefficients:
(Intercept)x  
 0.2143   1.1152  

Degrees of Freedom: 40 Total (i.e. Null);  39 Residual
Null Deviance:  9.256 
Residual Deviance: 5.229AIC: 49.87 
Warning message:
In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!

HTH,

Cheers,

Berwin

== Full address 
Berwin A Turlach  Tel.: +61 (8) 6488 3338 (secr)
School of Maths and Stats (M019)+61 (8) 6488 3383 (self)
The University of Western Australia   FAX : +61 (8) 6488 1028
35 Stirling Highway   
Crawley WA 6009e-mail: ber...@maths.uwa.edu.au
Australiahttp://www.maths.uwa.edu.au/~berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] From THE R BOOK - Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!

2010-03-30 Thread Graham Smith
Corrado


 I am afraid not  the paragraph's title is a bit of a give away:

 Proportion Data and Binomial Errors

 The sentence reads:

   are dealt with by using a generalised linear model with a binomial
 error structure.

 with the example:

 glm(y~x,family=binomial)

 You can check at page 514/515.

It would be better to check Chapter 16 (from page 569) on Proportions.
The pages you cite don't come across to me as an example of how this
procedure should be carried out, but rather a trivial example on the
changes in syntax between a linear model and a GLM.

Graham

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] large dataset

2010-03-30 Thread Thomas Lumley

KeithC,

If you're arguing that there should be more documentation and examples 
explaining how to use very large data sets with R, then I agree. Feel free to 
write some.

I've been giving tutorials on this for years now.  I wrote the first netCDF 
interface package for R because I needed to use data that wouldn't fit on a 
64Mb system. I wrote the biglm package to handle out-of-core regression. My 
presentation at the last useR meeting was on how to automatically load 
variables on demand from a SQL connection.

It's still true that you can't treat large data sets and small data sets the 
same way, and I still think that it's even more important to point out that 
nearly everyone doesn't have large data and doesn't need to worry about these 
issues.

   -thomas


On Mon, 29 Mar 2010, kMan wrote:


Dear Thomas,

While it may be true that R (and S) are *accused* of being slow,
memory-hungry, and able to handle only
small data sets (emphasis added), the accusation is  false, rendering the
*accusers* misinformed. Transparency is another, perhaps more interesting
matter. R-users can *experience* R as limited in the ways described above (a
functional limitation) while making a false technical assertion, without
generating a dichotomy. It is a bit like a cell phone example from
human-computer interaction circles in the 90s. The phone could technically
work, provided one is an engineer so as to make sense out of its interface,
while for most people, it may *functionally* be nothing more than a
paperweight. R is not technically limited in the way the accusation reads
(the point I was making), though many users are functionally limited so (the
point you seem to have made or at least passed along).

An R user can get far more data into memory as single objects with R than
with other stats packages; including matlab, JMP, and, obviously, excel.
This is just a simple comparison of the programs' documented environment
size and object limits. The difference in the same read/scan operation
between R and JMP on 600 Mb of data could easily be 25+ minutes (R perhaps
taking 5-7 minutes, with JMP taking 30+ minutes, assuming 1.8GHz  3GB RAM I
used back when I made the comparison that sold me on R). R can do formal
operations with all that data in memory, assuming the environment is given
enough space to work with, while JMP will do the same operation in several
smaller chunks, reference the disk several times, AND on windows machines,
cause the OS to page. In that case, the differences can be upwards of a day.
With the ability to handle larger chunks at once, and direct control over
preventing one's OS from paging, R users should be able to crank out
analyses on very large datasets faster than other programs.

I am perfectly willing to accept that consumers of statistical software may
*experience* R as more limiting, in keeping with the accusations, that the
effect may be larger for newcomers, and even larger for newcomers after
controlling for transparency. I'd expect the effect  to reverse at around 3
years of experience, controlling for transparency or not. Large scale data
may present technical problems many users choose simply to avoid using R
for, so the effect may not reverse for these issues. Even when R is more
than capable of outperforming other programs, its usability (or access to
suitable documentation/training material) apparently isn't currently up to
the challenge. This is something the R community should be gnawing at the
bit to address.

I'd think a consortium of sorts showcasing large-scale data support in R
would be a stellar contribution, and perhaps an issue of R-journal devoted
to the topic, say, of near worst-case scenario - 10Gb of data containing
different data types (categorical, numeric,  embedded matrices), in a .csv
file, header information somewhere else. Now how do the authors explain to
the beginner (say, 1 year experience with I/O) how to tackle getting the
data into a more suitable format, and then how did they analyze it 300Mb at
a time, all using R, in a non-cluster/single user environment, 32 bit, while
controlling for the environment size, missing data, and preventing paging?
How was their solution different when moving to 64 bit? Moving to a cluster?
One of the demos would certainly have to use scan() exclusively for I/O,
perhaps also demonstrating why the 'bad practice' part of working with raw
text files is something more than mere prescription.

Sincerely,
KeithC.

-Original Message-
From: Thomas Lumley [mailto:tlum...@u.washington.edu]
Sent: Monday, March 29, 2010 2:56 PM
To: Gabor Grothendieck
Cc: kMan; r-help; n.via...@libero.it
Subject: Re: [R] large dataset

On Mon, 29 Mar 2010, Gabor Grothendieck wrote:


On Mon, Mar 29, 2010 at 4:12 PM, Thomas Lumley tlum...@u.washington.edu

wrote:

On Sun, 28 Mar 2010, kMan wrote:


This was *very* useful for me when I dealt with a 1.5Gb text file



http://www.csc.fi/sivut/atcsc/arkisto/atcsc3_2007/ohjelmistot_html/R_and_la



Re: [R] From THE R BOOK - Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!

2010-03-30 Thread Robert A LaBudde

At 12:08 PM 3/30/2010, David Winsemius wrote:

snip
I don't understand this perspective. You bought Crowley's book so he
is in some minor sense in debt to you.  Why should you think it is
more appropriate to send your message out to thousands of readers of 
r- help around the world (some of whom have written books that you did

not buy) before sending Crowley a question about his text?


In fairness to Michael Crawley, whose books are useful and very clear 
(although not well-liked on this list for some reason):


1. The example quoted by Corrado Topi is not an actual example. 
Instead is an isolated line of code given to illustrate the 
simplicity of glm() syntax and its relation to lm() syntax. This is 
in a short general topic overview chapter on GLMs meant to introduce 
concepts and terminology, not runnable code.


2. The example chapter is followed in the book by individual 
chapters on each type of GLM covered (count data, count data in 
tables, proportion data, binary response variables). If Corrado Topi 
had looked in the relevant chapter, he would find numerous worked out 
examples with runnable code.


Corrado Topi made an error in trying to run an isolated line of code 
without antecedent definitions, which almost never works in any 
programming system. Michael Crawley made a mistake in judgment in 
assuming that detail later will suffice for generality now.


My advice to Corrado Topi is engage in some forward referencing, and 
read chapters 16 and 17 before deciding which example code to run.



Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: r...@lcfltd.com
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] weighted.median function from package R.basic

2010-03-30 Thread Brian S Cade
While perhaps not the solution you were looking for, you might consider 
estimating weighted medians with linear quantile regression (just specify 
an intercept for single sample analysis, tau=0.50, and weights = your 
weights) in the quantreg package. Quantile regression does not require 
sorting to estimate medians (minimizes and objective function)  and thus 
might require less computing time on a large data set. 

Brian

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  brian_c...@usgs.gov
tel:  970 226-9326



From:
Joris Meys jorism...@gmail.com
To:
R mailing list r-help@r-project.org
Date:
03/30/2010 10:39 AM
Subject:
[R] weighted.median function from package R.basic
Sent by:
r-help-boun...@r-project.org



Dear all,

I want to apply a weighted median on a huge dataset, and I remember a
function from the package R.basic that could do this using an internal
sorting algorithm qsort. This speeded things up quite a bit. Alas, I can't
find that package anywhere anymore. There is a weighted.median function in
the package limma too, but I didn't use that before.

Anybody who knows what happened to  R.basic?

Cheers
Joris

-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] for loop; lm() regressions; list of vectors

2010-03-30 Thread Driss Agramelal
##  Hello everyone,
##
##  I am trying to execute 150 times a lm regression using the 'for' loop,
with 150 vectors for y,
##
##  and always the same vector for x.
##
##  I have an object with 150 elements named a,
##
##  and a vector of 60 values named b.
##
##  Each element in a has 60 values plus a header.
##
##  When I type:

r - lm(i ~ b)

for(i in a) print(r)

##  I get 150 times the lm results of the first element of a regressed
with b,
##
##  whereas I would like to have 150 different regression results from each
element in a...
##
##  Can someone please help me with the syntax of my loop please?
##
##  Many Thanks,
##
##  Driss Agramelal
##
##  Switzerland
##

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multivariate hypergeometric distribution version of phyper()

2010-03-30 Thread Peter Ehlers

Karl,

I strongly support Chuck's recommendations.
If you do still want to compute such probabilities 'by hand',
you could consider the lchoose() function which does work
for your example.

 -Peter Ehlers

On 2010-03-30 9:55, Charles C. Berry wrote:

On Tue, 30 Mar 2010, Karl Brand wrote:


Dear R Users,

I employed the phyper() function to estimate the likelihood that the
number of genes overlapping between 2 different lists of genes is due
to chance. This appears to work appropriately.

Now i want to try this with 3 lists of genes which phyper() does not
appear to support.

Some googling suggests i can utilize the Multivariate hypergeometric
distribution to achieve this. eg.:

http://en.wikipedia.org/wiki/Hypergeometric_distribution

But when i try to do this manually using the choose() function (see
attempt below example with just two gene lists) i'm unable to perform
the calculations- the numbers hit infinity before getting an answer.

Searching cran archives for Multivariate hypergeometric show this
term in the vignettes of package's ‘combinat’ and ‘forward’. But i'm
unable to make sense of the these pachakege functions in the context
of my aforementioned apllication.

Can some one suggest a function, script or method to achieve my goal
of estimating the likelyhood of overlap between 3 lists of genes,
ideally using the multivariate hypergeometric, or anything else for
that matter?



Two suggestions:

1) Don't! Likely the theory is unsuited for the application. In
most applications that generate lists of genes, the genes are
not iid realizations and the hypergeometric gives results that
are astonishingly anticonservative. As an alternative , the
block bootstrap may be suitable. See
http://171.66.122.45/cgi/content/abstract/17/6/760

and Google (scholar) 'genomic block bootstrap' for some
starting points.


2) Take this thread to the bioconductor list. You are much
more likely to get pointers to useful packages and functions
for genomic statistical software there.

HTH,

Chuck




cheers in advance,

Karl



#example attempt with two gene lists m  n
N - 45101 # total number balls in urn
m - 720 # number of 'white' or 'special' balls in urn, aka 'success'
n - 801 # number balls drawn or number of samples
k - 40 # number of 'white' or 'special' balls DRAWN

a - choose(m,k)
b - choose((N-m),(n-k))
z - choose(N,n)
prK - (a*b)/z #'the answer'
print(prK)
[1] NaN


a

[1] 7.985852e+65

b

[1] Inf

z

[1] Inf


--
Karl Brand
Department of Genetics
Erasmus MC
Dr Molewaterplein 50
3015 GE Rotterdam
T +31 (0)10 704 3457 | F +31 (0)10 704 4743 | M +31 (0)642 777 268

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Peter Ehlers
University of Calgary

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] about the possible errors in Rgraphviz Package

2010-03-30 Thread Duncan Murdoch

On 30/03/2010 1:24 PM, HU,ZHENGJUN wrote:

Hi Duncan,

 (They are pretty hard to find, but I think you can find them on 
 the Bioconductor site.)  It is  not enough to install the 
 Rgraphviz package, you also need to install Graphviz.


  Yes I did. Before installing the Rgraphviz package successfully, 
(1) I downloaded graphviz-2.26.3.msi for MS Windows (XP) and 
installed it successfully and (2) I also installed the packages 
from Bioconductor by: (Note: I use MS Windows XP and R 2.10.1 
version)


  


From the instructions:

The right version of Graphviz for Bioconductor 2.5 is version
2.20.3.1.

Duncan Murdoch


source(http://www.bioconductor.org/biocLite.R;)
biocLite()

  I got those error messages:

Error in inDL(x, as.logical(local), as.logical(now), ...) :
  unable to load shared library
 'C:/PROGRA~1/R/R-210~1.1/library/Rgraphviz/libs/Rgraphviz.dll':
   LoadLibrary failure:  The specified module could not be 
found.


  Obviously, it seems it is the package problem because it should 
go to 
'C:/PROGRA~1/R/R-2.10.1/library/Rgraphviz/libs/Rgraphviz.dll' 
instead of

'C:/PROGRA~1/R/R-210~1.1/library/Rgraphviz/libs/Rgraphviz.dll'

  Thank you for the reply. Howard


On Tue Mar 30 12:50:44 EDT 2010, Duncan Murdoch 
murd...@stats.uwo.ca wrote:


 On 30/03/2010 10:44 AM, HU,ZHENGJUN wrote:
 Hi All,
 
   I tried to install the package of Rgraphviz in the following 
 two ways successfully:
 
 source(http://bioconductor.org/biocLite.R;)

 biocLite(Rgraphviz)
 
 install.packages(pkgs=C:/Progra~1/R/lib_download/Rgraphviz_1.24.0.zip, 
 lib=C:/Progra~1/R/R-2.10.1/library, repos=NULL)
 
 but when I loaded the package though library(Rgraphviz) or 
 library(Rgraphviz), and got the same error message below:
 
 Error in inDL(x, as.logical(local), as.logical(now), ...) :
   unable to load shared library 
 'C:/PROGRA~1/R/R-210~1.1/library/Rgraphviz/libs/Rgraphviz.dll':

   LoadLibrary failure:  The specified module could not be found.
 
 
 Most likely the problem is that you haven't followed the 
 installation instructions.  (They are pretty hard to find, but I 
 think you can find them on the Bioconductor site.)  It is not 
 enough to install the Rgraphviz package, you also need to install 
 Graphviz.
 
 Duncan Murdoch
 I think that it is the error in the package because it should go 
 to 'C:/PROGRA~1/R/R-2.10.1/library/Rgraphviz/libs/Rgraphviz.dll' 
 instead of 
 'C:/PROGRA~1/R/R-210~1.1/library/Rgraphviz/libs/Rgraphviz.dll'
 
 Could anyone help me to solve to problem?

 Thank you very much for the help. Howard
 
 __

 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible 
 code.
 
 
 
 




--
HU,ZHENGJUN



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] MySQL and RODBC - limitations

2010-03-30 Thread jorgusch

I found the solution.
The problem was indeed R. 

Their is a simple way to solve the problem, but it just needs a bit more
time. 
If you download large integers from a database, convert it on the fly with

SELECT CONVERT(yourcolumn,char)

That is it. This is nor problem, as long you do NO comparisons within this
columns. If you want to find something like entry10entry11 ('13''2') than
the result will be wrong, if both values do not have the same number of
characters. Hence, if you have numbers, you must fill up the empty slotes
with zeros. So it would look like:  '13''02'.
-- 
View this message in context: 
http://n4.nabble.com/MySQL-and-RODBC-limitations-tp1692743p1745570.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] dataframe in loop

2010-03-30 Thread Muting Zhang

hello all:

I would like to thank those who helped me out of the string  
problem..but now I got another problem.

I used R to query from SQL and got a list of crsp_fundno of G-style mutual
funds which is still alive. I use the following codes and got what I want:

library(RODBC)
channel-odbcConnect(CRSPFUND)
g.crspfundno-sqlQuery(channel,select crsp_fundno from Fund_style where
wbrger_obj_cd = 'G'order by crsp_fundno)
g.crspfundno (got crsp_fundno of G-style fund from Fund_style table)
y.crspfundno-sqlQuery(channel,select crsp_fundno from Fund_hdr where  
dead_flag

= 'N'and end_dt=20091231 order by crsp_fundno)
y.crspfundno (got crsp_fundno of still alive fund from Fund_hdr table)
g$key-paste(g.crspfundno$crsp_fundno)
y$key-paste(y.crspfundno$crsp_fundno)
v.fundno-intersect(g$key,y$key) (using intersect to get crsp_fundno  
of G-style

mutual funds which is still alive.)
v.fundno

What i need to do next is using the v.fundno I got to query from another table
Monthly_return to get the mret coresponding to every v.fundno.
I have only a basic idea of the code:
for (i in 1:length(v.fundno)){
gmret-sqlQuery(channel,paste(select mret from Monthly_returns where  
crsp_fundno =,test[i],'and caldt  19900630 order by caldt'))

}

The loop doesn't work:( I realize it might be the problem that I didnt  
define the dataframe, but my limited knowledge cant help me find out  
how..


I will give you guys a example of my data:
head(v.fundno)
test-head(v.fundno)
test
[1] 2899 2903 2960 3094 3095 3211
If I dont do the loop and query for one fund say 2899,
gmret.2899-sqlQuery(channel,select caldt, mret from Monthly_returns  
where crsp_fundno = 2899 and caldt  19900630 order by caldt)

gmret.2899
It will give me what I want:
 sample2899-head(gmret.2899)
 sample2899
 caldt mret
1 19900731  0.014204546
2 19900831 -0.050420168
3 19900928 -0.039823009
4 19901031  0.006144393
5 19901130  0.054961832
6 19901231  0.019632639

Can anybody help me with the loop?
Thanks a lot

Muting

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] MySQL and RODBC - limitations

2010-03-30 Thread Duncan Murdoch

On 30/03/2010 1:35 PM, jorgusch wrote:

I found the solution.
The problem was indeed R. 


Their is a simple way to solve the problem, but it just needs a bit more
time. 
If you download large integers from a database, convert it on the fly with


SELECT CONVERT(yourcolumn,char)

That is it. This is nor problem, as long you do NO comparisons within this
columns. If you want to find something like entry10entry11 ('13''2') than
the result will be wrong, if both values do not have the same number of
characters. Hence, if you have numbers, you must fill up the empty slotes
with zeros. So it would look like:  '13''02'.
  


If your longest integer is 10 digits (as mentioned earlier), you might 
do better to convert them to doubles rather than char.  I don't know how 
to say double in mySQL, but if you can figure that out, you should be 
good to about 15 digits.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] open help files in browser

2010-03-30 Thread Martin Batholdy
Hi,


Is there a way to open help files in the default web browser instead of a new 
R-window
when I use the help-functions (like ?, help.search() etc.)?



thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error singular gradient matrix at initial parameterestimates in nls

2010-03-30 Thread Bert Gunter
Your model is almost certainly over-parameterized (given the data that you
have to fit it), and the asymptotic correlation matrix of the parameters
that you should get from the solutions that converged will probably have
some large off diagonal elements. In other words, your model is essentially
non-identifiable.

If you don't know what the above means, you shouldn't be using nls.

Bert Gunter
Genentech Nonclinical Biostatistics
 
 
-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Gabor Grothendieck
Sent: Tuesday, March 30, 2010 4:25 AM
To: Corrado
Cc: r-help@r-project.org
Subject: Re: [R] Error singular gradient matrix at initial
parameterestimates in nls

You could try method=brute-force in the nls2 package to find starting
values.

On Tue, Mar 30, 2010 at 7:03 AM, Corrado ct...@york.ac.uk wrote:
 I am using nls to fit a non linear function to some data.

 The non linear function is:

 y= 1- exp(-(k0+k1*p1+  + kn*pn))

 I have chosen algorithm port, with lower boundary is 0 for all of the ki
 parameters, and I have tried many start values for the parameters ki
 (including generating them at random).

 If I fit the non linear function to the same data using an external
 algorithm, it fits perfectly and finds the parameters.

 As soon as I come to my R installation (2.10.1 on Kubuntu Linux 910 64
bit),
 I keep getting the error:

 Error in nlsModel(formula, mf, start, wts, upper) :   singular gradient
 matrix at initial parameter estimates

 I have read all the previous postings and the documentation, but to no
 avail: the error is there to stay. I am sure the problem is with nls,
 because the external fitting algorithm perfectly fits it in less than a
 second. Also, if my n is 4, then the nls works perfectly (but that
excludes
 all the k5  kn).

 Can anyone help me with suggestions? Thanks in advance.

 Alternatively, what do you suggest I should do? Shall I abandon nls in
 favour of optim?

 Regards

 --
 Corrado Topi
 PhD Researcher
 Global Climate Change and Biodiversity
 Area 18,Department of Biology
 University of York, York, YO10 5YW, UK
 Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Code is too slow: mean-centering variables in a data frame bysubgroup

2010-03-30 Thread Bert Gunter
?scale

Bert Gunter
Genentech Nonclinical Biostatistics
 
 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Dimitri Liakhovitski
Sent: Tuesday, March 30, 2010 8:05 AM
To: r-help
Subject: [R] Code is too slow: mean-centering variables in a data frame
bysubgroup

Dear R-ers,

I have  a large data frame (several thousands of rows and about 2.5
thousand columns). One variable (group) is a grouping variable with
over 30 levels. And I have a lot of NAs.
For each variable, I need to divide each value by variable mean - by
subgroup. I have the code but it's way too slow - takes me about 1.5
hours.
Below is a data example and my code that is too slow. Is there a
different, faster way of doing the same thing?
Thanks a lot for your advice!

Dimitri


# Building an example frame - with groups and a lot of NAs:
set.seed(1234)
frame-data.frame(group=rep(paste(group,1:10),10),a=rnorm(1:100),b=rnorm(1
:100),c=rnorm(1:100),d=rnorm(1:100),e=rnorm(1:100),f=rnorm(1:100),g=rnorm(1:
100))
frame-frame[order(frame$group),]
names.used-names(frame)[2:length(frame)]
set.seed(1234)
for(i in names.used){
   i.for.NA-sample(1:100,60)
   frame[[i]][i.for.NA]-NA
}
frame

### Code that does what's needed but is too slow:
Start-Sys.time()
frame - do.call(cbind, lapply(names.used, function(x){
  unlist(by(frame, frame$group, function(y) y[,x] / mean(y[,x],na.rm=T)))
}))
Finish-Sys.time()
print(Finish-Start) # Takes too long

-- 
Dimitri Liakhovitski
Ninah.com
dimitri.liakhovit...@ninah.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] open help files in browser

2010-03-30 Thread Henrik Bengtsson
If you're not already using R v2.10.0 or newer, try that first.

My $.02

/Henrik

On Tue, Mar 30, 2010 at 7:46 PM, Martin Batholdy
batho...@googlemail.com wrote:
 Hi,


 Is there a way to open help files in the default web browser instead of a new 
 R-window
 when I use the help-functions (like ?, help.search() etc.)?



 thanks!

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >