date:20090120

Re: [R] Can't find -lg2c when installing randomForest

2009-01-20 Thread Prof Brian Ripley


How did you install R?

This looks like an issue where R was built wiht g77 bu that is no 
longer present.


A system using gcc 4.1.2 should be using gfortran not g77 (for 
compatibility).


R 2.7.1 is obsolete: I suggest you install a current R from the 
sources (or a source RPM).


On Tue, 20 Jan 2009, Richard Yanicky wrote:


I have search the help archives and can't find a direct reference to the
following issue:



When installing randomForest on under CentOS 5.2 , R version 2.7.1 with gcc
4.1.2.



We receive the following error (see below, can't find ?lg2c) it is in the
path!



r...@abcsci12 ~]# R CMD INSTALL
/scisys/home/yanicrk/randomForest_4.5-28.tar.gz

* Installing to library '/usr/lib64/R/library'

* Installing *source* package 'randomForest' ...

** libs

gcc -I/usr/lib64/R/include  -I/usr/local/include-fpic  -O2 -g -std=gnu99
-c classTree.c -o classTree.o

gcc -I/usr/lib64/R/include  -I/usr/local/include-fpic  -O2 -g -std=gnu99
-c regrf.c -o regrf.o

gcc -I/usr/lib64/R/include  -I/usr/local/include-fpic  -O2 -g -std=gnu99
-c regTree.c -o regTree.o

gcc -I/usr/lib64/R/include  -I/usr/local/include-fpic  -O2 -g -std=gnu99
-c rf.c -o rf.o

g77   -fpic  -O2 -g -c rfsub.f -o rfsub.o

gcc -I/usr/lib64/R/include  -I/usr/local/include-fpic  -O2 -g -std=gnu99
-c rfutils.c -o rfutils.o

gcc -shared -Wl,-O1 -o randomForest.so classTree.o regrf.o regTree.o rf.o
rfsub.o rfutils.o  -lg2c -lm -L/usr/lib64/R/lib -lR

/usr/bin/ld: cannot find -lg2c

collect2: ld returned 1 exit status

make: *** [randomForest.so] Error 1

ERROR: compilation failed for package 'randomForest'

** Removing '/usr/lib64/R/library/randomForest'



Any assistance would be greatly appreciated.



Rich

[[alternative HTML version deleted]]




--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] A question on histogram (hist): coordinates on x-axis are too sparse

2009-01-20 Thread Li, Hua

Dear R helpers:
 Let's say I have some data  X,   
 X <- runif(1000, 1, 100)
 pdf('X.pdf', width=100,height=5)
 hist(X, breaks=1000)
 dev.off()
 I find that, on x-axis the coordinates are  0e+00,  2e+09, 4e+09, 6e+09, 
8e+09, 1e+10.  Only five numbers, which is too sparse in a 100x5 pdf file.  I 
want the x-axis coordinates to become more dense,  e.g.  0e+00,  1e+09, 2e+09, 
3e+09,. 8e+09, 9e+09,  1e+10.  What argument (or function) should I revise 
to let this happen??
 Thanks a lot!!
Best, Hua
 
***
HUA LI
Graduate student
Biomathematics & Biostatistics
The University of Texas Health Science Center At Houston

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] problem with rbind

2009-01-20 Thread SNN


Hi All,

I have a problem with rbind.
I have data that consist of weight height .. etc of 1000 patients. I would
like to find the mean and the standard deviation ( for the weight , height
etc)  for each gender.


data<-read.table("data.txt", header=T, sep='\t')
fdata=NULL

for (i in 1:50){

nn<-names(X)[i]

m<-tapply(X[,i],data$gender,mean,na.rm=T)
s<-tapply(X[,i], data$gender, sd,na.rm=T)

p<-cbind(mean=m,sd.dev=s)

cn<-paste(nn,colnames(p),sep="_")
 
colnames(p)<-cn


fdata<-rbind(fdata,p)
} 
write.table(fdata, “results.txt”,sep=’\t’,quote=FALSE, col.names=T)


here is the problem, 
1.  I have a header for each table but only the first one is printed.
2.  the weight_mean is suppose to be on the top of the means but it appears
on the top of the first column ( with no tab before the header)

weight_meanweight_sd.dev 
F  14.3  4.932883 
M  34.7 10.692677 
F  35.0  7.071068 
M  34.7 10.692677 
.
.
.

I want the result to look like this with a line separating each table and
each table has a header

   weight_meanweight_sd.dev 
 F 14.3  4.932883 
M 34.7 10.692677 

   hight_meanhight_sd.dev
F 35.0  7.071068 
M 34.7 10.692677 

3.Is there a way to make a title for each table, for example
   
weight
weight_meanweight_sd.dev 
 F 14.3  4.932883 
M 34.7 10.692677 



I appreciate your help,


-- 
View this message in context: 
http://www.nabble.com/problem-with-rbind-tp21577241p21577241.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] trouble converting an array to a dataframe

2009-01-20 Thread Christopher W. Ryan

I start with a dataframe called xrays. It contains scores on films from
each of two radiologists. It is in "long" format. I used the reshape
package to melt a data frame and then cast it into "wide" format, one
line for each patient (identified by redlognumb) with scores from both
radiologists for a given patient on the same line.

I named the result of the casting xrays.data. It is an array. I'd like
it to be a three-variable dataframe, with one column for scores from
each of two radiologists, and one column for redlognumb (because I will
then need to merge it with another dataframe that has a column named
redlognumb.) As you can see below, the data.frame() function turns
xrays.data into a two-variable dataframe. How can I get three columns
(or variables) into my final dataframe?

Thanks.


> head(xrays)
  redlognumb radiologis barrtotal
1  3  213
2  4  216
3  5  210
4  6  211
5  9  2NA
6 10  2NA

>melted.xrays <- melt(xrays, id=c("redlognumb","radiologis"))
>
> head(melted.xrays <- na.omit(melt(xrays,
id=c("redlognumb","radiologis"

  redlognumb radiologis  variable value
1  3  2 barrtotal13
2  4  2 barrtotal16
3  5  2 barrtotal10
4  6  2 barrtotal11
7  1  1 barrtotal11
8  2  1 barrtotal 2
>

> cast(melted.xrays.2, redlognumb~radiologis~variable)

  radiologis
redlognumb  1  2
1  11 NA
2   2 NA
3  12 13
4  16 16
5  12 10
   . .  . cut off for brevity . . .

> str(xrays.data)
 int [1:42, 1:2, 1] 11 2 12 16 12 13 18 8 19 14 ...
 - attr(*, "dimnames")=List of 3
  ..$ redlognumb: Named chr [1:42] "1" "2" "3" "4" ...
  .. ..- attr(*, "names")= chr [1:42] "1" "2" "3" "5" ...
  ..$ radiologis: Named chr [1:2] "1" "2"
  .. ..- attr(*, "names")= chr [1:2] "1" "80"
  ..$ variable  : Named chr "barrtotal"
  .. ..- attr(*, "names")= chr "1"
> data.frame(xrays.data)
   X1.barrtotal X2.barrtotal
111   NA
2 2   NA
312   13
416   16
512   10
613   11
718   NA
8 8   NA
. . . . cut off for brevity . . .


-- 
Christopher W. Ryan, MD
SUNY Upstate Medical University Clinical Campus at Binghamton
40 Arch Street, Johnson City, NY  13790
cryanatbinghamtondotedu
PGP public keys available at http://home.stny.rr.com/ryancw/

"If you want to build a ship, don't drum up the men to gather wood,
divide the work and give orders. Instead, teach them to yearn for the
vast and endless sea."  [Antoine de St. Exupery]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Concave Hull

2009-01-20 Thread roger koenker


Actually,  I think that the survey on "alpha shapes" available from:

http://www.cs.duke.edu/~edels/Surveys/

would be more closely aligned with what Michael was interested in...


url:www.econ.uiuc.edu/~rogerRoger Koenker
email   rkoen...@uiuc.edu   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On Jan 20, 2009, at 6:06 PM, David Winsemius wrote:

The OP was asking whether concave hulls have been implemented. He  
wasn't very helpful with his link giving the example, since it was  
to the "outside" of a frame-based website. Perhaps this link (see  
the bottom of that page) will be more helpful:


http://get.dsi.uminho.pt/local/results.html

It has been discussed (briefly) in r-help:
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/75574.html

Some of the material in Loader's "Local Regression and Likelihood"  
regarding classification looks potentially applicable.


When first I saw this question I expected that one of the r-sig-Geo  
folks would have a ready answer. Perhaps a follow-up there would be  
a reasonable next step?


--
David Winsemius

On Jan 20, 2009, at 6:37 PM, Charles Geyer wrote:


Message: 64
Date: Mon, 19 Jan 2009 15:14:34 -0700
From: Greg Snow 
Subject: Re: [R] Concave Hull
To: Michael Kubovy , r-help

Message-ID:

Content-Type: text/plain; charset="us-ascii"

I don't know if it is the same algorithm or not, but there is the  
function "chull" that finds the convex hull.


Also the R function "redundant" in the contributed package "rcdd"  
efficiently
finds convex hulls in d-dimensional space for arbitrary d (chull  
only does

d = 2).  See Sections 4.2 and 5.2 of the rcdd package vignette.


Hope this helps,

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
project.org] On Behalf Of Michael Kubovy
Sent: Saturday, January 17, 2009 9:49 AM
To: r-help
Subject: [R] Concave Hull

Dear Friends,

Here is an algorithm for finding concave hulls:
http://get.dsi.uminho.pt/local/

Has anyone implemented such an algorithm in R?

RSiteSearch('concave hull') didn't reveal one (I think).

_
Professor Michael Kubovy
University of Virginia
Department of Psychology
Postal Address:
P.O.Box 400400, Charlottesville, VA 22904-4400
Express Parcels Address:
Gilmer Hall, Room 102, McCormick Road, Charlottesville, VA 22903
Office:B011;Phone: +1-434-982-4729
Lab:B019;   Phone: +1-434-982-4751
WWW:http://www.people.virginia.edu/~mk9y/
Skype name: polyurinsane





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Serious difference between the result of ADF test in R and Eviews

2009-01-20 Thread RON70


I found there is serious difference between the result of ADF test in R and
Eviews, for this data :

> dat
 V1
1  -0.075851693
2  -0.046125504
3  -0.009117161
4   0.025569817
5   0.034882743
6   0.073671497
7   0.063805297
8   0.062306796
9   0.072343820
10  0.058354121
11 -0.007635359
12  0.086790779
13  0.085487789
14  0.113577103
15  0.021293381
16  0.089423068
17  0.090485998
18  0.128847827
19  0.011859335
20  0.058794744
21  0.065909368
22  0.020887431
23  0.085387467
24  0.097375525
25  0.108981417
26  0.044289044
27  0.071428571
28  0.052430556
29  0.056307049
30  0.041957314

R results says, Unit root hypothesis can not be rejected :

> adf.test(dat, k=1)

Augmented Dickey-Fuller Test

data:  dat 
Dickey-Fuller = -3.5458, Lag order = 1, p-value = 0.0554
alternative hypothesis: stationary 


However below the Eviews result

Null Hypothesis: SERIES02 has a unit root   
Exogenous: Constant 
Lag Length: 1 (Fixed)   

t-Statistic   Prob.*

Augmented Dickey-Fuller test statistic  -3.9925600.0048
Test critical values:   1% level-3.689194   
5% level-2.971853   
10% level   -2.625121   

*MacKinnon (1996) one-sided p-values.   


Augmented Dickey-Fuller Test Equation   
Dependent Variable: D(SERIES02) 
Method: Least Squares   
Date: 01/21/09   Time: 09:16
Sample (adjusted): 3 30 
Included observations: 28 after adjustments 

VariableCoefficient Std. Error  t-Statistic Prob.  

SERIES02(-1)-0.728468   0.182456-3.992560   0.0005
D(SERIES02(-1)) -0.182826   0.154154-1.185993   0.2468
C   0.0460940.0121523.7930430.0008

R-squared   0.510689Mean dependent var  0.003146
Adjusted R-squared  0.471545S.D. dependent var  0.047487
S.E. of regression  0.034520Akaike info criterion   
-3.793578
Sum squared resid   0.029791Schwarz criterion   
-3.650842
Log likelihood  56.11010F-statistic 13.04615
Durbin-Watson stat  2.127111Prob(F-statistic)   0.000132

Therefore Eviews rejects the null of unit root. 

Have you face this kind of problem earlier or I am missing something?
-- 
View this message in context: 
http://www.nabble.com/Serious-difference-between-the-result-of-ADF-test-in-R-and-Eviews-tp21576706p21576706.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Handling of factors

2009-01-20 Thread Stavros Macrakis

I'm rather confused by the semantics of factors.

When applied to factors, some functions (whose results are elements of
the original factor argument) return results of class factor, some
return integer vectors, some return character vectors, some give
errors.  I understand some but not all of this.  Consider:

Preserve factors: `[`, `[[`, sort, unique, subset, head, tapply, rep, rev, by,
  sample, expand.grid,
as.matrix(structure(factor(1:3),dim=c(1,3))), data.frame, list
Convert to integers: c, ifelse, cbind/rbind
Convert to characters: intersect, union, setdiff, matrix, array,
matrix(factor(1:3),1,3),
  as.matrix(factor(1:3))
Gives error: rle
No error (output of some other type): <, ==, etc.

In the case of ordered factors:

Preserve factors: quantile (for exact quantiles only)
Gives error: min, cut, range
No error: which.min, pmin, rank
(But some operations which are meaningful only on ordered factors also
give results on unordered factors, without even a warning: which.min,
pmin, rank, quantile.)

The general principle seems to be that if the result can contain only
elements of a single factor, then a factor is returned.  I understand
this: it may not be meaningful to mingle factors with different level
sets.  But I don't understand what the problem is with rle.

If the result can contain elements from more than one factor, it is
still not clear to me what the principle is for determining whether
the factors are converted to the integers representing them, or to the
characters naming them, or that the operation gives an error.

I also don't understand what is going on with min. min is well-defined
for any class supporting a < operator, but though < works on ordered
factors as do pmin, rank, etc., min does not.  And equally strangely,
which.min and rank blithely convert *un*ordered factors to the
integers which happen to represent them, returning what are presumably
meaningless results without giving an error; while pmin appropriately
gives an error.

It is all very confusing.  Of course, most of this behavior is
documented and is easily determined by experimentation, but it would
be easier to learn and teach the language if there were some clear
principle underlying all this.  What am I missing?

  -s

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Text Outside Lattice Plot

2009-01-20 Thread jimdare


Dear R users

I created the graph at the bottom using xyplot in the lattice package.  I
added a title using the main="Title"  command in xyplot, however it is
plotted too close to the legend for my liking.  To remedy this I increased
the upper margin of the plot using plot(data, position = c(0,0,1,.9)) and
attempted to move "SNA" upwards and to the right.  I have tried using a
variety of text functions such as:

trellis.focus("panel", 1, 1) 
panel.text(x=11, y=10, labels="SNA") 
trellis.unfocus() 

panel.xyplot(...) 
panel.text(x=11, y=10, labels="SNA") 

library(grid) 
ltext(grid.locator(), label='SNA') 

The first two of these functions work but the text disappears once I specify
a y coordinate > ymax.  The last function appears to work but requires me to
click on the plot to specify the location (I need this to be pre-defined). 
Does anyone know how I can do this?

Regards,
James
http://www.nabble.com/file/p21575690/SNA.gif 
-- 
View this message in context: 
http://www.nabble.com/Text-Outside-Lattice-Plot-tp21575690p21575690.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tclarray with embedded spaces in data

2009-01-20 Thread Peter Dalgaard


Ruth M. Ripley wrote:

I would like to use a tclArray:

mytkarray <- tclArray()

as the variable for a table:

table1 <- tkwidget(f1, 'table', variable= mytkarray)

but if I include character strings with embedded spaces, I get braces
appearing in the table.

I can remove them using a non-R tclarray, (the difference between the
first example of

http://www.sciviews.org/_rgui/tcltk/Tktable.html

and

http://bioinf.wehi.edu.au/~wettenhall/RTclTkExamples/tktable.html

- the result of the second of which includes the braces (although not in
the image provided!) and that of the first does not - but this seems
undesirably complicated. Am I missing something simple?

Any insights would be gratefully received,


(Didn't anyone tell you to cook up a selfcontained example?)

This is a right-honourable pain with Tcl, and it took me quite some time 
to reconstruct what I did several years ago on this matter. The short 
answer is that you need to assign things like


as.tclObj("foo bar   baz", drop=TRUE)

into your array. The longer story involves the ambiguity in Tcl between 
lists of words separated by whitespace and strings with spaces inside. I 
suspect that the sciviews approach doesn't actually cover all cases, 
paste()'ing raw tcl commands together usually leaves you burning in 
Quoting Hell.





Regards,

Ruth
--
Ruth M. Ripley, Email:r...@stats.ox.ac.uk
Dept. of Statistics,http://www.stats.ox.ac.uk/~ruth/
University of Oxford,   Tel:   01865 282851
1 South Parks Road, Oxford OX1 3TG, UK  Fax:   01865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - (p.dalga...@biostat.ku.dk)  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Concave Hull

2009-01-20 Thread David Winsemius

The OP was asking whether concave hulls have been implemented. He  
wasn't very helpful with his link giving the example, since it was to  
the "outside" of a frame-based website. Perhaps this link (see the  
bottom of that page) will be more helpful:


http://get.dsi.uminho.pt/local/results.html

It has been discussed (briefly) in r-help:
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/75574.html

Some of the material in Loader's "Local Regression and Likelihood"  
regarding classification looks potentially applicable.


When first I saw this question I expected that one of the r-sig-Geo  
folks would have a ready answer. Perhaps a follow-up there would be a  
reasonable next step?


--
David Winsemius

On Jan 20, 2009, at 6:37 PM, Charles Geyer wrote:


Message: 64
Date: Mon, 19 Jan 2009 15:14:34 -0700
From: Greg Snow 
Subject: Re: [R] Concave Hull
To: Michael Kubovy , r-help

Message-ID:

Content-Type: text/plain; charset="us-ascii"

I don't know if it is the same algorithm or not, but there is the  
function "chull" that finds the convex hull.


Also the R function "redundant" in the contributed package "rcdd"  
efficiently
finds convex hulls in d-dimensional space for arbitrary d (chull  
only does

d = 2).  See Sections 4.2 and 5.2 of the rcdd package vignette.


Hope this helps,

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
project.org] On Behalf Of Michael Kubovy
Sent: Saturday, January 17, 2009 9:49 AM
To: r-help
Subject: [R] Concave Hull

Dear Friends,

Here is an algorithm for finding concave hulls:
http://get.dsi.uminho.pt/local/

Has anyone implemented such an algorithm in R?

RSiteSearch('concave hull') didn't reveal one (I think).

_
Professor Michael Kubovy
University of Virginia
Department of Psychology
Postal Address:
P.O.Box 400400, Charlottesville, VA 22904-4400
Express Parcels Address:
Gilmer Hall, Room 102, McCormick Road, Charlottesville, VA 22903
Office:B011;Phone: +1-434-982-4729
Lab:B019;   Phone: +1-434-982-4751
WWW:http://www.people.virginia.edu/~mk9y/
Skype name: polyurinsane





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with subset() function?

2009-01-20 Thread Steven McKinney


D'oh!  My apologies for the noise.

I thought I had verified class
from the str() output the user was 
showing me.  

> class(subset(mydf, ht >= 150.0 & wt <= 150.0, select = c(age)))
[1] "data.frame"
> class(subset(mydf, ht >= 150.0 & wt <= 150.0, select = c(age), drop = TRUE))
[1] "integer"
> class(mydf[mydf$ht >= 150.0 & mydf$wt <= 150.0, "age"])
[1] "integer"
> density(subset(mydf, ht >= 150.0 & wt <= 150.0, select = c(age), drop = TRUE))

Call:
density.default(x = subset(mydf, ht >= 150 & wt <= 150, select = 
c(age), drop = TRUE))

Data: subset(mydf, ht >= 150 & wt <= 150, select = c(age), drop = TRUE) (29 
obs.);  Bandwidth 'bw' = 5.816

   xy
 Min.   : 4.553   Min.   :3.781e-05  
 1st Qu.:22.776   1st Qu.:3.108e-03  
 Median :41.000   Median :1.775e-02  
 Mean   :41.000   Mean   :1.370e-02  
 3rd Qu.:59.224   3rd Qu.:2.128e-02  
 Max.   :77.447   Max.   :2.665e-02  
> 



It's the "drop" arg that differs between
 density(subset(mydf, ht >= 150.0 & wt <= 150.0, select = c(age)))
and
 density(mydf[mydf$ht >= 150.0 & mydf$wt <= 150.0, "age"])

so it is
 subset(mydf, ht >= 150.0 & wt <= 150.0, select = c(age), drop = TRUE)
that is equivalent to
 mydf[mydf$ht >= 150.0 & mydf$wt <= 150.0, "age"]


Apologies and thanks for setting me straight.


Best

Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: smckinney +at+ bccrc +dot+ ca

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3
Canada




-Original Message-
From: Marc Schwartz [mailto:marc_schwa...@comcast.net]
Sent: Tue 1/20/2009 3:20 PM
To: Steven McKinney
Cc: R-help@r-project.org
Subject: Re: [R] Problem with subset() function?
 
on 01/20/2009 05:02 PM Steven McKinney wrote:
> Hi all,
> 
> Can anyone explain why the following use of
> the subset() function produces a different
> outcome than the use of the "[" extractor?
> 
> The subset() function as used in
> 
>  density(subset(mydf, ht >= 150.0 & wt <= 150.0, select = c(age)))

Here you are asking density to be run on a data frame, which is what
subset returns, even when you select a single column. Thus, you get an
error since density() expects a numeric vector.

No bug in either subset() or the documentation.

You could do this:

  density(subset(mydf, ht >= 150.0 & wt <= 150.0, select = age)[[1]])


> appears to me from documentation to be equivalent to
> 
>  density(mydf[mydf$ht >= 150.0 & mydf$wt <= 150.0, "age"])

Here you are running density on a vector, so it works. This is because
the default behavior for "[.data.frame" has 'drop = TRUE', which means
that the returned result is coerced to the lowest possible dimension.
Thus, rather than a single data frame column, a vector is returned.

The result from subset() would be equivalent to using 'drop = FALSE'.

HTH,

Marc Schwartz


> (modulo exclusion of NAs) but use of the former yields an 
> error from density.default() (shown below).
> 
> 
> Is this a bug in the subset() machinery?  Or is it
> a documentation issue for the subset() function
> documentation or density() documentation?
> 
> I'm seeing issues such as this with newcomers to R
> who initially seem to prefer using subset() instead
> of the bracket extractor.  At this point these functions
> are clearly not exchangeable.  Should code be patched
> so that they are, or documentation amended to show
> when use of subset() is not appropriate?
> 
>> ### Bug in subset()?
> 
>> set.seed(123)
>> mydf <- data.frame(ht = 150 + 10 * rnorm(100),
> +wt = 150 + 10 * rnorm(100),
> +age = sample(20:60, size = 100, replace = TRUE)
> +)
> 
> 
>> density(subset(mydf, ht >= 150.0 & wt <= 150.0, select = c(age)))
> Error in density.default(subset(mydf, ht >= 150 & wt <= 150, select = 
> c(age))) : 
>   argument 'x' must be numeric
> 
> 
>> density(mydf[mydf$ht >= 150.0 & mydf$wt <= 150.0, "age"])
> 
> Call:
>   density.default(x = mydf[mydf$ht >= 150 & mydf$wt <= 150, "age"])
> 
> Data: mydf[mydf$ht >= 150 & mydf$wt <= 150, "age"] (29 obs.); Bandwidth 'bw' 
> = 5.816
> 
>xy
>  Min.   : 4.553   Min.   :3.781e-05  
>  1st Qu.:22.776   1st Qu.:3.108e-03  
>  Median :41.000   Median :1.775e-02  
>  Mean   :41.000   Mean   :1.370e-02  
>  3rd Qu.:59.224   3rd Qu.:2.128e-02  
>  Max.   :77.447   Max.   :2.665e-02  
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Poisson GLM

2009-01-20 Thread bbbnc


I thought that quasipoisson family was used to model overdispersion, since
the dispersion parameter isn't fixed at one. Could you please elaborate a
little about why quasipoisson is more suitable for this non-integer poisson
data?

also, is it significant that vcov shows a difference?



-- 
View this message in context: 
http://www.nabble.com/Poisson-GLM-tp21567460p21573048.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] trouble switching to 'plm' from 'xtabond' and Stata

2009-01-20 Thread Aaron M. Swoboda


Hello,

I am switching to R from Stata and I am having particular trouble with  
the transition from Stata's 'xtabond' and 'ivreg' commands to the  
"plm" package. I am trying to replicate some of the dynamic panel data  
work using the UK Employment data in Arellano and Bond (1991) and  
available as 'EmplUK' under the 'plm' package.


I have been reading "Panel Data Econometrics in R: The plm Package" by  
Croissant and Millo available at
http://cran.r-project.org/web/packages/plm/vignettes/plm.pdf and "How  
to Do xtabond2: An Introduction to 'Difference' and 'System' GMM in  
Stata" by David Roodman available at http://www.cgdev.org/content/publications/detail/11619 
.  Roodman provides a very clear exposition of how to use Stata to  
analyze the UK Employment Data. I am trying to replicate Roodman's  
results for the UK Employment data using R instead of Stata but I am  
having limited success.


Using:
>library('plm')
>data("EmplUK", package = "plm")
>emp.plm <- plm(dynformula(emp ~ wage + capital + output, lag =  
list(2, 1, 2, 2), log = TRUE), EmplUK, effect = "time")

>summary(emp.plm)

I am able to perfectly replicate Roodman's "naive model" (on page 17)  
regressing Log(Employment) on its own first and second lags as well as  
current and first lags of log(wages) and current/first/second lags of  
capital and output. Roodman uses the Stata command "regress n nL1 nL2  
w wL1 k kL1 kL2 ys ysL1 ysL2 yr*" (n=employment, w=wages, k=capital,  
ys=output, yr*=year dummy variables, and nL1=first Lag of employment).


I am unable to replicate other results. Specifically, I cannot even  
replicate the Least Squares Dummy Variable model with effects for both  
time and firm (in Stata: xi: regress n nL1 nL2 w wL1 k kL1 kL2 ys ysL1  
ysL2 yr* i.id)


In R I tried:
>emp.lsdv <- plm(dynformula(emp ~ wage + capital + output, lag =  
list(2, 1, 2, 2), log = TRUE), EmplUK, model="within", effect =  
"twoways")

>summary(emp.lsdv)

but the coefficients do not match up with results shown on p 18 of  
Roodman. Can someone help point out what I am doing incorrectly?


Can anyone help me implement a First Differences model that also  
includes Year specific effects? First Differencing eliminates the  
individual effects, but I should still be able to add year specific  
effects, no? When I run the commands:


>emp.fd <- plm(dynformula(emp ~ wage + capital + output, lag =  
list(2, 1, 2, 2), log = TRUE),  EmplUK, model="fd", effect = "time")

>summary(emp.fd)

the output says it is running a "time" effect First-Difference Model,  
but I am unable to extract any time effects, nor can I find any

differences with the output from:

>emp.fdid <- plm(dynformula(emp ~ wage + capital + output, lag =  
list(2, 1, 2, 2), log = TRUE),  EmplUK, model="fd", effect =  
"individual")

>summary(emp.fdid)

What am I missing? Even the degrees of freedom appear the same to me.

Eventually, I would like to understand how to implement instrumental  
variables in the dynamic panel setting using General Method of Moments  
using R rather than Stata, but it seems I have quite a ways to go to  
better understand how 'plm' works. Any other resources anyone could  
point me to would be appreciated.


Thanks,

Aaron

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Can't find -lg2c when installing randomForest

2009-01-20 Thread Richard Yanicky

I have search the help archives and can't find a direct reference to the
following issue:



When installing randomForest on under CentOS 5.2 , R version 2.7.1 with gcc
4.1.2.



We receive the following error (see below, can't find lg2c) it is in the
path!



r...@abcsci12 ~]# R CMD INSTALL
/scisys/home/yanicrk/randomForest_4.5-28.tar.gz

* Installing to library '/usr/lib64/R/library'

* Installing *source* package 'randomForest' ...

** libs

gcc -I/usr/lib64/R/include  -I/usr/local/include-fpic  -O2 -g -std=gnu99
-c classTree.c -o classTree.o

gcc -I/usr/lib64/R/include  -I/usr/local/include-fpic  -O2 -g -std=gnu99
-c regrf.c -o regrf.o

gcc -I/usr/lib64/R/include  -I/usr/local/include-fpic  -O2 -g -std=gnu99
-c regTree.c -o regTree.o

gcc -I/usr/lib64/R/include  -I/usr/local/include-fpic  -O2 -g -std=gnu99
-c rf.c -o rf.o

g77   -fpic  -O2 -g -c rfsub.f -o rfsub.o

gcc -I/usr/lib64/R/include  -I/usr/local/include-fpic  -O2 -g -std=gnu99
-c rfutils.c -o rfutils.o

gcc -shared -Wl,-O1 -o randomForest.so classTree.o regrf.o regTree.o rf.o
rfsub.o rfutils.o  -lg2c -lm -L/usr/lib64/R/lib -lR

/usr/bin/ld: cannot find -lg2c

collect2: ld returned 1 exit status

make: *** [randomForest.so] Error 1

ERROR: compilation failed for package 'randomForest'

** Removing '/usr/lib64/R/library/randomForest'



Any assistance would be greatly appreciated.



Rich

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] arima.sim help

2009-01-20 Thread granitemathematics

I am trying to simulate time series data for an ar(1) and ma(1) process. I want 
the error term to have either a t distribution with 1 degree of freedom or a 
normal distribution with mean=0 and sd=1. Here is my code:

error.model=function(n){rnorm(n,mean=0, sd=1)}



data<-arima.sim(model=list(ar=c(0.1)), n=1000,
n.start=200, start.innov=rnorm(200,mean=0, sd=1),
rand.gen=error.model )

data

error.model=function(n){rnorm(n,mean=0, sd=1)}



data<-arima.sim(model=list(ma=c(0.1)), n=1000,
n.start=200, start.innov=rnorm(200,mean=0, sd=1),
rand.gen=error.model )

data


error.model=function(n){rt(n,1)}



data<-arima.sim(model=list(ma=c(0.1)), n=1000,
n.start=200, start.innov=rt(200,1),
rand.gen=error.model )

data


error.model=function(n){rt(n,1)}



data<-arima.sim(model=list(ar=c(0.1)), n=1000,
n.start=200, start.innov=rt(200,1),
rand.gen=error.model )

data

My question is: am I actually accomplishing my goal??? 

Thank you, 
Neill

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] tclarray with embedded spaces in data

2009-01-20 Thread Ruth M. Ripley

I would like to use a tclArray:

mytkarray <- tclArray()

as the variable for a table:

table1 <- tkwidget(f1, 'table', variable= mytkarray)

but if I include character strings with embedded spaces, I get braces
appearing in the table.

I can remove them using a non-R tclarray, (the difference between the
first example of

http://www.sciviews.org/_rgui/tcltk/Tktable.html

and

http://bioinf.wehi.edu.au/~wettenhall/RTclTkExamples/tktable.html

- the result of the second of which includes the braces (although not in
the image provided!) and that of the first does not - but this seems
undesirably complicated. Am I missing something simple?

Any insights would be gratefully received,

Regards,

Ruth
--
Ruth M. Ripley, Email:r...@stats.ox.ac.uk
Dept. of Statistics,http://www.stats.ox.ac.uk/~ruth/
University of Oxford,   Tel:   01865 282851
1 South Parks Road, Oxford OX1 3TG, UK  Fax:   01865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem with apply(m, 1, min, na.rm=T)

2009-01-20 Thread Marc Schwartz

on 01/20/2009 05:26 PM Czerminski, Ryszard wrote:
> Passing extra arguments to FUN=mean or median in apply
> seems fine, but when FUN=min warnings are generated?
> See below.
> 
> Any ideas why?
> 
> Best regards,
> Ryszard
> 
> Ryszard Czerminski
> AstraZeneca Pharmaceuticals LP
>  
>> m
>  [,1] [,2]
> [1,]12
> [2,]3   NA
> [3,]   NA   NA
>> apply(m, 1, median, na.rm=T)
> [1] 1.5 3.0  NA
>> apply(m, 1, mean, na.rm=T)
> [1] 1.5 3.0 NaN
>> apply(m, 1, min, na.rm=T)
> [1]   1   3 Inf
> Warning message:
> In FUN(newX[, i], ...) : no non-missing arguments to min; returning Inf

Not a problem with min(), it is an issue with the last row being an
empty set after removing the 2 NAs since na.rm = TRUE.

> min(NA, NA, na.rm = TRUE)
[1] Inf
Warning message:
In min(NA, NA, na.rm = TRUE) :
  no non-missing arguments to min; returning Inf

You are effectively doing:

> min(numeric(0))
[1] Inf
Warning message:
In min(logical(0)) : no non-missing arguments to min; returning Inf

See:

http://wiki.r-project.org/rwiki/doku.php?id=tips:surprises:emptysetfuncs

for more information on empty sets.

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Concave Hull

2009-01-20 Thread Charles Geyer

> Message: 64
> Date: Mon, 19 Jan 2009 15:14:34 -0700
> From: Greg Snow 
> Subject: Re: [R] Concave Hull
> To: Michael Kubovy , r-help
>   
> Message-ID:
>   
> Content-Type: text/plain; charset="us-ascii"
> 
> I don't know if it is the same algorithm or not, but there is the function 
> "chull" that finds the convex hull.

Also the R function "redundant" in the contributed package "rcdd" efficiently
finds convex hulls in d-dimensional space for arbitrary d (chull only does
d = 2).  See Sections 4.2 and 5.2 of the rcdd package vignette.

> Hope this helps,
> 
> -- 
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.s...@imail.org
> 801.408.8111
> 
> 
> > -Original Message-
> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> > project.org] On Behalf Of Michael Kubovy
> > Sent: Saturday, January 17, 2009 9:49 AM
> > To: r-help
> > Subject: [R] Concave Hull
> > 
> > Dear Friends,
> > 
> > Here is an algorithm for finding concave hulls:
> > http://get.dsi.uminho.pt/local/
> > 
> > Has anyone implemented such an algorithm in R?
> > 
> > RSiteSearch('concave hull') didn't reveal one (I think).
> > 
> > _
> > Professor Michael Kubovy
> > University of Virginia
> > Department of Psychology
> > Postal Address:
> > P.O.Box 400400, Charlottesville, VA 22904-4400
> > Express Parcels Address:
> > Gilmer Hall, Room 102, McCormick Road, Charlottesville, VA 22903
> > Office:B011;Phone: +1-434-982-4729
> > Lab:B019;   Phone: +1-434-982-4751
> > WWW:http://www.people.virginia.edu/~mk9y/
> > Skype name: polyurinsane
> > 
> > 
> > 
> > 
> > 
> > [[alternative HTML version deleted]]
> > 
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem with apply(m, 1, min, na.rm=T)

2009-01-20 Thread Rolf Turner



On 21/01/2009, at 12:26 PM, Czerminski, Ryszard wrote:


Passing extra arguments to FUN=mean or median in apply
seems fine, but when FUN=min warnings are generated?
See below.

Any ideas why?

Best regards,
Ryszard

Ryszard Czerminski
AstraZeneca Pharmaceuticals LP


m

 [,1] [,2]
[1,]12
[2,]3   NA
[3,]   NA   NA

apply(m, 1, median, na.rm=T)

[1] 1.5 3.0  NA

apply(m, 1, mean, na.rm=T)

[1] 1.5 3.0 NaN

apply(m, 1, min, na.rm=T)

[1]   1   3 Inf
Warning message:
In FUN(newX[, i], ...) : no non-missing arguments to min; returning  
Inf


RTFM:

 The minimum and maximum of a numeric empty set are '+Inf' and
 '-Inf' (in this order!) which ensures _transitivity_, e.g.,
 'min(x1, min(x2)) == min(x1, x2)'.  For numeric 'x' 'max(x) ==
 -Inf' and 'min(x) == +Inf' whenever 'length(x) == 0' (after
 removing missing values if requested).

cheers,

Rolf Turner

##
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problem with cyrillic in postscript

2009-01-20 Thread Alexander Nakhabov

Hi all,
When I plot some graph with cyrillic (namely russian) titles it looks ok,
but after saving this figure as eps file I get damaged title fonts. The
command dev.copy2eps was used in the following manner:
dev.copy2eps("test.eps")
or, for example
dev.copy2eps("test.eps",family='NimbusSan')

What is wrong? I use R 2.6.0 under Windows. Any help will be appreciated.

Regards,
Alexander Nakhabov

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] problem with apply(m, 1, min, na.rm=T)

2009-01-20 Thread Czerminski, Ryszard

Passing extra arguments to FUN=mean or median in apply
seems fine, but when FUN=min warnings are generated?
See below.

Any ideas why?

Best regards,
Ryszard

Ryszard Czerminski
AstraZeneca Pharmaceuticals LP
 
> m
 [,1] [,2]
[1,]12
[2,]3   NA
[3,]   NA   NA
> apply(m, 1, median, na.rm=T)
[1] 1.5 3.0  NA
> apply(m, 1, mean, na.rm=T)
[1] 1.5 3.0 NaN
> apply(m, 1, min, na.rm=T)
[1]   1   3 Inf
Warning message:
In FUN(newX[, i], ...) : no non-missing arguments to min; returning Inf
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with subset() function?

2009-01-20 Thread David Winsemius

Consider an alternative and realize that it is density() that is  
complaining about being passed a dataframe rather than subset  
misbehaving:


density(subset(mydf, ht >= 150.0 & wt <= 150.0)$age)

Call:
density.default(x = subset(mydf, ht >= 150 & wt <= 150)$age)

Data: subset(mydf, ht >= 150 & wt <= 150)$age (29 obs.);	Bandwidth  
'bw' = 5.816


   xy
 Min.   : 4.553   Min.   :3.781e-05
 1st Qu.:22.776   1st Qu.:3.108e-03
 Median :41.000   Median :1.775e-02
 Mean   :41.000   Mean   :1.370e-02
 3rd Qu.:59.224   3rd Qu.:2.128e-02
 Max.   :77.447   Max.   :2.665e-02


--
David Winsemius


On Jan 20, 2009, at 6:02 PM, Steven McKinney wrote:


Hi all,

Can anyone explain why the following use of
the subset() function produces a different
outcome than the use of the "[" extractor?

The subset() function as used in

density(subset(mydf, ht >= 150.0 & wt <= 150.0, select = c(age)))

appears to me from documentation to be equivalent to

density(mydf[mydf$ht >= 150.0 & mydf$wt <= 150.0, "age"])

(modulo exclusion of NAs) but use of the former yields an
error from density.default() (shown below).


Is this a bug in the subset() machinery?  Or is it
a documentation issue for the subset() function
documentation or density() documentation?

I'm seeing issues such as this with newcomers to R
who initially seem to prefer using subset() instead
of the bracket extractor.  At this point these functions
are clearly not exchangeable.  Should code be patched
so that they are, or documentation amended to show
when use of subset() is not appropriate?


### Bug in subset()?



set.seed(123)
mydf <- data.frame(ht = 150 + 10 * rnorm(100),

+wt = 150 + 10 * rnorm(100),
+age = sample(20:60, size = 100, replace = TRUE)
+)



density(subset(mydf, ht >= 150.0 & wt <= 150.0, select = c(age)))
Error in density.default(subset(mydf, ht >= 150 & wt <= 150, select  
= c(age))) :

 argument 'x' must be numeric



density(mydf[mydf$ht >= 150.0 & mydf$wt <= 150.0, "age"])


Call:
density.default(x = mydf[mydf$ht >= 150 & mydf$wt <= 150, "age"])

Data: mydf[mydf$ht >= 150 & mydf$wt <= 150, "age"] (29 obs.);	 
Bandwidth 'bw' = 5.816


  xy
Min.   : 4.553   Min.   :3.781e-05
1st Qu.:22.776   1st Qu.:3.108e-03
Median :41.000   Median :1.775e-02
Mean   :41.000   Mean   :1.370e-02
3rd Qu.:59.224   3rd Qu.:2.128e-02
Max.   :77.447   Max.   :2.665e-02



sessionInfo()

R version 2.8.0 Patched (2008-11-06 r46845)
powerpc-apple-darwin9.5.0

locale:
C

attached base packages:
[1] stats graphics  grDevices datasets  utils methods   base

loaded via a namespace (and not attached):
[1] Matrix_0.999375-16 grid_2.8.0 lattice_0.17-15 
lme4_0.99875-9

[5] nlme_3.1-89









Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: smckinney +at+ bccrc +dot+ ca

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C.
V5Z 1L3
Canada

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with subset() function?

2009-01-20 Thread Marc Schwartz

on 01/20/2009 05:02 PM Steven McKinney wrote:
> Hi all,
> 
> Can anyone explain why the following use of
> the subset() function produces a different
> outcome than the use of the "[" extractor?
> 
> The subset() function as used in
> 
>  density(subset(mydf, ht >= 150.0 & wt <= 150.0, select = c(age)))

Here you are asking density to be run on a data frame, which is what
subset returns, even when you select a single column. Thus, you get an
error since density() expects a numeric vector.

No bug in either subset() or the documentation.

You could do this:

  density(subset(mydf, ht >= 150.0 & wt <= 150.0, select = age)[[1]])


> appears to me from documentation to be equivalent to
> 
>  density(mydf[mydf$ht >= 150.0 & mydf$wt <= 150.0, "age"])

Here you are running density on a vector, so it works. This is because
the default behavior for "[.data.frame" has 'drop = TRUE', which means
that the returned result is coerced to the lowest possible dimension.
Thus, rather than a single data frame column, a vector is returned.

The result from subset() would be equivalent to using 'drop = FALSE'.

HTH,

Marc Schwartz


> (modulo exclusion of NAs) but use of the former yields an 
> error from density.default() (shown below).
> 
> 
> Is this a bug in the subset() machinery?  Or is it
> a documentation issue for the subset() function
> documentation or density() documentation?
> 
> I'm seeing issues such as this with newcomers to R
> who initially seem to prefer using subset() instead
> of the bracket extractor.  At this point these functions
> are clearly not exchangeable.  Should code be patched
> so that they are, or documentation amended to show
> when use of subset() is not appropriate?
> 
>> ### Bug in subset()?
> 
>> set.seed(123)
>> mydf <- data.frame(ht = 150 + 10 * rnorm(100),
> +wt = 150 + 10 * rnorm(100),
> +age = sample(20:60, size = 100, replace = TRUE)
> +)
> 
> 
>> density(subset(mydf, ht >= 150.0 & wt <= 150.0, select = c(age)))
> Error in density.default(subset(mydf, ht >= 150 & wt <= 150, select = 
> c(age))) : 
>   argument 'x' must be numeric
> 
> 
>> density(mydf[mydf$ht >= 150.0 & mydf$wt <= 150.0, "age"])
> 
> Call:
>   density.default(x = mydf[mydf$ht >= 150 & mydf$wt <= 150, "age"])
> 
> Data: mydf[mydf$ht >= 150 & mydf$wt <= 150, "age"] (29 obs.); Bandwidth 'bw' 
> = 5.816
> 
>xy
>  Min.   : 4.553   Min.   :3.781e-05  
>  1st Qu.:22.776   1st Qu.:3.108e-03  
>  Median :41.000   Median :1.775e-02  
>  Mean   :41.000   Mean   :1.370e-02  
>  3rd Qu.:59.224   3rd Qu.:2.128e-02  
>  Max.   :77.447   Max.   :2.665e-02  
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with subset() function?

2009-01-20 Thread Andrew Robinson

Steven,

check the class of the objects that you are creating.

Cheers,

Andrew

On Wed, January 21, 2009 10:02 am, Steven McKinney wrote:
> Hi all,
>
> Can anyone explain why the following use of
> the subset() function produces a different
> outcome than the use of the "[" extractor?
>
> The subset() function as used in
>
>  density(subset(mydf, ht >= 150.0 & wt <= 150.0, select = c(age)))
>
> appears to me from documentation to be equivalent to
>
>  density(mydf[mydf$ht >= 150.0 & mydf$wt <= 150.0, "age"])
>
> (modulo exclusion of NAs) but use of the former yields an
> error from density.default() (shown below).
>
>
> Is this a bug in the subset() machinery?  Or is it
> a documentation issue for the subset() function
> documentation or density() documentation?
>
> I'm seeing issues such as this with newcomers to R
> who initially seem to prefer using subset() instead
> of the bracket extractor.  At this point these functions
> are clearly not exchangeable.  Should code be patched
> so that they are, or documentation amended to show
> when use of subset() is not appropriate?
>
>> ### Bug in subset()?
>
>> set.seed(123)
>> mydf <- data.frame(ht = 150 + 10 * rnorm(100),
> +wt = 150 + 10 * rnorm(100),
> +age = sample(20:60, size = 100, replace = TRUE)
> +)
>
>
>> density(subset(mydf, ht >= 150.0 & wt <= 150.0, select = c(age)))
> Error in density.default(subset(mydf, ht >= 150 & wt <= 150, select =
> c(age))) :
>   argument 'x' must be numeric
>
>
>> density(mydf[mydf$ht >= 150.0 & mydf$wt <= 150.0, "age"])
>
> Call:
>   density.default(x = mydf[mydf$ht >= 150 & mydf$wt <= 150, "age"])
>
> Data: mydf[mydf$ht >= 150 & mydf$wt <= 150, "age"] (29 obs.); Bandwidth
> 'bw' = 5.816
>
>xy
>  Min.   : 4.553   Min.   :3.781e-05
>  1st Qu.:22.776   1st Qu.:3.108e-03
>  Median :41.000   Median :1.775e-02
>  Mean   :41.000   Mean   :1.370e-02
>  3rd Qu.:59.224   3rd Qu.:2.128e-02
>  Max.   :77.447   Max.   :2.665e-02
>
>
>> sessionInfo()
> R version 2.8.0 Patched (2008-11-06 r46845)
> powerpc-apple-darwin9.5.0
>
> locale:
> C
>
> attached base packages:
> [1] stats graphics  grDevices datasets  utils methods   base
>
> loaded via a namespace (and not attached):
> [1] Matrix_0.999375-16 grid_2.8.0 lattice_0.17-15
> lme4_0.99875-9
> [5] nlme_3.1-89
>>
>
>
>
>
>
>
> Steven McKinney
>
> Statistician
> Molecular Oncology and Breast Cancer Program
> British Columbia Cancer Research Centre
>
> email: smckinney +at+ bccrc +dot+ ca
>
> tel: 604-675-8000 x7561
>
> BCCRC
> Molecular Oncology
> 675 West 10th Ave, Floor 4
> Vancouver B.C.
> V5Z 1L3
> Canada
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


Andrew Robinson
Senior Lecturer in Statistics   Tel: +61-3-8344-6410
Department of Mathematics and StatisticsFax: +61-3-8344 4599
University of Melbourne, VIC 3010 Australia
Email: a.robin...@ms.unimelb.edu.auWebsite: http://www.ms.unimelb.edu.au

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Odd behaviour of subset indexing (x:y).

2009-01-20 Thread Marc Schwartz

on 01/20/2009 05:01 PM John Sorkin wrote:
> R 2.8.1
> windows XP
> 
> I don't understand the output from x[iS+1:iE] produced by the code below:
> 
> x = c(1,2,3,4,5)
> x
>  [1] 1 2 3 4 5
> 
>  iS=2   #  start position
>  iE=4   #  end position
> 
> [iS:iE]
> [1] 2 3 4
> 
> # I don't understand the results of the command below. I would expect to see 
> 3, 4, not 3, 4, 5, NA
> x[iS+1:iE]
> [1]  3  4  5 NA
> 
> Thanks,
> John

Operator precedence.

Note:

# You are asking for indices 3:6 and of course x[6] does not exist
# hence the NA
# Equivalent to: iS + (1:iE)

> iS + 1:iE
[1] 3 4 5 6


# This is what you want
> (iS + 1):iE
[1] 3 4


HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problem with subset() function?

2009-01-20 Thread Steven McKinney

Hi all,

Can anyone explain why the following use of
the subset() function produces a different
outcome than the use of the "[" extractor?

The subset() function as used in

 density(subset(mydf, ht >= 150.0 & wt <= 150.0, select = c(age)))

appears to me from documentation to be equivalent to

 density(mydf[mydf$ht >= 150.0 & mydf$wt <= 150.0, "age"])

(modulo exclusion of NAs) but use of the former yields an 
error from density.default() (shown below).


Is this a bug in the subset() machinery?  Or is it
a documentation issue for the subset() function
documentation or density() documentation?

I'm seeing issues such as this with newcomers to R
who initially seem to prefer using subset() instead
of the bracket extractor.  At this point these functions
are clearly not exchangeable.  Should code be patched
so that they are, or documentation amended to show
when use of subset() is not appropriate?

> ### Bug in subset()?

> set.seed(123)
> mydf <- data.frame(ht = 150 + 10 * rnorm(100),
+wt = 150 + 10 * rnorm(100),
+age = sample(20:60, size = 100, replace = TRUE)
+)


> density(subset(mydf, ht >= 150.0 & wt <= 150.0, select = c(age)))
Error in density.default(subset(mydf, ht >= 150 & wt <= 150, select = c(age))) 
: 
  argument 'x' must be numeric


> density(mydf[mydf$ht >= 150.0 & mydf$wt <= 150.0, "age"])

Call:
density.default(x = mydf[mydf$ht >= 150 & mydf$wt <= 150, "age"])

Data: mydf[mydf$ht >= 150 & mydf$wt <= 150, "age"] (29 obs.);   Bandwidth 'bw' 
= 5.816

   xy
 Min.   : 4.553   Min.   :3.781e-05  
 1st Qu.:22.776   1st Qu.:3.108e-03  
 Median :41.000   Median :1.775e-02  
 Mean   :41.000   Mean   :1.370e-02  
 3rd Qu.:59.224   3rd Qu.:2.128e-02  
 Max.   :77.447   Max.   :2.665e-02  


> sessionInfo()
R version 2.8.0 Patched (2008-11-06 r46845) 
powerpc-apple-darwin9.5.0 

locale:
C

attached base packages:
[1] stats graphics  grDevices datasets  utils methods   base 

loaded via a namespace (and not attached):
[1] Matrix_0.999375-16 grid_2.8.0 lattice_0.17-15lme4_0.99875-9
[5] nlme_3.1-89   
> 






Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: smckinney +at+ bccrc +dot+ ca

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3
Canada

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Odd behaviour of subset indexing (x:y).

2009-01-20 Thread John Sorkin

R 2.8.1
windows XP

I don't understand the output from x[iS+1:iE] produced by the code below:

x = c(1,2,3,4,5)
x
 [1] 1 2 3 4 5

 iS=2   #  start position
 iE=4   #  end position

[iS:iE]
[1] 2 3 4

# I don't understand the results of the command below. I would expect to see 3, 
4, not 3, 4, 5, NA
x[iS+1:iE]
[1]  3  4  5 NA

Thanks,
John



John Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
Baltimore VA Medical Center GRECC,
University of Maryland School of Medicine Claude D. Pepper OAIC,
University of Maryland Clinical Nutrition Research Unit, and
Baltimore VA Center Stroke of Excellence

University of Maryland School of Medicine
Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524

(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)
jsor...@grecc.umaryland.edu
Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] grep question

2009-01-20 Thread Edna Bell

In grep, you can use the options "n" and "o" to get the line number
and only the matching text.

Is there a way to just get the line number, please?

Thanks,
Edna Bell

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] inconsistent lm results with fixed response variable

2009-01-20 Thread tyler

Rolf Turner  writes:

> Oh for Pete's sake!

No, just for me.

> Computers use floating point arithmetic.  Your residual standard error in
> case 2 (i.e. 1.44e-14) *is* 0, but floating point arithmetic can't quite see
> that this is so. 

Yes, and that's fine. When I put together a lattice plot to display
several hundred slope coefficients, I don't need to distinguish between
1.44e-14 and 0. Both are visually 'zero', and accurately reflect the
lack of relationship.

My problem came when viewing a lattice plot of several hundred adj. R sq
values, and viewing a handful of very high values in cases where there
is no actual relationship. In some cases R did what I expected, and gave
me a NaN which didn't plot. In other cases, it gave me a very large
number, which did plot, and was quite confusing in context.

Anyways, it will be easy to add a check as you suggest.

Thanks for your time,

Tyler

> Put in a check for the RSE being 0, and ``over- ride'' the adjusted R
> squared to be NA (or NaN, or whatever floats your boat) in such
> instances. The all.equal() function might be useful to you:
>
>> x <- 1.44e-14
>> all.equal(x,0)
> [1] TRUE
>
> (Caution:  Trap for Young Players:  If x and y are ``really'' different,
> then all.equal(x,y) doesn't return FALSE as you might expect, but rather
> a description of the difference between x and y --- which may be complicated
> if x and y are complicated objects.  The function isTRUE() is useful here.)
>
>   cheers,
>
>   Rolf Turner
>
>
> On 21/01/2009, at 9:21 AM, tyler wrote:
>
>> Hi,
>>
>> I'm analyzing a large number of simulations using lm(), a sample of the
>> resulting data is pasted below. In some simulations, the response
>> variable doesn't vary, ie:
>>
>>> tmp[[2]]$richness
>>  [1] 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 
>> 40
>>
>> When I analyze this using R version 2.8.0 (2008-10-20) on a linux
>> cluster, I get an appropriate result:
>>
>>
>> ## begin R ##
>>
>> summary(lm(richness ~ het, data = tmp[[2]]))
>>
>> Call:
>> lm(formula = richness ~ het, data = tmp[[2]])
>>
>> Residuals:
>>Min 1Q Median 3QMax
>>  0  0  0  0  0
>>
>> Coefficients:
>> Estimate Std. Error t value Pr(>|t|)
>> (Intercept)   40  0 Inf   <2e-16 ***
>> het0  0  NA   NA
>> ---
>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>
>> Residual standard error: 0 on 23 degrees of freedom
>> Multiple R-squared:  ,  Adjusted R-squared:
>> F-statistic:   on 1 and 23 DF,  p-value: NA
>>
>> ## end R ##
>>
>> This is good, as when I extract the Adjusted R-squared and slope I get
>> NaN and 0, which are easily identified in my aggregate analysis, so I
>> can deal with them appropriately.
>>
>> However, this isn't always the case:
>>
>> ## begin R ##
>>
>>  tmp[[1]]$richness
>>  [1] 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 
>> 40
>> [26] 40 40 40 40 40 40 40 40 40 40 40
>>
>>  summary(lm(richness ~ het, data = tmp[[1]]))
>>
>> Call:
>> lm(formula = richness ~ het, data = tmp[[1]])
>>
>> Residuals:
>>Min 1Q Median 3QMax
>> -8.265e-14  1.689e-15  2.384e-15  2.946e-15  4.022e-15
>>
>> Coefficients:
>>  Estimate Std. Error   t value Pr(>|t|)
>> (Intercept) 4.000e+01  8.418e-15 4.752e+15   <2e-16 ***
>> het 1.495e-14  4.723e-14 3.160e-010.754
>> ---
>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>
>> Residual standard error: 1.44e-14 on 34 degrees of freedom
>> Multiple R-squared: 0.5112, Adjusted R-squared: 0.4968
>> F-statistic: 35.56 on 1 and 34 DF,  p-value: 9.609e-07
>>
>> ## end R ##
>>
>> This is a problem, as when I plot the adj. R sq as part of an aggregate
>> analysis of a large number of simulations, it appears to be a very
>> strong regression. I wouldn't have caught this except it was
>> exceptionally high for the simulation parameters. It also differs by
>> more than rounding error from the results with R 2.8.1 running on my
>> laptop (Debian GNU/Linux), i.e., adj. R sq 0.5042 vs 0.4968.
>> Furthermore, on my laptop, none of the analyses produce a NaN adj. R sq,
>> even for data that do produce that result on the cluster.
>>
>> Both my laptop and the linux cluster have na.action set to na.omit. Is
>> there something else I can do to ensure that lm() returns slope == 0
>> and adj.R.sq == NaN when the response variable is fixed?
>>
>> Thanks for any suggestions,
>>
>> Tyler
>>
>> Data follows:
>>
>> `tmp` <-
>> list(structure(list(richness = c(40, 40,
>> 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
>> 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
>> 40, 40), range = c(0.655084651733024, 0.579667533660137, 0.433092220907644,
>> 0.62937198839679, 0.787891987978164, 0.623511540624239, 0.542744487102066,
>> 0.905937570175433, 0.806802881350753, 0.680413208666325, 0.873426339019

Re: [R] Creating a Sparse Matrix from a Sparse Vector

2009-01-20 Thread Martin Maechler

> "B" == BDGrantham  
> on Tue, 20 Jan 2009 11:53:55 -0600 writes:

B> Hello,

B> I am working with a sparse matrix that is approx. 13,900 by 14,100.  My
B> goal is to select a row out of the matrix and create a new matrix with 
that
B> row repeated 13,900 times without having to do any looping.  Example:

B> Starting Matrix:

B> exampleMatrix
B> 3 x 4 sparse Matrix of class "dgCMatrix"

B> [1,] 1 . . .
B> [2,] . 1 . 0.5
B> [3,] . . 1 ..

B> New Matrix:..

B> newExampleMatrix
B> 3 x 4 sparse Matrix of class "dgCMatrix"

B> [1,] . 1 . 0.5
B> [2,] . 1 . 0.5
B> [3,] . 1 . 0.5

B> When I try the following I get a memory allocation error due to the size 
of
B> the array or vector:

{the following is too ugly for me to deparse ..}

If you used variable names of one or two letters, spaces and
proper indentation, that would have been a different business.

B> newExampleMatrix<-Matrix(rep(exampleMatrix[2,],times=nrow
B> (exampleMatrix)),nrow=nrow(exampleMatrix),ncol=ncol
B> (exampleMatrix),byrow=TRUE,sparse=TRUE)
B> newExampleMatrix<-Matrix(exampleMatrix[2,],nrow=nrow
B> (exampleMatrix),ncol=ncol(exampleMatrix),byrow=TRUE,sparse=TRUE)

Matrix() should not be used for large sparse matrices.
It's input, when not a sparseMatrix must be *dense* in one way
or the other.

The real solution of course is to step back and think a bit:

What will the (i,j,x)  [triplet aka "Tsparse"]
   or (i,p,x)  [column-compressed aka "Csparse"]
structure be.
After a fraction of a second you'll see that the triplet
representation will be trivial  more on that below.

B> When I tried the next set, I got the error "Error in Matrix(as(rep
B> (exampleMatrix[2, ], times = nrow(exampleMatrix)),  :   invalid 
type/length
B> (S4/12) in vector allocation":

B> newExampleMatrix<-Matrix(as(rep(exampleMatrix[2,],times=nrow
B> (exampleMatrix)),"sparseVector"),nrow=nrow(exampleMatrix),ncol=ncol
B> (exampleMatrix),byrow=TRUE,sparse=TRUE)
B> newExampleMatrix<-Matrix(as(exampleMatrix[2,],"sparseVector"),nrow=nrow
B> (exampleMatrix),ncol=ncol(exampleMatrix),byrow=TRUE,sparse=TRUE)

B> And finally, when I tried the next instruction, I got the error "Error in
B> as.vector(x, mode) :  cannot coerce type 'S4' to vector of type 'any' 
Error
Here seem to have loaded "SparseM" additionally to "Matrix",
not something useful at all here ..

B> in as.matrix.csc(as.matrix.csr(x)) : error in evaluating the argument 'x'
B> in selecting a method for function 'as.matrix.csc'" :

B> as.matrix.csc(as(rep(currentMatrix[pivitRow,],times=nrow
B> (currentMatrix)),"sparseVector"),nrow=nrow(currentMatrix),ncol=ncol
B> (currentMatrix))

B> Are there any other ways to accomplish this?],

yes; many, as always with R.
Since I'm sick in bed and cannot really do rocket science (:-)
I've solved this exercise for you:

require("Matrix")

repRow <- function(m, i, times)
{
## Purpose: return a sparse matrix containing row m[i,] 'times' times
## --
## Arguments: m: sparseMatrix;  i: row index;  times: #{replicates}
## --
## Author: Martin Maechler, Date: 20 Jan 2009, 21:48
stopifnot(is(m, "sparseMatrix"),
  length(i) == 1, length(times <- as.integer(times)) == 1,
  i >= 1, times >= 0, i == as.integer(i))
cl <- class(m)
m <- as(m, "TsparseMatrix")
mi <- m[i,, drop=FALSE]
## result: replace the parts of 'm':
r <- new(class(mi))
r...@dim <- c(times, m...@dim[2])
r...@i <- rep.int(seq_len(times) - 1L, length(m...@j))
r...@j <- rep(m...@j, each = times)
r...@dimnames <- list(NULL, m...@dimnames[[2]])
if(!extends(cl, "nMatrix")) ## have 'x' slot
r...@x <- rep(m...@x, each = times)
validObject(r)
if(extends(cl, "CsparseMatrix")) as(r, "CsparseMatrix") else r
}

(m <- Matrix(c(0,0,2:0), 3,5))
repRow(m,3, 7)
repRow(m,2, 11)
repRow(m,1, 1)
repRow(m,1, 0) # even that works

## now with a big (very sparse) one:
M <- kronecker(m, diag(1000))
r <- repRow(M,2, 4) # still quite quick

##-

Best regards,
Martin Maechler, ETH Zurich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] inconsistent lm results with fixed response variable

2009-01-20 Thread Rolf Turner

Oh for Pete's sake!

Computers use floating point arithmetic.  Your residual standard  
error in
case 2 (i.e. 1.44e-14) *is* 0, but floating point arithmetic can't  
quite see
that this is so. Put in a check for the RSE being 0, and ``over- 
ride'' the
adjusted R squared to be NA (or NaN, or whatever floats your boat) in  
such instances.

The all.equal() function might be useful to you:

> x <- 1.44e-14
> all.equal(x,0)
[1] TRUE

(Caution:  Trap for Young Players:  If x and y are ``really'' different,
then all.equal(x,y) doesn't return FALSE as you might expect, but rather
a description of the difference between x and y --- which may be  
complicated
if x and y are complicated objects.  The function isTRUE() is useful  
here.)

cheers,

Rolf Turner

On 21/01/2009, at 9:21 AM, tyler wrote:

Hi,

I'm analyzing a large number of simulations using lm(), a sample of  
the

resulting data is pasted below. In some simulations, the response
variable doesn't vary, ie:

tmp[[2]]$richness
 [1] 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40  
40 40 40 40

When I analyze this using R version 2.8.0 (2008-10-20) on a linux
cluster, I get an appropriate result:

## begin R ##

summary(lm(richness ~ het, data = tmp[[2]]))

Call:
lm(formula = richness ~ het, data = tmp[[2]])

Residuals:
   Min 1Q Median 3QMax
 0  0  0  0  0

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   40  0 Inf   <2e-16 ***
het0  0  NA   NA
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0 on 23 degrees of freedom
Multiple R-squared:  ,  Adjusted R-squared:
F-statistic:   on 1 and 23 DF,  p-value: NA

## end R ##

This is good, as when I extract the Adjusted R-squared and slope I get
NaN and 0, which are easily identified in my aggregate analysis, so I
can deal with them appropriately.

However, this isn't always the case:

## begin R ##

 tmp[[1]]$richness
 [1] 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40  
40 40 40 40

[26] 40 40 40 40 40 40 40 40 40 40 40

 summary(lm(richness ~ het, data = tmp[[1]]))

Call:
lm(formula = richness ~ het, data = tmp[[1]])

Residuals:
   Min 1Q Median 3QMax
-8.265e-14  1.689e-15  2.384e-15  2.946e-15  4.022e-15

Coefficients:
 Estimate Std. Error   t value Pr(>|t|)
(Intercept) 4.000e+01  8.418e-15 4.752e+15   <2e-16 ***
het 1.495e-14  4.723e-14 3.160e-010.754
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.44e-14 on 34 degrees of freedom
Multiple R-squared: 0.5112, Adjusted R-squared: 0.4968
F-statistic: 35.56 on 1 and 34 DF,  p-value: 9.609e-07

## end R ##

This is a problem, as when I plot the adj. R sq as part of an  
aggregate

analysis of a large number of simulations, it appears to be a very
strong regression. I wouldn't have caught this except it was
exceptionally high for the simulation parameters. It also differs by
more than rounding error from the results with R 2.8.1 running on my
laptop (Debian GNU/Linux), i.e., adj. R sq 0.5042 vs 0.4968.
Furthermore, on my laptop, none of the analyses produce a NaN adj.  
R sq,

even for data that do produce that result on the cluster.

Both my laptop and the linux cluster have na.action set to na.omit. Is
there something else I can do to ensure that lm() returns slope == 0
and adj.R.sq == NaN when the response variable is fixed?

Thanks for any suggestions,

Tyler

Data follows:

`tmp` <-
list(structure(list(richness = c(40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40), range = c(0.655084651733024, 0.579667533660137,  
0.433092220907644,
0.62937198839679, 0.787891987978164, 0.623511540624239,  
0.542744487102066,
0.905937570175433, 0.806802881350753, 0.680413208666325,  
0.873426339019084,
0.699982832956593, 0.697716600618959, 0.952729864926405,  
0.782938474636578,
1.03899695305995, 0.715075858219333, 0.579749205792549,  
1.20648999819246,
0.648677938600964, 0.651883559714785, 0.997318331273967,  
0.926368116052012,
0.91001274146868, 1.20737951037620, 1.12006560586723,  
1.09806272133903,
0.9750792390176, 0.356496202035743, 0.612018080768747,  
0.701905693862144,
0.735857916053381, 0.991787489781244, 1.07247435214078,  
0.60061903319766,

0.699733090379818), het = c(0.154538307084452, 0.143186508136608,
0.0690948358402777, 0.132337152911839, 0.169037344105692,  
0.117783183361602,
0.117524251767612, 0.221161206774407, 0.204574928003633,  
0.170571000779693,
0.204489357007294, 0.131749663515638, 0.154127894997213,  
0.232672587431942,
0.198610891796736, 0.260497696582693, 0.129028191256682,  
0.128717975847452,
0.254300896783617, 0.113546727236817, 0.142220347446853,  
0.24828642688332,
0.194340945175726, 0.190782985783610, 0.214676796387244,

Re: [R] plotting arrows with different colors and varying head size

2009-01-20 Thread Héctor Villalobos

Thanks to Jim and Greg,

Concerning the colors of the arrows, color.scale() function from Jim's solution 
seems more
straigthforward to me. I'm trying now to include a proper legend with 
color.legend() function.

Héctor

On 20 Jan 2009 at 22:13, Jim Lemon wrote:

> Héctor Villalobos wrote:
> > Dear list,
> >
> > I would like to plot arrows with different colors according to arrow
> > length, and also (if possible) with head size proportional to arrow
> > length. The idea is to make a quiver-like plot of matlab with wind
> > speed data.
> >
> > So far, I´ve been able to use different colors, but I need to find a
> > more efficient way to recode arrow length intervals into colors. On
> > the contrary, I can't define different head sizes, because the
> > "length" argument in the "arrows()" function seems to control the
> > head size of all the arrows at once.
> >
> Hi Hector,
> The color.scale function in the plotrix package may do what you want,
> also have a look at the vectorField function. I think you will have to
> draw the arrows one at a time or rewrite the arrows function to get
> different head sizes.
>
> Jim
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Dendrogram with the UPGMA method

2009-01-20 Thread Martin Maechler

> "MK" == Marcin Kozak 
> on Sat, 17 Jan 2009 10:14:10 +0100 writes:

MK> Hi, I am clustering objects using the agnes() function
MK> and the UPGMA clustering method (function =
MK> "average"). Everything works well, but apparently
MK> something is wrong with the dendrogram. For example:

MK> x<-c(102,102.1,112.5,113,100.3,108.2,101.1,104,105.5,106.3)
MK> y<-c(110,111,110.2,112.1,119.5,122.1,102,112,112.5,115)

MK> xy<-cbind(x,y)

MK> library(cluster) 
MK> UPGMA.orig<-agnes(x)

well, you compute agnes() on the one-dimensional data x rather
than the 2D  xy 
...
but we know that this is not your main "problem"

MK> plot(UPGMA.orig,which.plots=2,xlab="",main="",sub="")

MK> Look how the dendrogram has been drawn: all the OTUs
MK> should line up with 0.0 on the "distance" axis, 
"should" .. according to which commandments ?

MK> but it is not the case in the dendrogram obtained here. Is it
MK> possible to obtain a traditional dendrogram?

well, for those in the S / R tradition you already go a
traditional one...

but to finally help you:

use
ag <- agnes(...)
dg <- as.dendrogram(as.hclust(ag))
plot(dg)

and if you read and look at the examples of

help(plot.dendrogram)

you'll see that you have many more options for plotting there
(than if you'd plot the agnes object directly).

Martin Maechler, ETH Zurich
(being a bit disappointed that no other R-helper helped here ..)

MK> (I am using R 2.7.0 with Windows XP)

MK> Thanks and best wishes, Marcin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] inconsistent lm results with fixed response variable

2009-01-20 Thread tyler

Hi,

I'm analyzing a large number of simulations using lm(), a sample of the
resulting data is pasted below. In some simulations, the response
variable doesn't vary, ie:

> tmp[[2]]$richness
 [1] 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40

When I analyze this using R version 2.8.0 (2008-10-20) on a linux
cluster, I get an appropriate result:


## begin R ##

summary(lm(richness ~ het, data = tmp[[2]]))

Call:
lm(formula = richness ~ het, data = tmp[[2]])

Residuals:
   Min 1Q Median 3QMax
 0  0  0  0  0

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   40  0 Inf   <2e-16 ***
het0  0  NA   NA
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0 on 23 degrees of freedom
Multiple R-squared:  ,  Adjusted R-squared:
F-statistic:   on 1 and 23 DF,  p-value: NA

## end R ##

This is good, as when I extract the Adjusted R-squared and slope I get
NaN and 0, which are easily identified in my aggregate analysis, so I
can deal with them appropriately. 

However, this isn't always the case:

## begin R ##

 tmp[[1]]$richness
 [1] 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40
[26] 40 40 40 40 40 40 40 40 40 40 40

 summary(lm(richness ~ het, data = tmp[[1]]))

Call:
lm(formula = richness ~ het, data = tmp[[1]])

Residuals:
   Min 1Q Median 3QMax
-8.265e-14  1.689e-15  2.384e-15  2.946e-15  4.022e-15

Coefficients:
 Estimate Std. Error   t value Pr(>|t|)
(Intercept) 4.000e+01  8.418e-15 4.752e+15   <2e-16 ***
het 1.495e-14  4.723e-14 3.160e-010.754
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.44e-14 on 34 degrees of freedom
Multiple R-squared: 0.5112, Adjusted R-squared: 0.4968
F-statistic: 35.56 on 1 and 34 DF,  p-value: 9.609e-07

## end R ##

This is a problem, as when I plot the adj. R sq as part of an aggregate
analysis of a large number of simulations, it appears to be a very
strong regression. I wouldn't have caught this except it was
exceptionally high for the simulation parameters. It also differs by
more than rounding error from the results with R 2.8.1 running on my
laptop (Debian GNU/Linux), i.e., adj. R sq 0.5042 vs 0.4968.
Furthermore, on my laptop, none of the analyses produce a NaN adj. R sq,
even for data that do produce that result on the cluster.

Both my laptop and the linux cluster have na.action set to na.omit. Is
there something else I can do to ensure that lm() returns slope == 0
and adj.R.sq == NaN when the response variable is fixed? 

Thanks for any suggestions,

Tyler

Data follows:

`tmp` <-
list(structure(list(richness = c(40, 40, 
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 
40, 40), range = c(0.655084651733024, 0.579667533660137, 0.433092220907644, 
0.62937198839679, 0.787891987978164, 0.623511540624239, 0.542744487102066, 
0.905937570175433, 0.806802881350753, 0.680413208666325, 0.873426339019084, 
0.699982832956593, 0.697716600618959, 0.952729864926405, 0.782938474636578, 
1.03899695305995, 0.715075858219333, 0.579749205792549, 1.20648999819246, 
0.648677938600964, 0.651883559714785, 0.997318331273967, 0.926368116052012, 
0.91001274146868, 1.20737951037620, 1.12006560586723, 1.09806272133903, 
0.9750792390176, 0.356496202035743, 0.612018080768747, 0.701905693862144, 
0.735857916053381, 0.991787489781244, 1.07247435214078, 0.60061903319766, 
0.699733090379818), het = c(0.154538307084452, 0.143186508136608, 
0.0690948358402777, 0.132337152911839, 0.169037344105692, 0.117783183361602, 
0.117524251767612, 0.221161206774407, 0.204574928003633, 0.170571000779693, 
0.204489357007294, 0.131749663515638, 0.154127894997213, 0.232672587431942, 
0.198610891796736, 0.260497696582693, 0.129028191256682, 0.128717975847452, 
0.254300896783617, 0.113546727236817, 0.142220347446853, 0.24828642688332, 
0.194340945175726, 0.190782985783610, 0.214676796387244, 0.252940213066992, 
0.22362832797347, 0.182423482989676, 0.0602332226418674, 0.145400861749859, 
0.141297315445974, 0.139798699247632, 0.222815139716421, 0.211971297234962, 
0.120813579628747, 0.150590744533818), n.rich = c(40, 40, 40, 
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 
40)), .Names = c("richness", "range", "het", "n.rich")), 
 structure(list(richness = c(40, 40, 
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 
40, 40, 40, 40, 40, 40, 40), range = c(0.753203162648624, 0.599708526308711, 
0.714477274087683, 0.892359682406808, 0.868440625159371, 0.753239521511417, 
1.20164969658467, 1.20462111558583, 1.13142122690491, 0.95241921975703, 
1.13214481653550, 0.827528954009827, 1.14827745443481, 0.936048043180592, 
0.874649332193952, 1.38

Re: [R] Poisson GLM

2009-01-20 Thread David Winsemius

Since one method of modeling rates is to use  
glm(...  ,family="poisson") with the observed rate (events/ 
person_time) on the LHS of the formula and offset=log(expected_rates)  
on the RHS, I am quite happy that no error is thrown in that situation.

Reading old entries in r-help, it appears there may have been a time  
when an error was thrown, but now you just get a warning (assuming  
that people were not misreporting what they were seeing). Quasi- 
poisson models were recommended as an alternative.

?family

using a distorted version of the family = poisson() example in glm()  
help page:

> d.AD[1,3] <- 17.5

> glm.D93 <- glm(counts ~ outcome + treatment, data=d.AD,  
family=poisson())

Warning message:
In dpois(y, mu, log = TRUE) : non-integer x = 17.50

> glm.qD93 <- glm(counts ~ outcome + treatment, data= d.AD,  
family=quasipoisson())

#no warning

> coef(glm.qD93)
(Intercept)outcome2outcome3  treatment2  treatment3
 3.02984283 -0.44628710 -0.28501896  0.01005034  0.01005034
>
> coef(glm.D93)
(Intercept)outcome2outcome3  treatment2  treatment3
 3.02984283 -0.44628710 -0.28501896  0.01005034  0.01005034

# although vcov shows a difference
--
David Winsemius

On Jan 20, 2009, at 12:14 PM, bbbnc wrote:

This is a basics beginner question.

I attempted fitting a a Poisson GLM to data that is non-integer ( I  
believe

Poisson is suitable in this case, because it is modelling counts of
infections, but the data collected are all non-negative numbers with 2
decimal places).

My question is, since R doesn't return an error with this glm  
fitting, is it
important that the data is non-integer. How does R handle the data?  
if there

is a problem, how do i circumvent this.. data modification?

many thanks
--
View this message in context: 
http://www.nabble.com/Poisson-GLM-tp21567460p21567460.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Summing Select Columns of a Data Frame?

2009-01-20 Thread Jorge Ivan Velez

Hi again Julia,
Here are two more options:

DF<-read.table(textConnection("
x1 x2 x3
1   2   3
4   5   6
7   8   9"),header=TRUE)
closeAllConnections()

# Option 1  -- sent before
DF$x1+DF$x2

# Option 2
apply(DF,1,function(x) x[1]+x[2])

# Option 3
rowSums(DF[,c(1,2)])


HTH,

Jorge


On Tue, Jan 20, 2009 at 2:49 PM, Jorge Ivan Velez
wrote:

>
> Dear Julia,
> Try this:
>
> DF<-read.table(textConnection("
> x1 x2 x3
> 1   2   3
> 4   5   6
> 7   8   9"),header=TRUE)
> closeAllConnections()
>
> DF$x4<-DF$x1+DF$x2
> DF
>
>
> HTH,
>
> Jorge
>
>
>
> On Tue, Jan 20, 2009 at 2:12 PM, Julia Zhou wrote:
>
>> Hi,
>>
>> I would like to operate on certain columns in a dataframe, but not
>> others. My data looks like this:
>>
>> x1 x2 x3
>> 1   2   3
>> 4   5   6
>> 7   8   9
>>
>> I want to create a new column named x4 that is the sum of x1 and x2,
>> but NOT x3. I looked at colSums and apply, but those functions seem to
>> use all the columns in a dataframe. How do I only use select columns?
>>
>> If it helps, in Stata this would be gen x4 = x1 + x2.
>>
>> Thanks!
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to assign names in a list

2009-01-20 Thread mondher mehdi

Hi
for var names other than name1 ,2 and 3
for exple green yellow red use simply
names(list2)=c("green","yellow","red")

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Summing Select Columns of a Data Frame?

2009-01-20 Thread Stephan Kolassa


Hi Julia,

or you can use colSums on an appropriate sub-data.frame:

colSums(dataset[,c(1,2)])

HTH,
Stephan


Jorge Ivan Velez schrieb:

Dear Julia,
Try this:

DF<-read.table(textConnection("
x1 x2 x3
1   2   3
4   5   6
7   8   9"),header=TRUE)
closeAllConnections()

DF$x4<-DF$x1+DF$x2
DF


HTH,

Jorge



On Tue, Jan 20, 2009 at 2:12 PM, Julia Zhou  wrote:


Hi,

I would like to operate on certain columns in a dataframe, but not
others. My data looks like this:

x1 x2 x3
1   2   3
4   5   6
7   8   9

I want to create a new column named x4 that is the sum of x1 and x2,
but NOT x3. I looked at colSums and apply, but those functions seem to
use all the columns in a dataframe. How do I only use select columns?

If it helps, in Stata this would be gen x4 = x1 + x2.

Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Can R be installed on a Windows Server for concurrent usage?

2009-01-20 Thread Prof Brian Ripley


Yes.

People run classes with (to my knowledge) 40 sessions on a server.

On Tue, 20 Jan 2009, Gilbert, Gregory E. wrote:


I apologize for this very na??e and stupid question. I have searched the archives and 
"googled" a number of different queries relating to this question and cannot 
find the answer.



My Division is moving toward running an application server for all statistical 
applications because the Department of Veterans' Affairs has their computers 
locked down. They are locked down so much so simple installation becomes 
difficult.



Is it possible to run R on a 64 bit Windows Enterprise 2003/2008 Datacenter app 
server? I would only guess we are talking about 5 to 10 concurrent users.



Please respond to me as well as the list as I am not subscribed.



Thanks in advance.



Greg Gilbert, Statistician

Ralph Johnson VAMC

Charleston, SC USA


[[alternative HTML version deleted]]




--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Summing Select Columns of a Data Frame?

2009-01-20 Thread Jorge Ivan Velez

Dear Julia,
Try this:

DF<-read.table(textConnection("
x1 x2 x3
1   2   3
4   5   6
7   8   9"),header=TRUE)
closeAllConnections()

DF$x4<-DF$x1+DF$x2
DF


HTH,

Jorge



On Tue, Jan 20, 2009 at 2:12 PM, Julia Zhou  wrote:

> Hi,
>
> I would like to operate on certain columns in a dataframe, but not
> others. My data looks like this:
>
> x1 x2 x3
> 1   2   3
> 4   5   6
> 7   8   9
>
> I want to create a new column named x4 that is the sum of x1 and x2,
> but NOT x3. I looked at colSums and apply, but those functions seem to
> use all the columns in a dataframe. How do I only use select columns?
>
> If it helps, in Stata this would be gen x4 = x1 + x2.
>
> Thanks!
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Summing Select Columns of a Data Frame?

2009-01-20 Thread Julia Zhou

Hi,

I would like to operate on certain columns in a dataframe, but not
others. My data looks like this:

x1 x2 x3
1   2   3
4   5   6
7   8   9

I want to create a new column named x4 that is the sum of x1 and x2,
but NOT x3. I looked at colSums and apply, but those functions seem to
use all the columns in a dataframe. How do I only use select columns?

If it helps, in Stata this would be gen x4 = x1 + x2.

Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Can R be installed on a Windows Server for concurrent usage?

2009-01-20 Thread Gilbert, Gregory E.

I apologize for this very naïve and stupid question. I have searched the 
archives and "googled" a number of different queries relating to this question 
and cannot find the answer. 

 

My Division is moving toward running an application server for all statistical 
applications because the Department of Veterans' Affairs has their computers 
locked down. They are locked down so much so simple installation becomes 
difficult. 

 

Is it possible to run R on a 64 bit Windows Enterprise 2003/2008 Datacenter app 
server? I would only guess we are talking about 5 to 10 concurrent users.

 

Please respond to me as well as the list as I am not subscribed.

 

Thanks in advance.

 

Greg Gilbert, Statistician

Ralph Johnson VAMC

Charleston, SC USA


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Poisson GLM

2009-01-20 Thread bbbnc


This is a basics beginner question.

I attempted fitting a a Poisson GLM to data that is non-integer ( I believe
Poisson is suitable in this case, because it is modelling counts of
infections, but the data collected are all non-negative numbers with 2
decimal places).

My question is, since R doesn't return an error with this glm fitting, is it
important that the data is non-integer. How does R handle the data? if there
is a problem, how do i circumvent this.. data modification?

many thanks
-- 
View this message in context: 
http://www.nabble.com/Poisson-GLM-tp21567460p21567460.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Gentleman and Ihaka's integrity in question

2009-01-20 Thread Rolf Turner



On 20/01/2009, at 5:56 PM, Berwin A Turlach wrote:


Mark,

don't feed the troll.


Well said!

cheers,

Rolf


Cheers,

Berwin

On Mon, 19 Jan 2009 22:38:24 -0600 (CST)
markle...@verizon.net wrote:

Hi: I think I saw a link where the author clarified the original  
article
and explained more clearly that the design of R had it roots in S/S 
+. I

don't
remember where I saw it but it's somewhere. Also, I think it's  
jumping

the gun to claim that anyone lied to anyone before doing the research
and
knowing that FOR SURE but that's  just my opinion and you are  
entitled

to yours.




On Mon, Jan 19, 2009 at 11:05 PM, Robert Wilkins wrote:


It does look like Gentleman and Ihaka not only lied to the New York
Times, but also to the New Zealand Herald and who knows who else.  
This

is disgusting. The R programming language is the S programming
language, and Gentleman and Ihaka are not the ones who designed it.

http://thenewyorktimesissloppy.blogspot.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting- 
guide.html

and provide commented, minimal, self-contained, reproducible code.



=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919
National University of Singapore
6 Science Drive 2, Blk S16, Level 7  e-mail: sta...@nus.edu.sg
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting- 
guide.html

and provide commented, minimal, self-contained, reproducible code.



##
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] creating a list of matrices or data frames

2009-01-20 Thread hadley wickham

On Tue, Jan 20, 2009 at 10:34 AM, Simon Pickett  wrote:
> Hi all,
>
> How would you create a list of data.frames within a loop, then bind all the
> elements of the list using rbind?
>
> take this example of matrices with differing numbers of rows
>
> for(i in 1:3){
> assign(paste("s",i, sep=""),matrix(data = NA, nrow = i, ncol = 3, byrow =
> FALSE, dimnames = NULL))
> }
> s1
> s2
> s3
>
> I want to bind all the matrices at the end with do.call(rbind...)  rather
> than listing all the elements manually with rbind(s1,s2,s3...) and so on.

You might also want to have a look at the plyr package,
http://had.co.nz/plyr, which provides general tools for performing
these sorts of operations.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Creating a Sparse Matrix from a Sparse Vector

2009-01-20 Thread BDGrantham


Hello,

I am working with a sparse matrix that is approx. 13,900 by 14,100.  My
goal is to select a row out of the matrix and create a new matrix with that
row repeated 13,900 times without having to do any looping.  Example:

Starting Matrix:

exampleMatrix
3 x 4 sparse Matrix of class "dgCMatrix"

[1,] 1 . . .
[2,] . 1 . 0.5
[3,] . . 1 ..

New Matrix:..

 newExampleMatrix
3 x 4 sparse Matrix of class "dgCMatrix"

[1,] . 1 . 0.5
[2,] . 1 . 0.5
[3,] . 1 . 0.5

When I try the following I get a memory allocation error due to the size of
the array or vector:

 newExampleMatrix<-Matrix(rep(exampleMatrix[2,],times=nrow
(exampleMatrix)),nrow=nrow(exampleMatrix),ncol=ncol
(exampleMatrix),byrow=TRUE,sparse=TRUE)
 newExampleMatrix<-Matrix(exampleMatrix[2,],nrow=nrow
(exampleMatrix),ncol=ncol(exampleMatrix),byrow=TRUE,sparse=TRUE)

When I tried the next set, I got the error "Error in Matrix(as(rep
(exampleMatrix[2, ], times = nrow(exampleMatrix)),  :   invalid type/length
(S4/12) in vector allocation":

newExampleMatrix<-Matrix(as(rep(exampleMatrix[2,],times=nrow
(exampleMatrix)),"sparseVector"),nrow=nrow(exampleMatrix),ncol=ncol
(exampleMatrix),byrow=TRUE,sparse=TRUE)
newExampleMatrix<-Matrix(as(exampleMatrix[2,],"sparseVector"),nrow=nrow
(exampleMatrix),ncol=ncol(exampleMatrix),byrow=TRUE,sparse=TRUE)

And finally, when I tried the next instruction, I got the error "Error in
as.vector(x, mode) :  cannot coerce type 'S4' to vector of type 'any' Error
in as.matrix.csc(as.matrix.csr(x)) : error in evaluating the argument 'x'
in selecting a method for function 'as.matrix.csc'" :

as.matrix.csc(as(rep(currentMatrix[pivitRow,],times=nrow
(currentMatrix)),"sparseVector"),nrow=nrow(currentMatrix),ncol=ncol
(currentMatrix))

Are there any other ways to accomplish this?],

Thanks,

Brian




Notice:
This communication is an electronic communication within the meaning
of the Electronic Communications Privacy Act, 18 U.S.C. sec. 2510. Its
disclosure is strictly limited to the recipient(s) intended by the
sender of this message.  This transmission and any attachments may
contain proprietary, confidential, attorney-client privileged
information and/or attorney work product. If you are not the intended
recipient, any disclosure, copying, distribution, reliance on, or use
of any of the information contained herein is STRICTLY PROHIBITED.
Please destroy the original transmission and its attachments without
reading or saving in any matter and confirm by return email.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] generalizing expand.table: table -> data.frame

2009-01-20 Thread Marc Schwartz

on 01/20/2009 10:38 AM Michael Friendly wrote:
> In
> http://tolstoy.newcastle.edu.au/R/e2/help/06/10/3064.html
> a method was given for converting a frequency table to an expanded data
> frame representing each
> observation as a set of factors.  A slightly modified version was later
> included in the NCStats package,
> only on http://rforge.net/ (and it has too many dependencies to be useful).
> 
> I've tried to make it more general, allowing an input data frame in
> frequency form, and where
> the frequency variable is not named "Freq".  This is my working version:
> 
> __begin__ expand.table.R
> expand.table <- function (x, var.names = NULL, freq="Freq", ...)
> {
> #  allow: a table object, or a data frame in frequency form
>   if(inherits(x,"table")) {
> x <- as.data.frame.table(x)
>   }
> ##  This fails:
> #   df <- sapply(1:nrow(x), function(i) x[rep(i, each = x[i,freq]), ],
> simplify = FALSE)
> #   df <- subset(do.call("rbind", df), select = -freq)
> 
> #  This works, when the frequency variable is named Freq
>   df <- sapply(1:nrow(x), function(i) x[rep(i, each = x[i,"Freq"]), ],
> simplify = FALSE)
>   df <- subset(do.call("rbind", df), select = -Freq)
> 
>   for (i in 1:ncol(df)) {
>   df[[i]] <- type.convert(as.character(df[[i]]), ...)
>   }
>   rownames(df) <- NULL
>   if (!is.null(var.names)) {
>   if (length(var.names) < dim(df)[2])
>   stop("Too few var.names given.")
>   else if (length(var.names) > dim(df)[2])
>   stop("Too many var.names given.")
>   else names(df) <- var.names
>   }
>   df
> }
> __end__   expand.table.R
> 
> Thus for the following table
> 
> library(vcd)
> art <- xtabs(~Treatment + Improved, data = Arthritis)
> 
> 
>> art
> Improved
> Treatment None Some Marked
>  Placebo   297  7
>  Treated   137 21
> 
> expand.table (above) gives a data frame of sum(art)=84 observations,
> with factors
> Treatment and Improved.
>> artdf <- expand.table(art)
>> str(artdf)
> 'data.frame':   84 obs. of  2 variables:
> $ Treatment: Factor w/ 2 levels "Placebo","Treated": 1 1 1 1 1 1 1 1 1 1
> ...
> $ Improved : Factor w/ 3 levels "Marked","None",..: 2 2 2 2 2 2 2 2 2 2 ...
>>
> 
> I've generalized this so it works with data frames in frequency form,
> 
>> as.data.frame(art)
>  Treatment Improved Freq
> 1   Placebo None   29
> 2   Treated None   13
> 3   Placebo Some7
> 4   Treated Some7
> 5   Placebo   Marked7
> 6   Treated   Marked   21
> 
>> art.df2 <- expand.table(as.data.frame(art))
>> str(art.df2)
> 'data.frame':   84 obs. of  2 variables:
> $ Treatment: Factor w/ 2 levels "Placebo","Treated": 1 1 1 1 1 1 1 1 1 1
> ...
> $ Improved : Factor w/ 3 levels "Marked","None",..: 2 2 2 2 2 2 2 2 2 2 ...
>>
> 
> But--- here's the rub --- when the Freq variable in a data frame is
> called something other than
> "Freq", as in this example,
> 
>> GSS
> sex party count
> 1 female   dem   279
> 2   male   dem   165
> 3 female indep73
> 4   male indep47
> 5 female   rep   225
> 6   male   rep   191
> 
> all the changes I've tried, using the freq= argument in expand.table()
> fail in various ways.
> 
> Can someone help?

Hi Michael,

I think that the following modifications to my original code, also
incorporating the changes made in the NCstats package should work.


expand.dft <- function(x, var.names = NULL, freq = "Freq", ...)
{
  #  allow: a table object, or a data frame in frequency form
  if(inherits(x, "table"))
x <- as.data.frame.table(x, responseName = freq)

  freq.col <- which(colnames(x) == freq)
  if (length(freq.col) == 0)
  stop(paste(sQuote("freq"), "not found in column names"))

  DF <- sapply(1:nrow(x),
   function(i) x[rep(i, each = x[i, freq.col]), ],
   simplify = FALSE)

  DF <- do.call("rbind", DF)[, -freq.col]

  for (i in 1:ncol(DF))
  {
DF[[i]] <- type.convert(as.character(DF[[i]]), ...)

  }

  rownames(DF) <- NULL

  if (!is.null(var.names))
  {
if (length(var.names) < dim(DF)[2])
{
  stop(paste("Too few", sQuote("var.names"), "given."))
} else if (length(var.names) > dim(DF)[2]) {
  stop(paste("Too many", sQuote("var.names"), "given."))
} else {
  names(DF) <- var.names
}
  }

  DF
}



> art
 Improved
Treatment None Some Marked
  Placebo   297  7
  Treated   137 21


> head(expand.dft(art), 10)
   Treatment Improved
1Placebo None
2Placebo None
3Placebo None
4Placebo None
5Placebo None
6Placebo None
7Placebo None
8Placebo None
9Placebo None
10   Placebo None



art.dft <- as.data.frame.table(art)

> art.dft
  Treatment Improved Freq
1   Placebo None   29
2   Treated None   13
3   Placebo Some7
4   Treated Some7
5   Placebo   Marked7
6   Treated   Marked   21

names(art.dft)[3] <- "count"

> art.dft
  Treatment Improved count
1   Placebo None29
2   Treated None13
3

Re: [R] plotting points with two colors

2009-01-20 Thread Greg Snow

I don't know how efficient you would consider this, but here is one solution:

library(TeachingDemos)

ms.2circ <- function(r=1, adj=0, col1='blue', col2='red', npts=180) {
tmp1 <- seq( 0,   pi, length.out=npts+1) + adj
tmp2 <- seq(pi, 2*pi, length.out=npts+1) + adj
polygon(cos(tmp1)*r,sin(tmp1)*r, border=NA, col=col1)
polygon(cos(tmp2)*r,sin(tmp2)*r, border=NA, col=col2)
invisible(NULL)
}

x <- runif(10)
y <- rnorm(10)
a <- runif(10, 0, 2*pi)

my.symbols(x,y, ms.2circ, inches=0.15, add=FALSE, symb.plots=TRUE, adj=a)

# or

my.symbols(x,y, ms.2circ, inches=0.15, add=FALSE, symb.plots=TRUE, adj=a,
col1="#00ff0088", col2="#ff00ff88")

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of Georg Ehret
> Sent: Tuesday, January 20, 2009 10:23 AM
> To: r-help
> Subject: [R] plotting points with two colors
> 
> Dear Miss R,
> I am trying to plot a scatterplot in which the points (round)
> should
> have two colors: half red and half blue (if you want: two half solid
> circles
> put together. Can you please help me to realize this efficiently?
> 
> Thank you,
> Best regards, Georg.
> ***
> Georg Ehret
> Geneva University Hospital
> Geneva, Switzerland
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] building windows binary of a package

2009-01-20 Thread Duncan Murdoch


On 1/20/2009 12:17 PM, Li, Xiaochun wrote:

Hi,

I'm trying to build a Windows binary from a source package using R-2.8.1. I downloaded Rtools29 and set up the path. 
I got this error when running R CMD INSTALL,


make: sh.exe: Command not found
make: *** [pkg-SurvCov] Error 127
*** Installation of SurvCov failed ***

However, I can run 'sh' at the 'cmd' window. Any tips will be greatly appreciated.  


I haven't seen that particular error before, but it is almost always a 
PATH problem when this sort of thing happens.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plotting points with two colors

2009-01-20 Thread David Winsemius


?plot.default

plot(x <- rnorm(47), type = "p", main = "plot(x, type = \"p\")", col =  
c("dark red","blue"))


If you plan to have a system for the coloring, you need to get the  
values sequence-aligned with the colors. This just colors every other  
point "blue".

--
David Winsemius

On Jan 20, 2009, at 12:23 PM, Georg Ehret wrote:


Dear Miss R,
   I am trying to plot a scatterplot in which the points (round)  
should
have two colors: half red and half blue (if you want: two half solid  
circles

put together. Can you please help me to realize this efficiently?

Thank you,
Best regards, Georg.
***
Georg Ehret
Geneva University Hospital
Geneva, Switzerland

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] creating a list of matrices or data frames

2009-01-20 Thread Henrique Dallazuanna

Try this also:

lapply(1:3, matrix, data = NA, ncol = 3, byrow = F, dimnames = NULL)
do.call(rbind, lapply(1:3, matrix, data = NA, ncol = 3, byrow = F, dimnames
= NULL))

On Tue, Jan 20, 2009 at 2:34 PM, Simon Pickett wrote:

> Hi all,
>
> How would you create a list of data.frames within a loop, then bind all the
> elements of the list using rbind?
>
> take this example of matrices with differing numbers of rows
>
> for(i in 1:3){
> assign(paste("s",i, sep=""),matrix(data = NA, nrow = i, ncol = 3, byrow =
> FALSE, dimnames = NULL))
> }
> s1
> s2
> s3
>
> I want to bind all the matrices at the end with do.call(rbind...)  rather
> than listing all the elements manually with rbind(s1,s2,s3...) and so on.
>
> thanks in advance.
>
> Simon.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] creating a list of matrices or data frames

2009-01-20 Thread Chuck Cleland

On 1/20/2009 11:34 AM, Simon Pickett wrote:
> Hi all,
> 
> How would you create a list of data.frames within a loop, then bind all
> the elements of the list using rbind?
> 
> take this example of matrices with differing numbers of rows
> 
> for(i in 1:3){
> assign(paste("s",i, sep=""),matrix(data = NA, nrow = i, ncol = 3, byrow
> = FALSE, dimnames = NULL))
> }
> s1
> s2
> s3
> 
> I want to bind all the matrices at the end with do.call(rbind...) 
> rather than listing all the elements manually with rbind(s1,s2,s3...)
> and so on.
> 
> thanks in advance.

df.list <- vector("list", 3) # create list

for(i in 1:3){df.list[[i]] <- matrix(data = NA,
 nrow = i,
 ncol = 3,
 byrow = FALSE,
 dimnames = NULL)}

do.call(rbind, df.list) # rbind list elements

> Simon.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code. 

-- 
Chuck Cleland, Ph.D.
NDRI, Inc. (www.ndri.org)
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] plotting points with two colors

2009-01-20 Thread Georg Ehret

Dear Miss R,
I am trying to plot a scatterplot in which the points (round) should
have two colors: half red and half blue (if you want: two half solid circles
put together. Can you please help me to realize this efficiently?

Thank you,
Best regards, Georg.
***
Georg Ehret
Geneva University Hospital
Geneva, Switzerland

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Confidence intervals in ccf()

2009-01-20 Thread Shruthi Jayaram


Hi,

I have been running the ccf() function to find cross-correlations of time
series across various lags. When I give the option of plot=TRUE, I get a
plot that gives me 95% confidence interval cut-offs (based on sample
covariances) for my cross-correlations at each lag. This gives me a sense of
whether my cross-correlations are statistically significant or not. 

However, I am unable to get R to return the value of these critical values
to me in say, an object or vector form. Would anyone be able to help me
extract the critical values (at 95%) from the ccf() function? 

Thanks very much in advance,

Shruthi
-- 
View this message in context: 
http://www.nabble.com/Confidence-intervals-in-ccf%28%29-tp21567646p21567646.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] building windows binary of a package

2009-01-20 Thread Li, Xiaochun

Hi,

I'm trying to build a Windows binary from a source package using R-2.8.1. I 
downloaded Rtools29 and set up the path. 
I got this error when running R CMD INSTALL,

make: sh.exe: Command not found
make: *** [pkg-SurvCov] Error 127
*** Installation of SurvCov failed ***

However, I can run 'sh' at the 'cmd' window. Any tips will be greatly 
appreciated.  

By the way, I can install and run this package on a Linux machine.  It also 
passes Rcheck there with a few warnings.

Xiaochun

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Bar Plot ggplot2 Filling bars with cross hatching

2009-01-20 Thread Greg Snow

The best thing to do really depends on your situation and what question you are 
trying to answer with your plot.  

Shades of grey has been mentioned, and in some cases that works, but you then 
run into the problem (same problem with colors and hatching density) of 
figuring out which bars to make light and which dark since a plot can have a 
very different visual impact depending on that choice.

Sometimes reordering and/or grouping/regrouping bars in a barplot can convey 
the information with the need for fewer visual distinguishers.  Dotplots have 
also been mentioned already, they are often an improvement on a barplot.  You 
can use different symbols for the dots (or even letters if overlap is small 
enough and the letters give more information) and labeling and grouping are 
more natural.  Sometimes a line plot is an appropriate alternative to a barplot.

Also take into account how the plot will be used/displayed, others have 
mentioned that what looks good on screen may not print out well (also true of 
cross-hatching).  I have seen some overhead projectors where the slide clearly 
had black and grey sections, but when projected, all the grey was black as 
well.  If this is something that will be photocopied, then 
colors/shades/hatches can change in that process and not be distinguishable.

Often the best strategy is to make multiple variations of a graph, then show 
them to someone else for an outside opinion of which best convey the 
information.

If you tell us a bit more about the specifics of the project, we may have more 
or better suggestions.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111

> -Original Message-
> From: stephen sefick [mailto:ssef...@gmail.com]
> Sent: Monday, January 19, 2009 5:37 PM
> To: Greg Snow
> Cc: hadley wickham; R-help
> Subject: Re: [R] Bar Plot ggplot2 Filling bars with cross hatching
> 
> what is your suggestion for distinguishing between many bars without
> color?  I have grown up in the time of standarized tests - good or bad
> I never felt nauseous.
> 
> Stephen
> 
> On Mon, Jan 19, 2009 at 5:20 PM, Greg Snow  wrote:
> > I think the fact that the grid package does not support cross-
> hatching is a feature not a bug (or deficiency), and I hope that this
> is not "fixed".  Tufte's book (The Visual Display of Quantitative
> Information) has a section on why cross-hatching should be avoided
> (unless of course your goal is to induce nausea in the observer rather
> than convey information).
> >
> > I would edit Hadley's statement below to say "fortunately there's no
> way to do this in ggplot2".
> >
> > --
> > Gregory (Greg) L. Snow Ph.D.
> > Statistical Data Center
> > Intermountain Healthcare
> > greg.s...@imail.org
> > 801.408.8111
> >
> >
> >> -Original Message-
> >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> >> project.org] On Behalf Of hadley wickham
> >> Sent: Thursday, January 15, 2009 10:55 AM
> >> To: stephen sefick
> >> Cc: R-help
> >> Subject: Re: [R] Bar Plot ggplot2 Filling bars with cross hatching
> >>
> >> Hi Stephen,
> >>
> >> > #I am putting a test together for an introductory biology class
> and I
> >> > would like to put different cross hatching inside of each bar for
> the
> >> > bar plot below
> >>
> >> ggplot2 uses the grid package to do all the drawing, and currently
> >> grid doesn't support cross-hatching, so unfortunately there's no way
> >> to do this in ggplot2.
> >>
> >> Regards,
> >>
> >> Hadley
> >>
> >> --
> >> http://had.co.nz/
> >>
> >> __
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-
> >> guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> 
> 
> 
> --
> Stephen Sefick
> 
> Let's not spend our time and resources thinking about things that are
> so little or so large that all they really do for us is puff us up and
> make us feel like gods.  We are mammals, and have not exhausted the
> annoying little problems of being mammals.
> 
>   -K. Mullis
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] generalizing expand.table: table -> data.frame

2009-01-20 Thread Michael Friendly


In
http://tolstoy.newcastle.edu.au/R/e2/help/06/10/3064.html
a method was given for converting a frequency table to an expanded data 
frame representing each
observation as a set of factors.  A slightly modified version was later 
included in the NCStats package,

only on http://rforge.net/ (and it has too many dependencies to be useful).

I've tried to make it more general, allowing an input data frame in 
frequency form, and where

the frequency variable is not named "Freq".  This is my working version:

__begin__ expand.table.R
expand.table <- function (x, var.names = NULL, freq="Freq", ...)
{
#  allow: a table object, or a data frame in frequency form
  if(inherits(x,"table")) {
x <- as.data.frame.table(x)
  }
##  This fails:
#   df <- sapply(1:nrow(x), function(i) x[rep(i, each = x[i,freq]), ], 
simplify = FALSE)

#   df <- subset(do.call("rbind", df), select = -freq)

#  This works, when the frequency variable is named Freq
  df <- sapply(1:nrow(x), function(i) x[rep(i, each = x[i,"Freq"]), ], 
simplify = FALSE)

  df <- subset(do.call("rbind", df), select = -Freq)

  for (i in 1:ncol(df)) {
  df[[i]] <- type.convert(as.character(df[[i]]), ...)
  }
  rownames(df) <- NULL
  if (!is.null(var.names)) {
  if (length(var.names) < dim(df)[2])
  stop("Too few var.names given.")
  else if (length(var.names) > dim(df)[2])
  stop("Too many var.names given.")
  else names(df) <- var.names
  }
  df
}
__end__   expand.table.R

Thus for the following table

library(vcd)
art <- xtabs(~Treatment + Improved, data = Arthritis)


> art
Improved
Treatment None Some Marked
 Placebo   297  7
 Treated   137 21

expand.table (above) gives a data frame of sum(art)=84 observations, 
with factors
Treatment and Improved. 


> artdf <- expand.table(art)
> str(artdf)
'data.frame':   84 obs. of  2 variables:
$ Treatment: Factor w/ 2 levels "Placebo","Treated": 1 1 1 1 1 1 1 1 1 
1 ...

$ Improved : Factor w/ 3 levels "Marked","None",..: 2 2 2 2 2 2 2 2 2 2 ...
>

I've generalized this so it works with data frames in frequency form,

> as.data.frame(art)
 Treatment Improved Freq
1   Placebo None   29
2   Treated None   13
3   Placebo Some7
4   Treated Some7
5   Placebo   Marked7
6   Treated   Marked   21

> art.df2 <- expand.table(as.data.frame(art))
> str(art.df2)
'data.frame':   84 obs. of  2 variables:
$ Treatment: Factor w/ 2 levels "Placebo","Treated": 1 1 1 1 1 1 1 1 1 
1 ...

$ Improved : Factor w/ 3 levels "Marked","None",..: 2 2 2 2 2 2 2 2 2 2 ...
>

But--- here's the rub --- when the Freq variable in a data frame is 
called something other than

"Freq", as in this example,

> GSS
sex party count
1 female   dem   279
2   male   dem   165
3 female indep73
4   male indep47
5 female   rep   225
6   male   rep   191

all the changes I've tried, using the freq= argument in expand.table() 
fail in various ways.


Can someone help?

-Michael

--
Michael Friendly Email: friendly AT yorku DOT ca 
Professor, Psychology Dept.

York University  Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Streethttp://www.math.yorku.ca/SCS/friendly.html
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] read a xls file

2009-01-20 Thread Gabor Grothendieck

On Tue, Jan 20, 2009 at 11:21 AM, Ian Jenkinson
 wrote:
> I saw this about RODBC, but it seemed a complicated way of doing things, and
> in any case it seems that to run RODBC you need Excel 2004 or higher, which
> I would need to buy.
> One beautiful thing about R and OOo is that they are excellent and they cost
> nothing, so you don't have to buy, borrow or steal them.

read.xls in gdata does not require Excel and reads xls files directly
on all R platforms.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] creating a list of matrices or data frames

2009-01-20 Thread Simon Pickett


Hi all,

How would you create a list of data.frames within a loop, then bind all the 
elements of the list using rbind?


take this example of matrices with differing numbers of rows

for(i in 1:3){
assign(paste("s",i, sep=""),matrix(data = NA, nrow = i, ncol = 3, byrow = 
FALSE, dimnames = NULL))

}
s1
s2
s3

I want to bind all the matrices at the end with do.call(rbind...)  rather 
than listing all the elements manually with rbind(s1,s2,s3...) and so on.


thanks in advance.

Simon.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error message from CV.GLM

2009-01-20 Thread Prof Brian Ripley


Who said the variables were all in the data frame?  See this

 All the variables in 'formula', 'subset' and in '...' are looked
 for first in 'data' and then in the environment of 'formula' (see
 the help for 'formula()' for further details) and collected into a
 data frame.

Now ydata$y is not in the data frame ... so try putting it there.

On Tue, 20 Jan 2009, Markus Mühlbacher wrote:


Dear list members.

I have problems with the usage of cv.glm from the boot package. Here are some 
parts of the script I wanted to use:

data <- read.table("selected_2D.csv", header=TRUE, sep=",")
…
glm.fitted <- glm("ydata$ y  ~ 1 + density + vsurf_ID6 + vsurf_S ", data=data)
error <- cv.glm(data=data, glm.fitted, K=6)

ydata$y is a separate data set, where I take my independent data from. I build 
an equation with some of the columns in data. Then I generate the generalized 
linear model, which works. But when I try to run the last line – the cv.glm 
function, I get the following error message:

Error in model.frame.default(formula = eqfull, data = list(vsurf_ID6 = 
c(2.4599824,  :
 variable lengths differ (found for 'density')

I fear I don't get the meaning of the error message at all. The length of the 
data columns are all equal. Any help would be kindly appreciated!

Best wishes,
Markus




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Statistics books

2009-01-20 Thread glenn

Not strictly R sorry


Looking for some new books on the topics of; Correlation, and Rank-Reduction
in general

If anyone has any suggestions I would appreciate it

Amazon yielded some books from SAGE publishers which looked ok

Regards

Glenn

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] read a xls file

2009-01-20 Thread Ian Jenkinson

On 2009-01-20 15:24, Gavin Simpson wrote:

On Tue, 2009-01-20 at 14:45 +0100, Ian Jenkinson wrote:

See the R Import/Export manual.  Also
RSiteSearch("import excel")
gives many hits.  It seems as if this question is
asked almost daily.

On Sun, Jan 18, 2009 at 9:15 AM, Michele Santacatterina
 wrote:

Hello,

i have a xls file. I will read it in r, what library-command i use for
this??

any ideas??

I feel concerned because I have just spent a frustrating couple of days 
trying to read an Excel (xls) file, with the aid of the R book (Crawley, 
2007), and R help files. I failed, but finally found a workaround. My 
experience might help others.

You did read ?read.table yes?

Unfortunately not. Now I see that it is rich and detailed. Thank you.
There are three arguments there that can help in such situations: 

'colClasses' allows the finest grained control over how R imports your
text files. You specify what each column is, noting that if you have
lots of columns, things like 

c("numeric", rep("character", 12))

will deal with runs of columns of the same type, without having to type
them all by hand.

I'm a bit mystified by this.

'as.is' is a vector of logicals (TRUE/FALSE) that controls whether a
column is read in as is or converted.

I have just tried your suggestion:

> 
TTT<-read.table("/D/.../090117T_P.txt",header=TRUE,as.is="Log.microplank.biomass")

I CONFIRM IT WORKS! Sure enough elements in my "Log.microplank.biomass" 
column are now "numeric":

> (TTT[5,7])
[1] 1.612784
> class(TTT[5,7])
[1] "numeric"

Indeed, I see that Crawley (2007) gives this example on p.100:
[>] murder<-read.table("c:\\temp\\murders.txt,header=T,as.is="region")
, but until now I hadn't understood what "region" meant, so I didn't see 
how to use "as.is".  Now I realise "region" is a header name in that 
particular data.frame.

Sorry for being such a newby!

'stringsAsFactors' a single logical. Should all character variables be
converted to factors.

I hadn't come across this. This would be useful if you wanted variables 
as factors, but my problem was that I got "factors" when I wanted "numeric"

Some of these would have been useful in your case.

I'm not sure what you tried, but I have found that saving an .xls file
as a CSV via OpenOffice.org (on Linux) and subsequently reading it in
with read.csv("foo.csv", ...) to be reasonably fool proof, especially
when one makes use of the arguments about for fine-grained processing.

I can appreciate this would probably have worked for me too, had I known 
how to do the "fine-grained processing".

Someone in this thread posted a response that included the use of RODBC,
which I haven't tried, but there are a plethora of ways to read data
from Excel without having to torture yourself and the data formats to do
so.

I saw this about RODBC, but it seemed a complicated way of doing things, 
and in any case it seems that to run RODBC you need Excel 2004 or 
higher, which I would need to buy.
One beautiful thing about R and OOo is that they are excellent and they 
cost nothing, so you don't have to buy, borrow or steal them.

HTH

G

Ian

My data were in an Excel xls file

I have R (version 2.6.2) installed in Kubuntu Linux
I also have R (version 2.6.2) installed in Windows XP SP_3 running in 
VirtualBox (a Virtual Computer) in Kubuntu, and I have (very old) Excel 
97 on this system.

I wasted a lot of time exporting from Excel in various formats (txt, 
csv, dif, tab-delimited, ;-delimited ,-delimited, etc.). (I checked they 
were of correct format by peeking with a text editor.)
Then I would try reading using e.g. read.table("[file 
path]",header=TRUE) or read.csv(...) or read.csv2(...), or 
read.DIF(...), with or without "header=TRUE" or "header =FALSE".
I also copied to the "clipboard" and tried reading using 
read.DIF("clipboard")

In many of these cases I did get a data.frame that looked nice on-screen.

My recurrent problem, however, was that many of the numeric variables in 
the resultant data frame were CLASS "factor". If you do arithmetic or 
plotting on factors, either it fails or gives wrong results.

So I spent hours using (as.numeric(...)) with variants and permutations, 
etc. Most times (as.numeric(...)) seems to work, but actually the data 
either remained unchanged (as a "factor") or gave "numeric" but wrong 
numbers.

I read the xls file using gnumeric application and saved as a dif file, 
then used read.DIF("[file path]"). This gave some correct "numeric" 
numbers but jumbled and partly duplicated.

N.B. My problems were essentially the same whether I used R in XP or in 
Linux (kubuntu)

MY SOLUTION (working in Linux):
Read the Excel file (xls) using Open Office.org (version 2.4.1) 
(downloadable for free for Linux or Windows).

Save as dif file.
In R,   TT<-read.DIF("[file path]",header=TRUE)
It worked, and all my numerical data elements were "numeric", correct 
and in the right order. Omit "header=TRUE" if you do

Re: [R] Error message from CV.GLM

2009-01-20 Thread Max Kuhn

> I have problems with the usage of cv.glm from the boot package. Here are some 
> parts of the script I wanted to use:
>
> data <- read.table("selected_2D.csv", header=TRUE, sep=",")
> …
> glm.fitted <- glm("ydata$ y  ~ 1 + density + vsurf_ID6 + vsurf_S ", data=data)
> error <- cv.glm(data=data, glm.fitted, K=6)
>
> ydata$y is a separate data set, where I take my independent data from. I 
> build an equation with some of the columns in data. Then I generate the 
> generalized linear model, which works. But when I try to run the last line – 
> the cv.glm function, I get the following error message:

You are going to have to merge that variable into "data". The formula
interface can't really cope with it otherwise.

As an alternative, the train function in caret can do the same thing
as cv.glm (with a few more options). See

   http://www.jstatsoft.org/v28/i05

for more information.

Max

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] heatmap.2 color issue

2009-01-20 Thread Liu, Hao [CNTUS]

Dear All:

I tried to use heatmap.2 to generate hierarchical clustering using the 
following command:

heatmap.2(datamatrix, scale="row", trace="none", col=greenred(256), 
labRow=genelist[,1], margins=c(10,10), Rowv=TRUE, Colv=TRUE)

datamatrix is subset of a RMA normalized data subset by a genelist. 

The problem is a lot of times, the z-score in key are from, like -5 to 15 or 
-15 to 5, as a result, the zero of z distribution are are either green region 
or red region of
the key, the resulting heatmap are either generally greenish or redish.

I wonder if there is a way to make the heatmap more balanced between red and 
green, I tried to read the heatmap.2 help but could not get a clear idea.

Thanks
Hao


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Error message from CV.GLM

2009-01-20 Thread Markus Mühlbacher

Dear list members.

I have problems with the usage of cv.glm from the boot package. Here are some 
parts of the script I wanted to use:

data <- read.table("selected_2D.csv", header=TRUE, sep=",")
… 
glm.fitted <- glm("ydata$ y  ~ 1 + density + vsurf_ID6 + vsurf_S ", data=data)
error <- cv.glm(data=data, glm.fitted, K=6)

ydata$y is a separate data set, where I take my independent data from. I build 
an equation with some of the columns in data. Then I generate the generalized 
linear model, which works. But when I try to run the last line – the cv.glm 
function, I get the following error message:

Error in model.frame.default(formula = eqfull, data = list(vsurf_ID6 = 
c(2.4599824,  :
  variable lengths differ (found for 'density')

I fear I don't get the meaning of the error message at all. The length of the 
data columns are all equal. Any help would be kindly appreciated!

Best wishes,
Markus




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] two-sample test of multinomial proportion

2009-01-20 Thread Gustaf Rydevik

On Tue, Jan 20, 2009 at 4:08 PM, Gustaf Rydevik
 wrote:
> Hi all,
>
> This is perhaps more a statistics question than an R question, but I
> hope it's OK anyhow.
>
> I have some data (see below) with the number of tests positive to
> subtype H1 of a virus, the number of tests postive to subtype H3, and
> the total number of tests. This is for two different groups, and the
> two subtypes are mutually exclusive.
>
> What is the best way to test if the proportion of H1 tests to all
> positive tests differ between the two groups?
> I could run prop.test() on just the H1 and H3 part of the data,
> ignoring the total number of tests. But this seem to skip some
> information regarding variance of H1/H3 in the two groups, so I don't
> think it is correct.
>
> I've tried using a bootstrap approach on the ratio of the two
> proportions, but there must be a smarter way.
> Any help is much appreciated!
>
> Best regards,
>
> Gustaf Rydevik
>
>
> data and bootstrap attempt ###
> multi.data<-data.frame(
>  group=c("a","b"),
>  H1=c(2,12),
>  H3=c(21,46),
>  tests=c(189,411)
> )
> multi.ind<-data.frame(Type=
> rep(c("H1","H3","Neg"),c(2+12,21+46,189+411-2-12-21-46)),
> group=rep(c("a","b","a","b","a","b"),c(2,12,21,46,189-2-21,411-12-46))
> )
>
> props1<-vector(mode="numeric",length=1000)
> props2<-vector(mode="numeric",length=1000)
> for(i in 1:1000){
> sub.tab<-t(table(Subtyp.orig[sample(1:nrow(Subtyp.orig),nrow(Subtyp.orig),replace=TRUE),]))
> props1[i]<-sub.tab[1,1]/(sub.tab[1,1]+sub.tab[1,2])
> props2[i]<-sub.tab[2,1]/(sub.tab[2,1]+sub.tab[2,2])
> }
> sub.kvot<-props1/props2
> sort(sub.kvot)[50]
> sort(sub.kvot)[950]
>
>
>
> --
> Gustaf Rydevik, M.Sci.
> tel: +46(0)703 051 451
> address:Essingetorget 40,112 66 Stockholm, SE
> skype:gustaf_rydevik
>

ooops - forgot to change a name of the bootstrap code. Below is a
corrected version.

/Gustaf

data and bootstrap attempt ###
multi.data<-data.frame(
 group=c("a","b"),
 H1=c(2,12),
 H3=c(21,46),
 tests=c(189,411)
)
multi.ind<-data.frame(Type=
rep(c("H1","H3","Neg"),c(2+12,21+46,189+411-2-12-21-46)),
group=rep(c("a","b","a","b","a","b"),c(2,12,21,46,189-2-21,411-12-46))
)

props1<-vector(mode="numeric",length=1000)
props2<-vector(mode="numeric",length=1000)
for(i in 1:1000){
sub.tab<-t(table(multi.ind[sample(1:nrow(multi.ind),nrow(multi.ind),replace=TRUE),]))
props1[i]<-sub.tab[1,1]/(sub.tab[1,1]+sub.tab[1,2])
props2[i]<-sub.tab[2,1]/(sub.tab[2,1]+sub.tab[2,2])
}
sub.kvot<-props1/props2
sort(sub.kvot)[50]
sort(sub.kvot)[950]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regex -> negate a word

2009-01-20 Thread Wacek Kusnierczyk

Wacek Kusnierczyk wrote:
>
> attached are patches to character.c, names.c, and grep.R; if you tell me
>   

forgot to add:  the patches are against the latest r-devel
(19.01.2009).  compiled and tested on 32b Ubuntu 8.04.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] bug in R2WinBUGS

2009-01-20 Thread Uwe Ligges




John Smith wrote:

In newest version of R2WinBUGS, the default directory is changed to
working.directory, but never changed back once finished bugs call.


Now finally fixed, thanks again for the report and sorry for the delay.

Best,
Uwe





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regex -> negate a word

2009-01-20 Thread Wacek Kusnierczyk

Prof Brian Ripley wrote:
> On Mon, 19 Jan 2009, Rolf Turner wrote:
>
>>
>> On 19/01/2009, at 10:44 AM, Gabor Grothendieck wrote:
>>
>>> Well, that's why it was only provided when you insisted.  This is
>>> not what regexp's are good at.
>>>
>>> On Sun, Jan 18, 2009 at 4:35 PM, Rau, Roland  wrote:
 Thanks! (I have to admit, though, that I expected something simple)
>>
>> It may not be what regexp's are good at, but the grep command in
>> unix/linux
>> does what is required *very* simply via the ``-v'' flag.  I
>> conjecture that
>> it would not be difficult to add an argument with similar impact to the
>> grep() function in R.
>
> Indeed.  I have often wondered why grep() returned indices, when a
> logical vector would seem more natural in R (and !grep(...) would have
> been all that was needed).
>
> Looking at the code I see it does in fact compute a logical vector,
> just not return it.  So adding 'invert' (the long-form of -v is
> --invert) is a job of a very few lines and I have done so for 2.9.0.
>

in fact, it's simpler than that.  instead of redundantly distributing
the fix over four different lines in character.c, it's enough to ^= the
logical vector of matched/unmatched flags in just one place, on-the-fly,
close to the end of the loop over the vector of input strings.  see
attached patch.

for consistency, you might want to
- name the internal invert flag 'invert_opt' instead of 'invert';
- apply the same fix to agrep.

it's also trivial to add another argument to grep, say 'logical', which
will cause grep to return a logical vector of the same length as the
input strings vector.  see the attached patch.  note: i am novice to r
internals, and i get some mystical warnings i haven't decoded yet while
using the extended grep, but otherwise the code compiles well and grep
works as intended; you'd need to fix the cause of the warnings.

if you want the 'logical' argument, you need to decide how it interacts
with 'values'.  in the patch, 'values' set to TRUE resets 'logical' to
FALSE, with a warning.

further suggestions:  the arguments 'values' and 'logical' could be
replaced with one argument, say 'output', which would take a value from
{'indices', 'values', 'logical'}.  it might make further extensions
easier to implement and maintain.

attached are patches to character.c, names.c, and grep.R; if you tell me
which other files need a patch to get rid of the warnigns (see below),
i'll make one. 

s = c("abc", "bcd", "cde")

grep("b", s)
# 1 2

grep("b", s, value=TRUE)
# "abc" "bcd"

grep("b", s, logical=TRUE)
# TRUE TRUE FALSE

s[grep("b", s, logical=TRUE)]
# "abc" "bcd"
# Warning: stack imbalance in 'grep', 9 then 10
# Warning: stack imbalance in '.Internal', 8 then 9
# Warning: stack imbalance in '{', 6 then 7

grep("b", s, invert=TRUE)
# 3

grep("b", s, invert=TRUE, value=TRUE)
# "cde"

s[!grep("b", s, logical)]
# "cde"
# Warning: stack imbalance in 'grep', 15 then 16
# Warning: stack imbalance in '.Internal', 14 then 15
# Warning: stack imbalance in '{', 12 then 13
# Warning: stack imbalance in '!', 6 then 7
# Warning: stack imbalance in '[', 2 then 3

vQ
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] two-sample test of multinomial proportion

2009-01-20 Thread Gustaf Rydevik

Hi all,

This is perhaps more a statistics question than an R question, but I
hope it's OK anyhow.

I have some data (see below) with the number of tests positive to
subtype H1 of a virus, the number of tests postive to subtype H3, and
the total number of tests. This is for two different groups, and the
two subtypes are mutually exclusive.

What is the best way to test if the proportion of H1 tests to all
positive tests differ between the two groups?
I could run prop.test() on just the H1 and H3 part of the data,
ignoring the total number of tests. But this seem to skip some
information regarding variance of H1/H3 in the two groups, so I don't
think it is correct.

I've tried using a bootstrap approach on the ratio of the two
proportions, but there must be a smarter way.
Any help is much appreciated!

Best regards,

Gustaf Rydevik


data and bootstrap attempt ###
multi.data<-data.frame(
  group=c("a","b"),
  H1=c(2,12),
  H3=c(21,46),
  tests=c(189,411)
)
multi.ind<-data.frame(Type=
rep(c("H1","H3","Neg"),c(2+12,21+46,189+411-2-12-21-46)),
group=rep(c("a","b","a","b","a","b"),c(2,12,21,46,189-2-21,411-12-46))
)

props1<-vector(mode="numeric",length=1000)
props2<-vector(mode="numeric",length=1000)
for(i in 1:1000){
sub.tab<-t(table(Subtyp.orig[sample(1:nrow(Subtyp.orig),nrow(Subtyp.orig),replace=TRUE),]))
props1[i]<-sub.tab[1,1]/(sub.tab[1,1]+sub.tab[1,2])
props2[i]<-sub.tab[2,1]/(sub.tab[2,1]+sub.tab[2,2])
}
sub.kvot<-props1/props2
sort(sub.kvot)[50]
sort(sub.kvot)[950]



-- 
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] from matrix to data.frame

2009-01-20 Thread Antje


Wow, there are a lot of possibilities... thank you all very much!!!
I guess, I'll go for "as.data.frame.table", because it's one line and does 
exactly what I want :-)


Ciao,
Antje



Antje schrieb:

Hello,

I have a question how to reshape a given matrix to a data frame.

# --
 > a <- matrix(1:25, nrow=5)
 > a
 [,1] [,2] [,3] [,4] [,5]
[1,]16   11   16   21
[2,]27   12   17   22
[3,]38   13   18   23
[4,]49   14   19   24
[5,]5   10   15   20   25

 > colnames(a) <- LETTERS[1:5]
 > rownames(a) <- as.character(1:5)
 > a
  A  B  C  D  E
1 1  6 11 16 21
2 2  7 12 17 22
3 3  8 13 18 23
4 4  9 14 19 24
5 5 10 15 20 25

# ---

This is an example on how my matrix looks like.
Now, I'd like to reshape the data that I get a data frame with three 
columns:


- the row name of the enty (X1)
- the column name of the entry (X2)
- the entry itself (X3)

like:

X1X2X3
1A1
2A2
3A3

1B6
2B7

5E25

How would you solve this problem in an elegant way?

Antje

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] from matrix to data.frame

2009-01-20 Thread jim holtman

the reshape package might also help:

> require(reshape)
> x
  A  B  C  D  E
1 1  6 11 16 21
2 2  7 12 17 22
3 3  8 13 18 23
4 4  9 14 19 24
5 5 10 15 20 25
> melt(t(x))
   X1 X2 value
1   A  1 1
2   B  1 6
3   C  111
4   D  116
5   E  121
6   A  2 2
7   B  2 7
8   C  212
9   D  217
10  E  222
11  A  3 3
12  B  3 8
13  C  313
14  D  318
15  E  323
16  A  4 4
17  B  4 9
18  C  414
19  D  419
20  E  424
21  A  5 5
22  B  510
23  C  515
24  D  520
25  E  525


On Tue, Jan 20, 2009 at 9:10 AM, Antje  wrote:
> Hello,
>
> I have a question how to reshape a given matrix to a data frame.
>
> # --
>> a <- matrix(1:25, nrow=5)
>> a
> [,1] [,2] [,3] [,4] [,5]
> [1,]16   11   16   21
> [2,]27   12   17   22
> [3,]38   13   18   23
> [4,]49   14   19   24
> [5,]5   10   15   20   25
>
>> colnames(a) <- LETTERS[1:5]
>> rownames(a) <- as.character(1:5)
>> a
>  A  B  C  D  E
> 1 1  6 11 16 21
> 2 2  7 12 17 22
> 3 3  8 13 18 23
> 4 4  9 14 19 24
> 5 5 10 15 20 25
>
> # ---
>
> This is an example on how my matrix looks like.
> Now, I'd like to reshape the data that I get a data frame with three
> columns:
>
> - the row name of the enty (X1)
> - the column name of the entry (X2)
> - the entry itself (X3)
>
> like:
>
> X1  X2  X3
> 1   A   1
> 2   A   2
> 3   A   3
> 
> 1   B   6
> 2   B   7
> 
> 5   E   25
>
> How would you solve this problem in an elegant way?
>
> Antje
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] from matrix to data.frame

2009-01-20 Thread Marc Schwartz

on 01/20/2009 08:10 AM Antje wrote:
> Hello,
> 
> I have a question how to reshape a given matrix to a data frame.
> 
> # --
>> a <- matrix(1:25, nrow=5)
>> a
>  [,1] [,2] [,3] [,4] [,5]
> [1,]16   11   16   21
> [2,]27   12   17   22
> [3,]38   13   18   23
> [4,]49   14   19   24
> [5,]5   10   15   20   25
> 
>> colnames(a) <- LETTERS[1:5]
>> rownames(a) <- as.character(1:5)
>> a
>   A  B  C  D  E
> 1 1  6 11 16 21
> 2 2  7 12 17 22
> 3 3  8 13 18 23
> 4 4  9 14 19 24
> 5 5 10 15 20 25
> 
> # ---
> 
> This is an example on how my matrix looks like.
> Now, I'd like to reshape the data that I get a data frame with three
> columns:
> 
> - the row name of the enty (X1)
> - the column name of the entry (X2)
> - the entry itself (X3)
> 
> like:
> 
> X1X2X3
> 1A1
> 2A2
> 3A3
> 
> 1B6
> 2B7
> 
> 5E25
> 
> How would you solve this problem in an elegant way?
> 
> Antje


See ?as.data.frame.table

DF.a <- as.data.frame.table(a)

colnames(DF.a) <- paste("X", 1:ncol(DF.a), sep = "")

> DF.a
   X1 X2 X3
1   1  A  1
2   2  A  2
3   3  A  3
4   4  A  4
5   5  A  5
6   1  B  6
7   2  B  7
8   3  B  8
9   4  B  9
10  5  B 10
11  1  C 11
12  2  C 12
13  3  C 13
14  4  C 14
15  5  C 15
16  1  D 16
17  2  D 17
18  3  D 18
19  4  D 19
20  5  D 20
21  1  E 21
22  2  E 22
23  3  E 23
24  4  E 24
25  5  E 25


HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] from matrix to data.frame

2009-01-20 Thread Carlos J. Gil Bellosta

Hello, 

The columns in your output dataframe are the following vectors:

X1: as.vector( row(a) )
X2: colnames(a)[as.vector( col(a) )]
X3: as.vector( a )

Best regards,

Carlos J. Gil Bellosta
http://www.datanalytics.com

On Tue, 2009-01-20 at 15:10 +0100, Antje wrote:
> Hello,
> 
> I have a question how to reshape a given matrix to a data frame.
> 
> # --
>  > a <- matrix(1:25, nrow=5)
>  > a
>   [,1] [,2] [,3] [,4] [,5]
> [1,]16   11   16   21
> [2,]27   12   17   22
> [3,]38   13   18   23
> [4,]49   14   19   24
> [5,]5   10   15   20   25
> 
>  > colnames(a) <- LETTERS[1:5]
>  > rownames(a) <- as.character(1:5)
>  > a
>A  B  C  D  E
> 1 1  6 11 16 21
> 2 2  7 12 17 22
> 3 3  8 13 18 23
> 4 4  9 14 19 24
> 5 5 10 15 20 25
> 
> # ---
> 
> This is an example on how my matrix looks like.
> Now, I'd like to reshape the data that I get a data frame with three columns:
> 
> - the row name of the enty (X1)
> - the column name of the entry (X2)
> - the entry itself (X3)
> 
> like:
> 
> X1X2  X3
> 1 A   1
> 2 A   2
> 3 A   3
> 
> 1 B   6
> 2 B   7
> 
> 5 E   25
> 
> How would you solve this problem in an elegant way?
> 
> Antje
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] from matrix to data.frame

2009-01-20 Thread jim holtman

?stack

> x <- read.table(textConnection(" A  B  C  D  E
+ 1 1  6 11 16 21
+ 2 2  7 12 17 22
+ 3 3  8 13 18 23
+ 4 4  9 14 19 24
+ 5 5 10 15 20 25"), header=TRUE)
>
> x
  A  B  C  D  E
1 1  6 11 16 21
2 2  7 12 17 22
3 3  8 13 18 23
4 4  9 14 19 24
5 5 10 15 20 25
> stack(x)
   values ind
1   1   A
2   2   A
3   3   A
4   4   A
5   5   A
6   6   B
7   7   B
8   8   B
9   9   B
10 10   B
11 11   C
12 12   C
13 13   C
14 14   C
15 15   C
16 16   D
17 17   D
18 18   D
19 19   D
20 20   D
21 21   E
22 22   E
23 23   E
24 24   E
25 25   E
>


On Tue, Jan 20, 2009 at 9:10 AM, Antje  wrote:
> Hello,
>
> I have a question how to reshape a given matrix to a data frame.
>
> # --
>> a <- matrix(1:25, nrow=5)
>> a
> [,1] [,2] [,3] [,4] [,5]
> [1,]16   11   16   21
> [2,]27   12   17   22
> [3,]38   13   18   23
> [4,]49   14   19   24
> [5,]5   10   15   20   25
>
>> colnames(a) <- LETTERS[1:5]
>> rownames(a) <- as.character(1:5)
>> a
>  A  B  C  D  E
> 1 1  6 11 16 21
> 2 2  7 12 17 22
> 3 3  8 13 18 23
> 4 4  9 14 19 24
> 5 5 10 15 20 25
>
> # ---
>
> This is an example on how my matrix looks like.
> Now, I'd like to reshape the data that I get a data frame with three
> columns:
>
> - the row name of the enty (X1)
> - the column name of the entry (X2)
> - the entry itself (X3)
>
> like:
>
> X1  X2  X3
> 1   A   1
> 2   A   2
> 3   A   3
> 
> 1   B   6
> 2   B   7
> 
> 5   E   25
>
> How would you solve this problem in an elegant way?
>
> Antje
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] WinBUGS with R

2009-01-20 Thread Frank E Harrell Jr

We are having good success using JAGS and the rjags package.  We've put 
some information at http://biostat.mc.vanderbilt.edu/JAGSInstallExample 
for linux.  It's nice to have a native linux executable, thanks to 
Martyn Plummer.   For those unfamiliar with JAGS, it uses the BUGS 
language, i.e., the user still creates a .bugs file as with WinBUGS.


Frank


Uwe Ligges wrote:



Uwe Ligges wrote:



richard.cot...@hsl.gov.uk wrote:

I am having some problems using R with WinBUGS using the R2WinBUGS
package. Specifically, when I try to run bugs() I get the following
message.

Error in FUN(X[[1L]], ...) :
.C(..): 'type' must be "real" for this format
To give a little more context, my bugs() command (for a multilevel
ordinal logit  similar to Gelman and Hill, Data Analysis Using
Regression and Multilevel/Hierarchical Models p. 383 is:

Wednesbury.data <- list ("n.judge", "n", "n.cut", "y" "judge", "ct",
"ra", "lg")

Wednesbury.inits <- function(){
  list(C=matrix(0,39,2))
   }



Untested, but I think it needs to be

 Wednesbury.inits <- function(){
   matrix(0,39,2)
 }



No, in fact I was wrong...

Uwe

and a function is of interest if some randomness should be in the 
inits...






Wednesbury.parameters <- c("C", "b1", "b2", "b3")


Debugging your BUGS model or dataset via R is a bit of a pain.  I 
find that the best way (or maybe least worst way) to weed out the 
problems when you get an error like this is to find the files 
(model/data/inits) that R2WinBUGS has created and open them in 
WinBUGS itself.  Run the Model Specification tool and you can more 
easily determine which part of the file the problem lies in.


Just looking at your Wednesbury.inits variable, you don't need to 
define it as a function

Wednesbury.inits <- list(list(C=matrix(0,39,2)))
will do.


Yes.



Also, I'm not sure if WinBUGS understands matrix data types (though I 
may be wrong).


It does.


Uwe



Regards,
Richie.

Mathematical Sciences Unit
HSL



ATTENTION:

This message contains privileged and confidential 
inform...{{dropped:20}}


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] maptools, sunriset, POSIX timezones

2009-01-20 Thread Roger Bivand

Phil Taylor  gmail.com> writes:

> 
> Hi ...
> 
> I wonder if anyone can provide some insight into why the first three 
> examples using the sunriset function (appended below, with results) give 
> the correct answer, but the fourth generates and error.
> 
> The first two use ISOdatetime with and without a time zone attribute, 
> and the sunriset function returns the correct sunset time.
> 
> The third and fourth adds 10 seconds to the ISOdatetime (with and 
> without the time zone attribute) but the function only works when the 
> time zone is specified (example 3).
> 
> When I look at the objects (print or str) they appear the same, and when 
> I check to see if they are equivalent (e.g.)
> 
>  > time.1 <- ISOdatetime(1970, 1, 1, 10, 0, 0) + 10
>  > time.2 <- ISOdatetime(1970, 1, 1, 10, 0, 0, tz="GMT") + 10
>  > time.1 == time.2
> [1] TRUE
> 
> they appear to be the same.
> 
> I wonder if I am either missing something important, doing something 
> improperly, or if there is a small bug somewhere.

There may be a misunderstanding, but a small bug is also possible. In an
internal call to as.POSIXct(), a NULL value was being passed to the tz= 
argument when 10 seconds had been added and tz= not given in ISOdatetime().

In forthcoming maptools 0.7-18, the value being passed is tested, and the tz=
argument dropped if the value is NULL. With this:

> library(maptools)
Loading required package: foreign
Loading required package: sp
> Sys.setenv(TZ = "GMT")
> location <- matrix(c(-80.1,42.5), nrow=1)
> sunriset(location, ISOdatetime(1970, 1, 1, 10, 0, 0, tz="GMT"), 
direction="sunset", POSIXct.out=TRUE)
   day_fractime
newlon 0.915226 1970-01-01 21:57:55
> sunriset(location, ISOdatetime(1970, 1, 1, 10, 0, 0), direction="sunset", 
POSIXct.out=TRUE)
   day_fractime
newlon 0.915226 1970-01-01 21:57:55
> sunriset(location, ISOdatetime(1970, 1, 1, 10, 0, 0, tz="GMT") + 10, 
direction="sunset", POSIXct.out=TRUE)
   day_fractime
newlon 0.915226 1970-01-01 21:57:55
> sunriset(location, ISOdatetime(1970, 1, 1, 10, 0, 0) + 10, 
direction="sunset", POSIXct.out=TRUE)
   day_fractime
newlon 0.915226 1970-01-01 21:57:55

Thanks for the report - it could well be that the time classes are not being 
used well here - in addition, the sunset etc. functions of course only look
at yearday, so adding 10 seconds isn't very meaningful, but I guess this is
a simple example of a real problem.

Roger Bivand

> 
> I'm using windows Vista and R 2.8.1
> 
> Thanks,
> 
> Phil Taylor
> ptaylor  resalliance.org
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Bayesian Residual Analysis and GOF for Ordinal Regression Models with Latent Variable Formulation

2009-01-20 Thread reezwan you

Hi:
I am wondering if someone could please guide me to some package or 
implementation of Bayesian Residual Analysis and GOF for Ordinal Regression 
Models with Latent Variable Formulation. I've chekced arm package which uses 
glm formulation (howeve, I'm looking for Latent Variable/Data Augmentation 
formulation), and I couldn't find any functions for residual analysis. MCMCpack 
package method: MCMCoprobit uses data augmentation for ordered probit 
regression, but doesn't give bayesian latent residuals. Any help would be much 
appreciated.
Thanks
Reez

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] read a xls file

2009-01-20 Thread Gavin Simpson

On Tue, 2009-01-20 at 14:45 +0100, Ian Jenkinson wrote:
> > See the R Import/Export manual.  Also
> > RSiteSearch("import excel")
> > gives many hits.  It seems as if this question is
> > asked almost daily.
> >
> > On Sun, Jan 18, 2009 at 9:15 AM, Michele Santacatterina
> >  wrote:
> >   
> >> > Hello,
> >> >
> >> > i have a xls file. I will read it in r, what library-command i use for
> >> > this??
> >> >
> >> > any ideas??
> I feel concerned because I have just spent a frustrating couple of days 
> trying to read an Excel (xls) file, with the aid of the R book (Crawley, 
> 2007), and R help files. I failed, but finally found a workaround. My 
> experience might help others.

You did read ?read.table yes?

There are three arguments there that can help in such situations: 

'colClasses' allows the finest grained control over how R imports your
text files. You specify what each column is, noting that if you have
lots of columns, things like 

c("numeric", rep("character", 12))

will deal with runs of columns of the same type, without having to type
them all by hand.

'as.is' is a vector of logicals (TRUE/FALSE) that controls whether a
column is read in as is or converted.

'stringsAsFactors' a single logical. Should all character variables be
converted to factors.

Some of these would have been useful in your case.

I'm not sure what you tried, but I have found that saving an .xls file
as a CSV via OpenOffice.org (on Linux) and subsequently reading it in
with read.csv("foo.csv", ...) to be reasonably fool proof, especially
when one makes use of the arguments about for fine-grained processing.

Someone in this thread posted a response that included the use of RODBC,
which I haven't tried, but there are a plethora of ways to read data
from Excel without having to torture yourself and the data formats to do
so.

HTH

G

> 
> My data were in an Excel xls file
> 
> I have R (version 2.6.2) installed in Kubuntu Linux
> I also have R (version 2.6.2) installed in Windows XP SP_3 running in 
> VirtualBox (a Virtual Computer) in Kubuntu, and I have (very old) Excel 
> 97 on this system.
> 
> I wasted a lot of time exporting from Excel in various formats (txt, 
> csv, dif, tab-delimited, ;-delimited ,-delimited, etc.). (I checked they 
> were of correct format by peeking with a text editor.)
> Then I would try reading using e.g. read.table("[file 
> path]",header=TRUE) or read.csv(...) or read.csv2(...), or 
> read.DIF(...), with or without "header=TRUE" or "header =FALSE".
> I also copied to the "clipboard" and tried reading using 
> read.DIF("clipboard")
> In many of these cases I did get a data.frame that looked nice on-screen.
> 
> My recurrent problem, however, was that many of the numeric variables in 
> the resultant data frame were CLASS "factor". If you do arithmetic or 
> plotting on factors, either it fails or gives wrong results.
> 
> So I spent hours using (as.numeric(...)) with variants and permutations, 
> etc. Most times (as.numeric(...)) seems to work, but actually the data 
> either remained unchanged (as a "factor") or gave "numeric" but wrong 
> numbers.
> 
> I read the xls file using gnumeric application and saved as a dif file, 
> then used read.DIF("[file path]"). This gave some correct "numeric" 
> numbers but jumbled and partly duplicated.
> 
> N.B. My problems were essentially the same whether I used R in XP or in 
> Linux (kubuntu)
> 
> MY SOLUTION (working in Linux):
> Read the Excel file (xls) using Open Office.org (version 2.4.1) 
> (downloadable for free for Linux or Windows).
> Save as dif file.
> In R,   TT<-read.DIF("[file path]",header=TRUE)
> It worked, and all my numerical data elements were "numeric", correct 
> and in the right order. Omit "header=TRUE" if you don't want the first 
> elements of the spreadsheet columns declared as headers.
> 
> Hope this may help someone.
> 
> Here's a subset of my data in a data.frame (environmental data on 
> plankton):
> 
>  >TT
>   Stn Day Mean.salinity Mean.temperature Secchi.disc. 
> Log.microplank.biomass
> 11  12 0   14  0.7
> 1.954242509
> 21  70  13.516.55  0.3
> 3.083860801
> 31  93 13.4516.85  0.6
> 2.651278014
> 41 153  6.78 14.2  0.5
> 2.075546961
> 51 200 0  9.3  0.7
> 1.612783857
> 61 231 0  7.1  0.8
> 1.491361694
> 71 283 0  8.8  0.4
> 2.123851641
> 81 330  4.95 9.45  0.3
> 2.276461804
> 91 370  16.6 12.3  0.4
> 2.728353782
> 10   3  12 16.2511.95 0.55
> 2.025305865
> 11   3  70 22.35 16.1  0.5
> 2.096910013
> 12   3  93 26.05

Re: [R] Proportional response and boosting

2009-01-20 Thread Imelda.Somodi


Dear Gerard,

Thank you very much for the quick answer. I am a bit uncertain, what you
meant by total frequency of type A. I have data on the extent of type A at a
location (let's call it freqA), this can be taken for fequency, yes and I
have information on the total extent of all the nautral vegetation types (A
to Z), lets call the latter freqAZ. Both freqA and freqAZ are vectors, with
individual values for each observation point.

So to make clear what I did so far I modelled just as you wrote:

pi=freqA/freqAZ

veg.glm = glm ( pi ~ x, weights = freqAZ, family=binomial),

Did you mean that too? Or what do you propose to use as weights and for
standardisation instead feqAZ (as you write "total freq of type A")? FreqA
over all the observations? That yould make a scalar to be a weight. I'm sure
you meant something else.

Thank you,

Imelda



Gerard M. Keogh wrote:
> 
> Quick response on the binomial:
> 
> If possible I would suggest you should model
> 
> pi = (number/freq of type A) / (total_freq of type A)
> 
> veg.glm = glm ( pi ~ x, weights = total_freq, family=binomial)
> 
> The glm method is supposed to work only on the natural numbers (inc 0!)
> but
> also works for decimal data - it gives a warning in these cases which can
> be ignored.
> 
> Hope this helps!
> Gerard
> 
> 
> 
> 
>
>  "Imelda.Somodi"   
>ail.com>   To 
>  Sent by:  r-help@r-project.org
>  r-help-boun...@r-  cc 
>  project.org   
>Subject 
>[R]  Proportional response and  
>  20/01/2009 09:11  boosting
>
>
>
>
>
>
> 
> 
> 
> 
> 
> Dear experts of boosting!
> 
> I am planning to build vegetation models via boosting with either gbm or
> mboost. My problem is that my response variable is the proportion of a
> vegetation type in natural vegetation at a location.
> 
> ResponseA = (area of vegetation type A/area of all natural vegetation
> types)
> 
> That means that the response has a continuous distribution between 0 and 1
> with many 0s and 1s as well. As I understood from reading these forums, it
> is pretty close to a beta distribution with the exception that the
> marginal
> values (0,1) are also included. Because of the latter feature I cannot
> even
> build a beta regression, not that I could do a boosted variant of that.
> Nevertheless, I can think of my response as a binomial one with values
> between 0 and 1 and take 1 square meter (as if it was a pixel) of natural
> vegetation as an observation. This way I can do binomial glms for my data,
> so that I specify the no. of square meters of natural vegetation as
> weights
> (I round them to get integers to be applicable in glm). I hope I am
> allowed
> to post a side-question here. I always get a warning with these glms
> though.
> I give here a simple one-variable example:
> 
> Call: tmp <- glm(ossz_ujstand2$k2_stand ~ BIO_1 + I((BIO_1)^2),
> family=binomial, na.action=na.omit,weights= ossz_ujstand2$weights),
> 
> Where BIO_1 is a variable describing climate, and weights are the area of
> natural vegetation rounded to integers for each observation (a vector).
> 
> Warning: "non-integer #successes in a binomial glm!"
> 
> I read somewhere on this site that this can be normal, but would be
> reassured if it was stated that it is indeed so in my case as well.
> 
> My problem with boosting is that I don’t know how to handle my response
> variable distribution. I am not quite sure how to treat the loss function
> either. It seems to me that it somehow corresponds to the link function as
> it needs to be defined by family() like link functions in glm. The
> potential
> choices for family also correspond. At the same time some papers about
> boosting imply to me that the loss function takes more the role of the
> curve
> estimation technique and that data with any distribution can be boosted
> with
> any type of loss functions.
> As a start I tried to do the same with boosting as I did with glms. Here
> is
> an example.
> 
> With mboost:
> index<-!is.na(ossz_ujstand2$k2_stand)# I need this to remove
> NAs
> proba.b

[R] from matrix to data.frame

2009-01-20 Thread Antje


Hello,

I have a question how to reshape a given matrix to a data frame.

# --
> a <- matrix(1:25, nrow=5)
> a
 [,1] [,2] [,3] [,4] [,5]
[1,]16   11   16   21
[2,]27   12   17   22
[3,]38   13   18   23
[4,]49   14   19   24
[5,]5   10   15   20   25

> colnames(a) <- LETTERS[1:5]
> rownames(a) <- as.character(1:5)
> a
  A  B  C  D  E
1 1  6 11 16 21
2 2  7 12 17 22
3 3  8 13 18 23
4 4  9 14 19 24
5 5 10 15 20 25

# ---

This is an example on how my matrix looks like.
Now, I'd like to reshape the data that I get a data frame with three columns:

- the row name of the enty (X1)
- the column name of the entry (X2)
- the entry itself (X3)

like:

X1  X2  X3
1   A   1
2   A   2
3   A   3

1   B   6
2   B   7

5   E   25

How would you solve this problem in an elegant way?

Antje

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] problem with writing data to *.xls file

2009-01-20 Thread venkata kirankumar

Hi all,
I read data from *.xls file and i did some caliculations on that data and
now i have to create a column in the same .xls file
and i have to insert the data in to the consicutive rows related to the
previous data
i tried it with *write.xls() *but the thing is it deleted all the columns
previously presented in that file and it created a column and inserted data
can any one suggest what to do for this

thanks in advance

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] WinBUGS with R

2009-01-20 Thread Uwe Ligges




Uwe Ligges wrote:



richard.cot...@hsl.gov.uk wrote:

I am having some problems using R with WinBUGS using the R2WinBUGS
package. Specifically, when I try to run bugs() I get the following
message.

Error in FUN(X[[1L]], ...) :
.C(..): 'type' must be "real" for this format
To give a little more context, my bugs() command (for a multilevel
ordinal logit  similar to Gelman and Hill, Data Analysis Using
Regression and Multilevel/Hierarchical Models p. 383 is:

Wednesbury.data <- list ("n.judge", "n", "n.cut", "y" "judge", "ct",
"ra", "lg")

Wednesbury.inits <- function(){
  list(C=matrix(0,39,2))
   }



Untested, but I think it needs to be

 Wednesbury.inits <- function(){
   matrix(0,39,2)
 }



No, in fact I was wrong...

Uwe


and a function is of interest if some randomness should be in the inits...





Wednesbury.parameters <- c("C", "b1", "b2", "b3")


Debugging your BUGS model or dataset via R is a bit of a pain.  I find 
that the best way (or maybe least worst way) to weed out the problems 
when you get an error like this is to find the files 
(model/data/inits) that R2WinBUGS has created and open them in WinBUGS 
itself.  Run the Model Specification tool and you can more easily 
determine which part of the file the problem lies in.


Just looking at your Wednesbury.inits variable, you don't need to 
define it as a function

Wednesbury.inits <- list(list(C=matrix(0,39,2)))
will do.


Yes.



Also, I'm not sure if WinBUGS understands matrix data types (though I 
may be wrong).


It does.


Uwe



Regards,
Richie.

Mathematical Sciences Unit
HSL



ATTENTION:

This message contains privileged and confidential inform...{{dropped:20}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merging tables

2009-01-20 Thread Carlos J. Gil Bellosta

Hello,

Use merge. 

Best regards,

Carlos J. Gil Bellosta
http://www.datanalytics.com


On Tue, 2009-01-20 at 13:41 +, Dry, Jonathan R wrote:
> I am relatively new to R and am trying to do some basic data manipulation.  
> Basically I have a table (csv - table 1) of data for a set of samples (rows), 
> and a second table (table 2) of information about a subset of samples of 
> particular interest.  I want to pull out the data from table 1 for the 
> samples in table 2, either by:
> * Merging the two tables based on a common identifier (SampleID - may 
> have a different header in the two tables), and filter for overlapping 
> entries (preferred approach)
> * OR filter table 1 for entries where SampleID matches to one in a list 
> taken from table 2
> 
> Any help would be gratefully recieved.
> 
> --
> AstraZeneca UK Limited is a company incorporated in Engl...{{dropped:21}}
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merging tables

2009-01-20 Thread stephen sefick

?merge
?match

On Tue, Jan 20, 2009 at 8:41 AM, Dry, Jonathan R
 wrote:
> I am relatively new to R and am trying to do some basic data manipulation.  
> Basically I have a table (csv - table 1) of data for a set of samples (rows), 
> and a second table (table 2) of information about a subset of samples of 
> particular interest.  I want to pull out the data from table 1 for the 
> samples in table 2, either by:
> *   Merging the two tables based on a common identifier (SampleID - may 
> have a different header in the two tables), and filter for overlapping 
> entries (preferred approach)
> *   OR filter table 1 for entries where SampleID matches to one in a list 
> taken from table 2
>
> Any help would be gratefully recieved.
>
> --
> AstraZeneca UK Limited is a company incorporated in Engl...{{dropped:21}}
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Stephen Sefick

Let's not spend our time and resources thinking about things that are
so little or so large that all they really do for us is puff us up and
make us feel like gods.  We are mammals, and have not exhausted the
annoying little problems of being mammals.

-K. Mullis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] User input in batch mode

2009-01-20 Thread Sebastien Bihorel


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Merging tables

2009-01-20 Thread Dry, Jonathan R

I am relatively new to R and am trying to do some basic data manipulation.  
Basically I have a table (csv - table 1) of data for a set of samples (rows), 
and a second table (table 2) of information about a subset of samples of 
particular interest.  I want to pull out the data from table 1 for the samples 
in table 2, either by:
*   Merging the two tables based on a common identifier (SampleID - may 
have a different header in the two tables), and filter for overlapping entries 
(preferred approach)
*   OR filter table 1 for entries where SampleID matches to one in a list 
taken from table 2

Any help would be gratefully recieved.

--
AstraZeneca UK Limited is a company incorporated in Engl...{{dropped:21}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] WinBUGS with R

2009-01-20 Thread Uwe Ligges




richard.cot...@hsl.gov.uk wrote:

I am having some problems using R with WinBUGS using the R2WinBUGS
package. Specifically, when I try to run bugs() I get the following
message.

Error in FUN(X[[1L]], ...) :
.C(..): 'type' must be "real" for this format
To give a little more context, my bugs() command (for a multilevel
ordinal logit  similar to Gelman and Hill, Data Analysis Using
Regression and Multilevel/Hierarchical Models p. 383 is:

Wednesbury.data <- list ("n.judge", "n", "n.cut", "y" "judge", "ct",
"ra", "lg")

Wednesbury.inits <- function(){
  list(C=matrix(0,39,2))
   }



Untested, but I think it needs to be

 Wednesbury.inits <- function(){
   matrix(0,39,2)
 }

and a function is of interest if some randomness should be in the inits...





Wednesbury.parameters <- c("C", "b1", "b2", "b3")


Debugging your BUGS model or dataset via R is a bit of a pain.  I find 
that the best way (or maybe least worst way) to weed out the problems when 
you get an error like this is to find the files (model/data/inits) that 
R2WinBUGS has created and open them in WinBUGS itself.  Run the Model 
Specification tool and you can more easily determine which part of the 
file the problem lies in.


Just looking at your Wednesbury.inits variable, you don't need to define 
it as a function

Wednesbury.inits <- list(list(C=matrix(0,39,2)))
will do.


Yes.



Also, I'm not sure if WinBUGS understands matrix data types (though I may 
be wrong).


It does.


Uwe



Regards,
Richie.

Mathematical Sciences Unit
HSL



ATTENTION:

This message contains privileged and confidential inform...{{dropped:20}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] read a xls file

2009-01-20 Thread Ian Jenkinson

See the R Import/Export manual.  Also
RSiteSearch("import excel")
gives many hits.  It seems as if this question is
asked almost daily.

On Sun, Jan 18, 2009 at 9:15 AM, Michele Santacatterina
 wrote:

> Hello,
>
> i have a xls file. I will read it in r, what library-command i use for
> this??
>
> any ideas??
I feel concerned because I have just spent a frustrating couple of days 
trying to read an Excel (xls) file, with the aid of the R book (Crawley, 
2007), and R help files. I failed, but finally found a workaround. My 
experience might help others.

My data were in an Excel xls file

I have R (version 2.6.2) installed in Kubuntu Linux
I also have R (version 2.6.2) installed in Windows XP SP_3 running in 
VirtualBox (a Virtual Computer) in Kubuntu, and I have (very old) Excel 
97 on this system.

I wasted a lot of time exporting from Excel in various formats (txt, 
csv, dif, tab-delimited, ;-delimited ,-delimited, etc.). (I checked they 
were of correct format by peeking with a text editor.)
Then I would try reading using e.g. read.table("[file 
path]",header=TRUE) or read.csv(...) or read.csv2(...), or 
read.DIF(...), with or without "header=TRUE" or "header =FALSE".
I also copied to the "clipboard" and tried reading using 
read.DIF("clipboard")

In many of these cases I did get a data.frame that looked nice on-screen.

My recurrent problem, however, was that many of the numeric variables in 
the resultant data frame were CLASS "factor". If you do arithmetic or 
plotting on factors, either it fails or gives wrong results.

So I spent hours using (as.numeric(...)) with variants and permutations, 
etc. Most times (as.numeric(...)) seems to work, but actually the data 
either remained unchanged (as a "factor") or gave "numeric" but wrong 
numbers.

I read the xls file using gnumeric application and saved as a dif file, 
then used read.DIF("[file path]"). This gave some correct "numeric" 
numbers but jumbled and partly duplicated.

N.B. My problems were essentially the same whether I used R in XP or in 
Linux (kubuntu)

MY SOLUTION (working in Linux):
Read the Excel file (xls) using Open Office.org (version 2.4.1) 
(downloadable for free for Linux or Windows).

Save as dif file.
In R,   TT<-read.DIF("[file path]",header=TRUE)
It worked, and all my numerical data elements were "numeric", correct 
and in the right order. Omit "header=TRUE" if you don't want the first 
elements of the spreadsheet columns declared as headers.

Hope this may help someone.

Here's a subset of my data in a data.frame (environmental data on 
plankton):

>TT
 Stn Day Mean.salinity Mean.temperature Secchi.disc. 
Log.microplank.biomass
11  12 0   14  0.7
1.954242509
21  70  13.516.55  0.3
3.083860801
31  93 13.4516.85  0.6
2.651278014
41 153  6.78 14.2  0.5
2.075546961
51 200 0  9.3  0.7
1.612783857
61 231 0  7.1  0.8
1.491361694
71 283 0  8.8  0.4
2.123851641
81 330  4.95 9.45  0.3
2.276461804
91 370  16.6 12.3  0.4
2.728353782
10   3  12 16.2511.95 0.55
2.025305865
11   3  70 22.35 16.1  0.5
2.096910013
12   3  93 26.0517.15  1.5
1.707570176
13   3 153  23.4 14.21
1.755874856
14   3 200 14.05  8.6  0.4
1.812913357
15   3 231   7.9  6.3  0.3
1.897627091
16   3 283  11.2 7.25  0.7
1.832508913
17   3 330 19.95  8.1  0.5
1.785329835
18   3 370 24.35 11.5  0.4
2.361727836
19   4  12  18.112.05  0.6
1.792391689
20   4  70 24.35 15.9  0.7
1.973127854
21   4  932717.35  1.3
1.982271233
22   4 153  25.8 14.2  0.8
1.924279286
23   4 200  16.29  0.4
1.653212514
24   4 231  11.5 6.85  0.4
1.819543936
25   4 283 10.95  8.2 0.25
2.096910013
26   4 330  19.7 8.45  0.4
2.025305865
27   4 370  25.6 11.5  0.5
2.274157849

Ian Jenkinson

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project

Re: [R] Stacked barplot with two stacked bars besides each other

2009-01-20 Thread hadley wickham

On Tue, Jan 20, 2009 at 4:28 AM, Daniel Brewer  wrote:
> Hi,
>
> I have a particular barplot I would like to generate, but I am having
> trouble getting it to work.  What I would like is in effect two barplots
>  with stacked bars merged into one.  For example,  I have two samples
> (yoda1,yoda2) on which I measure whether two variables (var1,var2) are
> present or absent for a number of measurements on that sample.
>
>> var1 <- data.frame(yoda1=c(3,7), yoda2=c(1,9))
>> var2 <- data.frame(yoda1=c(8,2), yoda2=c(5,5))

I'd start by storing your data in a single data frame, with all
information explicit:

var1$row <- 1:2
var1$var <- "one"
var2$row <- 1:2
var2$var <- "two"

vars <- rbind(var1, var2)

library(reshape)
df <- melt(vars, id = c("var", "row"))
names(df)[3] <- "yoda"
df

(In reality you'd give the variables informative names based on your
study design)

Then you're in a position to better describe and control what you
want.  With the data in this form, you could then use the ggplot2
package to display it:

library(ggplot2)
qplot(yoda, value, data = df, fill = factor(row), geom="bar", stat =
"identity", facets = ~ var)

This puts yoda on the x axis, colours the bars by the row and
separates the plot into two panels based on var.  It's trivial to
produce any other arrangement of the three variables.

qplot(var, value, data = df, fill = factor(row), geom="bar", stat =
"identity", facets = ~ yoda)
qplot(row, value, data = df, fill = yoda, geom="bar", stat =
"identity", facets = ~ var)
etc

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] candisc

2009-01-20 Thread Michael Friendly

?candisc would have told you that

 Computational details for the one-way case are described in Cooley & 
Lohnes (1971), and in the SAS/STAT User's Guide, "The CANDISC procedure: 
Computational Details," 
http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/candisc_sect12.htm. 

-Michael

Pete Shepard wrote:

Hello,

I have a question regarding the candisc package. My data are:

speciesthreefive
12.956.63
12.537.79
13.575.65
13.165.47
22.584.46
22.166.22
23.273.52

I put these in a table and then a linear model
 >newdata <- lm(cbind(three, five) ~ species, data=rawdata)

and then do a candisc on them
 >candata<-candisc(newdata)

Here are my scores;

candata$scores

  species   Can1
1   1 -2.3769280
2   1 -2.7049437
3   1 -3.4748309
4   1 -0.9599825
5   2  4.2293774
6   2  2.6052193
7   2  2.6820884

and here are my coefficients

candata$coeffs.raw

   Can1
three -5.185380
five  -2.160237

candata$coeffs.std

   Can1
three -2.530843
five  -2.586620

My question is, what is the precise equation that gives the candata$scores?

Thanks

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University  Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Streethttp://www.math.yorku.ca/SCS/friendly.html
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Do not want to print when using prop.test

2009-01-20 Thread Peter Dalgaard

Jose Iparraguirre D'Elia wrote:
> I am using the function prop.test (base package), which returns a
> list with class "htest". All I want to do is to assign one of its
> values to a variable, but I do not want R to print the results and added 
> warning
message whenever I invoke the function.
> How can I prevent R from printing on using prop.test?
> Regards,
> José Luis


By assigning one of its values...

> pval <- prop.test(heads, 100)$p.value
> pval
[1] 0.4839273



-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - (p.dalga...@biostat.ku.dk)  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Bar Plot ggplot2 Filling bars with cross hatching

2009-01-20 Thread hadley wickham

>>I disagree.  Grey levels suck; labels are a kludge.  It is an issue
>>for ``many'' == 2, for which crosshatching works perfectly.
>>
>
> Could you show an example?
>
> There are several BW examples in example(barplot), and the gray ones look
> better on screen than the cross-hatched one.  (Not to say it makes a very
> good choice of cross-hatching, but I suspect the gray examples will look
> better than any 5 cross-hatch patterns.)  I haven't tried printing the
> examples, so I'm not sure the gray would reproduce well on paper; I wouldn't
> try to print those 5 gray levels on a typical printer.

My feeling is that the best cross-hatching is probably going to be
more aesthetically pleasing than the best solid greys (see e.g.
http://www.dannygregory.com/2005/09/cross_hatching.php).  However,
doing cross hatching well is far more difficult than doing grey well,
and for really nice cross-hatching I suspect you also need a high
quality printer.   It is also a challenging problem to come up with an
algorithm for generating perceptually uniform sets of cross-hatchings.
 I suspect there is some work on this area in vis/infovis, but I
haven't find it in a few minutes of casual searching.

Hadley

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Do not want to print when using prop.test

2009-01-20 Thread Jose Iparraguirre D'Elia

I am using the function prop.test (base package), which returns a list with 
class "htest". All I want to do is to assign one of its values to a variable, 
but I do not want R to print the results and added warning message whenever I 
invoke the function.
How can I prevent R from printing on using prop.test?
Regards,
José Luis

Mr José Luis Iparraguirre
Senior Research Economist
Economic Research Institute of Northern Ireland
2 -14 East Bridge Street
Belfast BT1 3NQ
Northern Ireland
United Kingdom

Tel: +44 (0)28 9072 7365

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R and WinBUGS (via R2WinBUGS) error

2009-01-20 Thread Ben Bolker

  What you have given us helps a little bit.

Lindsay Stirton  manchester.ac.uk> writes:

> I am having some problems using R with WinBUGS using the R2WinBUGS
> package. Specifically, when I try to run bugs() I get the following
> message.

> 
[snip]
> Error in FUN(X[[1L]], ...) :
>.C(..): 'type' must be "real" for this format
> 
> 
> > Wednesbury.data <- list ("n.judge", "n", "n.cut", "y", "judge", "ct",
> + "ra", "lg")

 [snip]

> Error in FUN(X[[1L]], ...) :
>.C(..): 'type' must be "real" for this format
> >
> 
> This problem was discussed before  
> (https://stat.ethz.ch/pipermail/r-help/2008-August/171726.html), but  
> the discussion didn't seem to help me. As suggested on that post,  
> traceback() gives the following:
> 
> > traceback()
> 6: .C("str_signif", x = x, n = n, mode = as.character(mode), width =  
> as.integer(width),
> digits = as.integer(digits), format = as.character(format),
> flag = as.character(flag), result = blank.chars(i.strlen),
> PACKAGE = "base")
> 5: FUN(X[[1L]], ...)
> 4: lapply(data.list, formatC, digits = digits, format = "E")
> 3: write.datafile(lapply(data.list, formatC, digits = digits, format = "E"),
> file.path(dir, data.file))
> 2: bugs.data(data, dir = getwd(), digits)
> 1: bugs(data = "Wednesbury.data", inits = "Wednesbury.inits",  
> parameters = "Wednesbury.parameters",
> model.file = "p:/Wednesbury09/Wednesbury.bug", n.chains = 1,
> n.burnin = 1000, n.sims = 1, bugs.directory = "c:/Program  
> Files/WinBUGS14/",
> program = "WinBUGS", debug = TRUE)

   What this tells us is that the problem occurs while
R2WinBUGS is trying to write the data out to files on disk
whence WinBUGS will pick them up. One of your data items
("n.judge", "n", "n.cut", "y", "judge", "ct", "ra", "lg")
may be wonky.

  What happens if you try out formatC on the data items one
at a time, i.e.

formatC(n.judge,digits=5,format="E")
formatC(n,digits=5,format="E")
etc.?

  This would be easier if you had given us a reproducible
example -- i.e. either your actual data (if it is something
you can share), or preferably a small subset of your data
that demonstrate the problem. One often stumbles across the
answer to problem in the process of trying
to reduce the problem to a small subset ...

  Ben Bolker

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave: conflict between setwd and \SweaveOpts{prefix.string=}

2009-01-20 Thread ONKELINX, Thierry

Have you tried specifying an absolute path in prefix.string instead of a
relative one?

HTH,

Thierry 




ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium 
tel. + 32 54/436 185
thierry.onkel...@inbo.be 
www.inbo.be 

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey

-Oorspronkelijk bericht-
Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
Namens Matthieu Stigler
Verzonden: dinsdag 20 januari 2009 12:53
Aan: r-help@r-project.org
Onderwerp: [R] Sweave: conflict between setwd and
\SweaveOpts{prefix.string=}

Hello

I think there is a conflict between setwd() and 
\SweaveOpts{prefix.string=}. In the same document, those both command 
get Sweave confuse the files and directories. See:

say my .Rnw document is in File1

If one inserts some setwd() for another file:
-setwd(File2)

then the command \SweaveOpts{prefix.string=graphics/Rplots} will search 
the "graphics" folder in File2 because of command setwd(File2) and not 
in File1 where the .Rnw file is and as is said in Sweave Manual A10.

Hence Latex get really confused and does not work anymore: the command 
\includegraphics looks for folder "graphics" in the usual File1 but 
those can have been stored in File2.

I tried to add some:
\usepackage{graphicx}
\graphicspath{{../File2/graphics/}}
but resulot was not so convincing

Is there anyway to avoid this? Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer 
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message 
and any annex are purely those of the writer and may not be regarded as stating 
an official position of INBO, as long as the message is not confirmed by a duly 
signed document.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Stacked barplot with two stacked bars besides each other

2009-01-20 Thread Daniel Brewer

Thanks Henrique.  Unfortunately, that gets us a step closer but it fails
to stack the individual bars, so instead of having a total of 10 for
each bar you get 7, 8, 9 & 5 (i.e. the largest for each var sample
pair).  I tried adding the stack=T option, but that stacks all of each
sample and you are left with two bars.  Any more ideas?

Appreciate your help

Dan

Henrique Dallazuanna wrote:
> One option is to use lattice package:
> 
> library(lattice)
> newVar <- cbind(stack(rbind(var1, var2)), Var = rep(c("var1", "var2"),
> each = 2))
> barchart(values ~ ind, groups = Var, data = newVar)
> 
> 
> On Tue, Jan 20, 2009 at 9:03 AM, Daniel Brewer  > wrote:
> 
> Thanks.
> That is definitely in the right direction, but firstly I would like
> yoda1:var1 next to yoda1:var2, not as currently yoda1:var1, yoda2:var1,
> yoda1:var2, yoda2:var2.  Additionally, I would like the gap between
> samples to be greater than the gap between variables.
> 
> Many thanks
> 
> Dan
> 
> Henrique Dallazuanna wrote:
> > Try this:
> >
> > barplot(cbind(as.matrix(var1), as.matrix(var2)), names.arg =
> LETTERS[1:4])
> >
> > On Tue, Jan 20, 2009 at 8:28 AM, Daniel Brewer
> mailto:daniel.bre...@icr.ac.uk>
> > >>
> wrote:
> >
> > Hi,
> >
> > I have a particular barplot I would like to generate, but I am
> having
> > trouble getting it to work.  What I would like is in effect
> two barplots
> >  with stacked bars merged into one.  For example,  I have two
> samples
> > (yoda1,yoda2) on which I measure whether two variables
> (var1,var2) are
> > present or absent for a number of measurements on that sample.
> >
> > > var1 <- data.frame(yoda1=c(3,7), yoda2=c(1,9))
> > > var2 <- data.frame(yoda1=c(8,2), yoda2=c(5,5))
> >
> > For each variable I can plot a barplot
> >
> > > barplot(as.matrix(var1))
> > > barplot(as.matrix(var2))
> >
> > I would like to join these together, so that for each sample
> there are
> > two stacked bars next to each other, one for var1 and the
> other for
> > var2.  I was thinking something like:
> >
> > > barplot(list(as.matrix(var1),as.matrix(var2)))
> >
> > would work, but it didn't.
> >
> > Any suggestions you could make would be great.
> >
> > Dan

-- 
**

Daniel Brewer, Ph.D.

Institute of Cancer Research
Molecular Carcinogenesis
MUCRC
15 Cotswold Road
Sutton, Surrey SM2 5NG
United Kingdom

Tel: +44 (0) 20 8722 4109

Email: daniel.bre...@icr.ac.uk

**

The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company 
Limited by Guarantee, Registered in England under Company No. 534147 with its 
Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the a...{{dropped:2}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave: conflict between setwd and \SweaveOpts{prefix.string=}

2009-01-20 Thread Duncan Murdoch


Matthieu Stigler wrote:

Hello

I think there is a conflict between setwd() and 
\SweaveOpts{prefix.string=}. In the same document, those both command 
get Sweave confuse the files and directories. See:


say my .Rnw document is in File1

If one inserts some setwd() for another file:
-setwd(File2)

then the command \SweaveOpts{prefix.string=graphics/Rplots} will search 
the "graphics" folder in File2 because of command setwd(File2) and not 
in File1 where the .Rnw file is and as is said in Sweave Manual A10.


Hence Latex get really confused and does not work anymore: the command 
\includegraphics looks for folder "graphics" in the usual File1 but 
those can have been stored in File2.


I tried to add some:
\usepackage{graphicx}
\graphicspath{{../File2/graphics/}}
but resulot was not so convincing

Is there anyway to avoid this? Thanks!
  


You could use a fully qualified prefix, so it doesn't matter what the 
current directory is when you save a plot.


Or you could avoid setwd().

Or you could change back to the original directory before drawing a plot.

It would probably make sense for Sweave to do the last of these 
internally:  it is mixing up characteristics of the session it's running 
in with characteristics of the session it is running.  However, this is 
a pretty strange case, and I'm not sure fixing it will be a high priority.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Perl-R bridge

2009-01-20 Thread Neil Shephard

ANJAN PURKAYASTHA wrote:
> 
> Hi,
> I'm planning to access R from my perl scripts.
> The only noteworthy bridge seems to be
> Statistics-R-0.03.
> Would anyone like to share their experience with this Perl-R bridge?

Irrespective of whether you are working on a Bioinformatics problem I'd
imagine you can find some useful information in the recently published book
"Building Bioinformatics Solutions with Perl, R and MySQL".  See
http://www.oup.com/uk/catalogue/?ci=9780199230235

ANJAN PURKAYASTHA wrote:
> 
> I'd like to install it in a Mac OS X.
> Suggestions on alternate solutions will be appreciated.
> 

In theory any solution involving R and Perl should be platform independant.
-- 
View this message in context: 
http://www.nabble.com/Perl-R-bridge-tp21535807p21561731.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Sweave: conflict between setwd and \SweaveOpts{prefix.string=}

2009-01-20 Thread Matthieu Stigler


Hello

I think there is a conflict between setwd() and 
\SweaveOpts{prefix.string=}. In the same document, those both command 
get Sweave confuse the files and directories. See:


say my .Rnw document is in File1

If one inserts some setwd() for another file:
-setwd(File2)

then the command \SweaveOpts{prefix.string=graphics/Rplots} will search 
the "graphics" folder in File2 because of command setwd(File2) and not 
in File1 where the .Rnw file is and as is said in Sweave Manual A10.


Hence Latex get really confused and does not work anymore: the command 
\includegraphics looks for folder "graphics" in the usual File1 but 
those can have been stored in File2.


I tried to add some:
\usepackage{graphicx}
\graphicspath{{../File2/graphics/}}
but resulot was not so convincing

Is there anyway to avoid this? Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 >

1 - 100 of 120 matches

Mail list logo