Re: [R] Doing pairwise comparisons using either Duncan, Tukey or LSD

2009-02-19 Thread Kingsford Jones
try:

example(TukeyHSD)


hth,
Kingsford Jones

On Thu, Feb 19, 2009 at 9:51 AM, Saeed Ahmadi ahmadi_sa...@yahoo.com wrote:

 Hi,

 I have a basic and simple question on how to code pairwise (multiple) mean
 compariosn between levels of a factor using one of the Duncan, Tukey or LSD.

 Thanks in advance,
 Saeed
 --
 View this message in context: 
 http://www.nabble.com/Doing-pairwise-comparisons-using-either-Duncan%2C-Tukey-or-LSD-tp22104786p22104786.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Ideal (possible) configuration for an exalted R system

2009-02-16 Thread Kingsford Jones
Hi Harsh,

The useR! 2008 site has useful information.  E.g. talks by

Graham Williams:

http://www.statistik.uni-dortmund.de/useR-2008/slides/Williams.pdf

Dirk Eddelbuettel

http://www.statistik.uni-dortmund.de/useR-2008/tutorials/useR2008introhighperfR.pdf

and others

http://www.statistik.uni-dortmund.de/useR-2008/abstracts/AbstractsByTopic.html#High%20Performance%20Computing



A few days ago I was googling to see what types of workstations are
available these days.  Here's some with up to 64gb ram:

http://www.colfax-intl.com/jlrid/SpotLight.asp?IT=0RID=80

Perhaps it won't be long before we see such memory in laptops:

http://www.ubergizmo.com/15/archives/2009/01/samsung_opens_door_to_32gb_ram_stick.html

Like you, I'd also be interested in hearing about configurations folks
have used to work w/ large datasets.


hth,

Kingsford Jones







On Mon, Feb 16, 2009 at 5:10 AM, Harsh singhal...@gmail.com wrote:
 Hi All,
 I am trying to assemble a system that will allow me to work with large
 datasets (45-50 million rows, 300-400 columns) possibly amounting to
 10GB + in size.

 I am aware that R 64 bit implementations on Linux boxes are suitable
 for such an exercise but I am looking for configurations that R users
 out there may have used in creating a high-end R system.
 Due to a lot of apprehensions that SAS users have about R's data
 limitations, I want to demonstrate R's usability even with very large
 datasets as mentioned above.
 I would be glad to hear from users(share configurations and system
 specific information) who have desktops/servers on which they use R to
 crunch massive datasets.


 Any suggestions in expanding R's functionality in the face of gigabyte
 class datasets would be appreciated.

 Thanks
 Harsh Singhal
 Decision Systems,
 Mu Sigma Inc.
 Chicago, IL

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problems in Recommending R

2009-02-12 Thread Kingsford Jones
On Thu, Feb 12, 2009 at 3:12 PM, Johannes Huesing johan...@huesing.name wrote:

 Last time I tried, rseek.org yielded no results when searching for inferno.

...although, if you hit the 'Support Lists' tab it finds the thread in
which Patrick announced it.




 --
 Johannes Hüsing   There is something fascinating about science.
  One gets such wholesale returns of conjecture
 mailto:johan...@huesing.name  from such a trifling investment of fact.
 http://derwisch.wikidot.com (Mark Twain, Life on the Mississippi)

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Linear model

2009-02-11 Thread Kingsford Jones
On Wed, Feb 11, 2009 at 1:36 PM, kayj kjaj...@yahoo.com wrote:

 I want to know how accurate are the p-values when you do linear regression in
 R?

 I was looking at the variable x3 and the t=10.843 and the corresponding
 p-value=2e-16 which is the same p-value for the intercept where the t-value
 for the intercept is 48.402.

 I tried to calculate the p-value in R and I got 0

 x-2*(1-pt(10.843,2838))
 x
 [1] 0




Some comments:

i) the printout says that the value is less than 2e-16

ii) It seems strange to interpret a p-value at that level of precision

iii) you're confusing what is printed vs what is stored

iv) see 
http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f

v) rather than subtracting from 1, use the 'lower.tail' argument to pt:

 2*pt(10.843,2838,lower=F)
[1] 7.185635e-27



hth,

Kingsford Jones


 G-lm(y~x1+x2+x3+x4+x5)
 summary(G)

 Call:
 lm(formula = y ~ x1 + x2 +x3 + x4 + x5)

 Residuals:
 Min   1Q   Median   3Q  Max
 -14.3172  -3.2197  -0.2913   2.6938  23.3602

 Coefficients:
Estimate Std. Error t value Pr(|t|)
 (Intercept)  22.9461 0.4741  48.402   2e-16 ***
 x1   -0.1139 0.3734  -0.305  0.76031
 x2   -0.0405 0.1936  -0.209  0.83437
 x3   2.0165 0.1860  10.843   2e-16 ***
 x4   0.5313 0.1782   2.982  0.00289 **
 x5   0.5879 0.1779   3.305  0.00096 ***
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 Residual standard error: 4.724 on 2838 degrees of freedom
  (138 observations deleted due to missingness)
 Multiple R-squared: 0.05279,Adjusted R-squared: 0.05112
 F-statistic: 31.63 on 5 and 2838 DF,  p-value:  2.2e-16


  Thanks for the help

 --
 View this message in context: 
 http://www.nabble.com/Linear--model-tp21963370p21963370.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help regarding White's Heteroscedasticity Correction

2009-02-10 Thread Kingsford Jones
or simultaneously estimate the coefficients and variance structure via
nlme::gls and its 'weights' argument...

On Tue, Feb 10, 2009 at 7:57 AM, John Fox j...@mcmaster.ca wrote:
 Dear Kishore,

 Yes, White's heteroscedasticity-consistent standard errors are just that --
 standard errors for the OLS coefficients that are consistent in the presence
 of heteroscedasticity. The coefficients themselves don't change. There is an
 issue here: although the standard errors and OLS coefficients are
 consistent, the OLS estimates lose efficiency. If you know that pattern of
 heteroscedasticity, then you might get more efficient estimates by taking it
 into account, e.g., in weighted-least-squares regression, or, if the
 residual spread increases with the fitted values, by transforming the
 response.

 I hope this helps,
  John

 --
 John Fox, Professor
 Department of Sociology
 McMaster University
 Hamilton, Ontario, Canada
 web: socserv.mcmaster.ca/jfox


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On
 Behalf Of Kishore
 Sent: February-10-09 9:19 AM
 To: r-help@r-project.org; r-h...@stat.math.ethz.ch
 Subject: [R] Help regarding White's Heteroscedasticity Correction

 Hi
 I am actually running the White test for correcting Heteroscedasticity.  I
 used sandwich()  car(), however the output shows the updated t test of
 coefficients, with revised Standard Errors, however the estimates remained
 same.  My problem is that the residuals formed a pattern in the original
 regression equation.  After running the White's test, I got some new
 standard errors - but from here I didn't understand how to plot the
 residuals (or) how to correct the estimates??  Can some one direct me in
 this regard..

 Best,

 Kishore/..
 http://kaykayatisb.blogspot.com

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What is the R equivalent of STATA's 'drop' command?

2009-02-09 Thread Kingsford Jones
See ?[ and its examples

Also, section 2.7 of An Introduction to R is a good place to start:

http://cran.r-project.org/doc/manuals/R-intro.html#Index-vectors

hth,
Kingsford Jones


On Mon, Feb 9, 2009 at 5:27 PM, jjh21 jjhar...@gmail.com wrote:

 Hello,

 I am trying to do some data cleaning in R. I need to drop observations that
 take on certain values of a variable. In STATA I might type something like:

 drop if variable name == 3
 drop if variable name == 4

 Is there an R equivalent of this? I have tried playing around with the
 subset command, but it seems a bit clunky. What would an advanced R user's
 approach be for something like this?

 Thank you!


 --
 View this message in context: 
 http://www.nabble.com/What-is-the-R-equivalent-of-STATA%27s-%27drop%27-command--tp21925249p21925249.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Introduction to maps (of Denmark)?

2009-02-06 Thread Kingsford Jones
Also, see this page for more ideas on mapping an auxiliary variable by county:

http://r-spatial.sourceforge.net/gallery/

(e.g. plots 9, 14, 15 and 21)


Kingsford Jones

On Fri, Feb 6, 2009 at 3:22 PM, Greg Snow greg.s...@imail.org wrote:
 Googling for Denmark Shapefile leads to this website: 
 http://www.vdstech.com/map_data.htm
 Where a few clicks lead to downloading a set of shapefiles (different levels 
 of subdividing the country, hopefully you know more about what they 
 correspond to than I do, about all I know about Denmark is from Hamlet, the 
 fact that I have some ancestors from there (but I don't remember which 
 branch), and how to make aebleskivers).

 These can be read into R using the maptools package as:

 library(maptools)
 dmk1 - readShapePoly('g:/downloads/DNK_adm/DNK0')
 dmk2 - readShapePoly('g:/downloads/DNK_adm/DNK1')
 dmk3 - readShapePoly('g:/downloads/DNK_adm/DNK2')

 simple plots can be done like:

 plot(dmk1)
 plot(dmk2)
 plot(dmk3)

 a little more complex with:

 plot(dmk3, col=topo.colors(248))

 though of course replacing topo.colors(248) with something that is 
 meaningful.  If you need more control, then see the sp package.

 Hope this helps,

 --
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Peter Jepsen
 Sent: Friday, February 06, 2009 1:18 PM
 To: r-help@r-project.org
 Subject: [R] Introduction to maps (of Denmark)?

 Dear R-listers,

 I am using R for Windows Vista 64-bit. I have no experience working
 with
 maps, but now I have to plot a map of Denmark in which each Danish
 county is color-coded according to its incidence rate of a particular
 disease. My problem is that I don't know where to start. I have read
 the
 help files to packages like 'mapplot', and their examples indicate that
 I am on the right track, but where do I find a suitably detailed map of
 Denmark, and which format should I prefer? The terminology in the help
 files etc. is overwhelming.

 Thank you in advance for any help.

 Best regards,
 Peter.

 ***
 R version 2.8.1 (2008-12-22)
 i386-pc-mingw32

 locale:
 LC_COLLATE=Danish_Denmark.1252;LC_CTYPE=Danish_Denmark.1252;LC_MONETARY
 =
 Danish_Denmark.1252;LC_NUMERIC=C;LC_TIME=Danish_Denmark.1252

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Matlab inv() and R solve() differences

2009-01-29 Thread Kingsford Jones
I suppose the solution is unstable because x is ill-conditioned:

 x
   [,1]   [,2]   [,3]  [,4]
[1,]  0.133  0.254 -0.214 0.116
[2,]  0.254  0.623 -0.674 0.139
[3,] -0.214 -0.674  0.910 0.011
[4,]  0.116  0.139  0.011 0.180
 cor(x)
   [,1]   [,2]   [,3]   [,4]
[1,]  1.000  0.9963557 -0.9883690  0.8548065
[2,]  0.9963557  1.000 -0.9976663  0.8084090
[3,] -0.9883690 -0.9976663  1.000 -0.7663847
[4,]  0.8548065  0.8084090 -0.7663847  1.000

 kappa(x)
[1] 2813.326

hth,

Kingsford Jones

On Thu, Jan 29, 2009 at 7:00 PM, Joseph P Gray jpg...@uwm.edu wrote:
 I submit the following matrix to both MATLAB and R

 x=  0.133 0.254 -0.214 0.116
0.254 0.623 -0.674 0.139
   -0.214 -0.674 0.910 0.011
0.116 0.139  0.011 0.180

 MATLAB's inv(x) provides the following
  137.21 -50.68 -4.70 -46.42
 -120.71  27.28 -8.94 62.19
 -58.15   6.93  -7.89  36.94
  8.35   11.17 10.42 -14.82

 R's solve(x) provides:
 261.94 116.22 150.92 -267.78
 116.22 344.30 286.68 -358.30
 150.92 286.68 252.96 -334.09
 -267.78 =358.30 -334.09 475.22

 inv(x)*x = I(4)
 and solve(x)%*%x = I(4)

 Is there a way to obtain the MATLAB result in R?

 Thanks for any help.


 Pat Gray

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Correlation matrix one side with significance

2009-01-25 Thread Kingsford Jones
On the topic of visualizing correlation, see also

Murdoch, D.J. and Chow, E.D. (1996). A graphical display of large
correlation matrices.
The American Statistician 50, 178-180.

with examples here:

# install.packages('ellipse')
example(plotcorr, package='ellipse')



On Sat, Mar 8, 2008 at 3:01 AM, Liviu Andronic landronim...@gmail.com wrote:
 On 3/5/08, Martin Kaffanke tech...@roomandspace.com wrote:
  Now I'd like to have it one sided, means only the left bottom side to be
  printed (the others are the same) and I'd like to have * where the
  p-value is lower than 0.05 and ** lower than 0.01.

 Look here [1], at Visualizing Correlations. You might find
 interesting the example of a plotted correlation matrix.

 Liviu

 [1] http://www.statmethods.net/stats/correlations.html

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Omitting a desired line from a table [Beginner Question]

2009-01-25 Thread Kingsford Jones
see

?subset

Or use indexing, which is covered in section 2.7 of an introduction to
R (but note that a data frame has 2 dimensions)

hth,

Kingsford Jones


On Sun, Jan 25, 2009 at 3:06 PM, pfc_ivan pfc_i...@hotmail.com wrote:

 I am a beginner using this R software and have a quick question.

 I added a file into the R called fish.txt using this line.

 fish-read.table(fish.txt, head=T, fill=T)

 The .txt file looks like this. Since it contains like 30 lines of data I
 will copy/paste first 5 lines.

 Year GeoArea  SmpNo   Month
 1970113 7
 197111310
 1972113 8
 197321310
 197411311

 Now what I want to do is to omit all the lines in the file that arent
 happening in GeoArea 1, and that arent happening in Month 10. So basically
 The only lines that I want to keep are the lines that have GeoArea=1 and
 Month=10 at the same time. So if GeoArea=2 and Month=10 I dont need it. So i
 just need the lines that have both of those values correct. How do I delete
 the rest of the lines that I dont need?

 Thank you everyone.


 --
 View this message in context: 
 http://www.nabble.com/Omitting-a-desired-line-from-a-table--Beginner-Question--tp21657416p21657416.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] comparing the previous and next entry of a vector to a criterium

2009-01-25 Thread Kingsford Jones
How about:

a - c(1,2,3,3,2,1,6,3,2)
b - c(NA,a[-length(a)])
c - c(a[-1],NA)
a[b==1  c==3]
[1] 2 6


hth,

Kingsford Jones


On Sun, Jan 25, 2009 at 3:02 PM, Jörg Groß jo...@licht-malerei.de wrote:
 Hi,

 I have a quit abstract problem, hope someone can help me here.

 I have a vector like this:


  x - c(1,2,3,4,5,2,6)
  x

 [1] 1 2 3 4 5 2 6

 now I want to get the number where the previous number is 1 and the next
 number is 3
 (that is the 2 at the second place)

 I tried something with tail(x, -1) ...
 with that, I can check  the next number, but how can I check the previous
 number?

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predict function problem for glmmPQL

2009-01-22 Thread Kingsford Jones
On Thu, Jan 22, 2009 at 10:48 PM, Chun Zhang usame...@yahoo.com wrote:
 Hi all,
 I am using cross-validation to validate a generalized linear mixed effects 
 model fitted using glmmPQL. i found that the predict function has a problem 
 and i wonder if anyone has encountered the same problem?

 glmm1 = glmmPQL(y~aX+b,random=~1|sample,data=traindata)
 predict(glmm1,newdata=testdata,level=1,type=response)

 gives me all NAs. it works for level=0 (the fixed effects), but not for 
 level=1. When i use newdata=traindata, predict function works perfectly.

 i wonder if this is a problem with predict function or it's some bug in my 
 code?


...perhaps the levels of 'sample' differ between 'traindata' and 'testdata'?


hth,
Kingsford Jones


 Thanks much for your help!

 Chun

 Chun Zhang
 Statistician at Roche

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Joint significance of more regressors in summary

2009-01-21 Thread Kingsford Jones
try

install.packages('car')
?car::linear.hypothesis

hth,

Kingsford Jones

On Wed, Jan 21, 2009 at 4:20 PM, Roberto Patuelli
roberto.patue...@lu.unisi.ch wrote:
 Dear All,

 I was wondering if it is possible to generate a regression summary (it does
 not matter at this stage if from an lm or for example a glm estimate) in
 which to obtain the joint significance of a set of regressors?
 Examples could be looking at the joint significance level of a polynomial,
 or of a set of exogenous variables of which is of interest the linear
 combination suggested by the regression parameters.
 With regard to the latter, it would also be cool to visualize directly the
 linear combination of such group of variables, which will obviously have a
 regression coefficient of 1. The standard error and significance level,
 though, are less obvious.

 I would expect - please correct me if I'm wrong - that a simple ANOVA
 comparison between two models with and without this set of variables would
 give the significance level. But what if there are two sets of variables
 included in the model for which to find joint significance (that is, set by
 set)?

 I hope someone can help. As an example, please see the regression output
 below, from a quasipoisson estimation.
 I have two large set of eigenvector decomposition variables, one marked by
 _o and one by _d. For these two sets of variables, I would like to have,
 in the regression summary, only two lines, with Estimate, Std. Error,
 t-value and Pr(|t|).
 Obviously I can do this by hand, constructing the linear combinations,
 rerunning the model, and therefore obtaining a standard error and a p-value
 for each set. But the degrees of freedom of the model would in reality be
 different...

 Thanks in advance for any help!

 Cheers
 Roberto Patuelli
 Post-doc researcher
 Institute for Economic Research (IRE)
 University of Lugano
 Email: roberto.patue...@lu.unisi.ch
 Homepage: http://www.people.lu.unisi.ch/patuellr

 *

 dep.qglm - glm(dep ~ lndist + com_lang + contig + history + fta +
 lnarea_i + lngdppc_i + lngdp_i + island_i + landl_i + lnarea_e + lngdp_e +
 lngdppc_e + island_e + landl_e

 + + e1_o + e3_o + e4_o + e5_o + e7_o + e8_o + e9_o + e10_o + e11_o + e12_o +
 e13_o + e14_o + e15_o + e17_o + e18_o + e19_o + e20_o + e21_o + e22_o +
 e23_o + e24_o
 + + e1_d + e2_d + e4_d + e5_d + e7_d + e8_d + e9_d + e10_d + e12_d + e13_d +
 e14_d + e16_d + e17_d + e18_d + e19_d + e20_d + e22_d + e23_d + e24_d +
 e25_d + e26_d + e27_d + e28_d + e29_d + e30_d, family = quasipoisson (link =
 log))

 summary(dep.qglm)

 Call:
 glm(formula = dep ~ lndist + com_lang + contig + history + fta +
   lnarea_i + lngdppc_i + lngdp_i + island_i + landl_i + lnarea_e +
   lngdp_e + lngdppc_e + island_e + landl_e + e1_o + e3_o +
   e4_o + e5_o + e7_o + e8_o + e9_o + e10_o + e11_o + e12_o +
   e13_o + e14_o + e15_o + e17_o + e18_o + e19_o + e20_o + e21_o +
   e22_o + e23_o + e24_o + e1_d + e2_d + e4_d + e5_d + e7_d +
   e8_d + e9_d + e10_d + e12_d + e13_d + e14_d + e16_d + e17_d +
   e18_d + e19_d + e20_d + e22_d + e23_d + e24_d + e25_d + e26_d +
   e27_d + e28_d + e29_d + e30_d, family = quasipoisson(link = log))

 Deviance Residuals:
 Min 1Q Median 3QMax
 -137.3970-4.3775-1.8095-0.6143   195.3221

 Coefficients:
 Estimate Std. Error  t value Pr(|t|)
 (Intercept) -29.311658   0.243063 -120.593   2e-16 ***
 lndist   -0.608668   0.009603  -63.386   2e-16 ***
 com_lang  0.162357   0.0210647.708 1.34e-14 ***
 contig0.578563   0.023609   24.506   2e-16 ***
 history   0.176760   0.0231137.647 2.15e-14 ***
 fta   0.411314   0.018823   21.851   2e-16 ***
 lnarea_i -0.137816   0.008402  -16.404   2e-16 ***
 lngdppc_i 0.003957   0.0183150.216 0.828937
 lngdp_i   0.816396   0.010770   75.801   2e-16 ***
 island_i  0.118761   0.0306183.879 0.000105 ***
 landl_i  -0.337145   0.040638   -8.296   2e-16 ***
 lnarea_e -0.054909   0.006349   -8.649   2e-16 ***
 lngdp_e   0.808997   0.009182   88.111   2e-16 ***
 lngdppc_e 0.012582   0.0123631.018 0.308837
 island_e -0.202474   0.029096   -6.959 3.55e-12 ***
 landl_e  -0.226312   0.041144   -5.501 3.84e-08 ***
 e1_o  0.685095   0.1306365.244 1.59e-07 ***
 e3_o -1.204244   0.140884   -8.548   2e-16 ***
 e4_o -1.311745   0.433108   -3.029 0.002460 **
 e5_o -1.539045   0.278576   -5.525 3.34e-08 ***
 e7_o  1.722945   0.145778   11.819   2e-16 ***
 e8_o  1.286667   0.124809   10.309   2e-16 ***
 e9_o  0.359851   0.1114943.228 0.001251 **
 e10_o 3.783921   0.288042   13.137   2e-16 ***
 e11_o 0.429692   0.1389963.091 0.001995 **
 e12_o-0.707160   0.087880   -8.047 9.00e-16 ***
 e13_o-2.231826   0.225201   -9.910   2e-16 ***
 e14_o-0.256754   0.108398   -2.369 0.017865 *
 e15_o-0.408286   0.158939

Re: [R] I'm looking for a book about spatial point patterns (Diggle, 2003)

2009-01-11 Thread Kingsford Jones
Unangu,

If you haven't seen the 200pg workshop notes that Adrian Baddeley has
made available from his spatstat webpage, I highly recommend them:

http://www.csiro.au/files/files/pn0y.pdf


hth,
Kingsford Jones

On Sat, Jan 10, 2009 at 9:04 PM, Unangu una...@gmail.com wrote:

 To understand some functions about spatial point patterns in spatstat ,I
 should know some background about it, and the best way is to read the
 monograph, and  Statistical Analysis of Spatial Point Patterns (2nd edt.)
 is a better choise. But I can not find it anywhere I can. Who can help me?
 Thank you!

 -
 una...@gmail.com
 --
 View this message in context: 
 http://www.nabble.com/I%27m-looking-for-a-book-about-spatial-point-patterns-%28Diggle%2C2003%29-tp21395908p21395908.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R, clinical trials and the FDA

2009-01-11 Thread Kingsford Jones
I hope that Marc doesn't mind, but I felt that part of his recent post
was important enough to deserve it's own subject line rather then
being lost in a 60-msg-long thread...



On Sun, Jan 11, 2009 at 10:08 AM, Marc Schwartz
marc_schwa...@comcast.net wrote:

...

I strongly believe that the comments regarding R and the FDA are overly
negative and pessimistic.

The hurdles to the use of R for clinical trials are shrinking. There has
been substantive activity over the past several years, both internally
at the FDA and within the R community to increase R's acceptance in this
domain.

At the Joint Statistical Meetings in 2006, Sue Bell from the FDA spoke
during a session with a presentation entitled Times 'R' A Changing: FDA
Perspectives on Use of Open Source. A copy of this presentation is
available here:

 http://www.fda.gov/cder/Offices/Biostatistics/Bell.pdf

In 2007, during an FDA committee meeting reviewing the safety profile of
Avandia (Rosiglitazone), the internal FDA meta-analysis performed by Joy
Mele, the FDA statistician, was done using R. A copy of this
presentation is available here:
 http://www.fda.gov/ohrms/dockets/ac/07/slides/2007-4308s1-05-fda-mele.ppt

Given the high profile nature of drug safety issues today, that R was
used for this analysis by the FDA itself speaks volumes.

Also in 2007, at the annual R user meeting at Iowa State University, I
had the pleasure and privilege of Chairing a session on the use of R for
clinical trials. The speakers included Frank Harrell (well known to R
users here), Tony Rossini and David James (Novartis Pharmaceuticals) and
Mat Soukup (FDA statistician). Copies of our presentations are available
here, a little more than half way down the page:

 http://user2007.org/program/

At that meeting, we also introduced a document that has been updated
since then and approved formally by the R Foundation for Statistical
Computing. The document provides guidance for the use of R in the
regulated clinical trials domain, addresses R's compliance with the
relevant regulations (eg. 21 CFR 11) as well as describing the
development, testing and quality processes in place for R, also known as
the Software Development Life Cycle.

That document is available here:

 http://www.r-project.org/doc/R-FDA.pdf

I have heard directly from colleagues in industry that this document has
provided significant value in their internal discussions regarding
implementing the use of R within their respective environments and
assuaging many fears regarding R's use.

Additionally, presentations regarding the use of open source software
and R specifically for clinical trials have been made at DIA and other
industry meetings. This fall, there is a session on the use of R
scheduled for the FDA's Industry Statistics Workshop in Washington, D.C.

For those unfamiliar, I would also point out the membership and
financial donors to the R Foundation for Statistical Computing and take
note of the plethora of large pharma companies and clinical research
institutions:

 http://www.r-project.org/foundation/memberlist.html

The use of R within this domain is increasing and will only continue to
progress as R's value becomes increasingly clear to even risk averse
industry decision makers.


Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R in the NY Times

2009-01-10 Thread Kingsford Jones
The reactions to the NYT article have certainly made for some
interesting reading.

Here are some of the links:

http://overdetermined.net/site/content/new-york-times-article-r

http://jackman.stanford.edu/blog/?p=1053

http://ggorjan.blogspot.com/2009/01/new-york-times-on-r.html

several posts on Andrew Gelman's blog:
http://www.stat.columbia.edu/~gelman/blog/

http://www.reddit.com/r/programming/comments/7nwgq/the_new_york_times_notices_the_r_programming/

comments here: http://bits.blogs.nytimes.com/2009/01/08/r-you-ready-for-r/


It's too bad that SAS has reacted to the negative reactions to their
NYT quote with more FUD.  The quote that Tony posted is just a
thinly-veiled jab at R (veiled by a disingenuous we value open
source veneer).  Perhaps SAS is shooting themselves in the foot with
their reactions; aren't they making it harder if they should ever
decide the best thing to do is to embrace R and the philosophies
behind it?  Four years ago, Marc Schwartz posted interesting comments
realted to this:

http://tolstoy.newcastle.edu.au/R/help/04/12/9497.html


On another note, I wonder why in the various conversations there seems
to be pervasive views that a) the FDA won't accept work done in R, and
b) SAS is the only way to effectively handle data?


best,

Kingsford Jones







 On 7 Jan, 14:50, Marc Schwartz marc_schwa...@comcast.net wrote:
 on 01/07/2009 08:44 AM Kevin E. Thorpe wrote:



  Zaslavsky, Alan M. wrote:
  This article is accompanied by nice pictures of Robert and Ross.

  Data Analysts Captivated by Power of R
 http://www.nytimes.com/2009/01/07/technology/business-computing/07pro...

  January 7, 2009 Data Analysts Captivated by R's Power By ASHLEE VANCE

  SAS says it has noticed R's rising popularity at universities,
  despite educational discounts on its own software, but it dismisses
  the technology as being of interest to a limited set of people
  working on very hard tasks.

  I think it addresses a niche market for high-end data analysts that
  want free, readily available code, said Anne H. Milley, director of
  technology product marketing at SAS. She adds, We have customers who
  build engines for aircraft. I am happy they are not using freeware
  when I get on a jet.

  Thanks for posting.  Does anyone else find the statement by SAS to be
  humourous yet arrogant and short-sighted?

  Kevin

 It is an ignorant comment by a marketing person who has been spoon fed
 her lines...it is also a comment being made from a very defensive and
 insecure posture.

 Congrats to R Core and the R Community. This is yet another sign of R's
 growth and maturity.

 Regards,

 Marc Schwartz

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] install.views()

2009-01-10 Thread Kingsford Jones
did you install and load the 'ctv' package?

install.packages('ctv')
library('ctv')

then try your code...




Kingsford Jones

On Sat, Jan 10, 2009 at 1:00 PM, oscar linares wins...@gmail.com wrote:
 Dear Rxperts,

 Using R 2.8.1 and trying

 install.views(Cluster)

 getting error

 Error: could not find function install.views


 Please help:-(


 --
 Oscar
 Oscar A. Linares
 Molecular Medicine Unit
 Bolles Harbor
 Monroe, Michigan

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Matrix: Problem with the code

2009-01-09 Thread Kingsford Jones
On Fri, Jan 9, 2009 at 6:36 PM,  markle...@verizon.net wrote:
 Charlotte: I ran your code because I wasn't clear on it and your way would
 cause more matrices than the person requested.

Bhargab gave us

 x-c(23,67,2,87,9,63,8,2,35,6,91,41,22,3)

and said: I want to have a matrix with p columns such that each
column will have the elements of  x^(column#).

so, I think Charlotte's code was spot-on:

p - 3
outer(x, 1:p, '^')
 [,1] [,2]   [,3]
 [1,]   23  529  12167
 [2,]   67 4489 300763
 [3,]24  8
 [4,]   87 7569 658503
 [5,]9   81729
 [6,]   63 3969 250047
 [7,]8   64512
 [8,]24  8
 [9,]   35 1225  42875
[10,]6   36216
[11,]   91 8281 753571
[12,]   41 1681  68921
[13,]   22  484  10648
[14,]39 27


Here's another way -- a bit less elegant, but a gentle
introduction to thinking in vectors rather than elements:

 mat - matrix(0,nrow=length(x), ncol=p)

 for(i in 1:p) mat[,i] - x^i
 mat
 [,1] [,2]   [,3]
 [1,]   23  529  12167
 [2,]   67 4489 300763
 [3,]24  8
 [4,]   87 7569 658503
 [5,]9   81729
 [6,]   63 3969 250047
 [7,]8   64512
 [8,]24  8
 [9,]   35 1225  42875
[10,]6   36216
[11,]   91 8281 753571
[12,]   41 1681  68921
[13,]   22  484  10648
[14,]39 27


best,

Kingsford Jones






So
 I think the code below it, although not too short, does what the person
 asked. Thanks though because I understand outer better now.

 temp - matrix(c(1,2,3,4,5,6),ncol=2)
 print(temp)

 #One of those more elegant ways:
 print(temp)
 outer(temp,1:p,'^')One of those more elegant ways:


 # THIS WAY I THINK GIVES WHAT THEY WANT

 sapply(1:ncol(temp), function(.col) {
  temp[,.col]^.col
 })



 On Fri, Jan 9, 2009 at  7:40 PM, Charlotte Wickham wrote:

 One of those more elegant ways:
 outer(x, 1:p, ^)

 Charlotte

 On Fri, Jan 9, 2009 at 4:24 PM, Sarah Goslee sarah.gos...@gmail.com
 wrote:

 Well, mat doesn't have any dimensions / isn't a matrix, and we don't
 know what p is supposed to be. But leaving aside those little details,
 do you perhaps want something like this:

 x-c(23,67,2,87,9,63,8,2,35,6,91,41,22,3)
 p - 5
 mat- matrix(0, nrow=p, ncol=length(x))
 for(j in 1:length(x))
 {
 for(i in 1:p)
 mat[i,j]-x[j]^i
 }

 Two notes: I didn't try it out, and if that's what you want rather
 than a toy example
 of a larger problem, there are more elegant ways to do it in R.

 Sarah

 On Fri, Jan 9, 2009 at 6:42 PM, Bhargab Chattopadhyay
 bharga...@yahoo.com wrote:

 Hi,


 Can any one please explain why the following code doesn't work? Or can
 anyone suggest an alternative.
 Suppose
  x-c(23,67,2,87,9,63,8,2,35,6,91,41,22,3)
   mat-0;
   for(j in 1:length(x))
   {
  for(i in 1:p)
   mat[i,j]-x[j]^i;
   }
   Actually I want to have a matrix with p columns such that each column
 will have the elements of  x^(column#).

 Thanks in advance.

 Bhargab






 --
 Sarah Goslee
 http://www.functionaldiversity.org

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] New York Times Article: Data Analysts Captivated by R's Power

2009-01-07 Thread Kingsford Jones
On Wed, Jan 7, 2009 at 5:15 PM, Simon Blomberg s.blombe...@uq.edu.au wrote:
 Some people familiar with R describe it as a supercharged version of
  Microsoft's Excel spreadsheet software...

 Maybe it's my dry Australian humour, but I think this should go into the
 fortunes package.


The irony of that quote is great, but I'd guess it's also an effective
bit of advertising. Akin to Gary Larson's famous what dogs hear
cartoon (link below), I can envision many managers reading the article
and seeing something like:

blah blah blah blah supercharged version of Microsoft's Excel
spreadsheet software that can help illuminate data trends more clearly
blah blah blah blah 

;-)

Kingsford Jones

http://www.flickr.com/photos/sluggerotoole/153603564/




 Simon.
 --
 Simon Blomberg, BSc (Hons), PhD, MAppStat.
 Lecturer and Consultant Statistician
 Faculty of Biological and Chemical Sciences
 The University of Queensland
 St. Lucia Queensland 4072
 Australia
 Room 320 Goddard Building (8)
 T: +61 7 3365 2506
 http://www.uq.edu.au/~uqsblomb
 email: S.Blomberg1_at_uq.edu.au

 Policies:
 1.  I will NOT analyse your data for you.
 2.  Your deadline is your problem.

 The combination of some data and an aching desire for
 an answer does not ensure that a reasonable answer can
 be extracted from a given body of data. - John Tukey.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] the first and last observation for each subject

2009-01-05 Thread Kingsford Jones
Here's some more timing's of Bill's function.  Although in this
example sapply has a clear performance advantage for smaller numbers
of groups (k) , gm is substantially faster for k  1000:

 gm - function(x, group){ # medians by group:
   o-order(group, x)
   group - group[o]
   x - x[o]
   changes - group[-1] != group[-length(group)]
   first - which(c(TRUE, changes))
   last - which(c(changes, TRUE))
   lowerMedian - x[floor((first+last)/2)]
   upperMedian - x[ceiling((first+last)/2)]
   median - (lowerMedian+upperMedian)/2
   names(median) - group[first]
   median
 }

 ##

 for(k in 10^(1:6)){
  group-sample(1:k, size=10, replace=TRUE)
  x-rnorm(length(group))*10 + group
  cat('number of groups:', k, '\n')
  cat('sapply\n')
  print(s - unix.time(sapply(split(x,group), median)))
  cat('gm\n')
  print(g - unix.time(-gm(x,group)))
  cat(' sapply/gm user ratio:', s[1]/g[1], '\n\n\n')
 }

number of groups: 10
sapply
   user  system elapsed
   0.010.000.01
gm
   user  system elapsed
   0.140.000.16
 sapply/gm user ratio: 0.07142857


number of groups: 100
sapply
   user  system elapsed
   0.020.000.02
gm
   user  system elapsed
   0.110.000.09
 sapply/gm user ratio: 0.1818182


number of groups: 1000
sapply
   user  system elapsed
   0.110.000.11
gm
   user  system elapsed
   0.110.000.11
 sapply/gm user ratio: 1


number of groups: 1
sapply
   user  system elapsed
   1.000.001.04
gm
   user  system elapsed
   0.100.000.09
 sapply/gm user ratio: 10


number of groups: 1e+05
sapply
   user  system elapsed
   6.240.016.31
gm
   user  system elapsed
   0.160.000.17
 sapply/gm user ratio: 39


number of groups: 1e+06
sapply
   user  system elapsed
   9.000.038.92
gm
   user  system elapsed
   0.150.000.20
 sapply/gm user ratio: 60


 R.Version()
$platform
[1] i386-pc-mingw32

$arch
[1] i386

$os
[1] mingw32

$system
[1] i386, mingw32

$status
[1] 

$major
[1] 2

$minor
[1] 8.0

$year
[1] 2008

$month
[1] 10

$day
[1] 20

$`svn rev`
[1] 46754

$language
[1] R

$version.string
[1] R version 2.8.0 (2008-10-20)


Kingsford Jones



On Mon, Jan 5, 2009 at 10:18 AM, William Dunlap wdun...@tibco.com wrote:
 Arg, the 'sapply(...)' in the function was in the initial
 comment,
   gm - function(x, group){ # medians by group:
 sapply(split(x,group),median)
 but someone's mailer put a newline before the sapply
   gm - function(x, group){ # medians by group:
   sapply(split(x,group),median)
 so it got executed, wasting all that time.  (Of course the
 same mailer will mess up this message in the same way -
 just delete the sapply() call from gm().)

 Bill Dunlap
 TIBCO Software Inc - Spotfire Division
 wdunlap tibco.com

 -Original Message-
 From: hadley wickham [mailto:h.wick...@gmail.com]
 Sent: Monday, January 05, 2009 9:10 AM
 To: William Dunlap
 Cc: gallon...@gmail.com; R help
 Subject: Re: [R] the first and last observation for each subject

  Another application of that technique can be used to quickly compute
  medians by groups:
 
  gm - function(x, group){ # medians by group:
  sapply(split(x,group),median)
o-order(group, x)
group - group[o]
x - x[o]
changes - group[-1] != group[-length(group)]
first - which(c(TRUE, changes))
last - which(c(changes, TRUE))
lowerMedian - x[floor((first+last)/2)]
upperMedian - x[ceiling((first+last)/2)]
median - (lowerMedian+upperMedian)/2
names(median) - group[first]
median
  }
 
  For a 10^5 long x and a somewhat fewer than 3*10^4 distinct groups
  (in random order) the times are:
 
  group-sample(1:3, size=10, replace=TRUE)
  x-rnorm(length(group))*10 + group
  unix.time(z0-sapply(split(x,group), median))
user  system elapsed
2.720.003.20
  unix.time(z1-gm(x,group))
user  system elapsed
0.120.000.16
  identical(z1,z0)
  [1] TRUE

 I get:

  unix.time(z0-sapply(split(x,group), median))
user  system elapsed
   2.733   0.017   2.766
  unix.time(z1-gm(x,group))
user  system elapsed
   2.897   0.032   2.946


 Hadley


 --
 http://had.co.nz/


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] the first and last observation for each subject

2009-01-05 Thread Kingsford Jones
whoops -- I left the group size unchanged so k became greather than
the length of the group vector.  When I increase the size to 1e7,
sapply is faster until it gets to k = 1e6.

warning:  this takes awhile (particularly on my machine which seems to
be using just 1 of it's 2 cpus)

  for(k in 10^(1:6)){
+   group-sample(1:k, size=1e7, replace=TRUE)
+   x-rnorm(length(group))*10 + group
+   cat('number of groups:', k, '\n')
+   cat('sapply\n')
+   print(s - unix.time(sapply(split(x,group), median)))
+   cat('gm\n')
+   print(g - unix.time(-gm(x,group)))
+   cat(' sapply/gm user ratio:', s[1]/g[1], '\n\n\n')
+  }
number of groups: 10
sapply
   user  system elapsed
   1.180.381.56
gm
   user  system elapsed
  53.430.70   55.05
 sapply/gm user ratio: 0.02208497


number of groups: 100
sapply
   user  system elapsed
   1.170.231.42
gm
   user  system elapsed
  49.890.61   51.60
 sapply/gm user ratio: 0.02345159


number of groups: 1000
sapply
   user  system elapsed
   1.290.251.72
gm
   user  system elapsed
  45.230.34   46.55
 sapply/gm user ratio: 0.02852089


number of groups: 1
sapply
   user  system elapsed
   2.800.092.87
gm
   user  system elapsed
  41.040.55   42.85
 sapply/gm user ratio: 0.06822612


number of groups: 1e+05
sapply
   user  system elapsed
  14.650.16   15.18
gm
   user  system elapsed
  38.280.65   39.55
 sapply/gm user ratio: 0.3827064


number of groups: 1e+06
sapply
   user  system elapsed
 126.630.42  129.21
gm
   user  system elapsed
  37.560.84   38.78
 sapply/gm user ratio: 3.371406

On Mon, Jan 5, 2009 at 2:37 PM, Kingsford Jones
kingsfordjo...@gmail.com wrote:
 Here's some more timing's of Bill's function.  Although in this
 example sapply has a clear performance advantage for smaller numbers
 of groups (k) , gm is substantially faster for k  1000:

  gm - function(x, group){ # medians by group:
   o-order(group, x)
   group - group[o]
   x - x[o]
   changes - group[-1] != group[-length(group)]
   first - which(c(TRUE, changes))
   last - which(c(changes, TRUE))
   lowerMedian - x[floor((first+last)/2)]
   upperMedian - x[ceiling((first+last)/2)]
   median - (lowerMedian+upperMedian)/2
   names(median) - group[first]
   median
  }

  ##

  for(k in 10^(1:6)){
  group-sample(1:k, size=10, replace=TRUE)
  x-rnorm(length(group))*10 + group
  cat('number of groups:', k, '\n')
  cat('sapply\n')
  print(s - unix.time(sapply(split(x,group), median)))
  cat('gm\n')
  print(g - unix.time(-gm(x,group)))
  cat(' sapply/gm user ratio:', s[1]/g[1], '\n\n\n')
  }

 number of groups: 10
 sapply
   user  system elapsed
   0.010.000.01
 gm
   user  system elapsed
   0.140.000.16
  sapply/gm user ratio: 0.07142857


 number of groups: 100
 sapply
   user  system elapsed
   0.020.000.02
 gm
   user  system elapsed
   0.110.000.09
  sapply/gm user ratio: 0.1818182


 number of groups: 1000
 sapply
   user  system elapsed
   0.110.000.11
 gm
   user  system elapsed
   0.110.000.11
  sapply/gm user ratio: 1


 number of groups: 1
 sapply
   user  system elapsed
   1.000.001.04
 gm
   user  system elapsed
   0.100.000.09
  sapply/gm user ratio: 10


 number of groups: 1e+05
 sapply
   user  system elapsed
   6.240.016.31
 gm
   user  system elapsed
   0.160.000.17
  sapply/gm user ratio: 39


 number of groups: 1e+06
 sapply
   user  system elapsed
   9.000.038.92
 gm
   user  system elapsed
   0.150.000.20
  sapply/gm user ratio: 60


 R.Version()
 $platform
 [1] i386-pc-mingw32

 $arch
 [1] i386

 $os
 [1] mingw32

 $system
 [1] i386, mingw32

 $status
 [1] 

 $major
 [1] 2

 $minor
 [1] 8.0

 $year
 [1] 2008

 $month
 [1] 10

 $day
 [1] 20

 $`svn rev`
 [1] 46754

 $language
 [1] R

 $version.string
 [1] R version 2.8.0 (2008-10-20)


 Kingsford Jones



 On Mon, Jan 5, 2009 at 10:18 AM, William Dunlap wdun...@tibco.com wrote:
 Arg, the 'sapply(...)' in the function was in the initial
 comment,
   gm - function(x, group){ # medians by group:
 sapply(split(x,group),median)
 but someone's mailer put a newline before the sapply
   gm - function(x, group){ # medians by group:
   sapply(split(x,group),median)
 so it got executed, wasting all that time.  (Of course the
 same mailer will mess up this message in the same way -
 just delete the sapply() call from gm().)

 Bill Dunlap
 TIBCO Software Inc - Spotfire Division
 wdunlap tibco.com

 -Original Message-
 From: hadley wickham [mailto:h.wick...@gmail.com]
 Sent: Monday, January 05, 2009 9:10 AM
 To: William Dunlap
 Cc: gallon...@gmail.com; R help
 Subject: Re: [R] the first and last observation for each subject

  Another application of that technique can be used to quickly compute
  medians by groups:
 
  gm - function(x, group){ # medians by group:
  sapply(split(x,group),median)
o-order(group, x)
group - group[o]
x - x[o]
changes - group[-1] != group

Re: [R] Basic Question about use of R

2009-01-02 Thread Kingsford Jones
Hi David,

A general answer to your question is: yes, R would be useful for such
analyses - particularly when interfaced with a GIS.  For an
introduction to the types of spatial tools available in R see the Task
View for spatial data:
http://cran.r-project.org/web/views/Spatial.html

Below are a few more specific comments:

On Fri, Jan 2, 2009 at 12:12 PM, David Greene greene...@ntelos.net wrote:
 Dear Sirs:
 I am not yet a user of R.  My background includes the use of (Turbo) Pascal
 for scientific analysis of underwater acoustics problems (e.g. sound ray
 tracing and array gain in directional noise fields.)
 I have become interested in the following type of problem:
 (1) select , say, 1000 random locations within the continental United
 States;

This could be as simple as using the runif function, but more likely
you'll want to look at sp::spsample, or for more advanced tools see
the spsurvey and spatstat packages.

 (2) characterize (statistically) the probabilities of:
   (a) distance to the nearest paved road;
   (b) elevation above sea level;
   (c) (?) ownership (public or private); etc.

R is outstanding for the types of 'statistical characterization' I
guess you are interested in.  It also has excellent capabilities for
importing and manipulating spatial data (e.g. see the Reading and
writing spatial data section of the Task View).  However for doing
things like calculating geographic distances using objects of varying
types (points, lines, polygons, grids) it's generally easiest to use a
GIS (such as GRASS, SAGA, ArcInfo, ...).  You can then use the
available tools for importing the GIS results into R for statistical
analysis, and if you wish, exporting back to the GIS.  However if you
do not want to put the effort into learning a GIS, it is usually
possible to work out a solution using only R.  As you run into
specific problems the R-sig-geo list is a good place to get helpful
answers to well formulated questions.

 Would R be useful , perhaps in combination with Google Earth, to carry out

As far as I know Google Earth is designed for visualization rather
than analysis.  R in combination with a GIS is really the way to go.

Here is a current book that covers many of the spatial tools available in R

http://www.springer.com/public+health/epidemiology/book/978-0-387-78170-9

hope that helps,

Kingsford Jones


 this kind of study?
 Thank you for your consideration.
 David C. Greene
 greene...@ntelos.net

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Spatial Data Analysis in R [was: Basic Question about use of R]

2009-01-02 Thread Kingsford Jones
resending to provide a more informative subject line

On Fri, Jan 2, 2009 at 3:21 PM, Kingsford Jones
kingsfordjo...@gmail.com wrote:
 Hi David,

 A general answer to your question is: yes, R would be useful for such
 analyses - particularly when interfaced with a GIS.  For an
 introduction to the types of spatial tools available in R see the Task
 View for spatial data:
 http://cran.r-project.org/web/views/Spatial.html

 Below are a few more specific comments:

 On Fri, Jan 2, 2009 at 12:12 PM, David Greene greene...@ntelos.net wrote:
 Dear Sirs:
 I am not yet a user of R.  My background includes the use of (Turbo) Pascal
 for scientific analysis of underwater acoustics problems (e.g. sound ray
 tracing and array gain in directional noise fields.)
 I have become interested in the following type of problem:
 (1) select , say, 1000 random locations within the continental United
 States;

 This could be as simple as using the runif function, but more likely
 you'll want to look at sp::spsample, or for more advanced tools see
 the spsurvey and spatstat packages.

 (2) characterize (statistically) the probabilities of:
   (a) distance to the nearest paved road;
   (b) elevation above sea level;
   (c) (?) ownership (public or private); etc.

 R is outstanding for the types of 'statistical characterization' I
 guess you are interested in.  It also has excellent capabilities for
 importing and manipulating spatial data (e.g. see the Reading and
 writing spatial data section of the Task View).  However for doing
 things like calculating geographic distances using objects of varying
 types (points, lines, polygons, grids) it's generally easiest to use a
 GIS (such as GRASS, SAGA, ArcInfo, ...).  You can then use the
 available tools for importing the GIS results into R for statistical
 analysis, and if you wish, exporting back to the GIS.  However if you
 do not want to put the effort into learning a GIS, it is usually
 possible to work out a solution using only R.  As you run into
 specific problems the R-sig-geo list is a good place to get helpful
 answers to well formulated questions.

 Would R be useful , perhaps in combination with Google Earth, to carry out

 As far as I know Google Earth is designed for visualization rather
 than analysis.  R in combination with a GIS is really the way to go.

 Here is a current book that covers many of the spatial tools available in R

 http://www.springer.com/public+health/epidemiology/book/978-0-387-78170-9

 hope that helps,

 Kingsford Jones


 this kind of study?
 Thank you for your consideration.
 David C. Greene
 greene...@ntelos.net

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] interpretation of conf.type in predict.Design {Design}

2009-01-01 Thread Kingsford Jones
Hi Dylan,

I haven't looked at the code for predict.Design or predict.lm, but I
think it's safe to assume that mean and confidence refer to the
same concept, as do individual and prediction.   Here's my logic:

In general, confidence intervals refer to parameter estimates and
prediction intervals to predicted point values.  For the linear model,
the fitted line represents the estimated conditional mean of y given
the x-values and we form a confidence interval around it.  In this
case, our best prediction of any individual y-value given the
x-values is also the fitted line.  However, upon repeated sampling the
fitted line will vary less than the observed y values at any given set
of x-values, and this is reflected in the fact that the confidence
interval is narrower than the prediction interval.

hth,

Kingsford Jones

On Wed, Dec 31, 2008 at 2:15 PM, Dylan Beaudette
dylan.beaude...@gmail.com wrote:
 Hi,

 I am not quite sure how to interpret the differences in output when
 changing conf.type from the default mean to individual. Are these
 analogous to the differences between confidence and prediction
 intervals, as defined in predict.lm {stats} ?

 Thanks in advance.

 Dylan

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to display y-axis labels in Multcomp plot

2008-12-08 Thread Kingsford Jones
See ?par and note the 'mar' parameter

Here's an example:


library(multcomp)
labs - c('short', 'medium', 'long')
treatment - gl(3, 10, labels = labs)
response - rnorm(30, mean=as.numeric(treatment))
mult - glht(lm(response ~ treatment),
linfct=mcp(treatment='Means'))
par(mar=c(4,8,4,2))
plot(confint(mult))

hth,

Kingsford Jones



On Mon, Dec 8, 2008 at 5:06 PM, Metconnection [EMAIL PROTECTED] wrote:

 Dear R-users,
 I'm currently using the multcomp package to produce plots of means with 95%
 confidence intervals
 i.e.

 mult-glht(lm(response~treatment, data=statdata),
 linfct=mcp(treatment=Means))
 plot(confint(mult,calpha = sig))

 Unfortunately the y-axis on the plot appears to be fixed and hence if the
 labels on the y-axis (treatment levels) are too long, then they are not
 displayed in full on the plot. Of course I could always make the labels
 shorter but I was wondering if there was a way to make the position of the
 y-axis on the plot more flexible, such as in the scatterplot produced using
 xyplot function, that would allow me to view the labels in full.

 Thanks in advance for any advice!
 Simon

 --
 View this message in context: 
 http://www.nabble.com/How-to-display-y-axis-labels-in-Multcomp-plot-tp20904977p20904977.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Create unique sets of 3 from a vector of IDs?

2008-12-02 Thread Kingsford Jones
However, I believe Brandon was trying to get the permutations of size
3, rather than combinations.  Dylan provided a solution including
repeats.  Here's one without:

 library(gtools)
 permutations(5, 3, LETTERS[1:5])
  [,1] [,2] [,3]
 [1,] A  B  C
 [2,] A  B  D
 [3,] A  B  E
 [4,] A  C  B
 [5,] A  C  D
 [6,] A  C  E
 [7,] A  D  B
 [8,] A  D  C
 [9,] A  D  E
[10,] A  E  B
[11,] A  E  C
[12,] A  E  D
[13,] B  A  C
[14,] B  A  D
[15,] B  A  E
[16,] B  C  A
[17,] B  C  D
[18,] B  C  E
[19,] B  D  A
[20,] B  D  C
[21,] B  D  E
[22,] B  E  A
[23,] B  E  C
[24,] B  E  D
[25,] C  A  B
[26,] C  A  D
[27,] C  A  E
[28,] C  B  A
[29,] C  B  D
[30,] C  B  E
[31,] C  D  A
[32,] C  D  B
[33,] C  D  E
[34,] C  E  A
[35,] C  E  B
[36,] C  E  D
[37,] D  A  B
[38,] D  A  C
[39,] D  A  E
[40,] D  B  A
[41,] D  B  C
[42,] D  B  E
[43,] D  C  A
[44,] D  C  B
[45,] D  C  E
[46,] D  E  A
[47,] D  E  B
[48,] D  E  C
[49,] E  A  B
[50,] E  A  C
[51,] E  A  D
[52,] E  B  A
[53,] E  B  C
[54,] E  B  D
[55,] E  C  A
[56,] E  C  B
[57,] E  C  D
[58,] E  D  A
[59,] E  D  B
[60,] E  D  C


Kingsford Jones


On Tue, Dec 2, 2008 at 9:41 PM, G. Jay Kerns [EMAIL PROTECTED] wrote:
 Dear Brandon,

 On Tue, Dec 2, 2008 at 10:46 PM, Dylan Beaudette
 [EMAIL PROTECTED] wrote:
 On Tue, Dec 2, 2008 at 7:42 PM, philozine [EMAIL PROTECTED] wrote:
 Dear all:

 This is one of those should be easy problems that I'm having great 
 difficulty solving. I have a vector containing ID codes, and I need to 
 generate a 3-column matrix that contains all possible combinations of three.

 For example, my ID vector looks like this:
 A
 B
 C
 D
 E

 I need to generate a matrix that looks like this:
 A B C
 A B D
 A B E
 A C B
 A C D
 A C E
 A D B
 A D C
 A D E

 Hi,

 Does this do what you want?

 expand.grid(letters[1:5], letters[1:5], letters[1:5])


 D



 Have a look at urnsamples() in the prob package.

 ID - LETTERS[1:5]
 urnsamples(ID, size = 3, replace = FALSE, ordered = FALSE)

 Best,
 Jay




 ***
 G. Jay Kerns, Ph.D.
 Associate Professor
 Department of Mathematics  Statistics
 Youngstown State University
 Youngstown, OH 44555-0002 USA
 Office: 1035 Cushwa Hall
 Phone: (330) 941-3310 Office (voice mail)
 -3302 Department
 -3170 FAX
 E-mail: [EMAIL PROTECTED]
 http://www.cc.ysu.edu/~gjkerns/

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function for simultaneous confidence interval of regression coefficients

2008-11-30 Thread Kingsford Jones
see
?coef  # extract the estimates
?vcov  #  extract their covariance matrix
?qf  # get the F quantile of interest

Also, you may be interested in

?car::ellipse
?ellipse::ellipse.lm
?gmodels::glht.test

hth,

Kingsford Jones


On Sat, Nov 29, 2008 at 4:30 PM, Kyle Matoba [EMAIL PROTECTED] wrote:
 List,

 Would someone be so kind as to point me to a function that will calculate
 simultaneous confidence intervals of regression coefficients based upon
 their distribution as (under the assumption of normal errors, with
 \mathbf{X} as the design matrix):

 $\hat{\mathbf{\beta}} \sim N(\mathbf{\beta},
 \sigma^2(\mathbf{X}^T\mathbf{X})^{-1})$.


 'confint' calculates individual coefficients so far as I can tell, but I
 need simultaneous CIs based on the confidence ellipse/ F distribution.
 Inverting the ellipse gives this equation:

 \mathbf{\hat{\beta}} \pm
 \sqrt{\mathrm{diag}(s^2(\mathbf{X}^T\mathbf{X})^{-1}) \times p \times F_{p,
 n-p, .95}}

 Thanks, and sorry for such a dumb question.  Either I am not searching for
 the right thing or this hasn't already been addressed in the lists (perhaps
 because it is so easy).

 Kyle

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to select cases based on value of one or more variables

2008-11-30 Thread Kingsford Jones
It's generally easier to work with data frames, so read your data with

students - read.spss(yourFile, to.data.frame=TRUE)


Then subset will work as expected:

subset(students, Sex == 1)


If you would rather keep the data as a list you could do something like

lapply(students, function(x) x[students$Sex == 1])

hth,

Kingsford Jones



On Sun, Nov 30, 2008 at 2:15 PM, Simone Gabbriellini
[EMAIL PROTECTED] wrote:
 sorry for my bad presentation...

 read.spss gives me this:

 students
 $Auno
  [1]  6  1  2  2  1  3  4  2  4  2  4  4  1  1 NA  1  4  2  1  1  1  5  4
  [24]  2  1  2  1  2  1  4  4  1  1  1  2  1  6  1  1  1  1  1  2  1  2  1
  [47]  2  2  1  4  2  4  3  1  1  1  1  3  2  1  4  4  4  4  2  4  1  2  4
  [70]  1  3  4  5  2  4  3  5  5  4  2  1  1  1  1  4  5  2  4  4  1  4  2
  [93]  1  2  3  3  2  1  2  2  2  1  1  1  3  5  5  5  2 NA  2  1 NA  5  2
 [116]  1  4  2 NA  1  4  5  2  3  1  1  1  1  4  2  1  1  3  2  4  2  4  2
 [139]  1  4  1  2  4  1  2  3  2  1  1  2  4  4  3  4  1  1  3  2  1  1  2
 [162]  1  2  5  5  5  1  4  3  2  3  3  2  1  1  5  1  2  1  1  2  1  2  1
 [185]  1  2  1  1  1  1  3  4  2  1  4  2  4  1  4  2  1  1  1  2  1  4  1
 [208]  5  1  1  4  4  2  1  1  5  4  1  1  5  5  4  1  4

 $Sex
  [1] 2 1 2 1 2 2 2 2 1 2 1 1 2 1 0 2 2 2 2 2 2 2 2 1 2 2 1 2 2 1 2 2 2 1
  [35] 2 2 2 1 1 2 2 2 2 2 2 2 2 2 1 1 1 1 2 2 2 2 2 2 2 1 1 2 2 2 2 2 1 2
  [69] 2 1 2 1 2 1 2 2 2 2 2 2 1 2 2 2 1 2 2 2 1 1 1 2 1 2 1 2 2 2 2 2 2 2
 [103] 2 2 2 2 1 2 2 2 2 2 2 2 2 2 1 2 0 2 2 2 1 2 2 1 2 1 2 2 1 1 2 1 2 1
 [137] 2 1 2 1 1 1 1 1 1 2 1 1 2 2 2 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 1 1 2 2
 [171] 2 2 2 2 2 1 1 1 1 2 2 2 2 2 2 2 2 1 2 2 1 1 1 1 1 2 0 2 2 1 2 1 2 2
 [205] 1 2 2 2 2 2 2 2 2 2 1 2 2 1 1 2 2 2 2 2

 

 I would like to filter - or subset - the dataset for $Sex = 1 (in this case
 means male...),  for example...

 thanks anyway,
 Simone



 Il giorno 30/nov/08, alle ore 21:49, Don MacQueen ha scritto:

 It is.

 For example, if you have a variable stored as a vector named x, and
 another variable stored as aa vector named y, you can select cases of y
 where x is greater than 3 by using

  y[x3]

 However, you're going to have to provide more information in order to get
 a better answer than that (see the posting guide, link included with every
 post to r-help). In particular, I'm guessing that the answer you really want
 looks somewhat different than my example -- but this depends on the exact
 structure of what read.spss() produces.

 I'd also suggest reading some of the documentation available from the R
 website (CRAN), notably, An Introduction to R.

 -Don

 At 9:36 PM +0100 11/30/08, Simone Gabbriellini wrote:

 dear list,

 I have read a spss file with read.spss()

 now I have a list with all my variable stored as vectors.

 is it possible to selec cases based on the value of one or more
 variables?

 thank you,
 Simone

 __
 R-help@r-project.org mailing list
 https:// stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http:// www.
 R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 --
 -
 Don MacQueen
 Lawrence Livermore National Laboratory
 Livermore, CA, USA
 925-423-1062
 [EMAIL PROTECTED]
 -

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] AIC function and Step function

2008-11-30 Thread Kingsford Jones
On Sun, Nov 30, 2008 at 5:05 PM, Dana77 [EMAIL PROTECTED] wrote:

  Thanks for kind help from Steven and Christos last time.  Now I got new
 problem regarding the codes for calculating the weights (w) in AIC ()
 function.
 The original code is as below:
   getAnywhere(logLik.lm)
 function (object, REML = FALSE, ...)
  {
res - object$residuals
p - object$rank
N - length(res)
if (is.null(w - object$weights)) {
w - rep.int(1, N)
}else {
excl - w == 0
if (any(excl)) {
res - res[!excl]
N - length(res)
w - w[!excl]
}
}

  Now my question is, if I use lm() function to fit a multiple linear
 regression model, such as mod.fit-lm(formula = Y~ X1 + X2 + X3, data =
 set1), what code could I use to extract the weights (w) out? or how to
 calculate the weights(w) shown in above codes?


mod.fit won't have weights because you didn't specify any through the
weights argument to lm.  If you had, you could extract them using the
same technique used in the above code:  w - mod.fit$weights

hth,

Kingsford Jones






 Thanks for your time and kind
 help!

 Dana



 Steven McKinney wrote:

 Hi Dana,

 Many thanks to Christos Hatzis who sent
 me an offline response, pointing out the
 new functions that make this much
 easier than my last suggestions:
 methods() and getAnywhere()

 methods(extractAIC)
 [1] extractAIC.aov* extractAIC.coxph*   extractAIC.glm*
 extractAIC.lm*  extractAIC.negbin*
 [6] extractAIC.survreg*

Non-visible functions are asterisked
 getAnywhere(extractAIC.coxph)
 A single object matching 'extractAIC.coxph' was found
 It was found in the following places
   registered S3 method for extractAIC from namespace stats
   namespace:stats
 with value

 function (fit, scale, k = 2, ...)
 {
 edf - length(fit$coef)
 loglik - fit$loglik[length(fit$loglik)]
 c(edf, -2 * loglik + k * edf)
 }
 environment: namespace:stats


 Thank you Christos.


 That said, one of the advantages of getting
 the source code is that it has comments that
 are stripped out when the code is sourced into R

 e.g. from the command line

 getAnywhere(AIC.default)
 A single object matching 'AIC.default' was found
 It was found in the following places
   registered S3 method for AIC from namespace stats
   namespace:stats
 with value

 function (object, ..., k = 2)
 {
 ll - if (stats4 %in% loadedNamespaces())
 stats4:::logLik
 else logLik
 if (length(list(...))) {
 object - list(object, ...)
 val - lapply(object, ll)
 val - as.data.frame(t(sapply(val, function(el) c(attr(el,
 df), AIC(el, k = k)
 names(val) - c(df, AIC)
 Call - match.call()
 Call$k - NULL
 row.names(val) - as.character(Call[-1])
 val
 }
 else AIC(ll(object), k = k)
 }
 environment: namespace:stats

From the source file


 #  File src/library/stats/R/AIC.R
 #  Part of the R package, http://www.R-project.org
 #
 #  This program is free software; you can redistribute it and/or modify
 #  it under the terms of the GNU General Public License as published by
 #  the Free Software Foundation; either version 2 of the License, or
 #  (at your option) any later version.
 #
 #  This program is distributed in the hope that it will be useful,
 #  but WITHOUT ANY WARRANTY; without even the implied warranty of
 #  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 #  GNU General Public License for more details.
 #
 #  A copy of the GNU General Public License is available at
 #  http://www.r-project.org/Licenses/

  Return the object's value of the Akaike Information Criterion
  (or An Inf.. Crit..)

 AIC - function(object, ..., k = 2) UseMethod(AIC)

 ## AIC for logLik objects
 AIC.logLik - function(object, ..., k = 2)
 -2 * c(object) + k * attr(object, df)

 AIC.default - function(object, ..., k = 2)
 {
 ## AIC for various fitted objects --- any for which there's a logLik()
 method:
 ll - if(stats4 %in% loadedNamespaces()) stats4:::logLik else logLik
 if(length(list(...))) {# several objects: produce data.frame
   object - list(object, ...)
   val - lapply(object, ll)
   val - as.data.frame(t(sapply(val,
 function(el)
 c(attr(el, df), AIC(el, k = k)
   names(val) - c(df, AIC)
 Call - match.call()
 Call$k - NULL
   row.names(val) - as.character(Call[-1])
   val
 } else AIC(ll(object), k = k)
 }



 Steven McKinney

 Statistician
 Molecular Oncology and Breast Cancer Program
 British Columbia Cancer Research Centre

 email: smckinney +at+ bccrc +dot+ ca

 tel: 604-675-8000 x7561

 BCCRC
 Molecular Oncology
 675 West 10th Ave, Floor 4
 Vancouver B.C.
 V5Z 1L3
 Canada

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read

Re: [R] aov help

2008-11-13 Thread Kingsford Jones
Rather than estimating the variance components via the
method-of-moments estimators, have a look at the 'nlme' and 'lme4'
packages, which provide likelihood-based tools for estimating random
and mixed models (and in the case of nlme, gls models too).
Advantages of the likelihood-based approaches include working with
unbalanced data, not producing negative variance estimates when the
MS_{error} is larger than the MS_{between groups}, and providing a
great deal of flexibility in structuring both the random effects and
error covariance matrices.

hth,

Kingsford Jones

On Thu, Nov 13, 2008 at 7:14 PM,  [EMAIL PROTECTED] wrote:
 Please pardon an extremely naive question. I see related earlier
 posts, but no responses which answer my particular question. In
 general, I'm very confused about how to do variance decomposition with
 random and mixed effects. Pointers to good tutorials or texts would
 be greatly appreciated.

 To give a specific example, page 193 of VR, 3d Edition, illustrates
 using raov assuming pure random effects on a subset of coop:

 raov(Conc ~ Lab / Bat, data=coop, subset = Spc==S1)

 I realize ('understand' would be a bit too strong) that the same analysis,
 resulting in identical sums of squares, degrees of freedom, and residuals
 can be generated in R by doing

 op - options(contrasts=c(contr.helmert, contr.poly))
 aov(Conc ~ Lab + Error(Lab / Bat), data=coop, subset = Spc==S1)

 However, as shown in VR, raov also equated the expected and observed
 mean squares, to solve for and display the variance components associated
 with the random factors, \sigma_\epsilon^2, \sigma_B^2, and \sigma_L^2 in
 a column labeled Est.  Var.. Given the analytical forms of the expected
 mean squares for each stratum, I can obviously do this manually. But is
 there way to get R to do it automatically, a la raov? This would be
 particularly useful for mixed cases in which the analytical formulations
 of the expected mean squares may not be immediately obvious to a novice.

 Thanks in advance!

 Regards,
 -jh-

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] VEC Operator in R

2008-10-23 Thread Kingsford Jones
RSiteSearch('vec vech', restrict = 'fun') turns up several packages
with vec and vech functions.

E.g., matrixcalc, fUtilities, ks, ...


hth,

Kingsford Jones

On Thu, Oct 23, 2008 at 11:47 AM, megh [EMAIL PROTECTED] wrote:

 Can anyone please tell whether there is any R function to act as VEC and
 VECH operator on Matrix? Yes of course, I can write a
 user-defined-function for that or else, I can put dim(mat) - NULL. However
 I am looking for some R function.

 Your help will be highly appreciated.

 Regards,
 --
 View this message in context: 
 http://www.nabble.com/VEC-Operator-in-R-tp20136201p20136201.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating confidence limits for the difference between means

2008-10-23 Thread Kingsford Jones
On Wed, Oct 22, 2008 at 11:09 AM, Dennis Fisher [EMAIL PROTECTED] wrote:
 Colleagues,

 I am working on a problem at the edge of my knowledge of statistics.

 Basically, I have values for two groups - I am trying to calculate the
 90% confidence limits for the difference between means.  I can
 implement this without difficulty based on standard equations that are
 available in stat textbooks: (difference between means) / (pooled
 standard error)

 My problem arises when I try to correct for the value of a covariate.
 I can do the ANOVA with the covariate as a factor.  But, I don't know
 how to use the output to obtain the confidence limits of the difference.


Generally 'factor' refers to qualitative variables and 'covariate' to
quantitative, but if you are looking to estimate the difference in two
means using an ANCOVA perhaps something like...

fit - lm(y ~ trt + covar) # trt is a 2-level factor and covar is quantitative
confint(fit, level = .90)

hth,

Kingsford Jones



 I suspect that several participants in this board have implemented
 code to so this.  I hope that someone is willing to share the code.

 Thanks in advance.

 Dennis


 Dennis Fisher MD
 P  (The P Less Than Company)
 Phone: 1-866-PLessThan (1-866-753-7784)
 Fax: 1-415-564-2220
 www.PLessThan.com


[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] VEC Operator in R

2008-10-23 Thread Kingsford Jones
Sender: [EMAIL PROTECTED]
On-Behalf-Of: [EMAIL PROTECTED]
Subject: Re: [R] VEC Operator in R
Message-Id: [EMAIL PROTECTED]
Recipient: [EMAIL PROTECTED]




This information is being sent at the recipient's request or with their 
specific understanding. The recipient acknowledges that by sending this 
information via electronic means, there is no absolute assurance that the 
information will be free from third party access, use, or further 
dissemination. This e-mail contains information that is privileged and/or 
confidential and may be subject to legal restrictions and penalties regarding 
its unauthorized disclosure or other use. You are prohibited from copying, 
distributing or otherwise using this information if you are not the intended 
recipient. Past performance is not necessarily indicative of future results. 
This is not an offer of or the solicitation for any security which will be made 
only by private placement memorandum that may be obtained from the applicable 
hedge fund. If you have received this e-mail in error, please notify us 
immediately by return e-mail and delete this e-mail and all attachments from 
your system. Thank You.
---BeginMessage---
RSiteSearch('vec vech', restrict = 'fun') turns up several packages
with vec and vech functions.

E.g., matrixcalc, fUtilities, ks, ...


hth,

Kingsford Jones

On Thu, Oct 23, 2008 at 11:47 AM, megh [EMAIL PROTECTED] wrote:

 Can anyone please tell whether there is any R function to act as VEC and
 VECH operator on Matrix? Yes of course, I can write a
 user-defined-function for that or else, I can put dim(mat) - NULL. However
 I am looking for some R function.

 Your help will be highly appreciated.

 Regards,
 --
 View this message in context: 
 http://www.nabble.com/VEC-Operator-in-R-tp20136201p20136201.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
---End Message---
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] VEC Operator in R

2008-10-23 Thread Kingsford Jones
Sender: [EMAIL PROTECTED]
On-Behalf-Of: [EMAIL PROTECTED]
Subject: Re: [R] VEC Operator in R
Message-Id: [EMAIL PROTECTED]
Recipient: [EMAIL PROTECTED]




This information is being sent at the recipient's request or with their 
specific understanding. The recipient acknowledges that by sending this 
information via electronic means, there is no absolute assurance that the 
information will be free from third party access, use, or further 
dissemination. This e-mail contains information that is privileged and/or 
confidential and may be subject to legal restrictions and penalties regarding 
its unauthorized disclosure or other use. You are prohibited from copying, 
distributing or otherwise using this information if you are not the intended 
recipient. Past performance is not necessarily indicative of future results. 
This is not an offer of or the solicitation for any security which will be made 
only by private placement memorandum that may be obtained from the applicable 
hedge fund. If you have received this e-mail in error, please notify us 
immediately by return e-mail and delete this e-mail and all attachments from 
your system. Thank You.
---BeginMessage---
RSiteSearch('vec vech', restrict = 'fun') turns up several packages
with vec and vech functions.

E.g., matrixcalc, fUtilities, ks, ...


hth,

Kingsford Jones

On Thu, Oct 23, 2008 at 11:47 AM, megh [EMAIL PROTECTED] wrote:

 Can anyone please tell whether there is any R function to act as VEC and
 VECH operator on Matrix? Yes of course, I can write a
 user-defined-function for that or else, I can put dim(mat) - NULL. However
 I am looking for some R function.

 Your help will be highly appreciated.

 Regards,
 --
 View this message in context: 
 http://www.nabble.com/VEC-Operator-in-R-tp20136201p20136201.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
---End Message---
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating confidence limits for the difference betweenmeans

2008-10-23 Thread Kingsford Jones
Sender: [EMAIL PROTECTED]
On-Behalf-Of: [EMAIL PROTECTED]
Subject: Re: [R] Calculating confidence limits for the difference between   
means
Message-Id: [EMAIL PROTECTED]
Recipient: [EMAIL PROTECTED]




This information is being sent at the recipient's request or with their 
specific understanding. The recipient acknowledges that by sending this 
information via electronic means, there is no absolute assurance that the 
information will be free from third party access, use, or further 
dissemination. This e-mail contains information that is privileged and/or 
confidential and may be subject to legal restrictions and penalties regarding 
its unauthorized disclosure or other use. You are prohibited from copying, 
distributing or otherwise using this information if you are not the intended 
recipient. Past performance is not necessarily indicative of future results. 
This is not an offer of or the solicitation for any security which will be made 
only by private placement memorandum that may be obtained from the applicable 
hedge fund. If you have received this e-mail in error, please notify us 
immediately by return e-mail and delete this e-mail and all attachments from 
your system. Thank You.
---BeginMessage---
On Wed, Oct 22, 2008 at 11:09 AM, Dennis Fisher [EMAIL PROTECTED] wrote:
 Colleagues,

 I am working on a problem at the edge of my knowledge of statistics.

 Basically, I have values for two groups - I am trying to calculate the
 90% confidence limits for the difference between means.  I can
 implement this without difficulty based on standard equations that are
 available in stat textbooks: (difference between means) / (pooled
 standard error)

 My problem arises when I try to correct for the value of a covariate.
 I can do the ANOVA with the covariate as a factor.  But, I don't know
 how to use the output to obtain the confidence limits of the difference.


Generally 'factor' refers to qualitative variables and 'covariate' to
quantitative, but if you are looking to estimate the difference in two
means using an ANCOVA perhaps something like...

fit - lm(y ~ trt + covar) # trt is a 2-level factor and covar is quantitative
confint(fit, level = .90)

hth,

Kingsford Jones



 I suspect that several participants in this board have implemented
 code to so this.  I hope that someone is willing to share the code.

 Thanks in advance.

 Dennis


 Dennis Fisher MD
 P  (The P Less Than Company)
 Phone: 1-866-PLessThan (1-866-753-7784)
 Fax: 1-415-564-2220
 www.PLessThan.com


[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
---End Message---
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating confidence limits for the difference betweenmeans

2008-10-23 Thread Kingsford Jones
Sender: [EMAIL PROTECTED]
On-Behalf-Of: [EMAIL PROTECTED]
Subject: Re: [R] Calculating confidence limits for the difference between   
means
Message-Id: [EMAIL PROTECTED]
Recipient: [EMAIL PROTECTED]




This information is being sent at the recipient's request or with their 
specific understanding. The recipient acknowledges that by sending this 
information via electronic means, there is no absolute assurance that the 
information will be free from third party access, use, or further 
dissemination. This e-mail contains information that is privileged and/or 
confidential and may be subject to legal restrictions and penalties regarding 
its unauthorized disclosure or other use. You are prohibited from copying, 
distributing or otherwise using this information if you are not the intended 
recipient. Past performance is not necessarily indicative of future results. 
This is not an offer of or the solicitation for any security which will be made 
only by private placement memorandum that may be obtained from the applicable 
hedge fund. If you have received this e-mail in error, please notify us 
immediately by return e-mail and delete this e-mail and all attachments from 
your system. Thank You.
---BeginMessage---
On Wed, Oct 22, 2008 at 11:09 AM, Dennis Fisher [EMAIL PROTECTED] wrote:
 Colleagues,

 I am working on a problem at the edge of my knowledge of statistics.

 Basically, I have values for two groups - I am trying to calculate the
 90% confidence limits for the difference between means.  I can
 implement this without difficulty based on standard equations that are
 available in stat textbooks: (difference between means) / (pooled
 standard error)

 My problem arises when I try to correct for the value of a covariate.
 I can do the ANOVA with the covariate as a factor.  But, I don't know
 how to use the output to obtain the confidence limits of the difference.


Generally 'factor' refers to qualitative variables and 'covariate' to
quantitative, but if you are looking to estimate the difference in two
means using an ANCOVA perhaps something like...

fit - lm(y ~ trt + covar) # trt is a 2-level factor and covar is quantitative
confint(fit, level = .90)

hth,

Kingsford Jones



 I suspect that several participants in this board have implemented
 code to so this.  I hope that someone is willing to share the code.

 Thanks in advance.

 Dennis


 Dennis Fisher MD
 P  (The P Less Than Company)
 Phone: 1-866-PLessThan (1-866-753-7784)
 Fax: 1-415-564-2220
 www.PLessThan.com


[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
---End Message---
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Ecological Niche Modelling on R

2008-10-08 Thread Kingsford Jones
On Wed, Oct 8, 2008 at 6:17 AM, milton ruser [EMAIL PROTECTED] wrote:
[snip]

 Finally, in fact GRASP do what I am looking for, and I am starting to
 compare the results of this packages with other very wel- know softwares
 (DescktopGarp, Maxent, OpenModeller). If someone of you have suggestions of
 other R-solutions for Ecological Niche Models, please, let-me know.


If you're interested in prediction methods based on presence-only
data, you might find Gillian Ward's dissertation interesting (
http://www-stat.stanford.edu/~hastie/students.htm ).  It includes
reference to an accompanying R package (although it does not appear to
be on CRAN).

hth,

Kingsford Jones




 Regards a lot.
 Savava.


 Miltinho Astronauta
 Brazil


 On Wed, Oct 8, 2008 at 6:02 AM, Clément Calenge 
 [EMAIL PROTECTED] wrote:


   It's very kind of Stephen to plug my book, but it's notwhat you're
 looking for.  You need to read more about this general topic, and aboutthe
 particular packages: try

 http://www.unine.ch/CSCF/grasp/grasp-r/index.htmlhttp://www.unine.ch/CSCF/grasp/
  Based on downloading grasp , it doesn't look as thoughit will handle
 presence-only data, though -- you may needto look further.
  It doesn't look like adehabitat is what you want.From Calenge, Clement.
 2006. The package adehabitat for the R software: A toolfor the analysis of
 space and habitat use by animals. Ecological Modelling 197,no. 3-4 (August
 25): 516-519. doi:10.1016/j.ecolmodel.2006.03.017.
 ' ... the adehabitat package for the R software, which offers basic
 GIS(Geographic Information System) functions, methods to analyze
 radio-trackingdata and habitat selection by wildlife, and interfaces with
 other R packages.'
  General advice about I want to do X in R -- (expandingon Stephen's
 advice above):
 1. read about X in general (perhaps you have already done this);2. search
 for R packages and functions that do what you want  (you've already done
 this, although you misidentified adehabitat3. install those packages and
 see what they do.  Look at thedocumentation included with the packages,
 including any citationsreferenced.  Try the examples.4. If you don't know
 enough R to understand the examples or howto get your data into R, back up
 and read the introductory Rdocumentation.



 Actually, the confusion could be explained by the fact that many analyses
 methods (and especially factor analyses) originally developed in community
 ecology and biogeography to study the niche are also used in habitat
 selection studies (e.g., OMI analysis, ENFA, etc.). As the statistical
 issues (predict the species/animal presence on an area, given the value of
 environmental variables) and type of data (presence-only data to be compared
 with a sample/census of available units, etc.) involved in studies of the
 niche and habitat selection are often similar, the methods used are often
 similar too... However, most of the functions in adehabitat implement
 /exploratory/ methods of the ecological niche, and methods suitable for
 prediction are rare in the package (except one or two functions which have
 already been used for that, such as mahasuhab or domain, but they are
 probably not the best choice given your aim)... The package grasp may indeed
 be a better choice if your aim is prediction...

 But I concur with Ben and Stephen on the fact that you should first read
 the (large) literature on niche modelling before choosing the method that
 seems appropriate to your data/issue, and then search R archives/package for
 a solution. a good start:

 @ARTICLE{Elith2006,
  author = {Elith, J. and Graham, C.H. and Anderson, R.P. and Dudik, M. and
 Ferrier,
   S. and Guisan, A. and Hijmans, R.J. and Huettmann, F. and Leathwick,
   J.R. and Lehmann, A. and Li, J. and Lohmann, L.G. and Loiselle, B.A.
   and Manion, G. and Moritz, C. and Nakamura, M. and Nakazawa, Y. and
   McC. Overton, J. and Peterson, A.T. and Phillips, S.J. and Richardson,
   K. and Scachetti-Pereira, R. and Schapire, R.E. and Soberon, J. and
   Williams, S. and Wisz, M.S. and Zimmermann, N.E.},
  title = {Novel methods improve prediction of species distributions from
 occurrence data},
  journal = {Ecography},
  year = {2006},
  volume = {29},
  pages = {129-151}
 }

 and references therein.
 Cheers,



 Clément Calenge.

 --
 Clément CALENGE
 Office national de la chasse et de la faune sauvage
 Saint Benoist - 78610 Auffargis
 tel. (33) 01.30.46.54.14



[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented

Re: [R] lme and lmer df's and F-statistics again

2008-10-07 Thread Kingsford Jones
You may find this site useful:

http://lme4.r-forge.r-project.org/bib/lme4bib.html


On Tue, Oct 7, 2008 at 9:08 AM, Dieter Menne
[EMAIL PROTECTED] wrote:
 Julia S. julia.schroeder at gmail.com writes:

 Now, I did that in my article and I got a response from a reviewer that I
 additionally should give the degrees of freedom, and the F-statistics. From
 what I read here, that would be incorrect to do, and I sort of intuitively
 also understand why (at least I think I do).
 ...
 Well, writing on my rebuttal, I find myself being unable to explain in a
 few, easy to understand (and, at the same time, correct) sentences stating
 that it is not a good idea to report (most likely wrong) dfs and F
 statistics. Can somebody here help me out with a correct explanation for a
 laymen?

 Feeling with you, and hoping some day this will be resolved. I am sure you
 have read Douglas Bates'

 http://finzi.psych.upenn.edu/R/Rhelp02a/archive/76742.html

 but I thought this was temporary. The only workaround I have is not to use
 lmer for gaussian models.

 Dieter

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ANOVA between within variance

2008-09-27 Thread Kingsford Jones
So, rather than the estimated between and within group variances from
a standard fixed-effects ANOVA (i.e. the Mean Sqs in the anova table),
you are looking for the estimated variance components from a model
with crossed random effects?  If so, you may find the lmer function
found in package lme4 to be useful (formulas for crossed random
effects are much more straightforward than in nlme).  If not, this
page provides some helpful tips on eliciting useful responses from
this list: http://www.r-project.org/posting-guide.html

HTH,

Kingsford Jones

On Sat, Sep 27, 2008 at 2:21 AM, Gregor Rolshausen
[EMAIL PROTECTED] wrote:
 dear Dr. Kubovy,

 I am sorry. but the Variance table is not exactly what I want. I want
 the partitioned VARIANCE for between and within the groups. the anova
 ()-table just gives me the SumSq and the mean Sq... I know how to run
 t.test and ANOVA!
 in the nlme-package there is the VarCorr function, which extracts the
 between and within variances, but only for nested ANOVAs. so my
 question was, if there is a function like that for not-nested ANOVAS ?

 sorry. maybe I should reformulate the question.

 cheers ,
 gregor






 Am Sep 27, 2008 um 7:19 AM schrieb Michael Kubovy:

 Than all you need is to run a t-test, no?  More generally (from ?lm):

 ctl - c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
 trt - c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
 group - gl(2,10,20, labels=c(Ctl,Trt))
 weight - c(ctl, trt)
 anova(lm.D9 - lm(weight ~ group))
 This gives you what you need:
 Analysis of Variance Table

 Response: weight
   Df Sum Sq Mean Sq F value Pr(F)
 group  1   0.690.691.42   0.25
 Residuals 18   8.730.48
 I am concerned that you have not spent enough time either studying
 stats or reading up on R. There are many good introductions to
 stats using R.
 _
 Professor Michael Kubovy
 University of Virginia
 Department of Psychology
 USPS: P.O.Box 400400Charlottesville, VA 22904-4400
 Parcels:Room 102Gilmer Hall
 McCormick RoadCharlottesville, VA 22903
 Office:B011+1-434-982-4729
 Lab:B019+1-434-982-4751
 Fax:+1-434-982-4766
 WWW:http://www.people.virginia.edu/~mk9y/


 Gregor Rolshausen
 PhD Student
 Department of Evolutionary Ecology
 University of Freiburg im Breisgau; Hauptstrasse 1, 79108 Freiburg
 phone - +49.761.2559
 email - [EMAIL PROTECTED]




[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bug in is ?

2008-09-25 Thread Kingsford Jones
On Thu, Sep 25, 2008 at 3:23 PM, Wacek Kusnierczyk
[EMAIL PROTECTED] wrote:
 Rolf Turner wrote:

[snip]

 Now what on earth does ``integer type'' mean?  The concept ``type'' is
 not defined
 anywhere, and there is no help on ``type''.  There is no type()
 function.  One
 has to intuit, from the discussion of integer vectors existing so that
 they
 can be properly passed to .C() or .Fortran(), that type has something
 to do
 with storage mode.

 indeed.  one more example that R man pages are often rather
 uninformative, despite verbosity

Try

?type

which correctly guesses the user is looking for the 'typeof' page.

Or even

example(type)

Also, after a brief introduction, the R Language Definition document
begins with a discussion of types.


Kingsford Jones





 It would have been better to have called the function now known as
 ``is.integer()''
 something like ``is.storedAsInteger()'' and have a function
 is.integer() which
 does what people expect.  E.g.

 is.integer(c(5.1, 5.4, 4.8, 5.0))

 would return

 [1] FALSE FALSE FALSE TRUE


 Despite what fortune(85) says, it is not unreasonable to expect
 functions to
 do what one would intuitively think that they should do.  E.g. sin(x)
 should not return
 1/(1+x^2) even if the help page for sin() says clearly and explicitly
 that this
 is what it does.  (Aside:  help pages rarely if ever say *anything*
 clearly and
 explicitly, at least from the point of view of the person who does not
 already
 understand everything about the concepts being explained.)

 indeed.  one more opinion that R man pages are often rather
 uninformative, despite verbosity.



 Be that as it may, this all academic maundering.  The is.integer()
 function
 does what it does and THAT IS NOT GOING TO CHANGE.  We'll just have to
 deal
 with it.  Once one is *aware* that the results of is.integer are
 counter-intuitive,
 one can adjust one's expectations, and it's no big deal.

 I do think, however, that there ought to a WARNING section in the help on
 is.integer() saying something like:

 NOTE: is.integer() DOES NOT DO what you expect it to do.

 hehe.  this should be printed on the first page in every R tutorial:
 NOTE: R DOES NOT DO what you expect it to do (seems i'm in a bad mood,
 sorry, R is just fine).


 In large friendly letters.

 cheers,

 Rolf Turner

 P. S.  Those who want a function that does what one would naturally
 expect
 is.integer() to do could define, e.g.

 is.whole.number - function(x) {
 abs(x-round(x))  sqrt(.Machine$double.eps)
 }

 Then

 is.whole.number(c(5.1, 5.4, 4.8, 5.0))

 returns

 [1] FALSE FALSE FALSE TRUE

 just as one would want, hope, and expect.

 if *this* is what one would want, hope, and expect from is.integer, why
 can't we want, hope, and expect that it eventually happens?

 vQ

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to do Clustered Standard Errors for Regression in R?

2008-09-17 Thread Kingsford Jones
Bo,

Try using RSiteSearch with the strings 'huber-white', 'sandwich' or
even 'clustered standard errors'.

You may also want to consider a mixed models approach -- see:

http://www.stat.columbia.edu/~cook/movabletype/archives/2007/11/clustered_stand.html

HTH,

Kingsford Jones



On Tue, Sep 16, 2008 at 12:01 PM, Bo Cowgill [EMAIL PROTECTED] wrote:
 I can't seem to find the right set of commands to enable me to do perform a
 regression with cluster-adjusted standard-errors. There seems to be nothing
 in the archives about this -- so this thread could help generate some useful
 content.

 I've searched everywhere. Can anyone point me to the right set of commands?
 An example would be most helpful as well.

 Bo

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] creating rainbow gradients

2008-09-17 Thread Kingsford Jones
On Wed, Sep 17, 2008 at 3:11 PM, Gillian Silver [EMAIL PROTECTED] wrote:
 What would I do if I have something like:

 x - rnorm(1:1000)
 y - rnorm(1:1000)
 z - x + y

 and I want the rainbow to increase with z? (i.e., red for lowest z...all the
 way up to the last color in the rainbow for the highest z)

Do you mean something like the following?

z - rnorm(1000, sd = sqrt(2))
plot(z, col = rainbow(length(z), end = 5/6)[rank(z)], pch = 19)




 On Wed, Sep 17, 2008 at 2:05 PM, stephen sefick [EMAIL PROTECTED] wrote:

 plot(1:20, col=rainbow(20))

 On Wed, Sep 17, 2008 at 4:58 PM, Gillian Silver [EMAIL PROTECTED]
 wrote:
  Hi, how can I create a rainbow gradient in R? For example, let's say I
 have
  a plot of y = x...and I want the plot to go from red - orange - yellow
 -
  green - blue - etc.
  Right now, I know how to do something like go from red to blue, using the
  plotrix library:
 
  library(plotrix)
  redToBlue -
 
 color.scale(x,redrange=c(0,1),greenrange=c(0,1),bluerange=c(0,1),extremes=c(red,blue))
  plot(x, y, col=redToBlue)
 
  But I can't figure out how to make the colors a rainbow. (I don't
 understand
  how the redrange, greenrange, and bluerange parameters in color.scale
 work.)
 
  Could someone please help?
 
  Thanks!
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Stephen Sefick
 Research Scientist
 Southeastern Natural Sciences Academy

 Let's not spend our time and resources thinking about things that are
 so little or so large that all they really do for us is puff us up and
 make us feel like gods. We are mammals, and have not exhausted the
 annoying little problems of being mammals.

-K. Mullis


[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] creating rainbow gradients

2008-09-17 Thread Kingsford Jones
On second thought, this is more likely to be what you're looking for...

library(rgl)
x - rnorm(1000)
y - rnorm(1000)
z - x + y
plot3d(x, y, z, col = rainbow(1000, end = 5/6)[rank(z)], size = 3)


On Wed, Sep 17, 2008 at 4:06 PM, Kingsford Jones
[EMAIL PROTECTED] wrote:
 On Wed, Sep 17, 2008 at 3:11 PM, Gillian Silver [EMAIL PROTECTED] wrote:
 What would I do if I have something like:

 x - rnorm(1:1000)
 y - rnorm(1:1000)
 z - x + y

 and I want the rainbow to increase with z? (i.e., red for lowest z...all the
 way up to the last color in the rainbow for the highest z)

 Do you mean something like the following?

 z - rnorm(1000, sd = sqrt(2))
 plot(z, col = rainbow(length(z), end = 5/6)[rank(z)], pch = 19)




 On Wed, Sep 17, 2008 at 2:05 PM, stephen sefick [EMAIL PROTECTED] wrote:

 plot(1:20, col=rainbow(20))

 On Wed, Sep 17, 2008 at 4:58 PM, Gillian Silver [EMAIL PROTECTED]
 wrote:
  Hi, how can I create a rainbow gradient in R? For example, let's say I
 have
  a plot of y = x...and I want the plot to go from red - orange - yellow
 -
  green - blue - etc.
  Right now, I know how to do something like go from red to blue, using the
  plotrix library:
 
  library(plotrix)
  redToBlue -
 
 color.scale(x,redrange=c(0,1),greenrange=c(0,1),bluerange=c(0,1),extremes=c(red,blue))
  plot(x, y, col=redToBlue)
 
  But I can't figure out how to make the colors a rainbow. (I don't
 understand
  how the redrange, greenrange, and bluerange parameters in color.scale
 work.)
 
  Could someone please help?
 
  Thanks!
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Stephen Sefick
 Research Scientist
 Southeastern Natural Sciences Academy

 Let's not spend our time and resources thinking about things that are
 so little or so large that all they really do for us is puff us up and
 make us feel like gods. We are mammals, and have not exhausted the
 annoying little problems of being mammals.

-K. Mullis


[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] autocorrelation in gams

2008-08-15 Thread Kingsford Jones
Keeping Gavin's advice in mind, you may also want to look at ?acf (and
see section 14.1 of MASS) and help(ACF, package=nlme) (see section 5.3
of MEMSS).  These are useful functions for exploring the 1d empirical
autocorrelation structure of model residuals.

hth,
Kingsford Jones

On Fri, Aug 15, 2008 at 1:18 AM, Gavin Simpson [EMAIL PROTECTED] wrote:
 On Thu, 2008-08-14 at 16:12 +0100, Abigail McQuatters-Gollop wrote:
 Hi,

 I am looking at the effects of two explanatory variables on chlorophyll.
 The data are an annual time-series (so are autocorrelated) and the
 relationships are non-linear. I want to account for autocorrelation in
 my model.



 The model I am trying to use is this:



 Library(mgcv)



 gam1 -gam(Chl~s(wintersecchi)+s(SST),family=gaussian,
 na.action=na.omit, correlation=corAR1(form =~ Year))


 There is no correlation argument in mgcv::gam you need gamm(). gam() has
 a ... argument which I suspect is silently mopping up your correlation
 argument so that no error/warning is raised.

 Note that gamm() uses lme from the nlme package (in the Gaussian case)
 and works that function very hard (see Wood 2006 GAM book). In my
 experience with this function separating trend from the correlation is
 quite difficult when also estimating the degree of smoothness and one
 has to work hard with the options.

 As an alternative you might also take a look at the paper by Ferguson et
 al (2007):

 http://www3.interscience.wiley.com/journal/119392062/abstract?CRETRY=1SRETRY=0

 Which has R code using the sm package to do a very similar thing.

 HTH

 G


 the result I get is this:



 Family: gaussian

 Link function: identity



 Formula:

 CPRChl ~ s(wintersecchi) + s(SST)



 Parametric coefficients:

 Estimate Std. Error t value Pr(|t|)

 (Intercept)  3.570000.05061   70.54   2e-16 ***

 ---

 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1



 Approximate significance of smooth terms:

   edf Est.rank F p-value

 s(wintersecchi) 2.4455 4.672 0.00887 **

 s(SST)  2.4085 4.301 0.01237 *

 ---

 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1



 R-sq.(adj) =  0.676   Deviance explained = 75.4%

 GCV score = 0.074563   Scale est. = 0.053781  n = 21



 The result look good - significant, with a lot of deviance explained,
 but I am not convinced the model is actually accounting for the
 autocorrelation (based on the formula in the results). How can I tell?



 Many thanks,







 Dr Abigail McQuatters-Gollop

 Sir Alister Hardy Foundation for Ocean Science (SAHFOS)

 The Laboratory

 Citadel Hill

 Plymouth UK PL1 2PB

 tel: +44 1752 633233




   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inconsistent linear model calculations

2008-05-15 Thread Kingsford Jones
Also, it's worth pointing out the reason for the numerical instability
of the parameter estimates: the predictors are nearly collinear.

 (dubious - read.table('clipboard'))
  V1  V2V3V4V5   V6 V7
1  1 300 39.87 39.85 39.90 39.87333  9
2  2 400 45.16 45.23 45.17 45.18667 16
3  3 500 50.72 51.03 50.90 50.88333 25
4  4 600 56.85 56.80 57.02 56.89000 36
5  5 700 63.01 63.09 63.14 63.08000 49
6  6 800 69.52 59.68 69.63 66.27667 64
 round(cor(dubious), 3)
  V1V2V3V4V5V6V7
V1 1.000 1.000 0.999 0.951 0.999 0.997 0.991
V2 1.000 1.000 0.999 0.951 0.999 0.997 0.991
V3 0.999 0.999 1.000 0.942 1.000 0.995 0.996
V4 0.951 0.951 0.942 1.000 0.943 0.970 0.912
V5 0.999 0.999 1.000 0.943 1.000 0.995 0.995
V6 0.997 0.997 0.995 0.970 0.995 1.000 0.983
V7 0.991 0.991 0.996 0.912 0.995 0.983 1.000


Note that the correlation for V2 and V7 is about .991


Kingsford Jones


On Thu, May 15, 2008 at 3:01 PM, e-letter [EMAIL PROTECTED] wrote:
 Thank you to all you eagle eyes; amendment made accordingly and solved. Not
 sure how the difference occurred...

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [R-sig-ME] lme nesting/interaction advice

2008-05-13 Thread Kingsford Jones
Hi Rolf,

On Tue, May 13, 2008 at 1:59 PM, Rolf Turner [EMAIL PROTECTED] wrote:

 in response to Doug Bates' useful tutorial...

  Thanks very much for your long, detailed, patient, and lucid
  response to my cri de coeur.  That helps a *great* deal.


Hear Hear!

snip
  One point that I'd like to spell out very explicitly, to make sure
  that I'm starting from the right square:

  The model that you start with in the Machines/Workers example is



 
   fm1 - lmer(score ~ Machine + (1|Worker) + (1|Machine:Worker), Machines)
  
 


  My understanding is that this is the ``only'' model that could be fitted
  by the Package-Which-Must-Not-Be-Named.  I.e. this package *could not* fit
  the second, more complex model:



 
   fm2 - lmer(score ~ Machine + (Machine|Worker), Machines)
  
 

  (At least not directly.)  Can you (or someone) confirm that I've got that
  right?

Compare:

## R
 m4
Linear mixed model fit by REML
Formula: score ~ Machine + (0 + Machine | Worker)
   Data: Machines
   AIC   BIC logLik deviance REMLdev
 228.3 248.2 -104.2216.6   208.3
Random effects:
 Groups   Name Variance Std.Dev. Corr
 Worker   MachineA 16.64098 4.07934
  MachineB 74.39558 8.62529  0.803
  MachineC 19.26646 4.38936  0.623 0.771
 Residual   0.92463 0.96158
Number of obs: 54, groups: Worker, 6

## The-Package

proc mixed data = machines;
class worker machine;
model score = machine  / solution;
random machine / subject = worker type = un;
run;

 Covariance Parameter Estimates

Cov Parm SubjectEstimate

UN(1,1)  Worker  16.6405
UN(2,1)  Worker  28.2447
UN(2,2)  Worker  74.3956
UN(3,1)  Worker  11.1465
UN(3,2)  Worker  29.1841
UN(3,3)  Worker  19.2675
Residual  0.9246


The two outputs report essentially the same thing.
Note that e.g, UN(2,1) = 28.2477 approx= .803*4.07934*8.62529
(And, as usual, the fixed effects estimates match too once the
contrasts and 'types' of SS for an ANOVA table are set up)

UN is short for 'unstructured' - a term Doug has pointed out is not
particularly fitting because the covariance matrix is symmetric
positive definite.


  It seems to me to be the key to why I've had such trouble in the past
  in grappling with mixed models in R.  I.e. I've been thinking like
  the Package-Which-Must-Not-Be-Named --- that the simple,
 everything-independent
  model was the only possible model.


Although this may well not apply to you, another area of confusion
arises not so much from differences between stats packages but by
differences between methods. I'm not an expert in the estimation
methods but, as I understand it, classic texts describe fitting mixed
models in terms of ANOVA in the OLS framework, calculating method of
moments estimators for the variances of the random effects by equating
observed and expected mean squares (I believe using aov and lm with an
'Error' term would fall into this category, and proc anova and proc
glm would also).  Starting in the 90's these methods started falling
out of fashion in favor of ML/REML/GLS methods (likelihood based),
which offer more flexibility in structuring both the error and random
effects covariance matrices, will not produce negative variance
estimates, and have other nice properties that someone more 'mathy'
than me could explain.  Tools like lme, lmer, proc mixed and proc
glimmix fall into this category.

hoping this helps,

Kingsford Jones







  Thanks again.

 cheers,

 Rolf



  ##
  Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lme nesting/interaction advice

2008-05-10 Thread Kingsford Jones
On Fri, May 9, 2008 at 4:04 AM, Federico Calboli
[EMAIL PROTECTED] wrote:

 Note that random can be a list:

 a one-sided formula of the form ~x1+...+xn, or a pdMat object with a
 formula
 (i.e. a non-NULL value for formula(object)), or a list of such formulas or
 pdMat
 objects. 

 If you can translate that into *informative* English I'd be grateful. I have
 the Pinheiro and Bates book under my nose now, and I suspect it's pretty
 more extensive that the helpfile, but it's still nowhere close to providing
 a straigtforward answer to my question.


Federico,

I think you'll be more likely to receive the type of response you're
looking for if you formulate your question more clearly.  The
inclusion of commented, minimal, self-contained, reproducible code
(as is requested at the bottom of every email sent by r-help) is an
effective way to clarify the issues.  Also, when asking a question
about fitting a model it's helpful to describe the specific research
questions you want the model to answer.

I'll offer my interpretation of your study design so you can see where
questions might arise.
It sounds like you have protein measures (on how many units?) at
various levels of 'male' (which at first you described as
presence/absence, but later as continuous - also you descibed 'male'
as fixed and continuous but then entered it in the formula as though
it were a random grouping factor), within the second level of
'selection' (e.g. large1), within the first level of selection (e.g.
large), within a random block (what are the blocks?)  within a random
month.  Is this right -- multiple observations within 4 levels of
nesting - some of which are random and some fixed?

Finally, I'll point out that there's an R list dedicated to mixed
models, with a particular focus on the lmer function, which might be
the right tool for your analyses (
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models ).


Kingsford Jones



 Cheers,

 Federico


 --
 Federico C. F. Calboli
 Department of Epidemiology and Public Health
 Imperial College, St Mary's Campus
 Norfolk Place, London W2 1PG

 Tel  +44 (0)20 7594 1602 Fax (+44) 020 7594 3193

 f.calboli [.a.t] imperial.ac.uk
 f.calboli [.a.t] gmail.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Which gls models to use?

2008-05-09 Thread Kingsford Jones
Tom,

I've never used lm.gls, but a quick look at the help page suggests
that, unlike gls, it doesn't have built-in functionality to estimate
parameters to structure the response/error covariance matrix.  I
believe it assumes the covariance matrix is known (hopefully someone
will correct me if I'm wrong about that).

So, building off the answer I sent to you on April 29 (pasted below
rather than hyper-linked because parts of the conversation appear to
be missing from the archive), if y ~ N(XB, Sigma) where Sigma =
sigma^2(Lambda) and Lambda is decomposed into VCV, where V is diagonal
and describes a variance structure and C has 1's on the diagonal and
the off-diagonals describe the correlation of the errors, then the
'weights' argument to gls will allow you to estimate a variance
structure for V (see ?varClasses) and the 'correlation' argument
allows you to estimate the correlation structure  for C (see
?corClasses).  You'll notice the first corClass listed on the help
page is corAR1 (there's also corCAR1 if you don't have discrete lags).
 See example(gls) for an example of how it's used, and Pinheiro and
Bates (2000) for detailed examples.

Frank Harrell has commented on this list that gls is underused (
http://tolstoy.newcastle.edu.au/R/help/06/08/32487.html ).  Given the
fact that it can drastically reduce the constraints of the constant
variance and independence assumptions of the ubiquitous linear model,
I agree.

good luck,

Kingsford Jones



Discussion on weights in gls from 2008-Apr-29

Thanks Kingsford! I will cc r-help.

varPower() works for me. I want to use lm because predict.gls does not
give lwr and upr values, and also the bptest and ncv.test only work
with lm models...


- Hide quoted text -
On 4/29/08, Kingsford Jones [EMAIL PROTECTED] wrote:

hi Tom,

Basically, a simple way to model non-constant variance when the
variance increases (the common case) or decreases with the conditional
mean of your response is to use weights = varPower(), and gls will
estimate the parameter \alpha that defines the power relationship
between the regression line and the error variances.  If for some
reason you wanted to use lm rather then gls for your final model, then
after estimating the gls model you could supply a weights argument to
lm of the form:

weights = fitted(your.gls.model)^(-2*the.alpha.estimate.from.the.gls.model).

and you should get the same (or very close) estimates as from the gls
model but have a model of class 'lm'.

There are many different ways to model non-constant variance and
you'll need to choose one that is appropriate for your data.  If the
model I described isn't appropriate then you should look at Ch 5 of
PB to learn about the other varFunc classes.

good luck,

Kingsford

ps - would you mind forwarding to r-help in case this others have the
same question.


On Tue, Apr 29, 2008 at 3:24 PM, tom soyer [EMAIL PROTECTED] wrote:
 Thanks Kingsford! What are the varClasses? I don't know how to
use them
 If I choose varPower(), then does it mean the function would generate the
 weights, w, so that w gives the most likely explanation of the
relationship
 between the variance and the independent variable?




 On 4/29/08, Kingsford Jones [EMAIL PROTECTED] wrote:
  On Tue, Apr 29, 2008 at 6:20 AM, tom soyer [EMAIL PROTECTED] wrote:
   Hi,
  
I would like to use a weighted lm model to reduce
heteroscendasticity.
 I am
wondering if the only way to generate the weights in R is through the
laborious process of trial and error by hand. Does anyone
know if R has
 a
function that would automatically generate the weights need for lm?
 
  Hi Tom,
 
  The 'weights' argument to the 'gls' function in the nlme package
  provides a great deal of flexibility in estimate weighting parameters
  and model coefficients.  For example, if you want to model monotonic
  heteroscedasticity by estimating the weights $E(Y)^{-2\alpha}$,
  you can use the varPower variance function class.  E.g., something like
 
  f1 - gls(y ~ x1 + x2, data = your.data, weights = varPower())
 
  will estimate the regression coefficients and alpha parameter together
  via maximum likelihood.  (note that the usual specification for varPower
 is
  varPower(form = ~ your.formula), but by default the mean is used.  See
  Ch 5 of the Pinheiro and Bates Mixed-effects Models book for details)
 
  Kingsford Jones
 



 --
 Tom




On Fri, May 9, 2008 at 8:05 AM, tom soyer [EMAIL PROTECTED] wrote:
 Hi,

 I need to correct for ar(1) behavior of my residuals of my model. I noticed
 that there are multiple gls models in R. I am wondering if anyone
 has experience in choosing between gls models. For example, how
 should one decide whether to use lm.gls in MASS, or gls in nlme

Re: [R] predict lmer

2008-05-07 Thread Kingsford Jones
One question that arises is: at what level is the prediction desired?
Within a given ID:TRKPT2 level?  Within a given ID level?  At the
marginal level (which Bert's code appears to produce).  Also, there is
the question:  how confident can you be in your predictions.  This
thread discusses possible ways to get prediction intervals:

https://stat.ethz.ch/pipermail/r-sig-mixed-models/2008q2/thread.html#841

Finally, why assume a Poisson error distribution for a binary response?

Kingsford Jones

On Wed, May 7, 2008 at 10:13 AM, Bert Gunter [EMAIL PROTECTED] wrote:
 Sorry, my reply below may be too terse. You'll need to also construct the
  appropriate design matrix to which to apply the fixef() results to.

  If newDat is a data.frame containing **exactly the same named regressor and
  response columns** as your original vdata dataframe, and if me.fit.of is
  your fitted lmer object as you have defined it below, then

   model.matrix(terms(me.fit.of),newDat) %*% fixef(me.fit.of)

  gives your predictions. Note that while the response column in newDat is
  obviously unnecessary for prediction (you can fill it with 0's,say), it is
  nevertheless needed for model.matrix to work. This seems clumsy to me, so
  there may well be better ways to do this, and **I would appreciate
  suggestions for improvement.***


  Cheers,
  Bert




  -Original Message-
  From: bgunter
  Sent: Wednesday, May 07, 2008 9:53 AM
  To: May, Roel; r-help@r-project.org
  Subject: RE: [R] predict lmer

  ?fixef

  gets you the coefficient vector, from which you can make your predictions.

  -- Bert Gunter
  Genentech

  -Original Message-
  From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
  Behalf Of May, Roel
  Sent: Wednesday, May 07, 2008 7:23 AM
  To: r-help@r-project.org
  Subject: [R] predict lmer



 Hi,

  I am using lmer to analyze habitat selection in wolverines using the
  following model:

  (me.fit.of -
  lmer(USED~1+STEP+ALT+ALT2+relM+relM:ALT+(1|ID)+(1|ID:TRKPT2),data=vdata,
  control=list(usePQL=TRUE),family=poisson,method=Laplace))

  Here, the habitat selection is calaculated using a so-called discrete
  choice model where each used location has a certain number of
  alternatives which the animal could have chosen. These sets of locations
  are captured using the TRKPT2 random grouping. However, these sets are
  also clustered over the different individuals (ID). USED is my binary
  dependent variable which is 1 for used locations and zero for unused
  locations. The other are my predictors.

  I would like to predict the model fit at different values of the
  predictors, but does anyone know whether it is possible to do this? I
  have looked around at the R-sites and in help but it seems that there
  doesn't exist a predict function for lmer???

  I hope someone can help me with this; point me to the right functions or
  tell me to just forget it

  Thanks in advance!

  Cheers Roel

  Roel May
  Norwegian Institute for Nature Research
  Tungasletta 2, NO-7089 Trondheim, Norway


 [[alternative HTML version deleted]]

  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] pwr package

2008-05-05 Thread Kingsford Jones
see ?library

You need to load the package before R can access its functions.

Kingsford Jones

On Mon, May 5, 2008 at 12:18 PM,  [EMAIL PROTECTED] wrote:
 Hello,

  I'm a new user of the R environment. I need to do some power analysis. For
 this purpose, I installed the pwr package from the R window, but
 unfortunately something went wrong. The installation of the package was
 successful (I got a message saying so in R) but when I enter a function of
 that package it says that the function does not exist!

  I appreciate your help,

  Sincerely,
  Samar Mouchawrab

  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data transformation

2008-05-02 Thread Kingsford Jones
Hi Christian,

Here's a way using the reshape package:

 dfr
   site lat lon spec1 spec2 spec3 spec4
1 site1  10  11 1 0 1 0
2 site2  20  21 1 1 1 0
3 site3  30  31 0 1 1 1
 library(reshape)
 dfr - melt(dfr[, -1], id=1:2, variable_name='species')
 dfr - dfr[dfr$value0,]
 dfr
   lat lon species value
1   10  11   spec1 1
2   20  21   spec1 1
5   20  21   spec2 1
6   30  31   spec2 1
7   10  11   spec3 1
8   20  21   spec3 1
9   30  31   spec3 1
12  30  31   spec4 1


The 'value', variable is not interesting here, but if you had counts
rather than presence/absence it could be.

best,

Kingsford Jones

On Fri, May 2, 2008 at 2:27 PM, Christian Hof [EMAIL PROTECTED] wrote:
 Dear all,
  how can I, with R, transform a presence-absence (0/1) matrix of species
 occurrences into a presence-only table (3 columns) with the names of the
 species (1st column), the lat information of the sites (2nd column) and the
 lon information of the sites (3rd column), as given in the below example?
  Thanks a lot for your help!
  Christian


  my dataframe:

  sitelat lon spec1   spec2   spec3   spec4
  site1   10  11  1   0   1   0
  site2   20  21  1   1   1   0
  site3   30  31  0   1   1   1


  my desired new dataframe:

  species lat lon
  spec1   10  11
  spec1   20  21
  spec2   20  21
  spec2   30  31
  spec3   10  11
  spec3   20  21
  spec3   30  31
  spec4   30  31



  --
  Christian Hof, PhD student

  Center for Macroecology  Evolution
  University of Copenhagen
  www.macroecology.ku.dk
  
  Biodiversity  Global Change Lab
  Museo Nacional de Ciencias Naturales, Madrid
  www.biochange-lab.eu

  mobile ES .. +34 697 508 519
  mobile DE .. +49 176 205 189 27
  mail .. [EMAIL PROTECTED]
 mail2 .. [EMAIL PROTECTED]
  blog .. www.vogelwart.de

  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Making a map in R?

2008-05-01 Thread Kingsford Jones
There are many ways to do this, depending on what type of spatial data
you are working with and what you want your maps to look like.  The sp
package provides S4 classes and methods for working with GIS data. Two
likely candidates for importing data are the rgdal and maptools
packages.  See the Spatial Task View:
http://cran.r-project.org/web/packages/rgdal/index.html

Also, there is a r-sig for spatial data:
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Kingsford Jones

On Thu, May 1, 2008 at 6:56 AM, stephen sefick [EMAIL PROTECTED] wrote:
 Does anyone know of a package to make a map from GIS data, and/or would it
  be easier in one of the free GIS programs.  I would like to make a map of
  the savannah river area with our sampling locations.
  thanks

  stephen

  --
  Let's not spend our time and resources thinking about things that are so
  little or so large that all they really do for us is puff us up and make us
  feel like gods. We are mammals, and have not exhausted the annoying little
  problems of being mammals.

  -K. Mullis

 [[alternative HTML version deleted]]

  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Making a map in R?

2008-05-01 Thread Kingsford Jones
I sent the wrong link for the spatial Task View.  Here it is

http://cran.r-project.org/web/views/Spatial.html

On Thu, May 1, 2008 at 8:21 AM, Kingsford Jones
[EMAIL PROTECTED] wrote:
 There are many ways to do this, depending on what type of spatial data
  you are working with and what you want your maps to look like.  The sp
  package provides S4 classes and methods for working with GIS data. Two
  likely candidates for importing data are the rgdal and maptools
  packages.  See the Spatial Task View:
  http://cran.r-project.org/web/packages/rgdal/index.html

  Also, there is a r-sig for spatial data:
  https://stat.ethz.ch/mailman/listinfo/r-sig-geo

  Kingsford Jones



  On Thu, May 1, 2008 at 6:56 AM, stephen sefick [EMAIL PROTECTED] wrote:
   Does anyone know of a package to make a map from GIS data, and/or would it
be easier in one of the free GIS programs.  I would like to make a map of
the savannah river area with our sampling locations.
thanks
  
stephen
  
--
Let's not spend our time and resources thinking about things that are so
little or so large that all they really do for us is puff us up and make 
 us
feel like gods. We are mammals, and have not exhausted the annoying little
problems of being mammals.
  
-K. Mullis
  
   [[alternative HTML version deleted]]
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
  


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] efficiency profiling?

2008-04-30 Thread Kingsford Jones
On Wed, Apr 30, 2008 at 4:33 PM, Duncan Murdoch [EMAIL PROTECTED] wrote:
 On 30/04/2008 6:59 PM, esmail bonakdarian wrote:
snip
  Is there a good collection of hints/suggestions for R language idoms in
 terms
  of efficiency?
snip

  See ?Rprof for the tool.  For the tips, I think you just need to hang
 around here a while.  I don't know of a nice collection (but I'm sure there
 are several.)

  Duncan Murdoch


;-) here's one:  https://stat.ethz.ch/pipermail/r-help/2005-October/080991.html


Kingsford Jones

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function to generate weights for lm?

2008-04-29 Thread Kingsford Jones
On Tue, Apr 29, 2008 at 6:20 AM, tom soyer [EMAIL PROTECTED] wrote:
 Hi,

  I would like to use a weighted lm model to reduce heteroscendasticity. I am
  wondering if the only way to generate the weights in R is through the
  laborious process of trial and error by hand. Does anyone know if R has a
  function that would automatically generate the weights need for lm?

Hi Tom,

The 'weights' argument to the 'gls' function in the nlme package
provides a great deal of flexibility in estimate weighting parameters
and model coefficients.  For example, if you want to model monotonic
heteroscedasticity by estimating the weights $E(Y)^{-2\alpha}$,
 you can use the varPower variance function class.  E.g., something like

f1 - gls(y ~ x1 + x2, data = your.data, weights = varPower())

will estimate the regression coefficients and alpha parameter together
via maximum likelihood.  (note that the usual specification for varPower is
varPower(form = ~ your.formula), but by default the mean is used.  See
Ch 5 of the Pinheiro and Bates Mixed-effects Models book for details)

Kingsford Jones

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nested anova and multiple comparisons

2008-04-26 Thread Kingsford Jones
Hi Stephen,

On Sat, Apr 26, 2008 at 10:29 AM, Stephen Cole [EMAIL PROTECTED] wrote:
...snip...

  I have managed to accomplish my first two goals by analyzing the data
  as a 3 level nested Anova with the following code

  mod1 - aov(Count~Region/Location/Site, data=data)

  This allows me to get the MS for a Anova table.   However R does not
  compute the correct F-statistics (because it uses the residual MS for
  the denominator in all calculations which it shouldnt) so i manually
  computed the F-stats and variance components by hand.


R's just doing what it was asked it to do.  The fact that it used the
residual MS isn't a surprise given your code.

  From reading the help guide i learned about and tried using the
  Error(Region/Location/Site) command but then i can only get MS and no
  F-statistics and still hand to compute the rest by hand.

Yes, if you want to use Method-of-Moments estimation to solve for
expected mean squares for the variance components and calculate
F-stats w/ the 'correct' MS, then then specifying error strata is
reasonable for balanced data (assuming you've met model assumptions).
However, I believe most statisticians these days are more inclined to
use (Restricted) Maximum Likelihood estimation (e.g. using lme or
lmer) because of the added fliexibility it provides (also it won't
produce negative variance estimates when the mean squared error is
larger than the mean square between the groups of interest)

As for why you are only getting MS and no F-stats, it's hard to say
without a reproducible example (see the posting guide).  In my
experience this will occur when there are insufficient degrees of
freedom for calculating an F-stat at a given level.


  My problem now is that i would liek to use TukeyHSD for multiple
  comparisons.  Howeber since R is unable to compute the correct F
  statistics in this nested design i also think it is using the wrong MS
  and df in calculating tukeys q. for example when i use


Again, it's not that R is unable to, it's that you asked R not to.

  TukeyHSD(mod1, Region)  i will get values however i do not think
  they have been calculated correctly.

  Furthermore when i use the Error(Region/Location/Site) term i can then
  no longer use TukeyHSD as i get the error message that there is no
  applicable use of tukey on this object.

Looking at the methods available for TukeyHSD shows that it will work
for an object of class aov

 methods(TukeyHSD)
[1] TukeyHSD.aov

But ?aov states:

Value:

 An object of class 'c(aov, lm)' or for multiple responses of
 class 'c(maov, aov, mlm, lm)' or for multiple error strata
 of class 'aovlist'.  There are 'print' and 'summary' methods
 available for these.

So your model with error strata is of class aovlist not aov.


  i am just wondering if there is any way to use Multiple comparisons
  with R in a nested design like mine.


I'm not sure but the package 'multcomp' might be of help.

  I have thought about using lme or lmer but if i understand them right
  with a balanced design i should be able to get the same result using
  aov


Even with balanced data, you don't get the exact same thing with aov.
As mentioned above lme and lmer use different estimation methods.  One
of the advantages of using  (RE)ML is a lot more flexibility in how
you specifiy the structure of random effects covariance matrices, and
(at least with lme) you have a lot of flexibility in how you structure
the residual covariance matrix as well.  This is particularly useful
when you are not meeting assumptions of constant variance and/or
independence of observations (which seems to be the rule rather than
the exception with ecological data with a spatial component, such as
you appear to have).

The theory behind analyses you want to do are quite complex with many
potential pitfalls.  If at all possible, I highly recommend finding
the help of someone who is an expert in these types of models (known
as mixed, hierarchical, multilevel, random effects, variance
components, nested ANOVA, etc).

best,

Kingsford Jones


  Thanks

  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Documentation General Comments

2008-04-24 Thread Kingsford Jones
I just read through this thread and I didn't see the R Language
Definition mentioned.  As with An Introduction to R it can be accessed
-- at least in my Windows GUI -- via the menu bar: Help - Manuals (in
PDF).  If An Introduction to R is too basic, then the Language
Definition should be a good place to look for more details on R
objects (Ch 2).  However An Introduction to R does include
authoritative introductions to the data types mentioned by the
original poster: factors (Ch4), arrays and matrices (Ch 5), and lists
and data frames (Ch 6).

That said, I agree that learning efficiency could be improved by
augmenting the manuals with tables similar to the table 2.1 that was
referenced earlier in the thread (aside: are functions, or even lists,
really Data Objects?). Of course, as pointed out by Duncan, we are
collaborators not consumers, so if I think there should be more tables
in the documents then the onus is on me to try to get my ideas
incorporated (see
http://wiki.r-project.org/rwiki/doku.php?id=misc:rpatch ).

Kingsford Jones

On Thu, Apr 24, 2008 at 9:37 AM, Duncan Murdoch [EMAIL PROTECTED] wrote:
 On 4/24/2008 12:08 PM, Martin Maechler wrote:
   Hmm,
  
   KeBe == Beck, Kenneth (STP) [EMAIL PROTECTED]
   on Thu, 24 Apr 2008 10:12:19 -0500 writes:
  
   KeBe OK I've spent a lot of time with the core
   KeBe documentation, and I never found anything as simple as
   KeBe their table 2.1, which elucidated the difference
   KeBe between a vector, matrix and array first, then the
   KeBe higher level structures, frame and list.  Maybe I'm
   KeBe not a good searcher, but believe me for every initial
   KeBe posting I submit to this group, I have spent hours
   KeBe trying to find the answer elsewhere. And, as you
   KeBe state, maybe I am now deluded by that presentation,
   KeBe maybe it is not this simple!
  
   Well, I get the impression that you've never read the manual
 Introduction to R
 (or some good book such as Peter Dalgaard's)
   but have directly jumped into reading  help() pages  ???

  That's not correct.  Kenneth started the thread (on Monday) saying:


  The basic tutorial Introduction to R is so basic, it
  hardly helps at all, then digging through documentation is really an
  exercise in frustration.

  Duncan Murdoch



  
   Maybe a good idea would be to improve the Introduction to R
   rather than thinking of misusing the help() collection
   {which is the reference manual, not the user manual !!}
   by making it easy to understand (and consequently less precise) ??
  
   Patches (well reflected ..) to the Introduction are quite
   welcome, indeed.
   The (development) source is always available
   at https://svn.r-project.org/R/trunk/doc/manual/R-intro.texi
  
   (and yes, the source does look a bit less user-friendly,
than its PDF output, e.g.
http://cran.r-project.org/doc/manuals/R-intro.pdf
or its  daily updated  HTML output at
http://stat.ethz.ch/R-manual/R-devel/doc/manual/R-intro.html
   )
  
   Regards,
   Martin
  
   KeBe Look at the help for data.frame. VERY terse
   KeBe explanation, with not a good comparison to the other
   KeBe data types. Then, look at the titles list. Where is a
   KeBe topic for data types Every other programming
   KeBe language I have used (C++, Pascal, SAS, Java) has a
   KeBe basic chapter in the documentation that goes over data
   KeBe types, what arrays are, higher level structures, etc.
   KeBe When I typed help.search(data type) I get the
   KeBe following:
  
   KeBe Help files with alias or concept or title matching
   KeBe 'data type' using fuzzy matching:
   KeBe character-class(methods) Classes Corresponding to
   KeBe Basic Data Types sqlTypeInfo(RODBC) Request
   KeBe Information about DataTypes in an ODBC Database
  
   KeBe Looking for the term character-class(methods) yields
   KeBe nothing. I don't think that is what I want!
  
   KeBe Given all this complaining, I actually have completed
   KeBe several nice project using R, it is an impressive
   KeBe package. Somehow, though, we need to make the
   KeBe documentation better.
  
   KeBe -Original Message- From: Duncan Murdoch
   KeBe [mailto:[EMAIL PROTECTED] Sent: Thursday, April
   KeBe 24, 2008 9:51 AM To: Beck, Kenneth (STP) Cc: Bert
   KeBe Gunter; r-help@r-project.org Subject: Re: [R]
   KeBe Documentation General Comments
  
   KeBe On 4/24/2008 10:22 AM, Beck, Kenneth (STP) wrote:
Agree that terseness is good, but I also agree with other
posters that
  
better cross referencing or maybe an index of synonyms
would be good.
   
So far, the best suggestion is the pdf at this link
   
(http://www.medepi.net/epir/epir_chap02.pdf).
   
Is there a way to pop at least part of this into the
R-base help page

<    1   2