[R] USING VSS

2008-02-04 Thread TEBBI FATIMA
hi,
   
  i start working with PCA method i found VSS package it is very helpfull it 
show me residual matrix with the number of components to extract now i want to  
verify the normality of residual matrix and than to simulate can every one help 
me please  .

   
-

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with loops and how R stores data

2008-02-04 Thread Ted Harding
On 02-Feb-08 21:16:41, R-novice wrote:
 
 I am trying to make an array c(3,8) that contains the averages of what
 is in another array c(9,8).  I want to average rows 1:3, 4:6, 7:9 and
 have a loop replace the generic 1:24 values in my array with the
 average of the three rows.  
 
 The problem I am having is that R only replaces the last value from the
 loops, and I am not experienced enough with R to know how it stores
 data when looping and how to overcome this problem. By having it print
 the avg, I was expecting to see 24 numbers, but it only gives me one. 
 
 Here is my code.
 
 cts.sds-array(1:24,c(3,8))
 for(i in 3){
   for(j in 8){
 avg-sum(exprdata[3*i-2:3*i,j]/3)
 cts.sds[i,j]-avg
 print(avg)
   }
 }
 print(cts.sds)
 
 Any help with this pesky matter will be greatly appreciated.

I think you have made two types of mistake here.

1. You wrote for(i in 3) and for(i in 3). This means that
'i' will take only one value, namely 3; and 'j' will take only
one value, namely 8. If you wanted to loop over i = 1,2,3 and
j=1,2,3,4,5,6,7,8 then you should write

  for(i in (1:3)){ for(j in (1:8){ ... }}

2. This is more subtle, and you're far from the only person
to have tripped over this one.

Because of the order of precedence of the operators, in the
epression

  3*i-2:3*i

for each value of 'i' it will first evaluate 2:3, namely {2,3},
and then evaluate the remainder. So, for say i=2, you will get
two numbers 3*i-2*i = 2 and 3*i-3*i = 0, whereas what you
presumably intended was (3*i-2):(3*i) = 4:6 = {4,5,6}.

The way to avoid this trap is to use parantheses to force the
order of evaluation you want:

  avg-sum(exprdata[(3*i-2):(3*i,j)]/3)

Example:
  i-2
  3*i-2:3*i
# [1] 2 0
  (3*i-2):(3*i)
# [1] 4 5 6

The reason is that the : operator takes precedence over the
binary operators *, + and -.

Enter

  ?Syntax

to get a listing of the various operators in order of
precedence.

Hoping this helps,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 04-Feb-08   Time: 08:18:05
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] New version 2.6.1- the menu is written in german

2008-02-04 Thread Prof Brian Ripley

What OS is this?

If Windows, see rw-FAQ Q3.2

Otherwise, see 'R Installation and Administration' Chapter 7.


On Sat, 2 Feb 2008, cris wrote:


Dear all,
i have recently update R in my computer. To my surprise the menu bar
is written in German language. This is the first time this happen to
me, usually it is English the language used.
I download the last version from several mirrors from different
English and Spanish spoken countries, but the result is the same; when
opening R, the menu appears in German.
Can someone tell me how should i change to English language?
Thanks for your help,
Cristin García

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] need help

2008-02-04 Thread Prof Brian Ripley
Use a working directory where you have write permission.

On Sat, 2 Feb 2008, [EMAIL PROTECTED] wrote:

 Good afternoon,

 I've installed R on Suse 10.3 compiling the source files. I opne the program 
 on shell configuration and i have a problem saving the workspace and *.R or 
 *.Rdata files. I 've read the manual and written the comand as it says but 
 the output is:

 save.image()
 Errore in gzfile(file, wb) : impossibile aprire la connessione
 Inoltre: Warning message:
 In gzfile(file, wb) : impossibile aprire il file compresso '.RDataTmp'


 an output error. So i've tried in windows, but i've had the same problem.

 How could i save the data and the files in R?
 I apologize for my english but it'not my language.

 Thanks a lot to everybody

 Danilo

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] counts of each column that are not NA, and/or greater than column means

2008-02-04 Thread Gabor Csardi
On Mon, Feb 04, 2008 at 03:21:10PM +0800, Ng Stanley wrote:
 Hi,
 
 Given a test matrix, test - matrix(c(1,2,3,NA,2,3,NA,NA,2), 3,3)
 
 A) How to compute the counts of each column (excluding the NA) i.e., 3, 2, 1

apply(test, 2, function(x) sum(!is.na(x)))

 B) How to compute the counts of each column (excluding the NA) that are
 greater than the column means ? i.e., 1, 1, 0

apply(test, 2, function(x) sum(x  mean(x, na.rm=TRUE), na.rm=TRUE))

In general, you need ?apply to calculate something for each row/column
of a matrix.

Gabor

 I could write a for loop, but hope to use better alternative.
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Csardi Gabor [EMAIL PROTECTED]UNIL DGM

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ARCH LM test for univariant time series

2008-02-04 Thread Pfaff, Bernhard Dr.
Dear All,


one can visually inspect ARCH-effects by plotting acf/pacf of the
squared residuals from an OLS-estimation. This can be as simple as a
demeaned series. Further one can run an auxiliary regression by
regressing q lagged squared values and a constant on the squared series
itself. This test statistic (N-q)*R^2 is distributed as chisq with q
degrees of freedom.  

Something along the lines:

archlmtest - function (x, lags, demean = FALSE) 
{
  x - as.vector(x)
  if(demean) x - scale(x, center = TRUE, scale = FALSE)
lags - lags + 1
mat - embed(x^2, lags)
arch.lm - summary(lm(mat[, 1] ~ mat[, -1]))
STATISTIC - arch.lm$r.squared * length(resid(arch.lm))
names(STATISTIC) - Chi-squared
PARAMETER - lags - 1
names(PARAMETER) - df
PVAL - 1 - pchisq(STATISTIC, df = PARAMETER)
METHOD - ARCH LM-test
result - list(statistic = STATISTIC, parameter = PARAMETER, 
p.value = PVAL, method = METHOD, data.name =
deparse(substitute(x)))
class(result) - htest
return(result)
}

should work and yield equal results as mentioned earlier in this thread.

Best,
Bernhard



Spencer,

The warning message is sent from VAR, it basically lets you 
know that the
data it used had no column names and it had to supply them 
using y1, y2, y3,
etc. It can be suppressed by including options(warn=-1) in the 
function.

Anyway, it seems that the p value from my function does not match
FinMetrics'. I guess the function doesn't work... hmm...


On 2/2/08, Spencer Graves [EMAIL PROTECTED] wrote:

 Dear Tom:

  Your revised function eliminates the discrepancy in the 
degrees of
 freedom but is still very different from the numbers reports 
on Tsay, p.
 102:

 archTest(log(1+as.numeric(m.intc7303)), lag=12)

ARCH test (univariate)

 data:  Residual of y1 equation
 Chi-squared = 13.1483, df = 12, p-value = 0.3584

 Warning message:
 In VAR(s, p = 1, type = const) :
 No column names supplied in y, using: y1, y2, y3, y4, y5, y6, y7, y8,
 y9, y10, y11, y12 , instead.


  TOM:  What can you tell me about the warning message?

  Thanks for your help with this.
  Spencer Graves

 tom soyer wrote:
  Spencer,
 
  Sorry, I forgot that the default lag in arch is 16. Here 
is the fix. Can
 you
  try it again and see if it gives the correct (or at least similar
 compared
  to a true LM test) result?
 
  archTest=function(x, lags=12){
   #x is a vector
   require(vars)
   s=embed(x,lags)
   y=VAR(s,p=1,type=const)
   result=arch(y,lags.single=lags,multi=F)$arch.uni[[1]]
   return(result)
  }
 
  Thanks and sorry about the bug.
 
 
  On 2/2/08, Spencer Graves [EMAIL PROTECTED] wrote:
 
  Dear Tom, Bernhard, Ruey:
 
   I can't get that to match Tsay's example, but I have other
  questions about that.
 
   1.  I got the following using Tom's 'archTest' 
function (below):
 
 
  archTest(log(1+as.numeric(m.intc7303)), lags=12)
 
 ARCH test (univariate)
 
  data:  Residual of y1 equation
  Chi-squared = 10.8562, df = 16, p-value = 0.8183
 
  Warning message:
  In VAR(s, p = 1, type = const) :
  No column names supplied in y, using: y1, y2, y3, y4, y5, 
y6, y7, y8,
  y9, y10, y11, y12 , instead.
 
 
** First note that the answer has df = 16, even though I
  supplied lags = 12.
 
   2.  For (apparently) this example, S-Plus FinMetrics 
'archTest'
  function returned Test for ARCH Effects:  LM Test.  Null 
Hypothesis:
  no ARCH effects.  Test Stat 43.5041, p.value 0..  
Dist. under Null:
  chi-square with 12 degrees of freedom.
 
   3.  Starting on p. 101, Ruey mentioned the Lagrange 
multiplier
  test of Engle (1982), saying This test is equivalent to 
the usual F
  test for no regression, but refers it to a chi-square, not an F
  distribution.  Clearly, there is a gap here, because the 
expected value
  of the F distribution is close to 1 [d2/(d2-2), where d2 
= denominator
  degrees of freedom;  http://en.wikipedia.org/wiki/F-distribution],
 while
  the expected value for a chi-square is the number of 
degrees of freedom
 
   Unfortunately, I don't feel I can afford the time to 
dig into this
  further right now.
 
   Thanks for your help.
   Spencer Graves
 
  tom soyer wrote:
 
  Spencer, how about something like this:
 
  archTest=function (x, lags= 16){
   #x is a vector
   require(vars)
   s=embed(x,lags)
   y=VAR(s,p=1,type=const)
   result=arch(y,multi=F)$arch.uni[[1]]
   return(result)
  }
 
  can you, or maybe Bernhard, check and see whether this 
function gives
  the correct result?
 
  thanks,
 
  On 2/1/08, *Spencer Graves* [EMAIL PROTECTED]
  mailto:[EMAIL PROTECTED] wrote:
 
  Hi, Tom:
 
   The 'arch' function in the 'vars' package is 
supposed to be
 
  able
 
  to do that.  Unfortunately, I was unable to make it 
work for a
  univariate series.  Bernhard Pfaff, the author of 
'vars', said
  that if I
  read the code for 'arch', I could easily retrieve 
the necessary
 
  lines
 
  and put them 

[R] Association Measures

2008-02-04 Thread dt Excellent
  Does anybody know if there is an implementation of Goodman-Kruskal lambda 
measures of association in R?
  Also, how we can analyze ordered contingency tables and compute the relative 
measures of associations in R?
  Thank in advance
   
-



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] counts of each column that are not NA, and/or greater than column means

2008-02-04 Thread Dimitris Rizopoulos

hits=-2.5 tests�YES_00,FORGED_RCVD_HELO
X-USF-Spam-Flag: NO

check the following options:

test - matrix(c(1,2,3,NA,2,3,NA,NA,2), 3, 3)

# A
colSums(!is.na(test))

# B
mat - test  rep(colMeans(test, na.rm = TRUE), each = nrow(test))
colSums(!is.na(mat)  mat)

apply(test, 2, function(x) {
mus - mean(x, na.rm = TRUE)
sum(x  mus, na.rm = TRUE)
})


I hope it helps.

Best,
Dimitris


Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
 http://www.student.kuleuven.be/~m0390867/dimitris.htm


Quoting Ng Stanley [EMAIL PROTECTED]:


Hi,

Given a test matrix, test - matrix(c(1,2,3,NA,2,3,NA,NA,2), 3,3)

A) How to compute the counts of each column (excluding the NA) i.e., 3, 2, 1

B) How to compute the counts of each column (excluding the NA) that are
greater than the column means ? i.e., 1, 1, 0

I could write a for loop, but hope to use better alternative.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.






Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] setting up directorie and packages at the startup

2008-02-04 Thread Lassana TOURE
Hi!

Can someone help me how to set up a directory and load  additional packages
at the startup than I do not have to do it each time I go to R session.

Thanks.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] setting up directorie and packages at the startup

2008-02-04 Thread Dieter Menne
Lassana TOURE lassana.toure at ier.ml writes:

 Can someone help me how to set up a directory and load  additional packages
 at the startup than I do not have to do it each time I go to R session.

It depends on your system and the type of installation, but searching for
.Rprofile in the manuals and CRAN might give you a starter.

Dieter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (no subject)

2008-02-04 Thread Petr PIKAL
[EMAIL PROTECTED] napsal dne 02.02.2008 07:10:37:

 bgchen wrote:
  r-help
  
  c(2.43, 3.22, 6.9, 3.03, 5.36, 6.9, 2.29, 6.13, 6.11, 4.25, 3.85, 
  5.09, 7.44, 2.86, 2.82, 3.64, 3.22, 7, 2.65, 4.5, 3.73, 5.09, 
  5.8, 7.87, 2.87, 2.9, 6.63, 6.8, 2.45, 7.68, 2.56, 2.54, 7.35, 
  4.61, 2.58, 3.27, 5.8, 3.13, 3.29, 2.56, 5.79, 2.67, 7.55, 7.13, 
  8.4, 6.16, 3.34, 6.49, 1.85, 7.4, 7.89, 3.3, 6.28, 6.52, 6.07, 
  4.26, 4.4, 3.1, 5.97, 6.69, 3.69, 4.62, 5.79, 2.54, 8.91, 3.48, 
  2.35, 7.28, 6.75, 6.97, 7.29, 2.28, 3.36, 4.79, 2.41, 7.93, 6.17, 
  2.19, 4.27, 7.17, 2.12, 3.05, 4.4, 5.35, 2.96, 3.54, 3.31, 7.28, 
  2.47, 2.27, 7.43, 3.69, 3.32, 2.61, 2.19, 5.67, 5.03, 7.58, 5.27, 
  2.25, 2.59, 4.16, 1.71, 3.64, 2.93, 7.2, 5.99, 7.68, 4.14, 5.83, 
  3.87, 7.29, 5.95, 3.4, 2.45, 2.85, 6.77, 3.58, 4.31, 5.83, 6.19, 
  2.56, 6.69, 7.2, 5.04, 4.34, 2.66, 7.68, 6.8, 2.5, 1.95, 7.12, 
  3.02, 2.58, 3.39, 3.78, 3.53, 4.42, 6.64, 2.19, 4.79, 5.75, 5.55, 
  5.57, 3.94, 7.27, 3.39, 6.64, 6.13, 4.55, 6.75, 3.79, 1.82, 5.95, 
  5.88, 2.64, 5.43, 7.9, 3.14, 5.38, 7.97, 5.83, 3.18, 2.47, 3.33, 
  6.98, 6.35, 6.1, 7.11, 7.34, 5.08, 2, 3.13, 2.96, 2.43, 2.12, 
  3.24, 3.13, 2.6, 5.15, 3.34, 7.07, 5.42, 3.1, 2.66, 2.42, 7.3, 
  3.28, 4.68, 5.9, 6.64, 4.79, 1.84, 6.78, 6.1, 3.29, 2.35, 2.16, 
  6.36, 6.41, 5.39, 2.48, 7.35, 5.87, 6.77, 5.66, 2.81, 5.27, 2.95, 
  7.01, 2.55, 3.4, 2.84, 2.35, 6.36, 4.31, 7.44, 3.35, 6.42, 5.86, 
  3.14, 6.95, 5.27, 2.05, 4.85, 3.37, 2.28, 7.59, 6.99, 5.68, 4.5, 
  2.81, 6.07, 5.27, 6.32, 5.25, 3.69, 3.89, 3.23, 3.5, 6.2, 6.78, 
  2.51, 5.87, 7, 6.99, 7.27, 7.62, 6.87, 2.38, 6.78, 3.11, 4.28, 
  7.57, 5.65, 3.32, 2.53, 3.09, 4.99, 2.19, 6.37, 2.68, 7.22, 2.47, 
  3.05, 4.92, 2.82, 7.72, 3.67, 6.76, 6.38, 3.18, 3.02, 6.2, 3.47, 
  3.02, 7.22, 2.21, 7.33, 7.58, 6.58, 2.98, 6.7, 2.64, 6, 2.72, 
  2, 6.35, 5.36, 6.74, 5.29, 6.22, 3.14, 7.15, 4.45, 2.34, 2.63, 
  4.62, 7.51, 7.4, 3.16, 5.99, 7.07, 4.85, 3.11, 4.6, 6.55, 2.63, 
  3.15, 5.16, 5.81, 3.99, 6.12, 4.08, 3.69, 3.32, 3.13, 2.41, 2.95, 
  2.88, 5.79, 5.83, 2.49, 3.8, 4.62, 3.71, 3.18, 7.41, 7.3, 6.51, 
  2.27, 7.15, 6.04, 8.11, 7.74, 7.09, 7.21, 2.54, 6.74, 3.48, 7.8, 
  6.41, 5.64, 5.11, 4.91, 2.25, 3.31, 3.95, 6.58, 5.38, 5.45, 3.46, 
  2.4, 5.79, 3.34, 3.32, 1.97, 2.49, 3.29, 5.69, 1.75, 4.57, 7.04, 
  5.29, 7.79, 7.84, 3.02, 4.36, 6.87, 2.14, 5.43, 4.11, 7.51, 2.13, 
  5.33, 2.91, 2.87, 5.95, 2.29, 2.08, 7.78, 3.96, 4.55, 2.82, 2.63, 
  3.13, 3.43, 5.96, 2.39, 4.98, 3.51, 6.26, 4.03, 2.41, 6.8, 7.64, 
  3.3, 2.66, 6.51, 3.33, 5.64, 2.5, 6.09, 2.63, 2.55, 2.52, 4.15, 
  5.3, 6.11, 5.13, 3.42, 7.42, 1.87, 6.64, 4.63, 3.25, 1.97, 5.44, 
  7.96, 6.67, 7.14, 5.12, 5.27, 7.48, 7.25, 2.51, 4.5, 2.87, 6.7, 
  2.09, 6.85, 2.16, 6.37, 3.01, 3.55, 4.5, 2.69, 3.52, 5.7, 3.02, 
  5.94, 7.38, 6.45, 5.33, 2.8, 4.83, 3.01, 6.34, 2.99, 1.98, 3.02, 
  3.11, 8.65, 6.38, 7.08, 7.74, 6.19, 7.97, 2.7, 5.05, 6.61, 3.75, 
  5.24, 6.59, 5.91, 2.02, 6.95, 3.05, 2.89, 5.09, 4.9, 6.78, 5.79, 
  5.42, 3.13, 3.97, 2.24, 5.91, 6.28, 3.44, 6.14, 2.41, 8.03, 3.39, 
  3.33, 3.21, 2.04, 4.49, 4.94, 3, 7.59, 6.84, 2.11, 6.36, 2.63, 
  2.01, 2.73, 4.94, 2.7, 2.65) 
  
 42
 
 Jim

Nice, have you seen Vogons nearby?

Petr

 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] use classificators learned in R in

2008-02-04 Thread Dieter Menne
 sich at gmx.de writes:

 I am interested in using R for machine learning (supervised classification).
 Currently, I have been investigating especially the rpart, tree, and
randomForest package, and have
 achieved first results.
 
 are there any experiences, how the learned classificators could
 be used in e.g. C ?
 in other words, I want to transfer the learned predictor from R
 to C-code.

You could use dput to write the representation to a file, and read it from 
C. Parsing could be nasty, though, so I would prefer to extract the 
relevant information (e.g. fit$cptable and fit$splits in the rpart example), 
and write it to a database in numeric form with additional information if 
required. Another option could be XML (package XML), and using one of the 
many XML libraries in C(++).

Dieter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] precision in seq

2008-02-04 Thread Eric Elguero
Hi everybody,

this is a warning more than a question.

I noticed that seq produces approximate results:

 seq(0,1,0.05)[19]==0.9
[1] TRUE
 seq(0,1,0.05)[20]==0.95
[1] FALSE
 seq(0,1,0.05)[21]==1
[1] TRUE

 seq(0,1,0.05)[20]-0.95
[1] 1.110223024625157e-16

I do not understand why 0.9 and 1 are correct (within some
tolerance or strictly exact?)  and 0.95 is not.

this one works:

 ((0:20)/20)[20]==0.95
[1] TRUE

Eric Elguero

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to get points from SpatialPolygonsDataFrame

2008-02-04 Thread Roger Bivand
Takatsugu Kobayashi tkobayas at indiana.edu writes:

 
 try
 
 tmp- slot(ex_1.7.selected, 'polygons')
 sub.tmp - slot(tmp[[1]],'Polygons')
 [EMAIL PROTECTED]
 
 will get you there.
 
 taka
 
 Jarek Jasiewicz wrote:
  Milton Cezar Ribeiro wrote:

  Dear all,
 

 
  grd - GridTopology(c(1,1), c(1,1), c(10,10))
  polys - as.SpatialPolygons.GridTopology(grd)
  centroids - coordinates(polys)
  x - centroids[,1]
  y - centroids[,2]
  z - 1.4 + 0.1*x + 0.2*y + 0.002*x*x
  ex_1.7 - SpatialPolygonsDataFrame(polys, data=data.frame(x=x, y=y, z=z,
 row.names=sapply(slot(polys, polygons), function(i) slot(i, ID
  ex_1.7.selected-ex_1.7[1,]
  slot(ex_1.7.selected,coords)

A recent thread on the (more relevant for your purposes) R-sig-geo list provides
code for doing at least part of what you say you want:

https://stat.ethz.ch/pipermail/r-sig-geo/2008-January/003075

Here, it would be:

library(sp)
grd - GridTopology(c(1,1), c(1,1), c(10,10))
polys - as.SpatialPolygons.GridTopology(grd)
centroids - coordinates(polys)
x - centroids[,1]
y - centroids[,2]
z - 1.4 + 0.1*x + 0.2*y + 0.002*x*x
ex_1.7 - SpatialPolygonsDataFrame(polys, data=data.frame(x=x, y=y,
  z=z, row.names=sapply(slot(polys, polygons),
  function(i) slot(i, ID
pls - slot(ex_1.7, polygons)
# from construction, we know that each Polygons object in pls has five 2D
# coordinates and only one part, so we won't check:
res - t(sapply(pls, function(x) c(slot(slot(x, Polygons)[[1]], coords
# steps along the list of Polygons objects, pulling out the coords slot
# of the first and only Polygon object in its Polygons slot, finally
# flattening them from a matrix to a vector as you request, and transposing
str(res)
res[1,]

Hope this helps,

Roger

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Effect size of comparison of two levels of a factor in multiple linear regression

2008-02-04 Thread Chuck Cleland
On 2/3/2008 10:09 AM, Christoph Mathys wrote:
 Dear R users,
 
 I have a linear model of the kind
 
 outcome ~ treatment + covariate
 
 where 'treatment' is a factor with three levels (0, 1, and 2),
 and the covariate is continuous. Treatments 1 and 2 both have
 regression coefficients significantly different from 0 when using
 treatment contrasts with treatment 0 as the baseline. I would now like
 to determine effect sizes (akin to Cohen's d in a two-sample comparison)
 for the comparison to baseline of treatments 1 and 2. I have
 illustrated a way to do this in the reproducible example below and am
 grateful for any comments on the soundness of what I'm doing. I have not
 yet found a way to determine confidence intervals for the effect sizes
 derived below and would appreciate tips on that.
 
 set.seed(123456) # Make session reproducible
 
 # Set up the treatment factor with three levels and 100 observations
 # each
 treatment - factor(c(rep(0, 100), rep(1, 100), rep(2, 100)))
 
 # Simulate outcomes
 outcome - rep(NA, 300)
 outcome[treatment==0] - rnorm(100, 10, 5) # baseline: mean=10, sd=5
 outcome[treatment==1] - rnorm(100, 30, 5) # effect size 4
 outcome[treatment==2] - rnorm(100, 40, 5) # effect size 6
 
 # Check effect sizes (Cohen's d)
 cohens.d - function (x, y) {(mean(x)-mean(y))/sqrt((var(x)+var(y))/2) }
 cohens.d(outcome[treatment==1], outcome[treatment==0])
 [1] 3.984774
 cohens.d(outcome[treatment==2], outcome[treatment==0])
 [1] 6.167798
 
 # Sometimes standardized regression coefficients are recommended
 # for determining effect size but that clearly doesn't work here:
 coef(lm(scale(outcome) ~ treatment))
 (Intercept)  treatment1  treatment2
   -1.2333661.4531522.246946
 # The reason it doesn't work is that the difference of outcome
 # means is divided by the sd of *all* outcomes:
 (mean(outcome[treatment==1])-mean(outcome[treatment==0]))/sd(outcome)
 [1] 1.453152
 (mean(outcome[treatment==2])-mean(outcome[treatment==0]))/sd(outcome)
 [1] 2.246946

   How about scaling the outcome by the residual standard error from the 
unstandardized model?  For example:

library(MASS)

cmat - diag(3)
diag(cmat) - c(25,25,25)

df - data.frame(Y = c(mvrnorm(100, mu=c(10,30,40), Sigma=cmat, 
empirical=TRUE)), TX = factor(rep(c(0,1,2), each=100)))

fm1 - lm(Y ~ TX, data = df)
fm2 - lm(scale(Y, scale=summary(fm1)$sigma) ~ TX, data = df)

summary(fm1)

Call:
lm(formula = Y ~ TX, data = df)

Residuals:
  Min   1Q   Median   3Q  Max
-12.7260  -3.5215  -0.1982   3.4517  12.1690

Coefficients:
 Estimate Std. Error t value Pr(|t|)
(Intercept)  10. 0.5000   20.00   2e-16 ***
TX1  20. 0.7071   28.28   2e-16 ***
TX2  30. 0.7071   42.43   2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 5 on 297 degrees of freedom
Multiple R-squared: 0.8627, Adjusted R-squared: 0.8618
F-statistic: 933.3 on 2 and 297 DF,  p-value:  2.2e-16

summary(fm2)

Call:
lm(formula = scale(Y, scale = summary(fm1)$sigma) ~ TX, data = df)

Residuals:
  Min   1Q   Median   3Q  Max
-2.54521 -0.70431 -0.03965  0.69034  2.43380

Coefficients:
 Estimate Std. Error t value Pr(|t|)
(Intercept)  -3. 0.1000  -33.33   2e-16 ***
TX1   4. 0.1414   28.28   2e-16 ***
TX2   6. 0.1414   42.43   2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1 on 297 degrees of freedom
Multiple R-squared: 0.8627, Adjusted R-squared: 0.8618
F-statistic: 933.3 on 2 and 297 DF,  p-value:  2.2e-16

confint(fm2)
 2.5 %97.5 %
(Intercept) -3.530132 -3.136535
TX1  3.721685  4.278315
TX2  5.721685  6.278315

   I've never seen this approach before, and I'm not how it would work 
when the variances within groups are heterogeneous or one or more 
covariates are added to the model.

hope this helps,

Chuck

 # Now, create a situation where Cohen's d is impossible to
 # calculate directly by introducing a continuous covariate:
 covariate - 1:300
 outcome - outcome + rnorm(300, covariate, 2)
 model - lm(scale(outcome) ~ treatment + scale(covariate))
 coef(model)
  (Intercept)   treatment1   treatment2 scale(covariate)
   -0.17204560.19942510.31671160.8753761
 
 # Proposed way to determine effect size: simulate outcomes for each
 # treatment level assuming the covariate to have a fixed value (here
 # its mean value after standardization: zero)
 library(MCMCpack)
 no.of.sims - 1
 sims.model - MCMCregress(model, mcmc=no.of.sims)
 sims.model[1:2,]
  (Intercept) treatment1 treatment2 scale(covariate)  sigma2
 [1,]  -0.1780735  0.2024111  0.33952330.8682119 0.002617449
 [2,]  -0.1521623  0.1773623  0.29560530.8764573 0.003529013
 sims.treat0 - rnorm(no.of.sims, sims.model[,(Intercept)], 
 sqrt(sims.model[,sigma2]))
 sims.treat1 - 

[R] How to search for packages

2008-02-04 Thread Monica Pisica

Hi everybody,


I know this might be very off topic and it took me quite a while to up my 
courage to post this…. But I remember a thread some time ago about how we can 
find the packages we need to do specific tasks in R if we don’t know before 
hand which ones actually do it. Now all the packages are listed alphabetically 
on the web site. Since I am not very advanced in writing my own functions I 
relay heavily on work already done and only when I have no other choice I 
modify existing functions. Usually my modifications are only cosmetic.


But sometimes I use lots of time to just read the descriptions of packages 
until I decide that maybe one will do close to what I want. I wonder if there 
is any way to improve how these packages are displayed on the site and help 
with this decision. I wonder if the community as a whole can come up with some 
broader categories such as Bayesian, spatial statistics, bootstrap, vegetation 
analysis, circular statistics, robust statistics, etc., and the authors of the 
package can choose 1 or 2 or how many categories they think their package fits 
the most. On the web page we can have a list of those very broad categories and 
within each category we can have in alpha order the packages themselves with 
their description and such as it is now. So if I am interested in vegetation 
analysis or environmental analysis but I never did it before I go to that 
category and see which packages are more geared towards that particular 
subject. For example it was by chance alone and some GOOGLE search that I 
discovered that the package labdsv has anything to do with vegetation analysis 
since first of course I looked at any package which might have “veg” or “env” 
in the title.


I also realize that this might mean a lot of work, but R develops so rapidly 
that soon I think it will be unmanageable to just peruse the list of packages 
and read descriptions in order to choose which package to install, when you are 
not familiar with all of them. I hope I didn’t offend the community with this, 
I would be very sorry since actually I get lots of help here and I learnt a lot 
from you. I will remain forever greatful.



Thanks,

Monica




_
[[elided Hotmail spam]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Association Measures

2008-02-04 Thread David Winsemius
dt Excellent [EMAIL PROTECTED] wrote in
news:[EMAIL PROTECTED]: 

   Does anybody know if there is an implementation of Goodman-Kruskal
   lambda measures of association in R? Also, how we can analyze
   ordered contingency tables and compute the relative measures of
   associations in R? Thank in advance

Laura Thompson's R (and S-PLUS) Manual to Accompany Agresti’s 
Categorical Data Analysis (2002): 2nd edition has two versions.

A search on Manual to Accompany Agresti produces multiple places where 
it can be found. 

-- 
David Winsemius

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] use classificators learned in R in real-life, e.g. C

2008-02-04 Thread Max Kuhn
On Feb 3, 2008 2:16 PM,  [EMAIL PROTECTED] wrote:
 Hi there,
 I am interested in using R for machine learning (supervised classification).
 Currently, I have been investigating especially the rpart, tree, and 
 randomForest package, and have achieved first results.

 are there any experiences, how the learned classificators could
 be used in e.g. C ?
 in other words, I want to transfer the learned predictor from R
 to C-code.


I don't know of any automated way to do this for those models, but the
MARS model in the earth package can export a C-friendly version of the
prediction function. See ?format.earth for details. The bagged MARS
model in the caret package does the same (see ?format.bagEarth).

Max


-- 

Max

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] make dataframe from table

2008-02-04 Thread Henrique Dallazuanna
Perhaps:

affect - as.data.frame(do.call('rbind', tapply(data$V2, data$V1, table)))
merge(age, affect, by.x=1, by.y=0)


On 04/02/2008, Boks, M.P.M. [EMAIL PROTECTED] wrote:
 Dear R-experts,


 I have got a dataframe:
 data
 ID disease
   V1 V2
 1 p1  1
 2 p1  3
 3 p3  3
 4 p3  5
 5 p5  1

 From which I extract a usefull table: affect

 affect
 1 3 5
   p1 1 1 0
   p3 0 1 1
   p5 1 0 0

 I want to merge this with anotherdataframe:
 age
 p1  23
 p2  24
 p3  23
 p4  11
 p5  45

 If have tried as.data.frame(affect) and other solutions to get the
 following comment going:
 merge(age,affect, by.x=0, by.y=1)

 QUESTION: I can get the merging process going with the outcome of the
 table command.

 Any help would be great.

 R for windows v2.6.1

 code:

 data-as.data.frame(matrix(c(p1,p1,p3,p3,p5,1,3,3,5,1),5,2))
 age-as.data.frame(matrix(c(p1,p2,p3,p4,p5,23,24,23,11,45),5,2
 ))
 affect-table(data[,1],data[,2])
 merge(age,affect, by.x=0, by.y=1)


 BW

 Marco


 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rpart error when constructing a classification tree

2008-02-04 Thread Randy Griffiths
Sorry, I wasn't very helpful. Let me try this again. I have attached a
subsample of the data which still gives me the same error as when I use the
full data file. I am trying to make a decision tree using rpart. This is my
code and output.
 data - read.table(/Users/randygriffiths/Desktop/data, header=T)
 attach(data)

 library(rpart)
 bookings.cart - rpart(totalRev~., data=data, method=class)
 bookings.cart
n= 50

node), split, n, loss, yval, (yprob)
  * denotes terminal node

1) root 50 15 0 (0.7 0.04 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02
0.02 0.02 0.02) *
 summary(bookings.cart)
Call:
rpart(formula = totalRev ~ ., data = data, method = class)
  n= 50

  CP nsplit rel error
1  0  0 1
Error in yval[, 1] : incorrect number of dimensions


I have used rpart on past projects without any problems. I can do a
regression tree without any error as well (by leaving off the method=class
argument). I have used this same data using 'randomForest' with good results
and also using 'tree' without this error. I have seen that at least a few
people have reported getting this same error with rpart, but I could not
find an answer that helped the problem. Can anyone help me?

Randy


On Jan 30, 2008 2:33 AM, Uwe Ligges [EMAIL PROTECTED] wrote:



 Randy Griffiths wrote:
  I am trying to make a decision tree using rpart. The function runs very
  quickly considering the size of the data (1742, 163). When I call the
  summary command I get this:
 
  summary(bookings.cart)
  Call:
  rpart(formula = totalRev ~ ., data = bookings, method = class)
n=1741 (1 observation deleted due to missingness)
 
CP nsplit rel error
  1  0  0 1
  Error in yval[, 1] : incorrect number of dimensions
 


 And we get:

 R summary(bookings.cart)
 Error in summary(bookings.cart) : object bookings.cart not found

 hence we cannot reproduce and inspect your problem. Please read the
 posting guide.

 Uwe Ligges


  note:
  dim(bookings)
  [1] 1742  163
 
 
  I have run rpart on past projects without any problems. I have used a
  catagorical version of the variable totalRev that was partitioned into
 four
  levels (and coded as a factor).
  I tried making a tree with the tree command (in the 'tree' package) and
 was
  able to construct a tree without any errors. However, I would much
 rather
  use rpart.
 
  Does anyone have any ideas that might help?
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to make reference to R in the method section in a scientific article?

2008-02-04 Thread Anne York
On Mon, 4 Feb 2008, Falco tinnunculus wrote:

Ft  Dear all,
Ft  
Ft  How do I make reference to R in the method section in a scientific article?
Ft  Should I state the web aderess?
Ft  
Ft  And, is this the proper way to report the lme test?
Ft  
Ft  
Ft  The relationships were assumed to be linear. The response, handling time
Ft  and the explanatory variable, prey mass, were log transformed to obtain
Ft  normal error distributions. The lme function of the nlme package for R was
Ft  used to fit linear mixed-effects (LME) models, using restricted maximum
Ft  likelihood (REML). Model selection were computed by using the Akaike
Ft  Information Criteria (AIC). The overall test of the model was conducted 
with
Ft  ANOVA.
Ft  
Ft  
Ft  
Ft  Regards Kes
Ft  

Check out citation function. This can be used to find 
the proper citation for R and for packages that have 
included a citation. For example:

citation(package=MASS)

Citing the url is usually a matter of journal style. Some 
journals now  want the date that the url was visited as part 
of the citation.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] free float of swiss shares (streubesitzanteil)

2008-02-04 Thread draht zieher
dear list,
i'd like to do some calculation for which i need the free float
(german :streubesitzanteil) of a number of swiss firms' shares. but at
swx.com i did only find very few information. does anyone know, if and
where that information is available? best would be a website where
it's extractable via regexp.
thanks in advance!

josuah

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] precision in seq

2008-02-04 Thread jim holtman
FAQ 7.31

On 2/4/08, Eric Elguero [EMAIL PROTECTED] wrote:
 Hi everybody,

 this is a warning more than a question.

 I noticed that seq produces approximate results:

  seq(0,1,0.05)[19]==0.9
 [1] TRUE
  seq(0,1,0.05)[20]==0.95
 [1] FALSE
  seq(0,1,0.05)[21]==1
 [1] TRUE

  seq(0,1,0.05)[20]-0.95
 [1] 1.110223024625157e-16

 I do not understand why 0.9 and 1 are correct (within some
 tolerance or strictly exact?)  and 0.95 is not.

 this one works:

  ((0:20)/20)[20]==0.95
 [1] TRUE

 Eric Elguero

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Processing dates and generating sequences of dates

2008-02-04 Thread Gavin Simpson
hits=-2.6 tests=BAYES_00
X-USF-Spam-Flag: NO

On Mon, 2008-02-04 at 10:48 -0500, Gabor Grothendieck wrote:
 Using zoo's yearmon class:
 
 library(zoo)
 my.dates[!duplicated(as.yearmon(my.dates))]
 
 or, although you seem to disallow this in your question,
 this would be an option:
 
 my.dates[!duplicated(format(my.dates, %Y-%m))]

Ah, actually, I spoke too soon. Your solutions return the following

 my.dates[!duplicated(format(my.dates, %Y-%m))]
 [1] 2005-05-01 2005-06-01 2005-07-01 2005-08-01 2005-09-01
 [6] 2005-10-01 2005-11-01 2005-12-01 2006-01-01 2006-02-01
[11] 2006-03-01 2006-04-01 2006-05-01 2006-06-01 2006-07-01
[16] 2006-08-01 2006-09-01 2006-10-01 2006-11-01 2006-12-01
[21] 2007-01-01 2007-02-01 2007-03-01 2007-04-01 2007-05-01
[26] 2007-06-01 2007-07-01

which gives only the months sampled. What I need is a vector of length
36 covering 1st Jan 2005 through 31st Dec 2007 as in (using the seq()
call in my original email):

 new.dates
 [1] 2005-01-01 2005-02-01 2005-03-01 2005-04-01 2005-05-01
 [6] 2005-06-01 2005-07-01 2005-08-01 2005-09-01 2005-10-01
[11] 2005-11-01 2005-12-01 2006-01-01 2006-02-01 2006-03-01
[16] 2006-04-01 2006-05-01 2006-06-01 2006-07-01 2006-08-01
[21] 2006-09-01 2006-10-01 2006-11-01 2006-12-01 2007-01-01
[26] 2007-02-01 2007-03-01 2007-04-01 2007-05-01 2007-06-01
[31] 2007-07-01 2007-08-01 2007-09-01 2007-10-01 2007-11-01
[36] 2007-12-01

This just seems a bit kludgy:

new.dates - seq(as.Date(paste(format(min(my.dates), format = %Y), 
   1, 1, sep = /)),
 as.Date(paste(format(max(my.dates), format = %Y), 
   12, 31, sep = /)),
 by = months)

but perhaps there isn't a better way?

Cheers,

G

 
 
 On Feb 4, 2008 10:39 AM, Gavin Simpson [EMAIL PROTECTED] wrote:
  hits=-2.6 tests=BAYES_00
  X-USF-Spam-Flag: NO
 
  Dear List,
 
  Say I have the following sequence of dates [*]:
 
  start - as.Date(2005-01-05, format = %Y-%d-%m)
  end - as.Date(2007-10-07, format = %Y-%d-%m)
  my.dates - seq(start, end, by = days)
 
  What I would like to generate is a sequence of dates, by month that goes
  from the first day of 2005 (the year of the start date) to the last day
  of 2007 (the year of the end date), so that the output is a vector of 36
  dates containing all months of the three calendar years that the
  sampling spanned.
 
  I could do it via manipulation of dates as so:
 
  new.dates - seq(as.Date(paste(format(min(my.dates), format = %Y),
1, 1, sep = /)),
  as.Date(paste(format(max(my.dates), format = %Y),
12, 31, sep = /)),
  by = months)
 
  And then manipulate that to get only the month and year parts of the 36
  generated dates.
 
  This doesn't seem very elegant to me. Is there a better/easier way than
  converting back and forth between characters and objects of class
  Date?
 
  Many thanks,
 
  G
 
  [*] FWIW, my actual application is similar to my.dates, but not sampled
  every day - indeed there are months where there are no samples - and I
  am trying to do a levelplot of the numbers of observations per month,
  per year. Given the following data
 
  dat - data.frame(temp = rnorm(length(my.dates)),
   my.dates = my.dates)
  dat$year - as.numeric(format(dat$my.dates, format = %Y))
  dat$month - format(dat$my.dates, format = %b)
  dat$month - factor(dat$month, levels = c(Jan,Feb,Mar,Apr,
   May,Jun,Jul,Aug,
   Sep,Oct,Nov,Dec))
 
  I can get a table of the number of observations per month per year via
 
  (obs.yearmon - with(dat, table(year, month)))
 
  Which when converted to a vector provides what I need for levelplot()'s
  formula method. Now I just need to generate the sequence of 36 months
  (Jan, Feb etc) and years to go with it. The matrix method of
  levelplot works, but I am having to hard code a lot of details that I
  believe the formula method will handle automagically for me.
  --
  %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
   Dr. Gavin Simpson [t] +44 (0)20 7679 0522
   ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
   Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
   Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
   UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
  %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 

Re: [R] make dataframe from table

2008-02-04 Thread John Kane
You need to create a data.frame in a different way is
all

Try this:  

df1 -
data.frame(rownames(affect),matrix(affect,nrow=3)) 
merge(age,df1, by.x=1, by.y=1) 


--- Boks, M.P.M. [EMAIL PROTECTED] wrote:

 Dear R-experts,
  
  
 I have got a dataframe: 
 data
 ID disease
   V1 V2
 1 p1  1
 2 p1  3
 3 p3  3
 4 p3  5
 5 p5  1
  
 From which I extract a usefull table: affect
  
 affect
 1 3 5
   p1 1 1 0
   p3 0 1 1
   p5 1 0 0
 
 I want to merge this with anotherdataframe:
 age
 p1  23
 p2  24
 p3  23
 p4  11
 p5  45
  
 If have tried as.data.frame(affect) and other
 solutions to get the
 following comment going:
 merge(age,affect, by.x=0, by.y=1)
  
 QUESTION: I can get the merging process going with
 the outcome of the
 table command. 
  
 Any help would be great.
  
 R for windows v2.6.1
  
 code:
  

data-as.data.frame(matrix(c(p1,p1,p3,p3,p5,1,3,3,5,1),5,2))

age-as.data.frame(matrix(c(p1,p2,p3,p4,p5,23,24,23,11,45),5,2
 ))
 affect-table(data[,1],data[,2])
 merge(age,affect, by.x=0, by.y=1)
  
  
 BW
  
 Marco
  
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.
 



  Looking for the perfect gift? Give the gift of Flickr!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to search for packages

2008-02-04 Thread hadley wickham
On Feb 4, 2008 8:34 AM, Monica Pisica [EMAIL PROTECTED] wrote:

 Hi everybody,


 I know this might be very off topic and it took me quite a while to up my 
 courage to post this…. But I remember a thread some time ago about how we can 
 find the packages we need to do specific tasks in R if we don't know before 
 hand which ones actually do it. Now all the packages are listed 
 alphabetically on the web site. Since I am not very advanced in writing my 
 own functions I relay heavily on work already done and only when I have no 
 other choice I modify existing functions. Usually my modifications are only 
 cosmetic.


 But sometimes I use lots of time to just read the descriptions of packages 
 until I decide that maybe one will do close to what I want. I wonder if there 
 is any way to improve how these packages are displayed on the site and help 
 with this decision. I wonder if the community as a whole can come up with 
 some broader categories such as Bayesian, spatial statistics, bootstrap, 
 vegetation analysis, circular statistics, robust statistics, etc., and the 
 authors of the package can choose 1 or 2 or how many categories they think 
 their package fits the most. On the web page we can have a list of those very 
 broad categories and within each category we can have in alpha order the 
 packages themselves with their description and such as it is now. So if I am 
 interested in vegetation analysis or environmental analysis but I never did 
 it before I go to that category and see which packages are more geared 
 towards that particular subject. For example it was by chance alone and some 
 GOOGLE search that I discovered that the package labdsv has anything to do 
 with vegetation analysis since first of course I looked at any package which 
 might have veg or env in the title.

Before Christmas I started working on a solution for this -
http://crantastic.org - a site for searching, reviewing and tagging R
packages.  Unfortunately I've run out of steam lately (and the lack of
a 64-bit ubuntu package for R means it's a bit out of date), but the
basic ideas are there.  If you like how the site is looking so far
please let me know, as it will be motivation for me to get the site
finished.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] adding the mean and standard deviation to boxplots

2008-02-04 Thread Tom Cohen
Dear list, 
   
  How can I add the mean and standard deviation to each of the boxplots using 
the example provided  in the boxplot function?
  
boxplot(len ~ dose, data = ToothGrowth,
boxwex = 0.25, at = 1:3 - 0.2,
subset = supp == VC, col = yellow,
main = Guinea Pigs' Tooth Growth,
xlab = Vitamin C dose mg,
ylab = tooth length, ylim = c(0, 35), yaxs = i)
boxplot(len ~ dose, data = ToothGrowth, add = TRUE,
boxwex = 0.25, at = 1:3 + 0.2,
subset = supp == OJ, col = orange)
legend(2, 9, c(Ascorbic acid, Orange juice),
   fill = c(yellow, orange))

Thanks for any help,
  Tom

   
-

Jämför pris på flygbiljetter och hotellrum: 
http://shopping.yahoo.se/c-169901-resor-biljetter.html
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Choosing hardware

2008-02-04 Thread Todd Remund

I’m ordering a new computer to increase my ability to handle large data sets.  
I’ve tried the dual core type and also the dual processor with dual cores each, 
and have not been satisfied.  This seems to agree with all the other postings 
on the help list.  I don’t want to do any simulations, I just want to deal with 
a large data set and do things such as resampling time series data from 50,000 
samples per second to 2,000 samples per second.  And compute other things that 
are not easily done on my PC.  What type of hardware do I want?  I am leaning 
toward a single processor, since I have had a slow down with the dual core 
processors.  It seems to me that the threading and clustering will not really 
help my situation.  Does anyone have any suggestions?
 
Thanks
 
Todd Remund
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Choosing hardware

2008-02-04 Thread Patrick Connolly
On Mon, 04-Feb-2008 at 10:42AM -0700, Todd Remund wrote:

| 

| I'm ordering a new computer to increase my ability to handle large
| data sets.  I?ve tried the dual core type and also the dual
| processor with dual cores each, and have not been satisfied.  This
| seems to agree with all the other postings on the help list.  I
| don?t want to do any simulations, I just want to deal with a large
| data set and do things such as resampling time series data from
| 50,000 samples per second to 2,000 samples per second.  And compute
| other things that are not easily done on my PC.  What type of
| hardware do I want?  I am leaning toward a single processor, since
| I have had a slow down with the dual core processors.  It seems to
| me that the threading and clustering will not really help my
| situation.  Does anyone have any suggestions?

Sounds as though it's a memory limitation.  If that's the case, you
might do best with a 64bit machine and a decent amount of memory.  If
you're using Linux, it's easy since 64 bit has been working for years.
If you have to use Windows (you didn't say), you might have to wait a
bit longer.

best


-- 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.   
   ___Patrick Connolly   
 {~._.~} Great minds discuss ideas
 _( Y )_Middle minds discuss events 
(:_~*~_:)Small minds discuss people  
 (_)-(_)   . Anon
  
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to search for packages

2008-02-04 Thread Roger Bivand
hadley wickham h.wickham at gmail.com writes:


  Also it is staggering that there are over 1200 packages for R  i was
suspecting close to 1000  .
 
 And the site is a couple of weeks out of date, so there are probably even 
 more.
 

The real answer was Task Views on CRAN (most of the OQs topics *are* already
Task Views), so crantastic is very partial. If you have a little time and want
to really draw in the masses, try doing clickable image maps from the
Bioconductor pkgDepTools: 

http://bioconductor.org/packages/2.1/bioc/html/pkgDepTools.html

because some of the unobtrusive short-name packages are key nodes in package
dependency graphs. The dependency trees are very illuminating. Automating the
updates would be positive. If you could also run against the Task View listings,
which are readily parsable, that would be very helpful.

Roger

 The intent isn't to provide large in-depth reviews, but to make it
 easy to add comments like yours above.  I agree it would be also nice
 to have some central place that lists all articles which cite a given
 R package, but that would be a lot of work.
 
 Hadley


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to search for packages

2008-02-04 Thread hadley wickham
 Your site is interesting  the little example with the graphics is 
 actually what i was thinking about. Now I've tagged 2 packages as robust 
 analysis but if i press the Tags button i see only my tags - i don't see the 
 graphics tags anymore. I would be extremely happy if you get enough 
 motivation to continue this effort.

Oops, that was a dumb bug in my code, now fixed.

 Also it is staggering that there are over 1200 packages for R  i was 
 suspecting close to 1000  .

And the site is a couple of weeks out of date, so there are probably even more.

 About reviewing packages - i am not sure i am versed enough for that but at 
 least it might be of interest that some packages do not like to be loaded 
 together for some of their functions to work properly. The example i know of 
 is robust and mvoutlier. I like both packages and i use them quite often 
  but now i know when i can have both loaded together and when i have to 
 detach one or another. On the other hand i've published some scientific 
 articles in which i've used R and different packages. It might be of interest 
 to have a place to cite these articles, even if their main focuss is not R 
 itself, but R was used to perform the analysis.


The intent isn't to provide large in-depth reviews, but to make it
easy to add comments like yours above.  I agree it would be also nice
to have some central place that lists all articles which cite a given
R package, but that would be a lot of work.

Hadley


-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] adding the mean and standard deviation to boxplots

2008-02-04 Thread Gabor Grothendieck
Not precisely what you asked for but see the notch= argument to boxplot
for a graphic measure of variability.  If you simply wish to print certain
statistics below the numbers already on the X axis then see:

https://stat.ethz.ch/pipermail/r-help/2008-January/152994.html

On Feb 4, 2008 10:41 AM, Tom Cohen [EMAIL PROTECTED] wrote:
 Dear list,

  How can I add the mean and standard deviation to each of the boxplots using 
 the example provided  in the boxplot function?

 boxplot(len ~ dose, data = ToothGrowth,
boxwex = 0.25, at = 1:3 - 0.2,
subset = supp == VC, col = yellow,
main = Guinea Pigs' Tooth Growth,
xlab = Vitamin C dose mg,
ylab = tooth length, ylim = c(0, 35), yaxs = i)
 boxplot(len ~ dose, data = ToothGrowth, add = TRUE,
boxwex = 0.25, at = 1:3 + 0.2,
subset = supp == OJ, col = orange)
 legend(2, 9, c(Ascorbic acid, Orange juice),
   fill = c(yellow, orange))

 Thanks for any help,
  Tom


 -

 Jämför pris på flygbiljetter och hotellrum: 
 http://shopping.yahoo.se/c-169901-resor-biljetter.html
[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] counting identical data in a column

2008-02-04 Thread joseph
Hi Peter
I have the following data frame with chromosome name, start and end positions:
   chrN start end
1 chr1  11122333  11122633
2 chr1  11122333  11122633
3 chr3  11122333  11122633
8 chr3 111273334 111273634
7 chr2  12122334  12122634
4 chr1  21122377  21122677
5 chr2  33122355  33122655
6 chr2  33122355  33122655
I would like to count the positions that have the same start and add a new 
column with the count number;
the new data frame should look like this:
  chrN 
 start end  count
1 chr1  11122333  11122633   3
2 chr1  11122333  11122633   3
3 chr3  11122333  11122633   3
8 chr3 111273334 111273634 1
7 chr2  12122334  12122634   1
4 chr1  21122377  21122677   1
5 chr2  33122355  33122655   2
6 chr2  33122355  33122655   2
Can you please show me how to achieve this?
Thanks
Joseph


  

Be a better friend, newshound, and 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] adding the mean and standard deviation to boxplots

2008-02-04 Thread Richard . Cotton
 As for the standard deviation, are you sure you want this?  Standard 
 deviation only makes sense if the data are normally distributed...
 --
 
 -- This is false, of course. What you probably meant to say is something
 like:
 
 The sample standard deviation may not tell you what you think it tells 
you
 if the underlying data are not distributed something like a normal
 distribution.

Yes.  How very sloppy of me.  Consider my wrist slapped.

Regards,
Richie.

Mathematical Sciences Unit
HSL



ATTENTION:

This message contains privileged and confidential inform...{{dropped:20}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ARCH LM test for univariant time series

2008-02-04 Thread tom soyer
Thanks Bernhard for the beautiful code!!

On 2/4/08, Pfaff, Bernhard Dr. [EMAIL PROTECTED] wrote:

 Dear All,


 one can visually inspect ARCH-effects by plotting acf/pacf of the
 squared residuals from an OLS-estimation. This can be as simple as a
 demeaned series. Further one can run an auxiliary regression by
 regressing q lagged squared values and a constant on the squared series
 itself. This test statistic (N-q)*R^2 is distributed as chisq with q
 degrees of freedom.

 Something along the lines:

 archlmtest - function (x, lags, demean = FALSE)
 {
 x - as.vector(x)
 if(demean) x - scale(x, center = TRUE, scale = FALSE)
lags - lags + 1
mat - embed(x^2, lags)
arch.lm - summary(lm(mat[, 1] ~ mat[, -1]))
STATISTIC - arch.lm$r.squared * length(resid(arch.lm))
names(STATISTIC) - Chi-squared
PARAMETER - lags - 1
names(PARAMETER) - df
PVAL - 1 - pchisq(STATISTIC, df = PARAMETER)
METHOD - ARCH LM-test
result - list(statistic = STATISTIC, parameter = PARAMETER,
p.value = PVAL, method = METHOD, data.name =
 deparse(substitute(x)))
class(result) - htest
return(result)
 }

 should work and yield equal results as mentioned earlier in this thread.

 Best,
 Bernhard


 
 Spencer,
 
 The warning message is sent from VAR, it basically lets you
 know that the
 data it used had no column names and it had to supply them
 using y1, y2, y3,
 etc. It can be suppressed by including options(warn=-1) in the
 function.
 
 Anyway, it seems that the p value from my function does not match
 FinMetrics'. I guess the function doesn't work... hmm...
 
 
 On 2/2/08, Spencer Graves [EMAIL PROTECTED] wrote:
 
  Dear Tom:
 
   Your revised function eliminates the discrepancy in the
 degrees of
  freedom but is still very different from the numbers reports
 on Tsay, p.
  102:
 
  archTest(log(1+as.numeric(m.intc7303)), lag=12)
 
 ARCH test (univariate)
 
  data:  Residual of y1 equation
  Chi-squared = 13.1483, df = 12, p-value = 0.3584
 
  Warning message:
  In VAR(s, p = 1, type = const) :
  No column names supplied in y, using: y1, y2, y3, y4, y5, y6, y7, y8,
  y9, y10, y11, y12 , instead.
 
 
   TOM:  What can you tell me about the warning message?
 
   Thanks for your help with this.
   Spencer Graves
 
  tom soyer wrote:
   Spencer,
  
   Sorry, I forgot that the default lag in arch is 16. Here
 is the fix. Can
  you
   try it again and see if it gives the correct (or at least similar
  compared
   to a true LM test) result?
  
   archTest=function(x, lags=12){
#x is a vector
require(vars)
s=embed(x,lags)
y=VAR(s,p=1,type=const)
result=arch(y,lags.single=lags,multi=F)$arch.uni[[1]]
return(result)
   }
  
   Thanks and sorry about the bug.
  
  
   On 2/2/08, Spencer Graves [EMAIL PROTECTED] wrote:
  
   Dear Tom, Bernhard, Ruey:
  
I can't get that to match Tsay's example, but I have other
   questions about that.
  
1.  I got the following using Tom's 'archTest'
 function (below):
  
  
   archTest(log(1+as.numeric(m.intc7303)), lags=12)
  
  ARCH test (univariate)
  
   data:  Residual of y1 equation
   Chi-squared = 10.8562, df = 16, p-value = 0.8183
  
   Warning message:
   In VAR(s, p = 1, type = const) :
   No column names supplied in y, using: y1, y2, y3, y4, y5,
 y6, y7, y8,
   y9, y10, y11, y12 , instead.
  
  
 ** First note that the answer has df = 16, even though I
   supplied lags = 12.
  
2.  For (apparently) this example, S-Plus FinMetrics
 'archTest'
   function returned Test for ARCH Effects:  LM Test.  Null
 Hypothesis:
   no ARCH effects.  Test Stat 43.5041, p.value 0..
 Dist. under Null:
   chi-square with 12 degrees of freedom.
  
3.  Starting on p. 101, Ruey mentioned the Lagrange
 multiplier
   test of Engle (1982), saying This test is equivalent to
 the usual F
   test for no regression, but refers it to a chi-square, not an F
   distribution.  Clearly, there is a gap here, because the
 expected value
   of the F distribution is close to 1 [d2/(d2-2), where d2
 = denominator
   degrees of freedom;  http://en.wikipedia.org/wiki/F-distribution],
  while
   the expected value for a chi-square is the number of
 degrees of freedom
  
Unfortunately, I don't feel I can afford the time to
 dig into this
   further right now.
  
Thanks for your help.
Spencer Graves
  
   tom soyer wrote:
  
   Spencer, how about something like this:
  
   archTest=function (x, lags= 16){
#x is a vector
require(vars)
s=embed(x,lags)
y=VAR(s,p=1,type=const)
result=arch(y,multi=F)$arch.uni[[1]]
return(result)
   }
  
   can you, or maybe Bernhard, check and see whether this
 function gives
   the correct result?
  
   thanks,
  
   On 2/1/08, *Spencer Graves* [EMAIL PROTECTED]
   mailto:[EMAIL PROTECTED] wrote:
  
   Hi, Tom:
  
The 'arch' function in the 'vars' package is
 supposed to be
  
   able
  
   to do that.  

Re: [R] counting identical data in a column

2008-02-04 Thread jim holtman
Is this what you want?

 x - read.table(textConnection(  chrN start end
+ 1 chr1  11122333  11122633
+ 2 chr1  11122333  11122633
+ 3 chr3  11122333  11122633
+ 8 chr3 111273334 111273634
+ 7 chr2  12122334  12122634
+ 4 chr1  21122377  21122677
+ 5 chr2  33122355  33122655
+ 6 chr2  33122355  33122655), header=TRUE)
 x$count - ave(x$start, x$start, FUN=length)
 x
  chrN start   end count
1 chr1  11122333  11122633 3
2 chr1  11122333  11122633 3
3 chr3  11122333  11122633 3
8 chr3 111273334 111273634 1
7 chr2  12122334  12122634 1
4 chr1  21122377  21122677 1
5 chr2  33122355  33122655 2
6 chr2  33122355  33122655 2



On 2/4/08, joseph [EMAIL PROTECTED] wrote:
 Hi Peter
 I have the following data frame with chromosome name, start and end positions:
   chrN start end
 1 chr1  11122333  11122633
 2 chr1  11122333  11122633
 3 chr3  11122333  11122633
 8 chr3 111273334 111273634
 7 chr2  12122334  12122634
 4 chr1  21122377  21122677
 5 chr2  33122355  33122655
 6 chr2  33122355  33122655
 I would like to count the positions that have the same start and add a new 
 column with the count number;
 the new data frame should look like this:
  chrN
  start end  count
 1 chr1  11122333  11122633   3
 2 chr1  11122333  11122633   3
 3 chr3  11122333  11122633   3
 8 chr3 111273334 111273634 1
 7 chr2  12122334  12122634   1
 4 chr1  21122377  21122677   1
 5 chr2  33122355  33122655   2
 6 chr2  33122355  33122655   2
 Can you please show me how to achieve this?
 Thanks
 Joseph


  
 
 Be a better friend, newshound, and


[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to search for packages

2008-02-04 Thread john seers (IFR)
 

Hi

I think this is a good suggestion. 

And I would like to add the associated problem of deciding between
packages that do the same function which one is better. Or similarly
packages are often superceded. I find I have to spend a lot of time
learning how to use packages to decide which one is most suitable; or I
spend time with a package to eventually find out it was superceded some
time ago or has been incorporated in another package.

It is also difficult to know how packages fit together (or not).
Functionality often overlaps or is duplicated. I have had some
difficulty with microarray packages - not being sure if they complement
each other or are exclusive to each other. (I know that is more
Bioconductor but the same principle applies in pure R packages, though
perhaps not as pronounced a problem).

Regards

JS

 
---

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Monica Pisica
Sent: 04 February 2008 14:34
To: r-help@r-project.org
Subject: [R] How to search for packages


Hi everybody,


I know this might be very off topic and it took me quite a while to up
my courage to post this But I remember a thread some time ago about
how we can find the packages we need to do specific tasks in R if we
don't know before hand which ones actually do it. Now all the packages
are listed alphabetically on the web site. Since I am not very advanced
in writing my own functions I relay heavily on work already done and
only when I have no other choice I modify existing functions. Usually my
modifications are only cosmetic.


But sometimes I use lots of time to just read the descriptions of
packages until I decide that maybe one will do close to what I want. I
wonder if there is any way to improve how these packages are displayed
on the site and help with this decision. I wonder if the community as a
whole can come up with some broader categories such as Bayesian, spatial
statistics, bootstrap, vegetation analysis, circular statistics, robust
statistics, etc., and the authors of the package can choose 1 or 2 or
how many categories they think their package fits the most. On the web
page we can have a list of those very broad categories and within each
category we can have in alpha order the packages themselves with their
description and such as it is now. So if I am interested in vegetation
analysis or environmental analysis but I never did it before I go to
that category and see which packages are more geared towards that
particular subject. For example it was by chance alone and some GOOGLE
search that I discovered that the package labdsv has anything to do with
vegetation analysis since first of course I looked at any package which
might have veg or env in the title.


I also realize that this might mean a lot of work, but R develops so
rapidly that soon I think it will be unmanageable to just peruse the
list of packages and read descriptions in order to choose which package
to install, when you are not familiar with all of them. I hope I didn't
offend the community with this, I would be very sorry since actually I
get lots of help here and I learnt a lot from you. I will remain forever
greatful.



Thanks,

Monica




_
[[elided Hotmail spam]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Processing dates and generating sequences of dates

2008-02-04 Thread Gavin Simpson
hits=-2.6 tests=BAYES_00
X-USF-Spam-Flag: NO

On Mon, 2008-02-04 at 10:48 -0500, Gabor Grothendieck wrote:
 Using zoo's yearmon class:
 
 library(zoo)
 my.dates[!duplicated(as.yearmon(my.dates))]
 
 or, although you seem to disallow this in your question,
 this would be an option:
 
 my.dates[!duplicated(format(my.dates, %Y-%m))]

Gabor,

Many thanks. Both are exactly what I was looking for. My head was stuck
very firmly in a rut of using tools that work with Date objects and
didn't think about using duplicated!

The latter is especially fine because:
 1. I didn't mean to imply that I didn't want to convert to/from
character - just that my doing so repeatedly in the example
didn't seem very elegant.
 2. This whole thing may end up being wrapped into a function for
use by people who have restricted computing environments, so a
pure R solution is advantageous.

All the best,

G

 
 
 On Feb 4, 2008 10:39 AM, Gavin Simpson [EMAIL PROTECTED] wrote:
  hits=-2.6 tests=BAYES_00
  X-USF-Spam-Flag: NO
 
  Dear List,
 
  Say I have the following sequence of dates [*]:
 
  start - as.Date(2005-01-05, format = %Y-%d-%m)
  end - as.Date(2007-10-07, format = %Y-%d-%m)
  my.dates - seq(start, end, by = days)
 
  What I would like to generate is a sequence of dates, by month that goes
  from the first day of 2005 (the year of the start date) to the last day
  of 2007 (the year of the end date), so that the output is a vector of 36
  dates containing all months of the three calendar years that the
  sampling spanned.
 
  I could do it via manipulation of dates as so:
 
  new.dates - seq(as.Date(paste(format(min(my.dates), format = %Y),
1, 1, sep = /)),
  as.Date(paste(format(max(my.dates), format = %Y),
12, 31, sep = /)),
  by = months)
 
  And then manipulate that to get only the month and year parts of the 36
  generated dates.
 
  This doesn't seem very elegant to me. Is there a better/easier way than
  converting back and forth between characters and objects of class
  Date?
 
  Many thanks,
 
  G
 
  [*] FWIW, my actual application is similar to my.dates, but not sampled
  every day - indeed there are months where there are no samples - and I
  am trying to do a levelplot of the numbers of observations per month,
  per year. Given the following data
 
  dat - data.frame(temp = rnorm(length(my.dates)),
   my.dates = my.dates)
  dat$year - as.numeric(format(dat$my.dates, format = %Y))
  dat$month - format(dat$my.dates, format = %b)
  dat$month - factor(dat$month, levels = c(Jan,Feb,Mar,Apr,
   May,Jun,Jul,Aug,
   Sep,Oct,Nov,Dec))
 
  I can get a table of the number of observations per month per year via
 
  (obs.yearmon - with(dat, table(year, month)))
 
  Which when converted to a vector provides what I need for levelplot()'s
  formula method. Now I just need to generate the sequence of 36 months
  (Jan, Feb etc) and years to go with it. The matrix method of
  levelplot works, but I am having to hard code a lot of details that I
  believe the formula method will handle automagically for me.
  --
  %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
   Dr. Gavin Simpson [t] +44 (0)20 7679 0522
   ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
   Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
   Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
   UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
  %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to make reference to R in the method section in a scientific article? (fwd)

2008-02-04 Thread Anne York
On Mon, 4 Feb 2008, Falco tinnunculus wrote:

Ft  Dear all,
Ft 
Ft  How do I make reference to R in the method section in a scientific article?
Ft  Should I state the web aderess?
Ft 
Ft  And, is this the proper way to report the lme test?
Ft 
Ft 
Ft  The relationships were assumed to be linear. The response, handling time
Ft  and the explanatory variable, prey mass, were log transformed to obtain
Ft  normal error distributions. The lme function of the nlme package for R was
Ft  used to fit linear mixed-effects (LME) models, using restricted maximum
Ft  likelihood (REML). Model selection were computed by using the Akaike
Ft  Information Criteria (AIC). The overall test of the model was conducted 
with
Ft  ANOVA.
Ft 
Ft 
Ft 
Ft  Regards Kes
Ft 

Check out citation function. This can be used to find
the proper citation for R and for packages that have
included a citation. For example:

citation(package=MASS)

Citing the url is usually a matter of journal style. Some
journals now  want the date that the url was visited as part
of the citation.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to search for packages

2008-02-04 Thread hadley wickham
On Feb 4, 2008 12:17 PM, Dirk Eddelbuettel [EMAIL PROTECTED] wrote:

 On 4 February 2008 at 10:03, hadley wickham wrote:
 | Before Christmas I started working on a solution for this -
 | http://crantastic.org - a site for searching, reviewing and tagging R
 | packages.  Unfortunately I've run out of steam lately (and the lack of
 | a 64-bit ubuntu package for R means it's a bit out of date), but the

 Err, that got fixed early (or mid-) January. See
http://cran.r-project.org/bin/linux/ubuntu/
 and/or the README therein:

UBUNTU PACKAGES FOR R

R packages for Ubuntu on i386 and amd64 are available. The plans are to
support at least the latest two Ubuntu releases and the latest LTS
release. As of January 2008, these are Gutsy Gibbon (7.10), Feisty Fawn
(7.04) and Dapper Drake (6.06), respectively.

 Kudos goes to Vincent and Michael for their work on Ubuntu packages based on
 our Debian packages.

 Maybe that can help crantastic get back on track. If on the other hand the
 dreadful winter sapped your energy, well then I'm out of options for help. :)

Ooops, I must have missed that.  I'll try again later today.

Hadley


-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to search for packages

2008-02-04 Thread hadley wickham
 The real answer was Task Views on CRAN (most of the OQs topics *are* already
 Task Views), so crantastic is very partial. If you have a little time and want

I think crantastic and task views solve somewhat different problems
(although I agree that crantastic should mirror the task views too).
Task views are of considerable effort to set up, and often written by
one person.  Tags on crantastic are dead simple to set up and can be
contributed to by multiple people (although I'm not yet sure what to
do about potential conflicts)

 to really draw in the masses, try doing clickable image maps from the
 Bioconductor pkgDepTools:

 http://bioconductor.org/packages/2.1/bioc/html/pkgDepTools.html

 because some of the unobtrusive short-name packages are key nodes in package
 dependency graphs. The dependency trees are very illuminating. Automating the
 updates would be positive. If you could also run against the Task View 
 listings,

That's a good idea.  I do mean to add dependency links in both
direction, and to automate the updates I just need to get a cron job
set up.

Hadley




 Roger

  The intent isn't to provide large in-depth reviews, but to make it
  easy to add comments like yours above.  I agree it would be also nice
  to have some central place that lists all articles which cite a given
  R package, but that would be a lot of work.
 
  Hadley
 


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] adding the mean and standard deviation to boxplots

2008-02-04 Thread HBaize

There are many ways to do it. The following will place a blue point on the
boxplot at the mean, then print the mean at the bottom of the plot. In some
plots I've gone too far and included median points and values as well. You
could also put 95% CI on the same plot, but it would get perhaps too busy.


# Plot boxplot of vitamin C subset
bx - boxplot(len ~ dose, data = ToothGrowth,
boxwex = 0.25, at = 1:3 - 0.2,
subset = supp == VC, col = yellow,
main = Guinea Pigs' Tooth Growth,
xlab = Vitamin C dose mg,
ylab = tooth length, ylim = c(0, 35), yaxs = i)

# keep location
at - c(1:length(bx$names))

# find means, plot as points
SubTc -subset(ToothGrowth, supp == VC)
means - by(SubTc$len, SubTc$dose, mean,na.rm=TRUE)
points(at - 0.2, means, pch = 19, col = blue)

# write mean values 
text(at - 0.1, 1, labels = formatC(means, format = f, digits = 1),
pos = 2, cex = 1, col = red)

# Orange juice subset
boxplot(len ~ dose, data = ToothGrowth, add = TRUE,
boxwex = 0.25, at = 1:3 + 0.2,
subset = supp == OJ, col = orange)

# find means, plot as points
SubTo -subset(ToothGrowth, supp == OJ)
meano - by(SubTo$len, SubTo$dose, mean,na.rm=TRUE)
points(at + 0.2,meano, pch = 19, col = blue)

# write mean values 
text(at + 0.3, 1, labels = formatC(meano, format = f, digits = 1),
pos = 2, cex = 1, col = orange)

legend(2, 9, c(Ascorbic acid, Orange juice),
   fill = c(yellow, orange))
--   
THT, I'm sure my R code is not as efficient as it could be. 

Harold Baize
Butte County Department of Behavioral Health




Tom Cohen-2 wrote:
 
 Dear list, 

   How can I add the mean and standard deviation to each of the boxplots
 using the example provided  in the boxplot function?
   
 boxplot(len ~ dose, data = ToothGrowth,
 boxwex = 0.25, at = 1:3 - 0.2,
 subset = supp == VC, col = yellow,
 main = Guinea Pigs' Tooth Growth,
 xlab = Vitamin C dose mg,
 ylab = tooth length, ylim = c(0, 35), yaxs = i)
 boxplot(len ~ dose, data = ToothGrowth, add = TRUE,
 boxwex = 0.25, at = 1:3 + 0.2,
 subset = supp == OJ, col = orange)
 legend(2, 9, c(Ascorbic acid, Orange juice),
fill = c(yellow, orange))
 
 Thanks for any help,
   Tom
 

 -
 
 Jämför pris på flygbiljetter och hotellrum:
 http://shopping.yahoo.se/c-169901-resor-biljetter.html
   [[alternative HTML version deleted]]
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/adding-the-mean-and-standard-deviation-to-boxplots-tp15271398p15278974.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] extracting AIC scores from lmer and other objects

2008-02-04 Thread Jarrett Byrnes
I have a slight conundrum.  I'm attempting to write a scrip that will  
take a number of objects (lm, glm, and lmer) and return AIC scores  
and weights.  I've run into 3 problems, and was wondering if anyone  
had any pointers.

1) is there any convenient way to extract the name of the objects?   
Simply, if I have a vector of objects c(my.lm, my.lmer) and I want to  
get a character vector c(my.lm, my.lmer), how would one do this?   
I see this as far simpler than jumping into the ugly details of each  
object type and extracting coefficient lists and such - not to  
mention tidier.

2) I'm repeatedly getting the error

Error in UseMethod(logLik) : no applicable method for logLik

in a variety of different contexts.  The first is if I have to get  
an AIC for an lmer object.  AIC(my.lmer) give me the error above.   
However, I can circumvent this with a very silly solution -

myAIC-function(object) {a-logLik(object)
return(-2*a[1] +2*attr(a, 'df'))}

I use this, and I do not get an error.

3) I do, however, get the above error if I have a vector of model  
objects.  So, again, if I have something like model.list-c(my.lm,  
my.lmer) or even just c(my.lm, my.lm2) and then call the following on  
the vector of models

aiclist-vector
for(index in 1:length(model.list)){
aiclist-c(aiclist, myAIC(model.list[index]))
}

it again yields the Error in UseMethod(logLik).  Given that this is  
true either for lm, glm, or lmer objects, I'm guessing there's a more  
general issue here that I'm missing.  Any pointers?

Thanks!

-Jarrett





Jarrett Byrnes
Population Biology Graduate Group, UC Davis
Bodega Marine Lab
707-875-1969
http://www-eve.ucdavis.edu/stachowicz/byrnes.shtml


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to search for packages

2008-02-04 Thread Tobias Sing
Hadley,

On Feb 4, 2008 5:03 PM, hadley wickham [EMAIL PROTECTED] wrote:
 [...]
 Before Christmas I started working on a solution for this -
 http://crantastic.org - a site for searching, reviewing and tagging R
 packages.  Unfortunately I've run out of steam lately (and the lack of
 a 64-bit ubuntu package for R means it's a bit out of date), but the
 basic ideas are there.  If you like how the site is looking so far
 please let me know, as it will be motivation for me to get the site
 finished.

it's just amazing how you still find some time for things besides
ggplot2 et al... Appreciating all your work so far, it'd be great to
keep you motivated for crantastic as well.

I think crantastic would be a good complement to task views:
- more likely to be up-to-date when used by many people
- reflecting the opinion/experiences of various users rather than that
of a single task view maintainer

Some additional benefit could be gained by adding a rating system and
by allowing packages to be sorted by number of comments a package has
received, average rating, etc. Such sorting options could even be
useful within a view for certain selected tags, because I guess some
tags - e.g. 'graphics' - might ultimately be given to a large number
of packages).

Cheers,
  Tobias

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] C-index

2008-02-04 Thread bereket weldeslassie
Hi

I am using Cox regression to identify at risk groups. How can I get the
C-index in R?

Thanks,

Bereket

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] C-index

2008-02-04 Thread bereket weldeslassie
Hi

I am using Cox regression to identify at risk groups. How can I get the
C-index in R?

Thanks,

Bereket
'[EMAIL PROTECTED]'

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] using image to show RGB image data ?

2008-02-04 Thread B. Bogart
Hello all,

I'm now using image() to show image data (in my case dumps of SOM
weights) but would like to show RGB colour data, not just single z
colour values.

I've currently been using seq() to skip 4 values, so I can show the R, G
or B channels separately as z. But is there a way I can show all three
channels simultaneously as a proper colour image?

Thanks,
B. Bogart

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] gc() and memory efficiency

2008-02-04 Thread Doran, Harold
I have a program which reads in a very large data set, performs some analyses, 
and then repeats this process with another data set. As soon as the first set 
of analyses are complete, I remove the very large object and clean up to try 
and make memory available in order to run the second set of analyses. The 
process looks something like this:

1) read in data set 1 and perform analyses
rm(list=ls())
gc()
2) read in data set 2 and perform analyses
rm(list=ls())
gc()
...

But, it appears that I am not making the memory that was consumed in step 1 
available back to the OS as R complains that it cannot allocate a vector of 
size X as the process tries to repeat in step 2. 

So, I close and reopen R and then drop in the code to run the second analysis. 
When this is done, I close and reopen R and run the third analysis. 

This is terribly inefficient. Instead I would rather just source in the R code 
and let the analyses run over night.

Is there a way that I can use gc() or some other function more efficiently 
rather than having to close and reopen R at each iteration?

I'm using Windows XP and r 2.6.1

Harold

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gc() and memory efficiency

2008-02-04 Thread Dirk Eddelbuettel

On 4 February 2008 at 20:45, Doran, Harold wrote:
| I have a program which reads in a very large data set, performs some 
analyses, and then repeats this process with another data set. As soon as the 
first set of analyses are complete, I remove the very large object and clean up 
to try and make memory available in order to run the second set of analyses. 
The process looks something like this:
| 
| 1) read in data set 1 and perform analyses
| rm(list=ls())
| gc()
| 2) read in data set 2 and perform analyses
| rm(list=ls())
| gc()
| ...
| 
| But, it appears that I am not making the memory that was consumed in step 1 
available back to the OS as R complains that it cannot allocate a vector of 
size X as the process tries to repeat in step 2. 
| 
| So, I close and reopen R and then drop in the code to run the second 
analysis. When this is done, I close and reopen R and run the third analysis. 
| 
| This is terribly inefficient. Instead I would rather just source in the R 
code and let the analyses run over night.
| 
| Is there a way that I can use gc() or some other function more efficiently 
rather than having to close and reopen R at each iteration?

I haven't found one. 

Every (trading) I process batches of data with R, and the only reliable way I
have found is to use fresh R sessions.  Otherwise, the fragmented memory will
eventually result in the all-too-familiar 'cannot allocate X mb' for rather
small values of X relative to my total ram. C'est la vie.

As gc() seems to help somewhat yet not 'sufficiently', fresh starts are an
alternative help, And Rscript starts faster than the main R. Now, I happen to
be partial to littler [1] which starts even faster, so I use that ( on Linux
and am not sure if it can be built on Windows as we embed R directly and
hence start faster than Rscript).  But either one should help you with some
batch files -- given you a way to run overnight.  And once you start batching
things, it is only a small step to regain efficiency by parallel execution
using something like MPI or NWS

Hth, Dirk

[1] littler is the predecessor to Rscript by Jeff and myself. See either 
http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/LittleR
or 
http://dirk.eddelbuettel.com/code/littler.html
for more on littler and feel free to email us.

-- 
Three out of two people have difficulties with fractions.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using image to show RGB image data ?

2008-02-04 Thread hadley wickham
 I'm now using image() to show image data (in my case dumps of SOM
 weights) but would like to show RGB colour data, not just single z
 colour values.

You can do this fairly readily with ggplot2:

install.packages(ggplot2)
library(ggplot2)
qplot(x, y, data=mydata, fill=rgb, geom=tile) + scale_fill_identity()

(assuming that your variable containing the rgb colour is called rgb)

If your data is originally in the matrix form used by image, see the
examples on http://had.co.nz/ggplot2/geom_tile.html on how to change
to the data.frame form used by ggplot.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Processing dates and generating sequences of dates

2008-02-04 Thread Gabor Grothendieck
Here is another solution.  It uses only R core
functions.  Note that as.Date(cut(x, years))
gives Jan 1 of x's year where x is of class Date:

y - as.Date(cut(range(my.dates), years)) + c(0, 364)
seq(y[1], y[2], months)

On Mon, Feb 4, 2008 at 1:00 PM, Gabor Grothendieck
[EMAIL PROTECTED] wrote:
 OK. Here are two zoo solutions and one using only core
 functions.

 In the first dd is the first of the month of Jan
 and month of Dec of the first and last year respectively
 so we apply seq.Dates to that.

 In the second yr is Jan of the first and Jan of the
 last year as yearmon class and we adjust, apply seq and
 convert that to Date.

 In the last we convert to POSIXlt and update the internals
 of that object finally converting back to Date and using seq.

 library(zoo)
 dd - as.Date(floor(as.yearmon(range(my.dates))) + c(0, 11/12))
 seq(dd[1], dd[2], by = month)

 library(zoo)
 yr - floor(as.yearmon(range(my.dates)))
 as.Date(seq(yr[1], yr[2]+11/12, by = 1/12))

 dd - as.POSIXlt(range(my.dates))
 dd$mon - c(0, 11)
 dd$mday - c(1, 1)
 dd - as.Date(dd)
 seq(dd[1], dd[2], by = month)


 On Feb 4, 2008 12:09 PM, Gavin Simpson [EMAIL PROTECTED] wrote:
  On Mon, 2008-02-04 at 10:48 -0500, Gabor Grothendieck wrote:
   Using zoo's yearmon class:
  
   library(zoo)
   my.dates[!duplicated(as.yearmon(my.dates))]
  
   or, although you seem to disallow this in your question,
   this would be an option:
  
   my.dates[!duplicated(format(my.dates, %Y-%m))]
 
  Ah, actually, I spoke too soon. Your solutions return the following
 
   my.dates[!duplicated(format(my.dates, %Y-%m))]
   [1] 2005-05-01 2005-06-01 2005-07-01 2005-08-01 2005-09-01
   [6] 2005-10-01 2005-11-01 2005-12-01 2006-01-01 2006-02-01
  [11] 2006-03-01 2006-04-01 2006-05-01 2006-06-01 2006-07-01
  [16] 2006-08-01 2006-09-01 2006-10-01 2006-11-01 2006-12-01
  [21] 2007-01-01 2007-02-01 2007-03-01 2007-04-01 2007-05-01
  [26] 2007-06-01 2007-07-01
 
  which gives only the months sampled. What I need is a vector of length
  36 covering 1st Jan 2005 through 31st Dec 2007 as in (using the seq()
  call in my original email):
 
   new.dates
   [1] 2005-01-01 2005-02-01 2005-03-01 2005-04-01 2005-05-01
   [6] 2005-06-01 2005-07-01 2005-08-01 2005-09-01 2005-10-01
  [11] 2005-11-01 2005-12-01 2006-01-01 2006-02-01 2006-03-01
  [16] 2006-04-01 2006-05-01 2006-06-01 2006-07-01 2006-08-01
  [21] 2006-09-01 2006-10-01 2006-11-01 2006-12-01 2007-01-01
  [26] 2007-02-01 2007-03-01 2007-04-01 2007-05-01 2007-06-01
  [31] 2007-07-01 2007-08-01 2007-09-01 2007-10-01 2007-11-01
  [36] 2007-12-01
 
  This just seems a bit kludgy:
 
  new.dates - seq(as.Date(paste(format(min(my.dates), format = %Y),
1, 1, sep = /)),
  as.Date(paste(format(max(my.dates), format = %Y),
12, 31, sep = /)),
  by = months)
 
  but perhaps there isn't a better way?
 
  Cheers,
 
 
  G
 
  
  
   On Feb 4, 2008 10:39 AM, Gavin Simpson [EMAIL PROTECTED] wrote:
hits=-2.6 tests=BAYES_00
X-USF-Spam-Flag: NO
   
Dear List,
   
Say I have the following sequence of dates [*]:
   
start - as.Date(2005-01-05, format = %Y-%d-%m)
end - as.Date(2007-10-07, format = %Y-%d-%m)
my.dates - seq(start, end, by = days)
   
What I would like to generate is a sequence of dates, by month that goes
from the first day of 2005 (the year of the start date) to the last day
of 2007 (the year of the end date), so that the output is a vector of 36
dates containing all months of the three calendar years that the
sampling spanned.
   
I could do it via manipulation of dates as so:
   
new.dates - seq(as.Date(paste(format(min(my.dates), format = %Y),
  1, 1, sep = /)),
as.Date(paste(format(max(my.dates), format = %Y),
  12, 31, sep = /)),
by = months)
   
And then manipulate that to get only the month and year parts of the 36
generated dates.
   
This doesn't seem very elegant to me. Is there a better/easier way than
converting back and forth between characters and objects of class
Date?
   
Many thanks,
   
G
   
[*] FWIW, my actual application is similar to my.dates, but not sampled
every day - indeed there are months where there are no samples - and I
am trying to do a levelplot of the numbers of observations per month,
per year. Given the following data
   
dat - data.frame(temp = rnorm(length(my.dates)),
 my.dates = my.dates)
dat$year - as.numeric(format(dat$my.dates, format = %Y))
dat$month - format(dat$my.dates, format = %b)
dat$month - factor(dat$month, levels = c(Jan,Feb,Mar,Apr,
 May,Jun,Jul,Aug,
 Sep,Oct,Nov,Dec))
   
I can get a table of the number of observations per month per year via
   

Re: [R] gc() and memory efficiency

2008-02-04 Thread Prof Brian Ripley
1) See ?Memory-limits: it is almost certainly memory fragmentation.
You don't need to give the memory back to the OS (and few OSes actually do 
so).

2) I've never seen this running a 64-bit version of R.

3) You can easily write a script to do this.  Indeed, you could write an R 
script to run multiple R scripts in separate processes in turn (via
system(Rscript fileN.R) ).   For example. Uwe Ligges uses R to script 
building and testing of packages on Windows.

On Mon, 4 Feb 2008, Doran, Harold wrote:

 I have a program which reads in a very large data set, performs some 
 analyses, and then repeats this process with another data set. As soon 
 as the first set of analyses are complete, I remove the very large 
 object and clean up to try and make memory available in order to run the 
 second set of analyses. The process looks something like this:

 1) read in data set 1 and perform analyses
 rm(list=ls())
 gc()
 2) read in data set 2 and perform analyses
 rm(list=ls())
 gc()
 ...

 But, it appears that I am not making the memory that was consumed in 
 step 1 available back to the OS as R complains that it cannot allocate a 
 vector of size X as the process tries to repeat in step 2.

 So, I close and reopen R and then drop in the code to run the second 
 analysis. When this is done, I close and reopen R and run the third 
 analysis.

 This is terribly inefficient. Instead I would rather just source in the 
 R code and let the analyses run over night.

 Is there a way that I can use gc() or some other function more 
 efficiently rather than having to close and reopen R at each iteration?

 I'm using Windows XP and r 2.6.1

 Harold

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Warnings from apply longer object length is not a multiple of shorter object length

2008-02-04 Thread Ng Stanley
Hi,

What was I doing wrong, such that B) gave warning messages ? I want the
computation of thr to be outside the apply function.

A) Uses a simple matrix of 9 elements. No warning messages.

data2_1 - matrix (c(1,2,3,NA,4,5,NA,NA,6), 3,3)
mean - colMeans(data2_1, na.rm = TRUE)
sd - sd(data2_1, na.rm = TRUE)

for (i in 1:2) {
 thr - mean + i*sd

 num - apply(data2_1, 2, function(x) {
temp - thr
sum(x  (temp), na.rm = TRUE)})
}

B) Replace data2_1 with a larger matrix, gives 20 warning messages i.e., In
x  (temp) : = longer object length is not a multiple of shorter object
length
data2_1 - matrix(c(0.9584190, 0.2710325, -0.9495618, -0.1301772, -0.7539687,
0.5344464, -0.8205933, 0.1581723, -0.5351588, 0.04448065, 0.9936430,
0.2278786, -0.8160700, -0.3314779, -0.4047975, 0.1168152, -0.7458182, -
0.2231588, -0.5051651, -0.74871174, 0.9450363, 0.4797723, -0.9033313, -
0.5825065, 0.8523742, 0.7402795, -0.7134312, -0.8162558, 0.6345438, -
0.05704138), 3,10)
mean - colMeans(data2_1, na.rm = TRUE)
sd - sd(data2_1, na.rm = TRUE)

for (i in 1:2) {
 thr - mean + i*sd

 num - apply(data2_1, 2, function(x) {
temp - thr
sum(x  (temp), na.rm = TRUE)})
}

C) To remove the 20 warning messages, I have to replace temp - thr
with  temp - mean + i*sd
data2_1 - matrix(c(0.9584190, 0.2710325, -0.9495618, -0.1301772, -0.7539687,
0.5344464, -0.8205933, 0.1581723, -0.5351588, 0.04448065, 0.9936430,
0.2278786, -0.8160700, -0.3314779, -0.4047975, 0.1168152, -0.7458182, -
0.2231588, -0.5051651, -0.74871174, 0.9450363, 0.4797723, -0.9033313, -
0.5825065, 0.8523742, 0.7402795, -0.7134312, -0.8162558, 0.6345438, -
0.05704138), 3,10)
mean - colMeans(data2_1, na.rm = TRUE)
sd - sd(data2_1, na.rm = TRUE)

for (i in 1:2) {
 thr - mean + i*sd

 num - apply(data2_1, 2, function(x) {
temp - mean + i*sd
sum(x  (temp), na.rm = TRUE)})
}

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.