Re: [R] aparchFit()$fitted.value

2004-09-23 Thread Uwe Ligges
Lisa wrote:
Dear R people,
I'm not able to have the component residuals, fitted.value from an
aparchFit() estimation as explain in the Value of aparchFit Help, package
fSeries.
OK, let's try together:
# starting with the second example on the cited help page:
 temp - aparchFit(ts,
   order = list(alpha.lags = 1, beta.lags = c(1, 5), delta = 1),
   opt = list(gamma=FALSE, delta=FALSE, disparm=FALSE),
   doprint=FALSE)
 temp$fitted.values # Hmmm ..
NULL
 str(temp) # Ahh!
List of 13
 $ par: num [1:4] 12.807 -1.833 -1.174 -0.687
 $ value  : num -51190
 $ counts : Named int [1:2] 167 NA
  ..- attr(*, names)= chr [1:2] function gradient
 $ convergence: int 0
[SNIP]
 - attr(*, class)= chr fAPARCH
Obviously, this is a documentation bug. Please send bug reports re. 
contributed packages to the package maintainer!

I'm CCing to the maintainer, Diethelm Wuertz.
Uwe Ligges

Could someone help me?
Thanks in advance.
Lisa
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Issue with predict() for glm models

2004-09-23 Thread Uwe Ligges
[EMAIL PROTECTED] wrote:
Hello everyone, 

I am having a problem using the predict (or the predict.glm) function in R.
Basically, I run the glm model on a training data set and try to obtain
predictions for a set of new predictors from a test data set (i.e., not the
predictors that were utilized to obtain the glm parameter estimates).
Unfortunately, every time that I attempt this, I obtain the predictions for the
predictors that were used to fit the glm model. I have looked at the R mailing
list archives and don't believe I am making the same mistakes that have been
made in the past and also have tried to closely follow the predict.glm example
in the help file. Here is an example of what I am trying to do: 


set.seed(545345)

# Necessary Variables # 


p - 2
train.n - 20
test.n - 25 
mean.vec.1 - c(1,1)
mean.vec.2 - c(0,0)

Sigma.1 - matrix(c(1,.5,.5,1),p,p)
Sigma.2 - matrix(c(1,.5,.5,1),p,p)
###
# Load MASS Library #
###
library(MASS)
###
# Data to Parameters for Logistic Regression Model #
###
train.data.1 - mvrnorm(train.n,mu=mean.vec.1,Sigma=Sigma.1)
train.data.2 - mvrnorm(train.n,mu=mean.vec.2,Sigma=Sigma.2)
train.class.var - as.factor(c(rep(1,train.n),rep(2,train.n)))
predictors.train - rbind(train.data.1,train.data.2)
##
# Test Data Where Predictions for Probabilities Using Logistic Reg.  #
# From Training Data are of Interest  #
## 

test.data.1 - mvrnorm(test.n,mu=mean.vec.1,Sigma=Sigma.1)
test.data.2 - mvrnorm(test.n,mu=mean.vec.2,Sigma=Sigma.2)
predictors.test - rbind(test.data.1,test.data.2)
##
# Run Logistic Regression on Training Data #
##
log.reg - glm(train.class.var~predictors.train,
family=binomial(link=logit))
Well, you haven't specified the data argument, but given the two 
variables directly. Exactly those variables will be used in the 
predict() step below! If you want the predict() step to work, use 
something like:

  train - data.frame(class = train.class.var,
  predictors = predictors.train)
  log.reg - glm(class ~ ., data = train,
 family=binomial(link=logit))

log.reg
# log.reg
#Call:  glm(formula = train.class.var ~ predictors.train, family =
#binomial(link = logit)) 
#
#Coefficients:
#  (Intercept)  predictors.train1  predictors.train2  
#   0.5105-0.2945-1.0811  
#
#Degrees of Freedom: 39 Total (i.e. Null);  37 Residual
#Null Deviance:  55.45 
#Residual Deviance: 41.67AIC: 47.67 

###
# Predicted Probabilities for Test Data #
###
New.Data - data.frame(predictors.train1=predictors.test[,1],
predictors.train2=predictors.test[,2])
logreg.pred.prob.test - predict.glm(log.reg,New.Data,type=response)
logreg.pred.prob.test
Instead, use:
  test - data.frame(predictors = predictors.test)
  predict(log.reg, newdata = test, type=response)
note also: please call the generic predict() rather than its glm method.
Uwe Ligges

#logreg.pred.prob.test
# [1] 0.51106406 0.15597423 0.04948404 0.03863875 0.35587589 0.71331091
# [7] 0.17320087 0.14176632 0.30966718 0.61878952 0.12525988 0.21271139
#[13] 0.70068113 0.18340723 0.10295501 0.44591568 0.72285161 0.31499339
#[19] 0.65789420 0.42750139 0.14435889 0.93008117 0.70798465 0.80109005
#[25] 0.89161472 0.47480625 0.56520952 0.63981834 0.57595189 0.60075882
#[31] 0.96493393 0.77015507 0.87643986 0.62973986 0.63043351 0.45398955
#[37] 0.80855782 0.90835588 0.54809117 0.11568637

Of course, notice that the vector for the predicted probabilities has only 40
elements, while the New.Data has 50 elements (since n.test has 25 per group
for 2 groups) and thus should have 50 predicted probabilities. As it turns out,
the output is for the training data predictors and not for the New.Data as I
would like it to be. I should also note that I have made sure that the names
for the predictors in the New.Data are the same as the names for the
predictors within the glm object (i.e., within log.reg) as this is what is
done in the example for predict.glm() within the help files. 

Could some one help me understand either what I am doing incorrectly or what
problems there might be within the predict() function? I should mention that I
tried the same program using predict.glm() and obtained the same problematic
results. 

Thanks and take care, 

Joe 

Joe Rausch, M.A. 
Psychology Liaison 
Lab for Social Research 
917 Flanner Hall 
University of Notre Dame 
Notre Dame, IN 46556
(574) 631-3910

If we knew what it was we were doing, it would not be called research, would
it?
- Albert Einstein

Re: [R] Facing problems with C code compilation - Please help.

2004-09-23 Thread Peter Dalgaard
Liaw, Andy [EMAIL PROTECTED] writes:

 Read and follow the instructions in c:\Program
 Files\R\rw1091\README.packages _very_, _very_ carefully.  Stray from it even
 a bit and you get what you deserve.

Well, not what you want, anyway... (In this case, the first problem
seems to be that you need make.exe from the toolkit *and* to have it
in your path.)
 
 Andy
 
  From: neha chaudhry
  
  Hello,
  
  I started using R a month ago - so I am a novice in this 
  area. I am stuck with a problem and need some help urgently.
  I am using windows version of R 1.9.1. I am trying to compile 
  C code in it. I have my C code - hello.c is lying in 
  C:\Program Files\R\rw1091
  
  This code is - 
  
  #include R.h
  void hello(int *n)
  {
int i;
for(i=0;i *n; i++)
{
   Rprintf(Hello World ! \n);
}
  }
  
  ===
  Code hello1.R is also lying in the same directory.
  This code is -
  hello2 - function(n)
   {
 .C(hello, as.integer))

Make that as.integer(n) or things will surely come crashing down...

   }

-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] t test problem?

2004-09-23 Thread Ramon Diaz-Uriarte
On Wednesday 22 September 2004 13:07, Ted Harding wrote:
 On 22-Sep-04 kan Liu wrote:
  Hi, Many thanks for your helpful comments and suggestions. The attached
  are the data in both log10 scale and original scale. It would be very
  grateful if you could suggest which version of test should be used.
 
  By the way, how to check whether the variation is additive (natural
  scale) or multiplicative (log scale) in R? How to check whether the
  distribution of the data is normal?

 As for additive vs multiplicative, this can only be judged in terms
 of the process by which the values are created in the real world.


Just my 2 cents: I often find it helpful to ask myself (or the client) 
whether, if there was a difference (something) between the two samples, 
I/she/he thinks the appropriate model is (please, read the = as approx. 
equal)

sample.1 = sample.2 + something [1]

OR

sample.1 = sample.2 * something [2]

(i.e., the ratio of means is a constant: sample.1/sample.2 = something)

which, by log transforming becomes

log(sample.1) = log(sample.2) + log(something)

I am not including here the issue of error distribution, but often times when 
the model for the means is like [2] the error terms are multiplicative (i.e., 
additive in the log scale). At least in many biological and engineering 
problems it is often evident whether [1] or [2] should be appropriate for the 
data, given what we know about the subject.

Best,

R.

 As for normality vs non-normality, an appraisal can often be made
 simply by looking at a histogram of the data.

 In your case, the commands
   hist(x,breaks=1*(0:100))
   hist(y,breaks=1*(0:100))
 indicate that the distributions of x and y do not look at all
 normal, since they both have considerable positive skewness
 (i.e. long upper tails relative to the main mass of the distribution).

 This does strongly suggest that a logarithmic transformation would
 give data which are more nearly normally distributed, as indeed
 is confirmed by the commands
   hist(log(x))
   hist(log(y))
 though in both cases the histograms show some irregularity compared
 with what you would expect from a sample from a normal distribution:
 the commands
   hist(log(x),breaks=0.2*(40:80))
   hist(log(y),breaks=0.2*(40:80))
 show that log(x) has an excessive peak at around 11.7,
 while log(y) has holes at around 11.1 and 12.1.

 Nevertheless, this inspection of the data shows that the use of
 log(x) and log(y) will come much closer to fulfilling the conditions
 of validity of the t test than using the raw data x and y.

 However, it is not merely the *normality* of each which is needed:
 the conditions for the usual t test also require that the two
 populations sampled for log(x) and log(y) should have the same
 standard deviations. In your case, this also turns out to be

 nearly enough true:
sd(log(x))

   [1] 0.902579

sd(log(y))

   [1] 0.9314807

  PS, Can I confirm that do your suggestions mean that in order to check
  whether there is a difference between x and y in terms of mean I need
  check the distribution of x and that of y in both natual and log scales
  and to see which present normal distribution?

 See above for an approach to this: the answer to your question is,
 in effect, yes. It could of course have happened that neither the
 raw nor the log scale would be satisfactory, in which case you would
 need to consider other possibilities. And, if the SDs had turned out
 to be very different, you should not use the standard t test but
 a variant which is adpated to the situation (e.g. the Welch test).

 You can, of course, also perform formal tests for skewness, for
 normality, and for equality of variances.

 Best wishes,
 Ted.


 
 E-Mail: (Ted Harding) [EMAIL PROTECTED]
 Fax-to-email: +44 (0)870 094 0861   [NB: New number!]
 Date: 22-Sep-04   Time: 12:07:07
 -- XFMail --

 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html

-- 
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Best device for printing quality

2004-09-23 Thread javier garcia - CEBAS
Hi all;
Just to ask you for your advise about what is the best way to get the best 
quality for graphics to be incorporated into a printed article

(I'm mainly a Linux useR, but also use the windows R version)

Thanks and best regards,

Javier Garcia

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Best device for printing quality

2004-09-23 Thread Wolski
Hi!

I would first check what kind of graphic format the article publisher of the article 
accepts.

If ps/pdf is ok I would use ?postscript. Otherwise I would change the journal.

/E

*** REPLY SEPARATOR  ***

On 9/24/2004 at 11:12 AM javier garcia - CEBAS wrote:

Hi all;
Just to ask you for your advise about what is the best way to get the
best 
quality for graphics to be incorporated into a printed article

(I'm mainly a Linux useR, but also use the windows R version)

Thanks and best regards,

Javier Garcia

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html



Dipl. bio-chem. Witold Eryk Wolski @ MPI-Moleculare Genetic   
Ihnestrasse 63-73 14195 Berlin'v'
tel: 0049-30-83875219/   \   
mail: [EMAIL PROTECTED]---W-Whttp://www.molgen.mpg.de/~wolski 
  [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] aparchFit()$fitted.value

2004-09-23 Thread Lisa
...unfortunately I was suspicious of this.
Could You give me some insight to obtain that values?

Thanks a lot!
Lisa

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] block statistics with POSIX classes

2004-09-23 Thread Gabor Grothendieck
Kahra Hannu kahra at mpsgr.it writes:

: 
: I have a monthly price index series x, the related return series y = diff(log
(x)) and a POSIXlt date-time
: variable dp. I would like to apply annual blocks to compute for example 
annual block maxima and mean of y.
: 
: When studying the POSIX classes, in the first stage of the learning curve, I 
computed the maximum drawdown
: of x:
:  mdd - maxdrawdown(x)
:  max.dd - mdd$maxdrawdown
:  from - as.character(dp[mdd$from]) 
:  to - as.character(dp[mdd$to])   
:  from; to
: [1] 2000-08-31
: [1] 2003-03-31
: that gives me the POSIX dates of the start and end of the period and 
suggests that I have done something correctly.
: 
: Two questions:
: (1) how to implement annual blocks and compute e.g. annual max, min and mean 
of y (each year's max, min, mean)?
: (2) how to apply POSIX variables with the 'block' argument in gev in the 
evir package?
: 
: The S+FinMetrics function aggregateSeries does the job in that module; but I 
do not know, how handle it in R.
: My guess is that (1) is done by using the function aggregate, but how to 
define the 'by' argument with POSIX variables?


1. To create a ts monthly time series you specify the first month
and a frequency of 12 like this.  

z.m - ts(rep(1:6,4), start = c(2000,1), freq = 12)
z.m

# Annual aggregate is done using aggregate.ts with nfreq = 1
z.y - aggregate(z.m, nfreq = 1, max)
z.y

# To create a POSIXct series of times using seq
# (This will use GMT.  Use tz= arg to ISOdate if you want current tz.)
z.y.times - seq(ISOdate(2000,1,1), length = length(z.y), by = year)
z.y.times

2. Have not used evir but looking at ?gev it seems you can
use block = 12 if you have monthly data and want the blocks to be 
successive 12 month periods or you can add a POSIXct times attribute to 
your data as below (also see comment re tz above) and then use 
block = year in your gev call.

attr(z.m, times) - seq(ISOdate(2000,1,1), length=length(z.m), by=month)
str(z.m)  # display z.m along with attribute info

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] multinomial logistic regression

2004-09-23 Thread David Firth
On 23 Sep 2004, at 01:36, array chip wrote:
Hi, how can I do multinomial logistic regression in R?
I think glm() can only handle binary response
variable, and polr() can only handle ordinal response
variable. how to do logistic regression with
multinomial response variable?
Perhaps multinom() in package nnet (in bundle VR) will do what you want?
Regards,
David
Thanks

__
Y! Messenger - Communicate in real time. Download now.
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] nnet with weights parameter: odd error

2004-09-23 Thread Christoph Lehmann
Dear R-users
I use nnet for a classification (2 classes) problem. I use the code 
CVnn1, CVnn2 as described in  VR.

The thing I changed to the code is: I define the (class) weight for each 
observation in each cv 'bag' and give the vector of weights as parameter 
of nnet(..weights = weight.vector...)

Unfortunately I get an error during some (but not all!) inner-fold cv runs:
Error in model.frame(formula, rownames, variables, varnames,
extras, extranames,  :
variable lengths differ
If you just remove the weights parameter in nnet() it runs fine!!
I debugged the code but could not resolve the problem- it is really very 
strange and I need your help! I tried it very simple in defining the 
weights as = 1 for each obs (as it is by default)!:

CVnn2 - function(formula, data,
  size = c(0,4,4,10,10), lambda = c(0, rep(c(0.001, 
0.01),2)),
  nreps = 1, nifold = 5, verbose = 99, ...)
{
resmatrix - function(predict.matrix, learn, data, ri, i)
{
   rae.matrix -   predict.matrix
   rae.matrix[,] - 0
   rae.vector - as.numeric(as.factor((predict(learn, data[ri == i,],
   type = class
   for (k in 1:dim(rae.matrix)[1]) {
 if (rae.vector[k] == 1)
 rae.matrix[k,1] - rae.matrix[k,1] + 1
 else
 rae.matrix[k,2] - rae.matrix[k,2] + 1
   }
   rae.matrix
}

CVnn1 - function(formula, data, nreps=1, ri, verbose,  ...)
{
totalerror - 0
truth - data[,deparse(formula[[2]])]
res -  matrix(0, nrow(data), length(levels(truth)))
if(verbose  20) cat(  inner fold)
for (i in sort(unique(ri))) {
if(verbose  20) cat( , i,  sep=)
data.training - data[ri != i,]$GROUP
weight.vector - rep(1, dim(data[ri !=i,])[1])
for(rep in 1:nreps) {
learn - nnet(formula, data[ri !=i,],
  weights = weight.vector,
  trace = F, ...)
#res[ri == i,] - res[ri == i,] + predict(learn, 
data[ri == i,])
res[ri == i,] - res[ri == i,] + resmatrix(res[ri == i,],
   learn, data, 
ri, i)
}
}
if(verbose  20) cat(\n)
sum(as.numeric(truth) != max.col(res/nreps))
}
truth - data[,deparse(formula[[2]])]
res -  matrix(0, nrow(data), length(levels(truth)))
choice - numeric(length(lambda))
for (i in sort(unique(rand))) {
if(verbose  0) cat(fold , i,\n, sep=)
set.seed(i*i)
ri - sample(nifold, sum(rand!=i), replace=T)
for(j in seq(along=lambda)) {
if(verbose  10)
cat(  size =, size[j], decay =, lambda[j], \n)
choice[j] - CVnn1(formula, data[rand != i,], nreps=nreps,
   ri=ri, size=size[j], decay=lambda[j],
   verbose=verbose, ...)
}
decay - lambda[which.is.max(-choice)]
csize - size[which.is.max(-choice)]
if(verbose  5) cat(  #errors:, choice,   ) #
if(verbose  1) cat(chosen size = , csize,
 decay = , decay, \n, sep=)
for(rep in 1:nreps) {
data.training - data[rand != i,]$GROUP
weight.vector - rep(1, dim(data[rand !=i,])[1])
learn - nnet(formula, data[rand != i,],
  weights = weight.vector,
  trace=F,
  size=csize, decay=decay, ...)
#res[rand == i,] - res[rand == i,] + predict(learn, 
data[rand == i,])
res[rand == i,] - res[rand == i,] + resmatrix(res[rand == 
i,],learn,data, rand, i)
}
}
factor(levels(truth)[max.col(res/nreps)], levels = levels(truth))
}


res.nn2 - CVnn2(GROUP ~ ., rae.data.subsetted1, skip = T, maxit = 500,
 nreps = cv.repeat)
con(true = rae.data.subsetted$GROUP, predicted = res.nn2)

###
Coordinates:
platform i686-pc-linux-gnu
arch i686
os   linux-gnu
system   i686, linux-gnu
status
major1
minor9.1
year 2004
month06
day  21
language R

Thanks a lot
Best regards
Christoph
--
Christoph LehmannPhone:  ++41 31 930 93 83
Department of Psychiatric NeurophysiologyMobile: ++41 76 570 28 00
University Hospital of Clinical Psychiatry   Fax:++41 31 930 99 61
Waldau[EMAIL PROTECTED]
CH-3000 Bern 60 http://www.puk.unibe.ch/cl/pn_ni_cv_cl_03.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] multinomial logistic regression

2004-09-23 Thread Pierre BADY
hi all,

you could have look at the library(VGAM).

http://www.stat.auckland.ac.nz/~yee/VGAM/


hope this helps


P.BADY


At 17:36 22/09/2004 -0700, array chip wrote:
Hi, how can I do multinomial logistic regression in R?
I think glm() can only handle binary response
variable, and polr() can only handle ordinal response
variable. how to do logistic regression with
multinomial response variable?

Thanks



__

Y! Messenger - Communicate in real time. Download now.

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Pierre BADY °)
Université Claude Bernard Lyon 1
UMR CNRS 5023, LEHF
bat Alphonse Forel
43 boulevard du 11 novembre 1918
F-69622 VILLEURBANNE CEDEX
FRANCE
TEL : +33 (0)4 72 44 62 34
FAX : +33 (0)4 72 43 28 92
MEL : [EMAIL PROTECTED]
http://limnologie.univ-lyon1.fr
[[alternative HTML version deleted]]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] ordered probit and cauchit

2004-09-23 Thread Kjetil Brinchmann Halvorsen
roger koenker wrote:
What is the current state of the R-art for ordered probit models, and 
more
esoterically is there any available R strategy for ordered cauchit 
models,
i.e. ordered multinomial alternatives with a cauchy link function.  MCMC
is an option, obviously, but for a univariate latent variable model 
this seems
to be overkill... standard mle methods should be preferable.  (??)

Googling reveals that spss provides such functions... just to wave a red
flag.
I find
polr(MASS) Ordered Logistic or Probit Regression
MCMCoprobit(MCMCpack)  Markov chain Monte Carlo for Ordered Probit
  Regression
and in Jim Lindsey's gnlm  there is
nordr   Nonlinear Ordinal Regression
ordglm  Generalized Linear Ordinal Regression
Kjetil
--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] FYI: DeveloperWorks News article on R

2004-09-23 Thread Pikounis, Bill

First of a 3 part series, apparently:

http://www-106.ibm.com/developerworks/linux/library/l-r1/?ca=dgr-lnxw02aAre
http://www-106.ibm.com/developerworks/linux/library/l-r1/?ca=dgr-lnxw02aAre
 

(Apologies if someone has already posted this link to R-help)

Best Regards,
Bill


Bill Pikounis, Ph.D.

Biometrics Research Department
Merck Research Laboratories
PO Box 2000, MailDrop RY33-300  
126 E. Lincoln Avenue
Rahway, New Jersey 07065-0900
USA

Phone: 732 594 3913
Fax: 732 594 1565

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] [Job Ad] Position at Merck Research Laboratories, NJ USA

2004-09-23 Thread Pikounis, Bill
Please direct *all* inquiries to Vladimir Svetnik, the hiring manager. His
contact information is below.

(Also, please accept my apologies, for cross-posting to those subscribed to
both R-help and S-news, and for any offense since this is a repost I made of
the same position month ago. That short time spacing will not be repeated.)

Thanks,
Bill



Job description:  Computational statistician/biometrician   

The Biometrics Research Department at Merck Research Laboratories, Merck 
Co., Inc. in Rahway, NJ, is seeking a highly motivated statistician/data
analyst to work in its basic research and drug discovery area.  The
applicant should have broad expertise in statistical computing.   Experience
and/or education relevant to signal processing, image processing, pattern
recognition and machine learning are preferred.  The position will involve
providing statistical, mathematical, and software development support for
one or more of following areas: medical imaging, biological signal analysis,
and computational chemistry.   We are looking for a Ph.D. with a background
and/or post-doctoral experience in at least one of the following fields:
Statistics, Electrical/Computer or Biomedical Engineering, Computer Science,
Applied Mathematics, or Physics.   Advanced computer programming skills
(including, but not limited to R, S-PLUS, Matlab, C/C++) and excellent
communication skills are essential. An ability to lead statistical analysis
efforts within a multidisciplinary team is required.   The position may also
involve general statistical consulting and training.

Our dedication to delivering quality medicines in innovative ways and our
commitment to bringing out the best in our people are just some of the
reasons why we're ranked among Fortune magazine's 100 Best Companies to
Work for in America.  We offer a competitive salary, an outstanding
benefits package, and a professional work environment with a company known
for scientific excellence.  To apply, please forward your CV or resume and
cover letter to

ATTENTION: Open Position  
Vladimir Svetnik, Ph.D.  
Biometrics Research Dept.  
Merck Research Laboratories
RY33-300
126 E. Lincoln Avenue
Rahway, NJ 07065-0900 
[EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Issue with predict() for glm models

2004-09-23 Thread John Fox
Dear Uwe,

Unless I've somehow messed this up, as I mentioned yesterday, what you
suggest doesn't seem to work when the predictor is a matrix. Here's a
simplified example:

 X - matrix(rnorm(200), 100, 2)
 y - (X %*% c(1,2) + rnorm(100))  0
 dat - data.frame(y=y, X=X)
 mod - glm(y ~ X, family=binomial, data=dat)
 new - data.frame(X = matrix(rnorm(20),2))
 predict(mod, new)
   12345
6 
  1.81224443  -5.92955128   1.98718051 -10.05331521   2.6506
-2.50635812 
   789   10   11
12 
  5.63728698  -0.94845276  -3.61657377  -1.63141320   5.03417372
1.80400271 
  13   14   15   16   17
18 
  9.32876273  -5.32723406   5.29373023  -3.90822713 -10.95065186
4.90038016 

 . . .

   97   98   99  100 
 -6.92509812   0.59357486  -1.17205723   0.04209578 


Note that there are 100 rather than 10 predicted values.

But with individuals predictors (rather than a matrix),

 x1 - X[,1]
 x2 - X[,2]
 dat.2 - data.frame(y=y, x1=x1, x2=x2)
 mod.2 - glm(y ~ x1 + x2, family=binomial, data=dat.2)
 new.2 - data.frame(x1=rnorm(10), x2=rnorm(10))
 predict(mod.2, new.2)
 1  2  3  4  5  6  7

 6.5723823  0.6356392  4.0291018 -4.7914650  2.1435485 -3.1738096 -2.8261585

 8  9 10 
-1.5255329 -4.7087592  4.0619290 

works as expected (?).

Regards,
 John
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Uwe Ligges
 Sent: Thursday, September 23, 2004 1:33 AM
 To: [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Subject: Re: [R] Issue with predict() for glm models
 
 [EMAIL PROTECTED] wrote:
 
  Hello everyone,
  
  I am having a problem using the predict (or the 
 predict.glm) function in R.
  Basically, I run the glm model on a training data set and try to 
  obtain predictions for a set of new predictors from a 
 test data set 
  (i.e., not the predictors that were utilized to obtain the 
 glm parameter estimates).
  Unfortunately, every time that I attempt this, I obtain the 
  predictions for the predictors that were used to fit the 
 glm model. I 
  have looked at the R mailing list archives and don't believe I am 
  making the same mistakes that have been made in the past 
 and also have 
  tried to closely follow the predict.glm example in the help 
 file. Here is an example of what I am trying to do:
  
  
  set.seed(545345)
  
  
  # Necessary Variables #
  
  
  p - 2
  train.n - 20
  test.n - 25
  mean.vec.1 - c(1,1)
  mean.vec.2 - c(0,0)
  
  Sigma.1 - matrix(c(1,.5,.5,1),p,p)
  Sigma.2 - matrix(c(1,.5,.5,1),p,p)
  
  ###
  # Load MASS Library #
  ###
  
  library(MASS)
  
  ###
  # Data to Parameters for Logistic Regression Model # 
  ###
  
  train.data.1 - mvrnorm(train.n,mu=mean.vec.1,Sigma=Sigma.1)
  train.data.2 - mvrnorm(train.n,mu=mean.vec.2,Sigma=Sigma.2)
  train.class.var - as.factor(c(rep(1,train.n),rep(2,train.n)))
  predictors.train - rbind(train.data.1,train.data.2)
  
  ##
  # Test Data Where Predictions for Probabilities Using 
 Logistic Reg.  #
  # From Training Data are of Interest
   #
  ##
  
  test.data.1 - mvrnorm(test.n,mu=mean.vec.1,Sigma=Sigma.1)
  test.data.2 - mvrnorm(test.n,mu=mean.vec.2,Sigma=Sigma.2)
  predictors.test - rbind(test.data.1,test.data.2)
  
  ##
  # Run Logistic Regression on Training Data # 
  ##
  
  log.reg - glm(train.class.var~predictors.train,
  family=binomial(link=logit))
 
 Well, you haven't specified the data argument, but given 
 the two variables directly. Exactly those variables will be 
 used in the
 predict() step below! If you want the predict() step to work, 
 use something like:
 
train - data.frame(class = train.class.var,
predictors = predictors.train)
log.reg - glm(class ~ ., data = train,
   family=binomial(link=logit))
 
 
 
  log.reg
  
  # log.reg
  
  #Call:  glm(formula = train.class.var ~ predictors.train, family = 
  #binomial(link = logit)) #
  #Coefficients:
  #  (Intercept)  predictors.train1  predictors.train2  
  #   0.5105-0.2945-1.0811  
  #
  #Degrees of Freedom: 39 Total (i.e. Null);  37 Residual
  #Null Deviance:  55.45 
  #Residual Deviance: 41.67AIC: 47.67 
  
  ###
  # Predicted Probabilities for Test Data # 
 ###
  
  New.Data - data.frame(predictors.train1=predictors.test[,1],
  predictors.train2=predictors.test[,2])
  
  logreg.pred.prob.test - 
 

Re: [R] Issue with predict() for glm models

2004-09-23 Thread Uwe Ligges
John Fox wrote:
Dear Uwe,
Unless I've somehow messed this up, as I mentioned yesterday, what you
suggest doesn't seem to work when the predictor is a matrix. Here's a
simplified example:

X - matrix(rnorm(200), 100, 2)
y - (X %*% c(1,2) + rnorm(100))  0
dat - data.frame(y=y, X=X)
mod - glm(y ~ X, family=binomial, data=dat)
new - data.frame(X = matrix(rnorm(20),2))
predict(mod, new)
Dear John,
the questioner had a 2 column matrix with 40 and one with 50 
observations (not a 100 column matrix with 2 observation) and for those 
matrices it works ...

Best,
Uwe



   12345
6 
  1.81224443  -5.92955128   1.98718051 -10.05331521   2.6506
-2.50635812 
   789   10   11
12 
  5.63728698  -0.94845276  -3.61657377  -1.63141320   5.03417372
1.80400271 
  13   14   15   16   17
18 
  9.32876273  -5.32723406   5.29373023  -3.90822713 -10.95065186
4.90038016 

 . . .
   97   98   99  100 
 -6.92509812   0.59357486  -1.17205723   0.04209578 

Note that there are 100 rather than 10 predicted values.
But with individuals predictors (rather than a matrix),

x1 - X[,1]
x2 - X[,2]
dat.2 - data.frame(y=y, x1=x1, x2=x2)
mod.2 - glm(y ~ x1 + x2, family=binomial, data=dat.2)
new.2 - data.frame(x1=rnorm(10), x2=rnorm(10))
predict(mod.2, new.2)
 1  2  3  4  5  6  7
 6.5723823  0.6356392  4.0291018 -4.7914650  2.1435485 -3.1738096 -2.8261585
 8  9 10 
-1.5255329 -4.7087592  4.0619290 

works as expected (?).
Regards,
 John
 


-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Uwe Ligges
Sent: Thursday, September 23, 2004 1:33 AM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: [R] Issue with predict() for glm models

[EMAIL PROTECTED] wrote:

Hello everyone,
I am having a problem using the predict (or the 
predict.glm) function in R.
Basically, I run the glm model on a training data set and try to 
obtain predictions for a set of new predictors from a 
test data set 

(i.e., not the predictors that were utilized to obtain the 
glm parameter estimates).
Unfortunately, every time that I attempt this, I obtain the 
predictions for the predictors that were used to fit the 
glm model. I 

have looked at the R mailing list archives and don't believe I am 
making the same mistakes that have been made in the past 
and also have 

tried to closely follow the predict.glm example in the help 
file. Here is an example of what I am trying to do:

set.seed(545345)

# Necessary Variables #

p - 2
train.n - 20
test.n - 25
mean.vec.1 - c(1,1)
mean.vec.2 - c(0,0)
Sigma.1 - matrix(c(1,.5,.5,1),p,p)
Sigma.2 - matrix(c(1,.5,.5,1),p,p)
###
# Load MASS Library #
###
library(MASS)
###
# Data to Parameters for Logistic Regression Model # 
###

train.data.1 - mvrnorm(train.n,mu=mean.vec.1,Sigma=Sigma.1)
train.data.2 - mvrnorm(train.n,mu=mean.vec.2,Sigma=Sigma.2)
train.class.var - as.factor(c(rep(1,train.n),rep(2,train.n)))
predictors.train - rbind(train.data.1,train.data.2)
##
# Test Data Where Predictions for Probabilities Using 
Logistic Reg.  #
# From Training Data are of Interest
 #
##
test.data.1 - mvrnorm(test.n,mu=mean.vec.1,Sigma=Sigma.1)
test.data.2 - mvrnorm(test.n,mu=mean.vec.2,Sigma=Sigma.2)
predictors.test - rbind(test.data.1,test.data.2)
##
# Run Logistic Regression on Training Data # 
##

log.reg - glm(train.class.var~predictors.train,
family=binomial(link=logit))
Well, you haven't specified the data argument, but given 
the two variables directly. Exactly those variables will be 
used in the
predict() step below! If you want the predict() step to work, 
use something like:

  train - data.frame(class = train.class.var,
  predictors = predictors.train)
  log.reg - glm(class ~ ., data = train,
 family=binomial(link=logit))


log.reg
# log.reg
#Call:  glm(formula = train.class.var ~ predictors.train, family = 
#binomial(link = logit)) #
#Coefficients:
#  (Intercept)  predictors.train1  predictors.train2  
#   0.5105-0.2945-1.0811  
#
#Degrees of Freedom: 39 Total (i.e. Null);  37 Residual
#Null Deviance:  55.45 
#Residual Deviance: 41.67AIC: 47.67 

###
# Predicted Probabilities for Test Data # 
###
New.Data - data.frame(predictors.train1=predictors.test[,1],
predictors.train2=predictors.test[,2])
logreg.pred.prob.test - 

Re: [R] block statistics with POSIX classes

2004-09-23 Thread Gabor Grothendieck
I am not sure that I understand what you are looking
for since you did not provide any sample data with
corresponding output to clarify your question.

Here is another guess.

If y is just a numeric vector with monthly data 
and dp is a POSIXlt vector of the same length then:

   aggregate(list(y=y), list(dp$year), mean)$y

will perform aggregation, as will

  aggregate(ts(y, start=c(d$year[1],d$mon[1]+1), freq = 12), nfreq=1, mean)

which converts y to ts and then performs the aggregation.  The first
one will work even if y is irregular while the second one assumes that
y must be monthly.  The second one returns a ts object.

By the way, I had a look at gev's source and it seems that despite the
documentation it does not use POSIXct anywhere internally.  If the
block is year or other character value then it simply assumes that
whatever datetime class is used has an as.POSIXlt method.  If your dates
were POSIXct rather than POSIXlt then it would be important to ensure
that whatever timezone is assumed (which I did not check) in the conversion 
is the one you are using.  You could use character dates or Date class to 
avoid this problem.  Since you appear to be using POSIXlt datetimes from
the beginning I think you should be ok.


Kahra Hannu kahra at mpsgr.it writes:

: 
: Thank you Petr and Gabor for the answers.
: 
: They did not, however, solve my original problem. When I have a monthly time 
series y with a POSIX date
: variable dp, the most obvious way to compute e.g. the annual means is to use 
aggregate(y, 1, mean) that
: works with time series. However, I got stuck with the idea of using the 'by' 
argument as by = dp$year. But in
: that case y has to be a data.frame. The easiest way must be the best way.
: 
: Regards,
: Hannu 
: 
: -Original Message-
: From: r-help-bounces at stat.math.ethz.ch
: [mailto:r-help-bounces at stat.math.ethz.ch]On Behalf Of Gabor Grothendieck
: Sent: Thursday, September 23, 2004 12:56 PM
: To: r-help at stat.math.ethz.ch
: Subject: Re: [R] block statistics with POSIX classes
: 
: 
: Kahra Hannu kahra at mpsgr.it writes:
: 
: : 
: : I have a monthly price index series x, the related return series y = diff
(log
: (x)) and a POSIXlt date-time
: : variable dp. I would like to apply annual blocks to compute for example 
: annual block maxima and mean of y.
: : 
: : When studying the POSIX classes, in the first stage of the learning curve, 
I 
: computed the maximum drawdown
: : of x:
: :  mdd - maxdrawdown(x)
: :  max.dd - mdd$maxdrawdown
: :  from - as.character(dp[mdd$from]) 
: :  to - as.character(dp[mdd$to])   
: :  from; to
: : [1] 2000-08-31
: : [1] 2003-03-31
: : that gives me the POSIX dates of the start and end of the period and 
: suggests that I have done something correctly.
: : 
: : Two questions:
: : (1) how to implement annual blocks and compute e.g. annual max, min and 
mean 
: of y (each year's max, min, mean)?
: : (2) how to apply POSIX variables with the 'block' argument in gev in the 
: evir package?
: : 
: : The S+FinMetrics function aggregateSeries does the job in that module; but 
I 
: do not know, how handle it in R.
: : My guess is that (1) is done by using the function aggregate, but how to 
: define the 'by' argument with POSIX variables?
: 
: 1. To create a ts monthly time series you specify the first month
: and a frequency of 12 like this.  
: 
: z.m - ts(rep(1:6,4), start = c(2000,1), freq = 12)
: z.m
: 
: # Annual aggregate is done using aggregate.ts with nfreq = 1
: z.y - aggregate(z.m, nfreq = 1, max)
: z.y
: 
: # To create a POSIXct series of times using seq
: # (This will use GMT.  Use tz= arg to ISOdate if you want current tz.)
: z.y.times - seq(ISOdate(2000,1,1), length = length(z.y), by = year)
: z.y.times
: 
: 2. Have not used evir but looking at ?gev it seems you can
: use block = 12 if you have monthly data and want the blocks to be 
: successive 12 month periods or you can add a POSIXct times attribute to 
: your data as below (also see comment re tz above) and then use 
: block = year in your gev call.
: 
: attr(z.m, times) - seq(ISOdate(2000,1,1), length=length(z.m), by=month)
: str(z.m)  # display z.m along with attribute info
: 
: __
: R-help at stat.math.ethz.ch mailing list
: https://stat.ethz.ch/mailman/listinfo/r-help
: PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
: 
: __
: R-help at stat.math.ethz.ch mailing list
: https://stat.ethz.ch/mailman/listinfo/r-help
: PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
: 
:

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] gsub

2004-09-23 Thread Jean Eid
Hi

A while back I used gsub to do the following

temp-000US00231
gsub(something here, , temp)
00231

I think it involved the `meta characters' somehow.

I do not know how to do this anymore. I know strsplit will also work but I
remember gsub was much faster.  In essence the question is how to delete
all characters before a particular pattern.

If anyone has some help file for this, it will be greatly appreciated.


Jean Eid

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] gsub

2004-09-23 Thread Sean Davis
You might want to look at ?regex.
Sean
On Sep 23, 2004, at 10:03 AM, Jean Eid wrote:
Hi
A while back I used gsub to do the following
temp-000US00231
gsub(something here, , temp)
00231
I think it involved the `meta characters' somehow.
I do not know how to do this anymore. I know strsplit will also work 
but I
remember gsub was much faster.  In essence the question is how to 
delete
all characters before a particular pattern.

If anyone has some help file for this, it will be greatly appreciated.
Jean Eid
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] nnet and weights: error analysis using VR example

2004-09-23 Thread Christoph Lehmann
Dear R-users, dear Prof. Ripley as package maintainer
I tried to investigate the odd error, when I call nnet together with a 
'weights' parameter, using the 'fgl' example in VR p 348

The error I get is:
Error in eval(expr, envir, enclos) : Object w not found
I think it is a kind of scoping problem, but I really cannot see, what 
the problem exactly is.

and here is my code: the only thing which changed is the definition of a 
weight-parameter ('w') which is given to the nnet-call. Of course the 
weight vector with '1's makes no sense here, but it will be defined 
according to the class sizes later.

###
library(MASS)
data(flg)
con - function(...)
{
print(tab - table(...))
diag(tab) - 0
cat(error rate = ,
round(100*sum(tab)/length(list(...)[[1]]), 2), %\n)
invisible()
}
set.seed(123)
rand - sample(10, dim(fgl)[1], replace = T)
fgl1 - fgl
fgl1[1:9] - lapply(fgl[, 1:9], function(x) {r - range(x); (x - 
r[1])/diff(r)})

CVnn2 - function(formula, data,
  size = c(0,4,4,10,10), lambda = c(0, rep(c(0.001, 
0.01),2)),
  nreps = 1, nifold = 5, verbose = 99, ...)
{

CVnn1 - function(formula, data, nreps=1, ri, verbose,  ...)
{
totalerror - 0
truth - data[,deparse(formula[[2]])]
res -  matrix(0, nrow(data), length(levels(truth)))
if(verbose  20) cat(  inner fold)
for (i in sort(unique(ri))) {
if(verbose  20) cat( , i,  sep=)
data.training - data[ri != i,]$GROUP
w - rep(1, dim(data[ri !=i,])[1])
for(rep in 1:nreps) {
learn - nnet(formula, data[ri !=i,],
  weights = w,
  trace = F, ...)
res[ri == i,] - res[ri == i,] + predict(learn, data[ri 
== i,])

}
}
if(verbose  20) cat(\n)
sum(as.numeric(truth) != max.col(res/nreps))
}
truth - data[,deparse(formula[[2]])]
res -  matrix(0, nrow(data), length(levels(truth)))
choice - numeric(length(lambda))
for (i in sort(unique(rand))) {
if(verbose  0) cat(fold , i,\n, sep=)
set.seed(i*i)
ri - sample(nifold, sum(rand!=i), replace=T)
for(j in seq(along=lambda)) {
if(verbose  10)
cat(  size =, size[j], decay =, lambda[j], \n)
choice[j] - CVnn1(formula, data[rand != i,], nreps=nreps,
   ri=ri, size=size[j], decay=lambda[j],
   verbose=verbose, ...)
}
decay - lambda[which.is.max(-choice)]
csize - size[which.is.max(-choice)]
if(verbose  5) cat(  #errors:, choice,   ) #
if(verbose  1) cat(chosen size = , csize,
 decay = , decay, \n, sep=)
for(rep in 1:nreps) {
data.training - data[rand != i,]$GROUP
w - rep(1, dim(data[rand !=i,])[1])
learn - nnet(formula, data[rand != i,],
  weights = w,
  trace=F,
  size=csize, decay=decay, ...)
res[rand == i,] - res[rand == i,] + predict(learn, 
data[rand == i,])
}
}
factor(levels(truth)[max.col(res/nreps)], levels = levels(truth))
}

res.nn2 - CVnn2(type ~ ., fgl1, skip = T, maxit = 500, nreps = 10)
con(true = fgl$type, predicted = res.nn2)
##
many thanks for your help
Christoph
###
Coordinates:
platform i686-pc-linux-gnu
arch i686
os   linux-gnu
system   i686, linux-gnu
status
major1
minor9.1
year 2004
month06
day  21
language R
--
Christoph LehmannPhone:  ++41 31 930 93 83
Department of Psychiatric NeurophysiologyMobile: ++41 76 570 28 00
University Hospital of Clinical Psychiatry   Fax:++41 31 930 99 61
Waldau[EMAIL PROTECTED]
CH-3000 Bern 60 http://www.puk.unibe.ch/cl/pn_ni_cv_cl_03.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] detection of outliers

2004-09-23 Thread Phguardiol
Hi,
this is both a statistical and a R question...
what would the best way / test to detect an outlier value among a series of 10 to 30 
values ? for instance if we have the following dataset: 10,11,12,15,20,22,25,30,500 I 
d like to have a way to identify the last data as an outlier (only one direction). One 
way would be to calculate abs(mean - median) and if elevated (to what extent ?) delete 
the extreme data then redo.. but is it valid to do so with so few data ? is the 
(trimmed mean - mean) more efficient ? if so, what would be the maximal tolerable 
value to use as a threshold ? (I guess it will be experiment dependent...) tests for 
skweness will probably required a larger dataset ? 
any suggestions are very welcome !
thanks for your help
Philippe Guardiola, MD

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Issue with predict() for glm models

2004-09-23 Thread John Fox
Dear Uwe, 

 -Original Message-
 From: Uwe Ligges [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, September 23, 2004 8:06 AM
 To: John Fox
 Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Subject: Re: [R] Issue with predict() for glm models
 
 John Fox wrote:
 
  Dear Uwe,
  
  Unless I've somehow messed this up, as I mentioned 
 yesterday, what you 
  suggest doesn't seem to work when the predictor is a 
 matrix. Here's a 
  simplified example:
  
  
 X - matrix(rnorm(200), 100, 2)
 y - (X %*% c(1,2) + rnorm(100))  0
 dat - data.frame(y=y, X=X)
 mod - glm(y ~ X, family=binomial, data=dat) new - data.frame(X = 
 matrix(rnorm(20),2)) predict(mod, new)
 
 Dear John,
 
 the questioner had a 2 column matrix with 40 and one with 50 
 observations (not a 100 column matrix with 2 observation) and 
 for those matrices it works ...
 

Indeed, and in my example the matrix predictor X has 2 columns and 100 rows;
I did screw up the matrix for the new data to be used for predictions (in
the example I sent today but not yesterday), but even when this is done
right -- where the new data has 10 rows and 2 columns -- there are 100 (not
10) predicted values:

 X - matrix(rnorm(200), 100, 2)  # original predictor matrix with 100 rows
 y - (X %*% c(1,2) + rnorm(100))  0
 dat - data.frame(y=y, X=X)
 mod - glm(y ~ X, family=binomial, data=dat)
 new - data.frame(X = matrix(rnorm(20),10, 2)) # corrected -- note 10 rows
 predict(mod, new) # note 100 predicted values
   12345
6 
  5.75238091   0.31874587  -3.00515893  -3.77282121  -1.97511221
0.54712914 
   789   10   11
12 
  1.85091226   4.38465524  -0.41028694  -1.53942869   0.57613555
-1.82761518 

 . . .

  91   92   93   94   95
96 
  0.36210780   1.71358713  -9.63612775  -4.54257576  -5.29740468
2.64363405 
  97   98   99  100 
 -4.45478627  -2.44973209   2.51587537  -4.09584837 

Actually, I now see the source of the problem:

The data frames dat and new don't contain a matrix named X; rather the
matrix is split columnwise:

 names(dat)
[1] y   X.1 X.2
 names(new)
[1] X.1 X.2

Consequently, both glm and predict pick up the X in the global environment
(since there is none in dat or new), which accounts for why there are 100
predicted values.

Using list() rather than data.frame() produces the originally expected
behaviour:

 new - list(X = matrix(rnorm(20),10, 2))
 predict(mod, new)
 1  2  3  4  5  6  7

 5.9373064  0.3687360 -8.3793045  0.7645584 -2.6773842  2.4130547  0.7387318

 8  9 10 
-0.4347916  8.4678728 -0.8976054 

Regards,
 John

 Best,
 Uwe
 
 
 
 
 
 
 
 12345
  6 
1.81224443  -5.92955128   1.98718051 -10.05331521   2.6506
  -2.50635812 
 789   10   11
  12 
5.63728698  -0.94845276  -3.61657377  -1.63141320   5.03417372
  1.80400271 
13   14   15   16   17
  18 
9.32876273  -5.32723406   5.29373023  -3.90822713 -10.95065186
  4.90038016
  
   . . .
  
 97   98   99  100 
   -6.92509812   0.59357486  -1.17205723   0.04209578 
  
  
  Note that there are 100 rather than 10 predicted values.
  
  But with individuals predictors (rather than a matrix),
  
  
 x1 - X[,1]
 x2 - X[,2]
 dat.2 - data.frame(y=y, x1=x1, x2=x2)
 mod.2 - glm(y ~ x1 + x2, family=binomial, data=dat.2)
 new.2 - data.frame(x1=rnorm(10), x2=rnorm(10)) 
 predict(mod.2, new.2)
  
   1  2  3  4  5  
 6  7
  
   6.5723823  0.6356392  4.0291018 -4.7914650  2.1435485 -3.1738096 
  -2.8261585
  
   8  9 10 
  -1.5255329 -4.7087592  4.0619290
  
  works as expected (?).
  
  Regards,
   John
   
  
  
 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Uwe Ligges
 Sent: Thursday, September 23, 2004 1:33 AM
 To: [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Subject: Re: [R] Issue with predict() for glm models
 
 [EMAIL PROTECTED] wrote:
 
 
 Hello everyone,
 
 I am having a problem using the predict (or the
 
 predict.glm) function in R.
 
 Basically, I run the glm model on a training data set and try to 
 obtain predictions for a set of new predictors from a
 
 test data set
 
 (i.e., not the predictors that were utilized to obtain the
 
 glm parameter estimates).
 
 Unfortunately, every time that I attempt this, I obtain the 
 predictions for the predictors that were used to fit the
 
 glm model. I
 
 have looked at the R mailing list archives and don't believe I am 
 making the same mistakes that have been made in the past
 
 and also have
 
 tried to closely follow the predict.glm example in the help
 
 file. Here is an example of 

Re: [R] gsub

2004-09-23 Thread Gabor Grothendieck
Jean Eid jeaneid at chass.utoronto.ca writes:

: 
: Hi
: 
: A while back I used gsub to do the following
: 
: temp-000US00231
: gsub(something here, , temp)
: 00231
: 
: I think it involved the `meta characters' somehow.
: 
: I do not know how to do this anymore. I know strsplit will also work but I
: remember gsub was much faster.  In essence the question is how to delete
: all characters before a particular pattern.
: 
: If anyone has some help file for this, it will be greatly appreciated.
: 

I think you want sub in this case, not gsub.

There are many possibilities here depending on what the
general case is.  The following all give the desired
result for the example but their general cases differ.
These are just some of the numerous variations possible.

temp-000US00231
sub(.*US, , temp)
sub(.*S, , temp)
sub([[:digit:]]*[[:alpha:]]*, , temp)
sub(.*[[:alpha:]], , temp)
sub(.*[[:alpha:]][[:alpha:]], , temp)
sub(.*[[:upper:]], , temp)
sub(.*[[:upper:]][[:upper:]], , temp)
sub(., , temp)
substring(temp, 6)

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Issue with predict() for glm models

2004-09-23 Thread jrausch

Thanks to John Fox, Andy Liaw, and Uwe Ligges for their help with my problem
regarding the use of the predict()  function to obtain predictions for a new
set of predictor values. It appears that the bottom line (at least for my
purposes) is that the names and the setup for the data of the predictors in the
glm and the new data need to be consistent. The safest way that I know to do
this from reading  John, Andy, and Uwe's responses is to label each predictor
separately and place them into the glm model separately. Then, when creating a
new data frame to utilize in the predict() function, ensure to consistently
name the predictors. For illustrative examples, see the reply emails of John,
Andy, and Uwe.  

Thanks again, 

Joe  




Joe Rausch, M.A. 
Psychology Liaison 
Lab for Social Research 
917 Flanner Hall 
University of Notre Dame 
Notre Dame, IN 46556
(574) 631-3910
www.nd.edu/~jrausch

If we knew what it was we were doing, it would not be called research, would
it?
- Albert Einstein

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] detection of outliers

2004-09-23 Thread Gabor Grothendieck
 Phguardiol at aol.com writes:

: 
: Hi,
: this is both a statistical and a R question...
: what would the best way / test to detect an outlier value among a series of 
10 to 30 values ? for instance if we
: have the following dataset: 10,11,12,15,20,22,25,30,500 I d like to have a 
way to identify the last data
: as an outlier (only one direction). One way would be to calculate abs(mean - 
median) and if elevated (to
: what extent ?) delete the extreme data then redo.. but is it valid to do so 
with so few data ? is the (trimmed
: mean - mean) more efficient ? if so, what would be the maximal tolerable 
value to use as a threshold ? (I guess
: it will be experiment dependent...) tests for skweness will probably 
required a larger dataset ? 
: any suggestions are very welcome !
: thanks for your help
: Philippe Guardiola, MD


If z is your vector the following all detect outliers:

boxplot(z)  # will show the outlier

plot(lm(z ~ 1))  # the various plots show this as well

require(car)
outlier.test(lm(z ~ 1)) # tests most extreme value

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Problem with R 1.9.1

2004-09-23 Thread Jacques Mendes Meyohas
Hi,

Could someone help me ?
When I´m running the following command in the new R 1.9.1 version the error message 
appear. It didn´t happen on R 1.8.0

 glm(indenizacao/sinistros ~ sexo + plano + faixa, family=Gamma(link=log), 
 weights=sinistros)
Error in x[good, ] * w : non-conformable arrays

[[alternative HTML version deleted]]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Gridbase basic question

2004-09-23 Thread Sean Davis
All,
I have a simple plot(x,y) and I would like to then insert rectangles of  
some length (in native coordinates) and height fixed to 0.5 in native  
coordinates.  I can't quite get the code right to do this.  Can anyone  
give me a quick example of how to do this?  I looked the gridBase index  
and the tutorial (from R-news?) but just haven't gotten it down yet.

 plot(1:10,1:10)
 par(new=T);vps - baseViewports()
 pushViewport(vps$inner,vps$figure,vps$plot)
viewport[GRID.VP.28]
 pushViewport(viewport(x=unit(1,native),y=unit(2,native)))
viewport[GRID.VP.29]
  
grid.rect(height=unit(0.5,native),width=unit(1.5,native),just='botto 
m')

This draws a very large rectangle going from 2 to 7 (y) and to 8 (x).
Thanks,
Sean
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] R vs EViews - serial correlation

2004-09-23 Thread Remigijus Lapinskas

Dear all,

I met with some problems when dealing with a time series with serial correlation.

FIRST, I generate a series with correlated errors

set.seed(1)
x=1:50
y=x+arima.sim(n = 50, list(ar = c(0.47)))

SECOND, I estimate three constants (a, b and rho) in the model Y=a+b*X+u, where 
u=rho*u(-1)+eps
 
library(nlme)
gls(y~x,correlation = corAR1(0.5)) # Is it the right procedure?

Coefficients:
(Intercept)   x 
  0.1410465   1.0023341 

Correlation Structure: AR(1)
 Formula: ~1 
 Parameter estimate(s):
 Phi 
0.440594 
Degrees of freedom: 50 total; 48 residual
Residual standard error: 0.9835158

THIRD, I do the same procedure with EViews as LS Y C X AR(1) and get
Y = 0.1375 + 1.0024*X + [AR(1)=0.3915]

My problem is actually connected with the fitting procedure. As far as I understand 

gls(y~x,correlation = corAR1(0.5))$fit

is obtained through the linear equation 0.1410+1.0023*X while in EViews through the 
nonlinear equation 

Y=rho*Y(-1) + (1-rho)*a+(X-rho*X(-1))*b

where either dynamic or static fitting procedures are applied. 

X   YYF_DYF_S gls.fit
1   1  1.1592  NA  NA  1.1434
2   2  3.5866  2.1499  2.1499  2.1457
3   3  4.1355  3.1478  3.7103  3.1480
4   4  3.9125  4.1484  4.5352  4.1504
5   5  2.7442  5.1502  5.0578  5.1527
6   6  6.0647  6.1523  5.2103  6.1551
7   7  6.9855  7.1547  7.1203  7.1574
.
47 47 49.4299 47.2521 47.5288 47.2507
48 48 48.7748 48.2545 49.1072 48.2531
49 49 48.3200 49.2570 49.4607 49.2554
50 50 50.2501 50.2594 49.8926 50.2578

All gls.fit values are on a line, YF_D (D for dynamic) soon begin
to follow a line and YF_S (S for static) try to mimic Y.

My question is: do R and EViews estimate the same model? If yes, why
the estimates are different and which of the two (three?) procedures
is correct?

Thanking you in advance,
Rem

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Problem with R 1.9.1

2004-09-23 Thread Duncan Murdoch
On Thu, 23 Sep 2004 11:56:37 -0300, Jacques Mendes Meyohas
[EMAIL PROTECTED] wrote :

Hi,

Could someone help me ?
When I´m running the following command in the new R 1.9.1 version the error message 
appear. It didn´t happen on R 1.8.0

 glm(indenizacao/sinistros ~ sexo + plano + faixa, family=Gamma(link=log), 
 weights=sinistros)
Error in x[good, ] * w : non-conformable arrays

Someone might recognize this problem, but you're likely to get a
better response (or figure out the problem yourself!) if you try to
simplify things to something completely self-contained.  

I'm guessing there's something wrong with your weight vector
sinistros, but I can't check to be sure.

Duncan Murdoch

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] How to improve the quality of curve/line plots?

2004-09-23 Thread Wolf, Michael
Dear list,

I'm using the windows version of R. When plotting a curve or a line for time series 
with annual data , e. g. GDP growth 1991-2003, the line seems to exist of a lot of 
smaller lines. Printing the results the curves and lines seems to be unclean 
(because of using small resolution bitmaps?). Comparing the result of R with the same 
results of Excel the lines in excel seems to havve a higher qualitiy. In Excel you 
also can produce curves instead of lines.

Are there any possibilities how to improve the quality of the plots in R? How can R be 
influenced to plot clean lines with a higher resolution on the screen (I think it's 
not a question of the pdf- or png command.). Perhaps, it's a problem of the graphical 
possibilites of R because the most line plots which can be seen on the web have these 
problems.

Thanks,

Dr. Michael Wolf
Bezirksregierung Münster
Dezernat 61
Domplatz 1-348161 Münster
Tel.:   ++ 49 (02 51) / 4 11 - 17 95
Fax.:   ++ 49 (02 51) / 4 11 - 8 17 95
E-Mail: [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] detection of outliers

2004-09-23 Thread Christian Hennig
On Thu, 23 Sep 2004 [EMAIL PROTECTED] wrote:

 Hi,
 this is both a statistical and a R question...
 what would the best way / test to detect an outlier value among a series of 10 to 30 
 values ? for instance if we have the following dataset: 10,11,12,15,20,22,25,30,500 
 I d like to have a way to identify the last data as an outlier (only one direction). 
 One way would be to calculate abs(mean - median) and if elevated (to what extent ?) 
 delete the extreme data then redo.. but is it valid to do so with so few data ? is 
 the (trimmed mean - mean) more efficient ? if so, what would be the maximal 
 tolerable value to use as a threshold ? (I guess it will be experiment dependent...) 
 tests for skweness will probably required a larger dataset ? 
 any suggestions are very welcome !
 thanks for your help
 Philippe Guardiola, MD
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 

You may want to read 
Davies and Gather, The identification of multiple outliers, JASA 88 (1993),
782-801.

The simplest recommendation is to nominate all points with distance larger
than c*mad(data) from the median as outliers. Choices of c depending on n
are given in the above paper.

This is somewhat better founded theoretically than the boxplot method
recommended by Gabor G., but it is based on the assumption that the
distribution on the non-outliers is close to the normal and especially not
strongly skewed (the boxplot method
seems to be a bit more robust against skewness).

Christian
 
***
Christian Hennig
Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg
[EMAIL PROTECTED], http://www.math.uni-hamburg.de/home/hennig/
###
ich empfehle www.boag-online.de

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] detection of outliers

2004-09-23 Thread Vito Ricci
Hi,
give a look to:

http://www.itl.nist.gov/div898/handbook/eda/section3/eda35h.htm

it's the Grubbs' Test for Outliers. It is based on the
assumption of normality of data.

Other methods of outliers' could:

Run-Sequence Plot
Histogram
Normal Probability Plot
Box-plot

Best
Vito



you wrote:

Hi,
this is both a statistical and a R question...
what would the best way / test to detect an outlier
value among a series of 10 to 30 values ? for instance
if we have the following dataset:
10,11,12,15,20,22,25,30,500 I d like to have a way to
identify the last data as an outlier (only one
direction). One way would be to calculate abs(mean -
median) and if elevated (to what extent ?) delete the
extreme data then redo.. but is it valid to do so with
so few data ? is the (trimmed mean - mean) more
efficient ? if so, what would be the maximal tolerable
value to use as a threshold ? (I guess it will be
experiment dependent...) tests for skweness will
probably required a larger dataset ? 
any suggestions are very welcome !
thanks for your help
Philippe Guardiola, MD

=
Diventare costruttori di soluzioni

Visitate il portale http://www.modugno.it/
e in particolare la sezione su Palese http://www.modugno.it/archivio/cat_palese.shtml



___

http://it.seriea.fantasysports.yahoo.com/

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] detection of outliers

2004-09-23 Thread Berton Gunter
Not to oversimplify ...

1. (At least) dozens of books and thousands of papers have been written on
this...

2. Most important question is: What is an outlier? (Many smart folks says
that the concept is illogical/flawed -- there is no mystical boundary that
one crosses to become a statistical pariah; many other smart folks
disagree).

3. Equivalently: What is the model with respect to which values are
outlying? (with apologies to Winston Churchill's: That is an indignity up
with which I will not put.)

So good advice here is: Beware of good advice about this. (Of course, I may
just be an outlier ...)

;)

Cheers,

-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
The business of the statistician is to catalyze the scientific learning
process.  - George E. P. Box
 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 [EMAIL PROTECTED]
 Sent: Thursday, September 23, 2004 7:22 AM
 To: [EMAIL PROTECTED]
 Subject: [R] detection of outliers
 
 Hi,
 this is both a statistical and a R question...
 what would the best way / test to detect an outlier value 
 among a series of 10 to 30 values ? for instance if we have 
 the following dataset: 10,11,12,15,20,22,25,30,500 I d like 
 to have a way to identify the last data as an outlier (only 
 one direction). One way would be to calculate abs(mean - 
 median) and if elevated (to what extent ?) delete the extreme 
 data then redo.. but is it valid to do so with so few data ? 
 is the (trimmed mean - mean) more efficient ? if so, what 
 would be the maximal tolerable value to use as a threshold ? 
 (I guess it will be experiment dependent...) tests for 
 skweness will probably required a larger dataset ? 
 any suggestions are very welcome !
 thanks for your help
 Philippe Guardiola, MD
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] block statistics with POSIX classes

2004-09-23 Thread Kahra Hannu
I have followed Gabor's instructions:

 aggregate(list(y=y), list(dp$year), mean)$y   # returns NULL since y 
 is a time series
NULL
 
 aggregate(list(y=as.vector(y)), list(dp$year), mean)$y# returns annual means
[1]  0.0077656696  0.0224050294  0.001898  0.0240550925 -0.0084085867
[6] -0.0170950194 -0.0355641251  0.0065873997  0.0008253111

 aggregate(list(y=y), list(dp$year), mean) # returns the same as 
 the previous one
  Group.1  Series.1
1  96  0.0077656696
2  97  0.0224050294
3  98  0.001898
4  99  0.0240550925
5 100 -0.0084085867
6 101 -0.0170950194
7 102 -0.0355641251
8 103  0.0065873997
9 104  0.0008253111

Gabor's second suggestion returns different results:

 aggregate(ts(y, start=c(dp$year[1],dp$mon[1]+1), freq = 12), nfreq=1, mean)
Time Series:
Start = 96.3 
End = 103. 
Frequency = 1 
 Series 1
[1,]  0.016120895
[2,]  0.024257131
[3,]  0.007526997
[4,]  0.017466118
[5,] -0.016024846
[6,] -0.017145159
[7,] -0.036047765
[8,]  0.014198501

 aggregate(y, 1, mean) # verifies the result above
Time Series:
Start = 1996.333 
End = 2003.333 
Frequency = 1 
 Series 1
[1,]  0.016120895
[2,]  0.024257131
[3,]  0.007526997
[4,]  0.017466118
[5,] -0.016024846
[6,] -0.017145159
[7,] -0.036047765
[8,]  0.014198501

The data is from 1996:5 to 2004:8. The difference of the results must depend on the 
fact that the beginning of the data is not January and the end is not December? The 
first two solutions give nine annual means while the last two give only eight means. 
The block size in the last two must be 12 months, as is said in ?aggregate, instead of 
a calender year that I am looking for. Gabor's first suggestion solved my problem.

Thank you,
Hannu
  


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Gabor Grothendieck
Sent: Thursday, September 23, 2004 3:52 PM
To: [EMAIL PROTECTED]
Subject: Re: [R] block statistics with POSIX classes


I am not sure that I understand what you are looking
for since you did not provide any sample data with
corresponding output to clarify your question.

Here is another guess.

If y is just a numeric vector with monthly data 
and dp is a POSIXlt vector of the same length then:

   aggregate(list(y=y), list(dp$year), mean)$y

will perform aggregation, as will

  aggregate(ts(y, start=c(d$year[1],d$mon[1]+1), freq = 12), nfreq=1, mean)

which converts y to ts and then performs the aggregation.  The first
one will work even if y is irregular while the second one assumes that
y must be monthly.  The second one returns a ts object.

By the way, I had a look at gev's source and it seems that despite the
documentation it does not use POSIXct anywhere internally.  If the
block is year or other character value then it simply assumes that
whatever datetime class is used has an as.POSIXlt method.  If your dates
were POSIXct rather than POSIXlt then it would be important to ensure
that whatever timezone is assumed (which I did not check) in the conversion 
is the one you are using.  You could use character dates or Date class to 
avoid this problem.  Since you appear to be using POSIXlt datetimes from
the beginning I think you should be ok.


Kahra Hannu kahra at mpsgr.it writes:

: 
: Thank you Petr and Gabor for the answers.
: 
: They did not, however, solve my original problem. When I have a monthly time 
series y with a POSIX date
: variable dp, the most obvious way to compute e.g. the annual means is to use 
aggregate(y, 1, mean) that
: works with time series. However, I got stuck with the idea of using the 'by' 
argument as by = dp$year. But in
: that case y has to be a data.frame. The easiest way must be the best way.
: 
: Regards,
: Hannu 
: 
: -Original Message-
: From: r-help-bounces at stat.math.ethz.ch
: [mailto:r-help-bounces at stat.math.ethz.ch]On Behalf Of Gabor Grothendieck
: Sent: Thursday, September 23, 2004 12:56 PM
: To: r-help at stat.math.ethz.ch
: Subject: Re: [R] block statistics with POSIX classes
: 
: 
: Kahra Hannu kahra at mpsgr.it writes:
: 
: : 
: : I have a monthly price index series x, the related return series y = diff
(log
: (x)) and a POSIXlt date-time
: : variable dp. I would like to apply annual blocks to compute for example 
: annual block maxima and mean of y.
: : 
: : When studying the POSIX classes, in the first stage of the learning curve, 
I 
: computed the maximum drawdown
: : of x:
: :  mdd - maxdrawdown(x)
: :  max.dd - mdd$maxdrawdown
: :  from - as.character(dp[mdd$from]) 
: :  to - as.character(dp[mdd$to])   
: :  from; to
: : [1] 2000-08-31
: : [1] 2003-03-31
: : that gives me the POSIX dates of the start and end of the period and 
: suggests that I have done something correctly.
: : 
: : Two questions:
: : (1) how to implement annual blocks and compute e.g. annual max, min and 
mean 
: of y (each year's max, min, mean)?
: : (2) how to 

[R] R glm

2004-09-23 Thread Shuangge Ma
Hello:
would you please help me with the following glm question?

for the R function glm, what I understand is: once you specify the
family, then the link function is fixed.

My question is: is it possible I use, for example, log link function,
but the estimation approach for the guassian family?

Thanks,

Shuangge Ma, Ph.D.

* CHSCC, Department of Biostatistics   *
* University of Washington *
* Building 29, Suite 310, 6200 NE 74th ST. *
* Seattle, WA 98115*
* Tel: 206-685-7123 Fax: 206-616-4075  *

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Problem with R 1.9.1

2004-09-23 Thread Uwe Ligges
Jacques Mendes Meyohas wrote:
Hi,
Could someone help me ?
When I´m running the following command in the new R 1.9.1 version the error message 
appear. It didn´t happen on R 1.8.0

glm(indenizacao/sinistros ~ sexo + plano + faixa, family=Gamma(link=log), weights=sinistros)
Error in x[good, ] * w : non-conformable arrays
We cannot help if we don't know anything about the data you are using.
You also might want to try out R-2.0.0 beta (but I guess it is user error).
Uwe Ligges

[[alternative HTML version deleted]]
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] fitting weibull distribution

2004-09-23 Thread Ulrich Leopold
Dear all,

I get the following error message. And I cannot quite work out what is
wrong. I think the optim gets infinite values. Certainly my data do not
have any infinite values. How can I solve this?

fitdistr(A1, weibull)
Error in optim(start, mylogfn, x = x, hessian = TRUE, ...) :
non-finite value supplied by optim


I am using R version 1.9.1 on RedHat Linux, Kernel 2.6.8.

Ulrich

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Issue with predict() for glm models

2004-09-23 Thread Uwe Ligges
John Fox wrote:
Dear Uwe, 


-Original Message-
From: Uwe Ligges [mailto:[EMAIL PROTECTED] 
Sent: Thursday, September 23, 2004 8:06 AM
To: John Fox
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: [R] Issue with predict() for glm models

John Fox wrote:

Dear Uwe,
Unless I've somehow messed this up, as I mentioned 
yesterday, what you 

suggest doesn't seem to work when the predictor is a 
matrix. Here's a 

simplified example:

X - matrix(rnorm(200), 100, 2)
y - (X %*% c(1,2) + rnorm(100))  0
dat - data.frame(y=y, X=X)
mod - glm(y ~ X, family=binomial, data=dat) new - data.frame(X = 
matrix(rnorm(20),2)) predict(mod, new)
Dear John,
the questioner had a 2 column matrix with 40 and one with 50 
observations (not a 100 column matrix with 2 observation) and 
for those matrices it works ...


Indeed, and in my example the matrix predictor X has 2 columns and 100 rows;
I did screw up the matrix for the new data to be used for predictions (in
the example I sent today but not yesterday), but even when this is done
right -- where the new data has 10 rows and 2 columns -- there are 100 (not
10) predicted values:

X - matrix(rnorm(200), 100, 2)  # original predictor matrix with 100 rows
y - (X %*% c(1,2) + rnorm(100))  0
dat - data.frame(y=y, X=X)
mod - glm(y ~ X, family=binomial, data=dat)
John,
note that I used glm(y ~ .) (the dot!),
because the names are automatically chosen to be X.1 and X.2, hence you 
cannot use X in the formula in this case ...

Best,
Uwe

new - data.frame(X = matrix(rnorm(20),10, 2)) # corrected -- note 10 rows
predict(mod, new) # note 100 predicted values
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] R glm

2004-09-23 Thread Kahra Hannu
In Venables  Ripley: Modern Applied Statistics with S (MASS), (4th edition), on page 
184 there is a table Families and link functions that gives you the available links 
with different families. The default and the only link with the gaussian family is 
identity.

ciao,
Hannu Kahra

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Shuangge Ma
Sent: Thursday, September 23, 2004 6:26 PM
To: [EMAIL PROTECTED]
Subject: [R] R glm


Hello:
would you please help me with the following glm question?

for the R function glm, what I understand is: once you specify the
family, then the link function is fixed.

My question is: is it possible I use, for example, log link function,
but the estimation approach for the guassian family?

Thanks,

Shuangge Ma, Ph.D.

* CHSCC, Department of Biostatistics   *
* University of Washington *
* Building 29, Suite 310, 6200 NE 74th ST. *
* Seattle, WA 98115*
* Tel: 206-685-7123 Fax: 206-616-4075  *

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R glm

2004-09-23 Thread Martin Henry H. Stevens
Dear Shuangge - There exists a default link function for each family, 
but you can specify the link, as well.
see ?family (e.g., in html help click on the family link from the glm 
help page).
Hank
On Sep 23, 2004, at 12:25 PM, Shuangge Ma wrote:

Hello:
would you please help me with the following glm question?
for the R function glm, what I understand is: once you specify the
family, then the link function is fixed.
My question is: is it possible I use, for example, log link function,
but the estimation approach for the guassian family?
Thanks,
Shuangge Ma, Ph.D.

* CHSCC, Department of Biostatistics   *
* University of Washington *
* Building 29, Suite 310, 6200 NE 74th ST. *
* Seattle, WA 98115*
* Tel: 206-685-7123 Fax: 206-616-4075  *
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


Dr. Martin Henry H. Stevens, Assistant Professor
338 Pearson Hall
Botany Department
Miami University
Oxford, OH 45056
Office: (513) 529-4206
Lab: (513) 529-4262
FAX: (513) 529-4243
http://www.cas.muohio.edu/botany/bot/henry.html
http://www.muohio.edu/ecology/
http://www.muohio.edu/botany/
E Pluribus Unum
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] block statistics with POSIX classes

2004-09-23 Thread Gabor Grothendieck

Kahra Hannu kahra at mpsgr.it writes:

: 
: I have followed Gabor's instructions:
: 
:  aggregate(list(y=y), list(dp$year), mean)$y # 
returns NULL since y is a time series
: NULL
:  
:  aggregate(list(y=as.vector(y)), list(dp$year), mean)$y  # returns 
annual means
: [1]  0.0077656696  0.0224050294  0.001898  0.0240550925 -0.0084085867
: [6] -0.0170950194 -0.0355641251  0.0065873997  0.0008253111
: 
:  aggregate(list(y=y), list(dp$year), mean)   # returns the 
same as the previous one
:   Group.1  Series.1
: 1  96  0.0077656696
: 2  97  0.0224050294
: 3  98  0.001898
: 4  99  0.0240550925
: 5 100 -0.0084085867
: 6 101 -0.0170950194
: 7 102 -0.0355641251
: 8 103  0.0065873997
: 9 104  0.0008253111
: 
: Gabor's second suggestion returns different results:
: 
:  aggregate(ts(y, start=c(dp$year[1],dp$mon[1]+1), freq = 12), nfreq=1, mean)
: Time Series:
: Start = 96.3 
: End = 103. 
: Frequency = 1 
:  Series 1
: [1,]  0.016120895
: [2,]  0.024257131
: [3,]  0.007526997
: [4,]  0.017466118
: [5,] -0.016024846
: [6,] -0.017145159
: [7,] -0.036047765
: [8,]  0.014198501
: 
:  aggregate(y, 1, mean)   # verifies the result above
: Time Series:
: Start = 1996.333 
: End = 2003.333 
: Frequency = 1 
:  Series 1
: [1,]  0.016120895
: [2,]  0.024257131
: [3,]  0.007526997
: [4,]  0.017466118
: [5,] -0.016024846
: [6,] -0.017145159
: [7,] -0.036047765
: [8,]  0.014198501
: 
: The data is from 1996:5 to 2004:8. The difference of the results must depend 
on the fact that the beginning of
: the data is not January and the end is not December? The first two solutions 
give nine annual means while the
: last two give only eight means. The block size in the last two must be 12 
months, as is said in ?aggregate,
: instead of a calender year that I am looking for. Gabor's first suggestion 
solved my problem.

Yes, that seems to be the case.  Using length instead of 
mean we find that the aggregate.data.frame example used calendar 
years as the basis of aggregation whereas the aggregate.ts example
used successive 12 month periods starting from the first month discarding
the 4 points at the end which do not fill out a full year.

R set.seed(1)
R dp - as.POSIXlt(seq(from=as.Date(1996-5-1), to=as.Date(2004-8-1), 
+  by=month))
R y - rnorm(length(dp$year))

R aggregate(list(y=y), list(dp$year), length)$y
[1]  8 12 12 12 12 12 12 12  8

R aggregate(ts(y, start=c(dp$year[1],dp$mon[1]+1), freq = 12), nfreq=1, 
length)
Time Series:
Start = 96.3 
End = 103. 
Frequency = 1 
[1] 12 12 12 12 12 12 12 12

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


followup: Re: [R] Issue with predict() for glm models

2004-09-23 Thread Paul Johnson
I have a follow up question that fits with this thread.
Can you force an overlaid plot showing predicted values to follow the 
scaling of the axes of the plot over which it is laid?

Here is an example based on linear regression, just for clarity.  I have 
followed the procedure described below to create predictions and now 
want to plot the predicted values on top of a small section of the x-y 
scatterplot.

x - rnorm(100, 10, 10)
e - rnorm(100, 0, 5)
y - 5 + 10 *x + e
myReg1 - lm (y~x)
plot(x,y)
newX - seq(1,10,1)
myPred - predict(myReg1,data.frame(x=newX))
Now, if I do this, I get 2 graphs overlaid but their axes do not line 
up.

par(new=T)
plot(newX,myPred$fit)
The problem is that the second one uses the whole width of the graph 
space, when I'd rather just have it go from the small subset where its x 
is defined, from 1 to 10.  Its stretching the range (1,10) for newX to 
use the same scale that goes from (-15, 35) where it plots x

I know abline() can do this for lm, but for some other kinds of models, 
no  lines() method is provided, and so I am doing this the old fashioned 
way.

pj
John Fox wrote:
Dear Uwe, 


-Original Message-
From: Uwe Ligges [mailto:[EMAIL PROTECTED] 
Sent: Thursday, September 23, 2004 8:06 AM
To: John Fox
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: [R] Issue with predict() for glm models

John Fox wrote:

Dear Uwe,
Unless I've somehow messed this up, as I mentioned 
yesterday, what you 

suggest doesn't seem to work when the predictor is a 
matrix. Here's a 

simplified example:

X - matrix(rnorm(200), 100, 2)
y - (X %*% c(1,2) + rnorm(100))  0
dat - data.frame(y=y, X=X)
mod - glm(y ~ X, family=binomial, data=dat) new - data.frame(X = 
matrix(rnorm(20),2)) predict(mod, new)
Dear John,
the questioner had a 2 column matrix with 40 and one with 50 
observations (not a 100 column matrix with 2 observation) and 
for those matrices it works ...


Indeed, and in my example the matrix predictor X has 2 columns and 100 rows;
I did screw up the matrix for the new data to be used for predictions (in
the example I sent today but not yesterday), but even when this is done
right -- where the new data has 10 rows and 2 columns -- there are 100 (not
10) predicted values:

X - matrix(rnorm(200), 100, 2)  # original predictor matrix with 100 rows
y - (X %*% c(1,2) + rnorm(100))  0
dat - data.frame(y=y, X=X)
mod - glm(y ~ X, family=binomial, data=dat)
new - data.frame(X = matrix(rnorm(20),10, 2)) # corrected -- note 10 rows
predict(mod, new) # note 100 predicted values
   12345
6 
  5.75238091   0.31874587  -3.00515893  -3.77282121  -1.97511221
0.54712914 
   789   10   11
12 
  1.85091226   4.38465524  -0.41028694  -1.53942869   0.57613555
-1.82761518 

 . . .
  91   92   93   94   95
96 
  0.36210780   1.71358713  -9.63612775  -4.54257576  -5.29740468
2.64363405 
  97   98   99  100 
 -4.45478627  -2.44973209   2.51587537  -4.09584837 

Actually, I now see the source of the problem:
The data frames dat and new don't contain a matrix named X; rather the
matrix is split columnwise:

names(dat)
[1] y   X.1 X.2
names(new)
[1] X.1 X.2
Consequently, both glm and predict pick up the X in the global environment
(since there is none in dat or new), which accounts for why there are 100
predicted values.
Using list() rather than data.frame() produces the originally expected
behaviour:

new - list(X = matrix(rnorm(20),10, 2))
predict(mod, new)
 1  2  3  4  5  6  7
 5.9373064  0.3687360 -8.3793045  0.7645584 -2.6773842  2.4130547  0.7387318
 8  9 10 
-0.4347916  8.4678728 -0.8976054 

Regards,
 John

Best,
Uwe




  12345
6 
 1.81224443  -5.92955128   1.98718051 -10.05331521   2.6506
-2.50635812 
  789   10   11
12 
 5.63728698  -0.94845276  -3.61657377  -1.63141320   5.03417372
1.80400271 
 13   14   15   16   17
18 
 9.32876273  -5.32723406   5.29373023  -3.90822713 -10.95065186
4.90038016

. . .
  97   98   99  100 
-6.92509812   0.59357486  -1.17205723   0.04209578 

Note that there are 100 rather than 10 predicted values.
But with individuals predictors (rather than a matrix),

x1 - X[,1]
x2 - X[,2]
dat.2 - data.frame(y=y, x1=x1, x2=x2)
mod.2 - glm(y ~ x1 + x2, family=binomial, data=dat.2)
new.2 - data.frame(x1=rnorm(10), x2=rnorm(10)) 
predict(mod.2, new.2)
1  2  3  4  5  
   6  7
6.5723823  0.6356392  4.0291018 -4.7914650  2.1435485 -3.1738096 
-2.8261585

8  9 10 
-1.5255329 -4.7087592  4.0619290

works as expected (?).
Regards,
John


-Original Message-
From: [EMAIL 

Re: [R] R glm

2004-09-23 Thread Paul Johnson
No!
 ?family
 The 'gaussian' family accepts the links 'identity', 'log' and
  'inverse';
Kahra Hannu wrote:
In Venables  Ripley: Modern Applied Statistics with S (MASS), (4th edition), on page 184 there 
is a table Families and link functions that gives you the available links with different 
families. The default and the only link with the gaussian family is identity.
ciao,
Hannu Kahra
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Shuangge Ma
Sent: Thursday, September 23, 2004 6:26 PM
To: [EMAIL PROTECTED]
Subject: [R] R glm
Hello:
would you please help me with the following glm question?
for the R function glm, what I understand is: once you specify the
family, then the link function is fixed.
My question is: is it possible I use, for example, log link function,
but the estimation approach for the guassian family?
Thanks,
Shuangge Ma, Ph.D.

* CHSCC, Department of Biostatistics   *
* University of Washington *
* Building 29, Suite 310, 6200 NE 74th ST. *
* Seattle, WA 98115*
* Tel: 206-685-7123 Fax: 206-616-4075  *
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

--
Paul E. Johnson   email: [EMAIL PROTECTED]
Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas  Office: (785) 864-9086
Lawrence, Kansas 66044-3177   FAX: (785) 864-5700
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: followup: Re: [R] Issue with predict() for glm models

2004-09-23 Thread Austin, Matt
Could you just use

lines(newX, myPred, col=2)

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Paul Johnson
Sent: Thursday, September 23, 2004 10:3 AM
To: r help
Subject: followup: Re: [R] Issue with predict() for glm models


I have a follow up question that fits with this thread.

Can you force an overlaid plot showing predicted values to follow the 
scaling of the axes of the plot over which it is laid?

Here is an example based on linear regression, just for clarity.  I have 
followed the procedure described below to create predictions and now 
want to plot the predicted values on top of a small section of the x-y 
scatterplot.

x - rnorm(100, 10, 10)
e - rnorm(100, 0, 5)
y - 5 + 10 *x + e

myReg1 - lm (y~x)
plot(x,y)
newX - seq(1,10,1)
myPred - predict(myReg1,data.frame(x=newX))

Now, if I do this, I get 2 graphs overlaid but their axes do not line 
up.

par(new=T)
plot(newX,myPred$fit)

The problem is that the second one uses the whole width of the graph 
space, when I'd rather just have it go from the small subset where its x 
is defined, from 1 to 10.  Its stretching the range (1,10) for newX to 
use the same scale that goes from (-15, 35) where it plots x

I know abline() can do this for lm, but for some other kinds of models, 
no  lines() method is provided, and so I am doing this the old fashioned 
way.

pj

John Fox wrote:
 Dear Uwe, 
 
 
-Original Message-
From: Uwe Ligges [mailto:[EMAIL PROTECTED] 
Sent: Thursday, September 23, 2004 8:06 AM
To: John Fox
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: [R] Issue with predict() for glm models

John Fox wrote:


Dear Uwe,

Unless I've somehow messed this up, as I mentioned 

yesterday, what you 

suggest doesn't seem to work when the predictor is a 

matrix. Here's a 

simplified example:



X - matrix(rnorm(200), 100, 2)
y - (X %*% c(1,2) + rnorm(100))  0
dat - data.frame(y=y, X=X)
mod - glm(y ~ X, family=binomial, data=dat) new - data.frame(X = 
matrix(rnorm(20),2)) predict(mod, new)

Dear John,

the questioner had a 2 column matrix with 40 and one with 50 
observations (not a 100 column matrix with 2 observation) and 
for those matrices it works ...

 
 
 Indeed, and in my example the matrix predictor X has 2 columns and 100
rows;
 I did screw up the matrix for the new data to be used for predictions
(in
 the example I sent today but not yesterday), but even when this is done
 right -- where the new data has 10 rows and 2 columns -- there are 100
(not
 10) predicted values:
 
 
X - matrix(rnorm(200), 100, 2)  # original predictor matrix with 100 rows
y - (X %*% c(1,2) + rnorm(100))  0
dat - data.frame(y=y, X=X)
mod - glm(y ~ X, family=binomial, data=dat)
new - data.frame(X = matrix(rnorm(20),10, 2)) # corrected -- note 10 rows
predict(mod, new) # note 100 predicted values
 
12345
 6 
   5.75238091   0.31874587  -3.00515893  -3.77282121  -1.97511221
 0.54712914 
789   10   11
 12 
   1.85091226   4.38465524  -0.41028694  -1.53942869   0.57613555
 -1.82761518 
 
  . . .
 
   91   92   93   94   95
 96 
   0.36210780   1.71358713  -9.63612775  -4.54257576  -5.29740468
 2.64363405 
   97   98   99  100 
  -4.45478627  -2.44973209   2.51587537  -4.09584837 
 
 Actually, I now see the source of the problem:
 
 The data frames dat and new don't contain a matrix named X; rather the
 matrix is split columnwise:
 
 
names(dat)
 
 [1] y   X.1 X.2
 
names(new)
 
 [1] X.1 X.2
 
 Consequently, both glm and predict pick up the X in the global environment
 (since there is none in dat or new), which accounts for why there are 100
 predicted values.
 
 Using list() rather than data.frame() produces the originally expected
 behaviour:
 
 
new - list(X = matrix(rnorm(20),10, 2))
predict(mod, new)
 
  1  2  3  4  5  6
7
 
  5.9373064  0.3687360 -8.3793045  0.7645584 -2.6773842  2.4130547
0.7387318
 
  8  9 10 
 -0.4347916  8.4678728 -0.8976054 
 
 Regards,
  John
 
 
Best,
Uwe








   12345
6 
  1.81224443  -5.92955128   1.98718051 -10.05331521   2.6506
-2.50635812 
   789   10   11
12 
  5.63728698  -0.94845276  -3.61657377  -1.63141320   5.03417372
1.80400271 
  13   14   15   16   17
18 
  9.32876273  -5.32723406   5.29373023  -3.90822713 -10.95065186
4.90038016

 . . .

   97   98   99  100 
 -6.92509812   0.59357486  -1.17205723   0.04209578 


Note that there are 100 rather than 10 predicted values.

But with individuals predictors (rather than a matrix),



x1 - X[,1]
x2 - X[,2]
dat.2 - data.frame(y=y, x1=x1, x2=x2)
mod.2 - glm(y ~ x1 + x2, family=binomial, data=dat.2)

Re: [R] gsub

2004-09-23 Thread Jean Eid
Thank you all for the help, specially Gabor that is exactly what I needed.
A few examples that do the same thing is very helpful in understanding the
structure of the call.


Thank you again,


Jean

 On Thu, 23 Sep 2004, Gabor Grothendieck wrote:

 Jean Eid jeaneid at chass.utoronto.ca writes:

 :
 : Hi
 :
 : A while back I used gsub to do the following
 :
 : temp-000US00231
 : gsub(something here, , temp)
 : 00231
 :
 : I think it involved the `meta characters' somehow.
 :
 : I do not know how to do this anymore. I know strsplit will also work but I
 : remember gsub was much faster.  In essence the question is how to delete
 : all characters before a particular pattern.
 :
 : If anyone has some help file for this, it will be greatly appreciated.
 :

 I think you want sub in this case, not gsub.

 There are many possibilities here depending on what the
 general case is.  The following all give the desired
 result for the example but their general cases differ.
 These are just some of the numerous variations possible.

 temp-000US00231
 sub(.*US, , temp)
 sub(.*S, , temp)
 sub([[:digit:]]*[[:alpha:]]*, , temp)
 sub(.*[[:alpha:]], , temp)
 sub(.*[[:alpha:]][[:alpha:]], , temp)
 sub(.*[[:upper:]], , temp)
 sub(.*[[:upper:]][[:upper:]], , temp)
 sub(., , temp)
 substring(temp, 6)

 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R glm

2004-09-23 Thread Martin Henry H. Stevens
Upon examining Table 7.1 on page 184 of VR 4th edition, I can see 
where the ambiguity would arise, however. Always best to check the 
help, I guess.
Hank
On Sep 23, 2004, at 1:04 PM, Paul Johnson wrote:

No!
 ?family
 The 'gaussian' family accepts the links 'identity', 'log' and
  'inverse';
Kahra Hannu wrote:
In Venables  Ripley: Modern Applied Statistics with S (MASS), (4th 
edition), on page 184 there is a table Families and link functions 
that gives you the available links with different families. The 
default and the only link with the gaussian family is identity.
ciao,
Hannu Kahra
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Shuangge Ma
Sent: Thursday, September 23, 2004 6:26 PM
To: [EMAIL PROTECTED]
Subject: [R] R glm
Hello:
would you please help me with the following glm question?
for the R function glm, what I understand is: once you specify the
family, then the link function is fixed.
My question is: is it possible I use, for example, log link 
function,
but the estimation approach for the guassian family?
Thanks,
Shuangge Ma, Ph.D.

* CHSCC, Department of Biostatistics   *
* University of Washington *
* Building 29, Suite 310, 6200 NE 74th ST. *
* Seattle, WA 98115*
* Tel: 206-685-7123 Fax: 206-616-4075  *
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

--
Paul E. Johnson   email: [EMAIL PROTECTED]
Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas  Office: (785) 864-9086
Lawrence, Kansas 66044-3177   FAX: (785) 864-5700
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


Dr. Martin Henry H. Stevens, Assistant Professor
338 Pearson Hall
Botany Department
Miami University
Oxford, OH 45056
Office: (513) 529-4206
Lab: (513) 529-4262
FAX: (513) 529-4243
http://www.cas.muohio.edu/botany/bot/henry.html
http://www.muohio.edu/ecology/
http://www.muohio.edu/botany/
E Pluribus Unum
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: followup: Re: [R] Issue with predict() for glm models

2004-09-23 Thread Marc Schwartz
On Thu, 2004-09-23 at 12:02, Paul Johnson wrote:
 I have a follow up question that fits with this thread.
 
 Can you force an overlaid plot showing predicted values to follow the 
 scaling of the axes of the plot over which it is laid?
 
 Here is an example based on linear regression, just for clarity.  I have 
 followed the procedure described below to create predictions and now 
 want to plot the predicted values on top of a small section of the x-y 
 scatterplot.
 
 x - rnorm(100, 10, 10)
 e - rnorm(100, 0, 5)
 y - 5 + 10 *x + e
 
 myReg1 - lm (y~x)
 plot(x,y)
 newX - seq(1,10,1)
 myPred - predict(myReg1,data.frame(x=newX))
 
 Now, if I do this, I get 2 graphs overlaid but their axes do not line 
 up.
 
 par(new=T)
 plot(newX,myPred$fit)
 
 The problem is that the second one uses the whole width of the graph 
 space, when I'd rather just have it go from the small subset where its x 
 is defined, from 1 to 10.  Its stretching the range (1,10) for newX to 
 use the same scale that goes from (-15, 35) where it plots x
 
 I know abline() can do this for lm, but for some other kinds of models, 
 no  lines() method is provided, and so I am doing this the old fashioned 
 way.

Paul,

Instead of using plot() for the second set of points, use points():

x - rnorm(100, 10, 10)
e - rnorm(100, 0, 5)
y - 5 + 10 * x + e

myReg1 - lm (y ~ x)
plot(x, y)

newX - seq(1, 10, 1)
myPred - predict(myReg1, data.frame(x = newX))

points(newX, myPred$fit, pch = 19)


This will preserve the axis scaling. If you use plot() without
explicitly indicating xlim and ylim, it will automatically scale the
axes based upon your new data, even if you indicated that the underlying
plot should not be cleared.

Alternatively, you could also use the lines() function, which will draw
point to point lines:

lines(newX, myPred$fit, col = red)

If you want fitted lines and prediction/confidence intervals, you could
use a function like matlines(), presuming that a predict method exists
for the model type you want to use.

There is an example of using this in R Help Desk in R News Vol 3
Number 2 (October 2003), in the first example, with a standard linear
regression model.

HTH,

Marc Schwartz

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] fitting weibull distribution

2004-09-23 Thread Christian Ritz
Hi.

I think there may be one or more zeros in your data set, causing the
problem:

x - rgamma(100)
fitdistr(x, weibull)
fitdistr(c(x,0), weibull)

Maybe you should omit the zeros.

Christian

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Issue with predict() for glm models

2004-09-23 Thread John Fox
Dear Uwe, 

 -Original Message-
 From: Uwe Ligges [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, September 23, 2004 11:37 AM
 To: John Fox
 Cc: [EMAIL PROTECTED]
 Subject: Re: [R] Issue with predict() for glm models
 
. . .

 
 John,
 
 note that I used glm(y ~ .) (the dot!),
 because the names are automatically chosen to be X.1 and X.2, 
 hence you cannot use X in the formula in this case ...
 
 Best,
 Uwe

OK -- I see. I did notice that you used . in the formula but didn't make the
proper connection!

Thanks,
 John

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] decompose a correlation matrix

2004-09-23 Thread Mark Strivens
Is there a simple way to decompose the upper triangle
of a correlation matrix to a linear list;

For example:

  X Y Z
X 1 2 3
Y 2 1 4
Z 3 4 1

so you get a list like:

xy 2
XZ 3
YZ 4

I suspect you can do it with a matrix transformation, but
that beyond me at present.

Many thanks

Mark
_
Department of Molecular and Human Genetics,
Baylor College of Medicine,

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R vs EViews - serial correlation

2004-09-23 Thread Patrick Brandt
The issue here is that you have confused an AR(1) process for the 
variable of interest with an AR(1) process for its residuals.  The 
former is a true AR(1) process, while the latter is really an MA(1) 
process, in terms of Box-Jenkins style ARIMA models.

Interpreting your original post:
I met with some problems when dealing with a time series with serial correlation.
FIRST, I generate a series with correlated errors
set.seed(1)
x=1:50
y=x+arima.sim(n = 50, list(ar = c(0.47)))

This generates an ARIMA(1,0,0) dataset of 50 observations.  So the d.v. 
has an AR(1) process.

SECOND, I estimate three constants (a, b and rho) in the model Y=a+b*X+u, where u=rho*u(-1)+eps
 
library(nlme)
gls(y~x,correlation = corAR1(0.5)) # Is it the right procedure?

Coefficients:
(Intercept)   x 
  0.1410465   1.0023341 

Correlation Structure: AR(1)
 Formula: ~1 
 Parameter estimate(s):
 Phi 
0.440594 
Degrees of freedom: 50 total; 48 residual
Residual standard error: 0.9835158


This estimates an AR(1) error process model -- or an ARIMA(0,0,1).  By 
the Wold decomposition, the AR(1) dgp we should expect the MA(q) process 
to have a longer order than 1.

THIRD, I do the same procedure with EViews as LS Y C X AR(1) and get
Y = 0.1375 + 1.0024*X + [AR(1)=0.3915]
This fits an AR(1) error process (ARIMA(0,0,1)) if I recall my Eviews 
syntax.

My problem is actually connected with the fitting procedure. As far as I understand 

gls(y~x,correlation = corAR1(0.5))$fit
is obtained through the linear equation 0.1410+1.0023*X while in EViews through the nonlinear equation 

Y=rho*Y(-1) + (1-rho)*a+(X-rho*X(-1))*b
where either dynamic or static fitting procedures are applied. 

X   YYF_DYF_S gls.fit
1   1  1.1592  NA  NA  1.1434
2   2  3.5866  2.1499  2.1499  2.1457
3   3  4.1355  3.1478  3.7103  3.1480
4   4  3.9125  4.1484  4.5352  4.1504
5   5  2.7442  5.1502  5.0578  5.1527
6   6  6.0647  6.1523  5.2103  6.1551
7   7  6.9855  7.1547  7.1203  7.1574
.
47 47 49.4299 47.2521 47.5288 47.2507
48 48 48.7748 48.2545 49.1072 48.2531
49 49 48.3200 49.2570 49.4607 49.2554
50 50 50.2501 50.2594 49.8926 50.2578
Again, both of the models you fit are NOT the DGP you simulated.
All gls.fit values are on a line, YF_D (D for dynamic) soon begin
to follow a line and YF_S try to mimic Y.
Correct since you are fitting a model of correlated INNOVATIONS, rather 
than a model of correlated Y's

My question is: do R and EViews estimate the same model? If yes, why
the estimates are different and which of the two (three?) procedures
is correct?
If you want to fit the DGP you proposed above, you should use
arima(y, order=c(1,0,0), xreg=x)
Hope that helps.
--
Patrick T. Brandt
Assistant Professor
Department of Political Science
University of North Texas
[EMAIL PROTECTED]
http://www.psci.unt.edu/~brandt
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] decompose a correlation matrix

2004-09-23 Thread Duncan Murdoch
On Thu, 23 Sep 2004 13:08:53 -0500, Mark Strivens
[EMAIL PROTECTED] wrote :

Is there a simple way to decompose the upper triangle
of a correlation matrix to a linear list;

For example:

  X Y Z
X 1 2 3
Y 2 1 4
Z 3 4 1

so you get a list like:

xy 2
XZ 3
YZ 4

I suspect you can do it with a matrix transformation, but
that beyond me at present.

x[ row(x)  col(x) ]

will give the entries you want.

Duncan Murdoch

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] R glm

2004-09-23 Thread Liaw, Andy
As ?glm says, see ?family.

Andy

 From: Kahra Hannu
 
 In Venables  Ripley: Modern Applied Statistics with S 
 (MASS), (4th edition), on page 184 there is a table Families 
 and link functions that gives you the available links with 
 different families. The default and the only link with the 
 gaussian family is identity.
 
 ciao,
 Hannu Kahra
 
 From: Shuangge Ma
 
 Hello:
 would you please help me with the following glm question?
 
 for the R function glm, what I understand is: once you specify the
 family, then the link function is fixed.
 
 My question is: is it possible I use, for example, log link 
 function,
 but the estimation approach for the guassian family?
 
 Thanks,
 
 Shuangge Ma, Ph.D.
 
 * CHSCC, Department of Biostatistics   *
 * University of Washington *
 * Building 29, Suite 310, 6200 NE 74th ST. *
 * Seattle, WA 98115*
 * Tel: 206-685-7123 Fax: 206-616-4075  *
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] decompose a correlation matrix

2004-09-23 Thread Baskin, Robert
This may not have things in the order you want but you can see if it gets
close to what you want:

 x - matrix(1:16,ncol=4)
 x
 [,1] [,2] [,3] [,4]
[1,]159   13
[2,]26   10   14
[3,]37   11   15
[4,]48   12   16
 y - x[row(x)  col(x)]
 y
[1]  5  9 10 13 14 15
 

bob



-Original Message-
From: Mark Strivens [mailto:[EMAIL PROTECTED] 
Sent: Thursday, September 23, 2004 2:09 PM
To: [EMAIL PROTECTED]
Subject: [R] decompose a correlation matrix


Is there a simple way to decompose the upper triangle
of a correlation matrix to a linear list;

For example:

  X Y Z
X 1 2 3
Y 2 1 4
Z 3 4 1

so you get a list like:

xy 2
XZ 3
YZ 4

I suspect you can do it with a matrix transformation, but
that beyond me at present.

Many thanks

Mark
_
Department of Molecular and Human Genetics,
Baylor College of Medicine,

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] decompose a correlation matrix

2004-09-23 Thread Mark Strivens
Thanks for the answers the appear to be just
right. 

i.e. 

corrmat[upper.tri(corrmat)]

or

x[col(x)row(x)] 

The slight problem is I lose all the column 
row labels from the correlation matrix, so I 
am not sure of the order of the values (i.e.
which combination of markers the value 
represents).

Is there anyway to drag the labels along with
the value extraction?
_
Department of Molecular and Human Genetics,
Baylor College of Medicine,

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] decompose a correlation matrix

2004-09-23 Thread Liaw, Andy
 From: Duncan Murdoch
 
 On Thu, 23 Sep 2004 13:08:53 -0500, Mark Strivens
 [EMAIL PROTECTED] wrote :
 
 Is there a simple way to decompose the upper triangle
 of a correlation matrix to a linear list;
 
 For example:
 
   X Y Z
 X 1 2 3
 Y 2 1 4
 Z 3 4 1
 
 so you get a list like:
 
 xy 2
 XZ 3
 YZ 4
 
 I suspect you can do it with a matrix transformation, but
 that beyond me at present.
 
 x[ row(x)  col(x) ]
 
 will give the entries you want.
 
 Duncan Murdoch

Perhaps slightly more transparent:

x[upper.tri(x)]

or

x[lower.tri(x)]

Best,
Andy

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] decompose a correlation matrix

2004-09-23 Thread Liaw, Andy
 From: Mark Strivens
 
 Thanks for the answers the appear to be just
 right. 
 
 i.e. 
 
 corrmat[upper.tri(corrmat)]
 
 or
 
 x[col(x)row(x)] 
 
 The slight problem is I lose all the column 
 row labels from the correlation matrix, so I 
 am not sure of the order of the values (i.e.
 which combination of markers the value 
 represents).
 
 Is there anyway to drag the labels along with
 the value extraction?
 _
 Department of Molecular and Human Genetics,
 Baylor College of Medicine,

idx - upper.tri(x)
cbind(row=row(x)[idx], col=col(x)[idx], value=x[idx])

Best,
Andy

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] decompose a correlation matrix

2004-09-23 Thread Ted Harding
On 23-Sep-04 Mark Strivens wrote:
 Is there a simple way to decompose the upper triangle
 of a correlation matrix to a linear list;
 
 For example:
 
   X Y Z
 X 1 2 3
 Y 2 1 4
 Z 3 4 1
 
 so you get a list like:
 
 xy 2
 XZ 3
 YZ 4
 
 I suspect you can do it with a matrix transformation, but
 that beyond me at present.

Let C be your correlation matrix. Then:

  C[lower.tri(C)]

or equivalently

  C[upper.tri(C)]

E.g. (in your notation):

   C
   [,1] [,2] [,3] [,4]
  [1,]1234
  [2,]2156
  [3,]3517
  [4,]4671

   (C[lower.tri(C)])
  [1] 2 3 4 5 6 7

See ?lower.tri

Best wishes,
Ted.



E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861   [NB: New number!]
Date: 23-Sep-04   Time: 20:37:36
-- XFMail --

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Issue with predict() for glm models

2004-09-23 Thread Uwe Ligges
John Fox wrote:
Dear Uwe, 


-Original Message-
From: Uwe Ligges [mailto:[EMAIL PROTECTED] 
Sent: Thursday, September 23, 2004 11:37 AM
To: John Fox
Cc: [EMAIL PROTECTED]
Subject: Re: [R] Issue with predict() for glm models

. . .

John,
note that I used glm(y ~ .) (the dot!),
because the names are automatically chosen to be X.1 and X.2, 
hence you cannot use X in the formula in this case ...

Best,
Uwe

OK -- I see. I did notice that you used . in the formula but didn't make the
proper connection!
Sorry, my first reply was too short and imprecisely.
Thank you to help clarifying things.
Uwe

Thanks,
 John
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] folding table into a matrix

2004-09-23 Thread Gene Cutler
I'm just getting started with R, so feel free to point me to the 
appropriate documentation if this is already answered somewhere (though 
I've been unable to find it myself).  This does seem like a rather 
basic question.
I want to fold a table into a matrix.  The table is formatted like so:

Column_Index  Value
1 486
2 688
3 447
4 555
5 639
1 950
2 881
3 1785
4 1216
1 612
2 790
3 542
4 1310
5 976
And I want to end up with something like this:
  [,1]  [,2]  [,3]  [,4]  [,5]
[1,]   486   688   447   555   639
[2,]   950   881  1785  1216NA
[3,]   612   790   512  1310   976
Since not all the rows are complete, I can't just reformat using 
matrix(), I need to go by the index information in the Column_Index 
column.  This seems like something simple to do, but I'm stumped.

Thanks.
-- Gene
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] folding table into a matrix

2004-09-23 Thread Spencer Graves
 Have you looked at index arrays in An Introduction to R, 
available as the first menu option after help.start() in R?  The 
version I got with R 1.9.1 for Windows includes the following: 

 x - array(1:20,dim=c(4,5))   # Generate a 4 by 5 array.
 x
 [,1] [,2] [,3] [,4] [,5]
[1,]159   13   17
[2,]26   10   14   18
[3,]37   11   15   19
[4,]48   12   16   20
 i - array(c(1:3,3:1),dim=c(3,2))
 i # i is a 3 by 2 index array.
 [,1] [,2]
[1,]13
[2,]22
[3,]31
 x[i]  # Extract those elements
[1] 9 6 3
 x[i] - 0 # Replace those elements by zeros.
 x
 [,1] [,2] [,3] [,4] [,5]
[1,]150   13   17
[2,]20   10   14   18
[3,]07   11   15   19
[4,]48   12   16   20

 The key point here is that i is a 3x2 array, and x[i] references 
the 3 elements of x that have the row and column indices in x.  Does 
this provide the information you need? 

 hope this helps.  spencer graves
Gene Cutler wrote:
I'm just getting started with R, so feel free to point me to the 
appropriate documentation if this is already answered somewhere 
(though I've been unable to find it myself).  This does seem like a 
rather basic question.
I want to fold a table into a matrix.  The table is formatted like so:

Column_Index  Value
1 486
2 688
3 447
4 555
5 639
1 950
2 881
3 1785
4 1216
1 612
2 790
3 542
4 1310
5 976
And I want to end up with something like this:
  [,1]  [,2]  [,3]  [,4]  [,5]
[1,]   486   688   447   555   639
[2,]   950   881  1785  1216NA
[3,]   612   790   512  1310   976
Since not all the rows are complete, I can't just reformat using 
matrix(), I need to go by the index information in the Column_Index 
column.  This seems like something simple to do, but I'm stumped.

Thanks.
-- Gene
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

--
Spencer Graves, PhD, Senior Development Engineer
O:  (408)938-4420;  mobile:  (408)655-4567
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Problems with boot and optim

2004-09-23 Thread Angelo Canty
Hi Luke,

I think your problem is in the function lik.hetprobit and not 
remembering that R is case sensitive so X and x are not the same.

The parameters passed in are called X, Y and Z which change for each
bootstrap dataset.  Within the function, however, your first three 
lines are
Y - as.matrix(y)
X - as.matrix(x)
Z - as.matrix(z)

since x, y, z (lowercase) do not exist in the function, they are being
sought in the global workspace which remains the same for each bootstrap
dataset so that after these three lines your X, Y and Z (uppercase) take
on these values no matter what was input to the function.

Replace x, y and z in the function by X, Y and Z and it should work.

HTH, Angelo
  
On Tue, 21 Sep 2004, Luke Keele wrote:

  
 I am trying to bootstrap the parameters for a model that is estimated
 through the optim() function and find that when I make the call to boot,
 it runs but returns the exact same estimate for all of the bootstrap
 estimates.  I managed to replicate the same problem using a glm() model
 but was able to fix it when I made a call to the variables as data frame
 by their exact names.  But no matter how I refer to the variables in the
 het.fit function (see below) I get the same result.  I could bootstrap
 it with the sample command and a loop, but then the analysis in the next
 step isn't as nice
  
 The code for the likelihood and the call to boot is below.  I have tried
 numerous other permutations as well.
  
 I am using R 1.9.1 on Windows XP pro.
  
 Thanks
  
 Luke Keele
  
  
 #Define Likelihood
 lik.hetprobit -function(par, X, Y, Z){
  
 #Pull Out Parameters
 Y - as.matrix(y)
 X - as.matrix(x)
 Z - as.matrix(z)
 K -ncol(X)
 M -ncol(Z)
 b - as.matrix(par[1:K])
 gamma - as.matrix(par[K+1:M])
  
 mu - (X%*%b)
 sd - exp(Z%*%gamma)
  
 mu.sd -(mu/sd)
  
 #Form Likelihood
  
log.phi - pnorm(ifelse(Y == 0, -1, 1) * mu.sd, log.p = TRUE)
2 * sum(log.phi)
 }
  
 y - as.matrix(abhlth)
 x - as.matrix(reliten) 
 ones = rep(1, nrow(x)) 
 x = cbind(ones,x) 
 z = as.matrix(abinfo)
  
 data.het - as.matrix(cbind(y,x,z))
  
 het.fit - function(data){
 mod - optim(c(1,0,0), lik.hetprobit, Y=data$y, X=data$x, Z=data$z,
 method=BFGS, 
control=list(fnscale=-1), hessian=T)
 c(mod$par)
 }
 case.fun - function(d,i)
 het.fit(d[i,])
  
 het.case - boot(data.het, case.fun, R=50)
 Luke Keele
 Post-Doctoral Fellow in Quantitative Methods
 Nuffield College, Oxford University
 Oxford, UK
  
 
   [[alternative HTML version deleted]]
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 

-- 
--
|   Angelo J. CantyEmail: [EMAIL PROTECTED] |
|   Mathematics and Statistics Phone: (905) 525-9140 x 27079 |
|   McMaster UniversityFax  : (905) 522-0935 |
|   1280 Main St. W. |
|   Hamilton ON L8S 4K1  |

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] folding table into a matrix

2004-09-23 Thread Peter Dalgaard
Gene Cutler [EMAIL PROTECTED] writes:

 I'm just getting started with R, so feel free to point me to the
 appropriate documentation if this is already answered somewhere
 (though I've been unable to find it myself).  This does seem like a
 rather basic question.
 I want to fold a table into a matrix.  The table is formatted like so:
 
 Column_Index  Value
 1 486
 2 688
 3 447
 4 555
 5 639
 1 950
 2 881
 3 1785
 4 1216
 1 612
 2 790
 3 542
 4 1310
 5 976
 
 And I want to end up with something like this:
 
[,1]  [,2]  [,3]  [,4]  [,5]
 [1,]   486   688   447   555   639
 [2,]   950   881  1785  1216NA
 [3,]   612   790   512  1310   976
 
 Since not all the rows are complete, I can't just reformat using
 matrix(), I need to go by the index information in the Column_Index
 column.  This seems like something simple to do, but I'm stumped.

It's not completely trivial, since you're relying on ordering
information: the missing col.5 value goes in the 2nd row, but you only
know that because values are ordered in row blocks.

If you supply the rows that things belong in, the task does becomes
simple:

Row_Index - rep(1:3,c(5,4,5))
M - matrix(NA, 3, 5)
M[cbind(Row_Index,Column_Index)] - Value

Now how to compute Row_Index from Column_Index? If you know that each
group starts with a 1, you might use (rename Column_Index as cc for
brevity) 

 cumsum(cc==1)
 [1] 1 1 1 1 1 2 2 2 2 3 3 3 3 3

If you can't make that assumption, you might consider something like

 cumsum(c(1,diff(cc)0))
 [1] 1 1 1 1 1 2 2 2 2 3 3 3 3 3


-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] decompose a correlation matrix

2004-09-23 Thread Mark Strivens
Thanks guys for the help, here's the final (grizzly?)
solution:

#generate a correlation matrix
cm-cor(someDataFrame, y = NULL ...)

# get the list of labels (included in the dataframe)
labels-labels(cm)[[1]]

# retrieve the uppper portion of the correlation matrix (as logical values)
idx - upper.tri(cm)

# combine the logical values with the marker list
mcm-cbind(marker=paste((markers[col(t(cm))[idx]],'',markers[row(t(cm))[idx
]]),
row=row(t(cm))[idx], col=col(t(cm))[idx], value=cm[idx])

gives the format:

label2  label1 1   2   0.97712
label3  label1 1   3   0.84587
label4  label1 1   4   0.92184
label5  label1 1   5   0.54698


There seemed to be some issues with the row()and col()
commands not thinking the output of cor() was a matrix
so I used t() to force it into a table...

_
Department of Molecular and Human Genetics,
Baylor College of Medicine,
1 Baylor Plaza,
Houston,
TX 77030.

713-798-1947 (work)
832-876-7956 (cell)

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Cox proportional hazards model

2004-09-23 Thread Frank E Harrell Jr
Min-Han Tan wrote:
Good afternoon,
I am currently trying to do some work on survival analysis.
- I hope to seek your advice re: 2 questions (1 general and 1 specific)
(1) I'm trying to do a stratified Cox analysis and subsequently
plot(survfit(object)). It seems to work for some strata, but not for
others.
I have tumor grade, which is a range of 1 - 4. 

When I divide this range of 1:4 into 2 groups, it works fine for
strata(grade2) and strata(grade  3). However, if I do a
strata(grade1), there is an error when I do a survfit( coxph object
)
Call: survfit.coxph(object = s)
Error in print.survfit(structure(list(n = as.integer(46), time = c(22,  : 
length of dimnames [1] not equal to array extent

(2) As a general question, is it possible to distinguish between
confounding and interaction in the Cox proportional hazards model?
Thanks!
Min-Han
The Design package provides another way to use the survival package.
library(Design)  # also issues library(Hmisc) so need to install both
d - datadist(mydataframe); options(datadist='d')
f - cph(Surv(...) ~ ... + strat(z), surv=TRUE)   # note strat instead 
of strata
survplot(f, z=vector of strata values to plot survival curves for)

survplot has many options
--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] moving average method for time series objects

2004-09-23 Thread Schwarz,Paul
Dear R-Help readers,

I suspect that this question must be a FAQ, but my investigation of the archives has 
not been very revealing.  Is there an R function for calculating moving averages of 
time series objects?

Thank you for your time and patience.

-Paul

Paul Schwarz, Ph.D.
Associate Director of Methodology
Gartner
Vendor Marketing Solutions, Custom Research
+1 503 241 8036 x186

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] RMySQL and Blob

2004-09-23 Thread jonathan_li
Hi,

I tried your suggestion to blindly import the blob into R, doing the following:

 con - dbConnect(MySQL, host=host, user=user, dbname=db)
 rs - dbGetQuery(con, statement=paste(select picture from db where id=1)

It didn't crash R. But rs is not usable. It seems that it has been converted to a 
character object.

In addition, we need to pass an image format to R, like png, bmp or something. It's 
unclear to me how to achieve this even if we can read rs as a binary object as you 
suggested.

Any ideas?

Thanks for suggestions!
Jonathan





-Original Message-
From: David James [mailto:[EMAIL PROTECTED]
Sent: Wednesday, September 22, 2004 10:06 AM
To: LI,JONATHAN (A-Labs,ex1)
Cc: David James
Subject: Re: [R] RMySQL and Blob


[EMAIL PROTECTED] wrote:
 Hi David,
 
 The application I have in mind is for images. In my case, size of images is known 
 and they are not big. As an example, a 64*32 image will have 2048 pixels. If they 
 are 8-bit grey-level pixels, the image occupies 2KB memory. 
 
 I may venture to guess that the unknown size and type of a blob object in MySQL 
 prevent it from being very usable in R since R doesn't have a datatype for a binary 
 blob?

You could just blindly try to import it into R (but do it on a clean
workspace, since it may crash R and you could loose your data!).
The underlying C code clearly identifies FIELD_TYPE_BLOB and goes
ahead and puts it in an R character vector (with comments clearly
stating that it is a hack).  Once it moves the data from the MySQL
result set buffer to the R vector, it computes the length in both
places and prints a warning if they differ.

Or you could try to hack something.  For instance, what happens if
instead of bringing the blob you import, say, as a string?
con - dbConnect(MySQL, )
rs - dbSendQuery(con, select SUBSTRING(blob, 0) from table)
dd - fetch(rs)

One possible general solution would be to define a new class
binaryConnection simmilar to textConnection, so that you
can readBin() and writeBin() from it.  In this way, blobs could
return a binary buffer (just a pointer to a block of C memory) 
that could be given to binaryConnection:

   data - fetch(rs)
   for(i in seq(nrow(data)){
  ## extract blobs from each row and create a binary connection
  bcon = binaryConnection(blobs$image[1])
  img = readBin(bcon, integer, n = 2048)
  
  ## work with the image
   }

let me know what happens if you try to naively import a blob...

--
David

 
 Thanks!
 Jonathan
 
 
 -Original Message-
 From: David James [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, September 22, 2004 7:05 AM
 To: LI,JONATHAN (A-Labs,ex1)
 Cc: [EMAIL PROTECTED]
 Subject: Re: [R] RMySQL and Blob
 
 
 Hi Jonathan,
 
 Currently RMySQL doesn't handle blob objects.  The mechanics of
 inserting and extracting blob objects by itself is not too hard,
 but issues such as how should blobs be made available to R, how to
 prevent buffers overflows, how to prevent huge blobs from exhausting
 the available memory, should R callback functions be invoked
 as chunks of the blob are brought in, etc., need more consideration.
 And these issues are not R/MySQL specific, but also relevant to
 other databases and other non-dbms interfaces.
 
 BTW there are R facilities (e.g., external pointers, finalizers) that 
 seems quite important for this type of implementation.  
 
 What type and how big are the blobs that want to import?
 
 --
 David
 
 [EMAIL PROTECTED] wrote:
  Dear R experts,
  
  Does RMySQL package handle Blob datatype in a MySQL database? Blob can represent 
  an image, a sound or some other 
  large and complex binary objects. In an article published by R-database special 
  interest group, named A common database interface (DBI) (updated June 2003),  
  it's mentioned in open issues and limitations that We need to carefully plan 
  how to deal with binary objects. 
  
  Before I invest time to try, I would appreciate any experts' opinions.
  
  Thanks,
  Jonathan
  
  __
  [EMAIL PROTECTED] mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 

-- 
David A. James
Statistics Research, Room 2C-276Phone:  (908) 582-3082   
Bell Labs, Lucent Technologies  Fax:(908) 582-3340
Murray Hill, NJ 09794-0636

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Intrduction of function

2004-09-23 Thread
Hi all: I've written a function and saved the worksapce.When the workspace of my 
function is re-open,I want display some introduction or step by step guidance of the 
function for the users.How can I do it? Thanks a lot! 

My best regards!

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] folding table into a matrix

2004-09-23 Thread Gabor Grothendieck
Gene Cutler gcutler at amgen.com writes:

: 
: I'm just getting started with R, so feel free to point me to the 
: appropriate documentation if this is already answered somewhere (though 
: I've been unable to find it myself).  This does seem like a rather 
: basic question.
: I want to fold a table into a matrix.  The table is formatted like so:
: 
: Column_Index  Value
: 1 486
: 2 688
: 3 447
: 4 555
: 5 639
: 1 950
: 2 881
: 3 1785
: 4 1216
: 1 612
: 2 790
: 3 542
: 4 1310
: 5 976
: 
: And I want to end up with something like this:
: 
:[,1]  [,2]  [,3]  [,4]  [,5]
: [1,]   486   688   447   555   639
: [2,]   950   881  1785  1216NA
: [3,]   612   790   512  1310   976
: 
: Since not all the rows are complete, I can't just reformat using 
: matrix(), I need to go by the index information in the Column_Index 
: column.  This seems like something simple to do, but I'm stumped.

Suppose z is the two column matrix which we wish
to reshape. Using Peter's cumsum expression to distinguish
the groups, tapply ts to each group turning each into
a ts object.  cbind these ts objects together (which 
has the effect of adding the NAs at the end automatically)
and finally transpose the result (which also turns it
back into a matrix).  The following one liner solves
the example problem:

t(do.call(cbind,tapply(z[,2], cumsum(z[,1]==1), ts)))


This approach also extends to the case where the indices in
column 1 have gaps.  In this case we need an irregular time
series package.  Using zoo we carry out the analogous
operations using the indices in column 1 for a group as the
times and the values in column 2 as the time series values.
We make use of zoo's multiway merge and Peter's second, more
general, cumsum expression to define the groups:

require(zoo)
f - function(i) zoo(z[i,2], z[i,1])
g - cumsum(c(1,diff(z[,1])0))
t(as.matrix(do.call(merge, tapply(1:nrow(z), g, f

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] moving average method for time series objects

2004-09-23 Thread Gabor Grothendieck
Schwarz,Paul Paul.Schwarz at gartner.com writes:

: 
: Dear R-Help readers,
: 
: I suspect that this question must be a FAQ, but my investigation of the 
archives has not been very revealing. 
: Is there an R function for calculating moving averages of time series 
objects?
: 
: Thank you for your time and patience.
: 
: -Paul

Check out:

https://stat.ethz.ch/pipermail/r-sig-finance/2004q3/000104.html

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] can I get these result with R?

2004-09-23 Thread rongguiwong
i want to check if the previously stock of social capital  has impact on the 
present economoc performace.so i get the following variables:economic 
performace ,stock of socail capital ;each of them from year 1996 to 2002,from 
30 different areas.
the data as following:
performance   stockyearereas
12,321996 1
11 12 1996 2
123  34   1996 3
¡­¡­¡­¡­¡­¡­
12,3321997 1
11 212 1997 2
123  32   1997 3
¡­¡­¡­¡­
13,2322002 1
12 212 2002 2
1233  2002 3

so i want to see if the 
1/stock[1996],stock[1997],stock[1998],stock[1999],sotck[2000],stock[2001],stock[2002] 
has impact on performance[2002]?
2/stock[1996],stock[1997],stock[1998],stock[1999],sotck[2000],stock[2001] has 
impact on performance[2001],
3/stock[1996],stock[1997],stock[1998],stock[1999],sotck[2000], has impact on 
performance[2000],
¡­¡­
and to see if the 
1/performace of year 1996,1997,1998,1999,2000,2001 has impact on performace of 
year 2002,
2/performace of year 1996,1997,1998,1999,2000 has impact on performace of year 
2001,
1/performace of year 1996,1997,1998,1999, has impact on performace of year 
2000,
¡­¡­

arima model seems not work,as i have data from 31 eareas,and i want to see if 
the differeces in economic performace of differenct areas has relation to the 
differences in the stock of social capital in each areas.
is it impossible mission?
any help will be appreciated.

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] SJava Issue

2004-09-23 Thread Anar Khan

Hi,

I have downloaded and installed SJava-0.68-0 on my linux box.  When I try
to initialise the JVM from within R this is what I get:

 .JavaInit()
(1) error initializing manager class can't find class java/lang/Hashtable
Error in .JavaInit() : Couldn't start Java Virtual Machine: can't find
class java/lang/Hashtable

My JAVA_HOME variable is set in my .tcshrc script, so it shouldn't
really have any problem finding java.  (And its strange that its looking
for Hashtable in java.lang!)

This problem was posted on the mailing list by someone else also using
the j2sdk1.4.2_01 JDK from Sun last year
(http://www.mail-archive.com/[EMAIL PROTECTED]/msg09080.html) and I
was wondering if anyone had figured out how to resolve it?


Cheers,

Anar

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html