[R] Difficult survival model

2007-08-16 Thread Daniel Malter
Hi all,

I would appreciate your advice how to model the survival model for the
following data, especially if it can be modeled in one model or if I should
(have to) break it up (i.e. assume that some events are independent of each
other, etc.).  Data is on an experimental stock market simulation. Dependent
variable is the hazard of transition from either holding the stock to
closing the position (-1) or from not holding the stock to opening a
position in that stock (1) in the variable CHNG. Main variables of interest
are the regressors IV1 and IV2 (which are covariates, not factors). BUY and
SELL indicate the number of stocks of type STOCK that were purchased/sold in
ROUND. Accordingly, NBT and NET record the number of STOCK held at the
beginning and the end of period ROUND, respectively. HP is a 0/1 dummy
indicating whether or not shares of STOCK were held by an investor in any
given period.

And these are the issues I see in the data:

1. Over time, the stock may be held several times. This suggests a
conditional hazard. TIMESHELD measures the number of times a stock has been
held (meaning that the stock position must have been closed inbetween).
Alternatively, it could measure the number of prior transactions in the
stock (all kinds). This I intend to model with strata.

2. I also have repeated measures for individuals. As some investor may be
more of a trader than others, I want to model this with a random effect
(frailty).

3. Different stocks may have a different risk of being traded. This I would
model with another frailty term (or would I rather model it differently?).

4. In case that an investor holds a stock, there are competing risks between
increasing or decreasing the position.

5. There are also competing risks, so to speak, between the different
stocks. That is, when an investor makes a buy or sell decision, s/he decides
between all available stocks. One stocks competes with all other stocks to
be purchased. When selling a stock though, one stock only competes with the
other different stocks in the portfolio to be sold, of course.

I have no idea how to model 4. and 5. though and if this is possible. I
would also appreciate your feedback on the suggested way of modeling 1., 2.,
and 3.

Apologies for the lengthy description and thanks much for your support,
Daniel


ID  ROUND   STOCK   BUY SELLNBT NET IV1 IV2 HP
CHNGTIMESHELD
1   1   A   0   0   0   0   3   4   0
0   0
1   2   A   10  0   0   10  3   5   1
1   1
1   3   A   0   0   10  10  5   4   1
0   1
1   4   A   0   10  10  0   2   1   1
-1  1
1   5   A   0   0   0   0   3   4   0
0   1
1   1   A   20  0   0   20  2   5   1
1   2
1   2   A   0   0   20  20  4   5   1
0   2
1   3   A   0   0   20  20  2   3   1
0   2
1   4   A   0   0   20  20  5   3   1
0   2
1   5   A   0   20  20  0   4   5   1
-1  2
2   1   B   15  0   0   15  1   2   1
1   1
2   2   B   0   0   15  15  1   2   1
0   1
2   3   B   0   0   15  15  2   2   1
0   1
2   4   B   0   15  15  0   5   1   1
-1  1
2   5   B   0   0   0   0   4   4   0
0   1
2   1   B   5   0   0   5   3   4   1
1   2
2   2   B   0   0   5   5   5   2   1
0   2
2   3   B   0   5   5   0   4   2   1
-1  2
2   4   B   0   0   0   0   4   1   0
0   2
2   5   B   0   0   0   0   4   1   0
0   2


PhD Program Strategy
Dept. of Management and Organization
Robert H. Smith School of Business   
University of Maryland
Van Munching Hall   
College Park, MD  20742
www.rhsmith.umd.edu
www.umd.edu

mailto:[EMAIL PROTECTED]
mailto:[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] residual plots for lmer in lme4 package

2007-08-16 Thread Margaret Gardiner-Garden
Hi,  

 

I was wondering if I might be able to ask some advice about doing residual
plots for the lmer function in the lme4 package. 

 

Our group's aim is to find if the expression staining of a particular gene
in a sample (or core)  is related to the pathology of the core.

To do this, we used the lmer function to perform a logistic mixed model
below.  I apologise in advance for the lack of subscripts.

 

 logit P(yij=1) = â0 + Ui + â1Patholij where Ui~N(0, óu2),

i indexes patient, j indexes measurement, Pathol is an indicator variable
(0,1) for benign

epithelium versus cancer and yij is the staining indicator (0,1) for each
core where yij equals 1 if the core stains positive and 0 otherwise. 

 

(I have inserted some example R code at the end of this message)

 

I was wondering if you knew how I could test that the errors Ui are normally
distributed in my fit.  I am not familiar with how to do residual plots for
a mixed logistic regression (or even for any logistic regression!). 

 

Any advice would be greatly appreciated!

 

Thanks and Regards

Marg

 

Example code:

 

lmer(Intensity.over2.hyp.canc~Pathology + (1|Patient.ID), data=
HSD17beta4.hyp.canc, family=binomial, na.action=na.omit)

 

 

   

#Family: binomial(logit link)

 #AIC  BIClogLik deviance

   # 414.1101 431.4147 -203.0550 406.1101

   #Random effects:

   # GroupsNameVarianceStd.Dev. 

   # Patient.ID (Intercept)  4.9558  2.2262 

   # of obs: 559, groups: Patient.ID, 177

 

   #Estimated scale (compare to 1)  0.6782544 

 

   #Fixed effects:

#Estimate Std. Error z value  Pr(|z|)

   #(Intercept)  -2.057340.24881 -8.2686  2.2e-16 ***

   #PathologyHyperplasia -1.766270.44909 -3.9330 8.389e-05 ***

 

NB. Intensity.over2.hyp.canc is the staining of the core (ie 0 or 1)

Pathology is Hyperplasia or Cancer

 

 

Dr Margaret Gardiner-Garden

Garvan Institute of Medical Research

384 Victoria Street

Darlinghurst Sydney

NSW 2010 Australia

 

Phone: 61 2 9295 8348

Fax: 61 2 9295 8321

 

 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] binomial simulation

2007-08-16 Thread Prof Brian Ripley
On Wed, 15 Aug 2007, Moshe Olshansky wrote:

 Thank you - I wasn't aware of this function.
 One can even use lchoose which allows really huge
 arguments (more than 2^1000)!

Using dbinom() for binomial probabilities would be even better, 
and that has a log=TRUE argument to return results on natural log scale.

 dbinom(k,N,p,log=TRUE) + dbinom(m,k,q,log=TRUE)
[1] -92.52584
 log(choose(N,k)*p^k*(1-p)^(N-k)*choose(k,m)*q^m*(1-q)^(k-m))
[1] -92.52584


 --- Lucke, Joseph F [EMAIL PROTECTED]
 wrote:

 C is an R function for setting contrasts in a
 factor.  Hence the funky
 error message.
 ?C

 Use choose() for your C(N,k)
 ?choose

 choose(200,2)
 19900

 choose(200,100)
  9.054851e+58

 N=200; k=100; m=50; p=.6; q=.95

 choose(N,k)*p^k*(1-p)^(N-k)*choose(k,m)*q^m*(1-q)^(k-m)
 6.554505e-41

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf
 Of Moshe Olshansky
 Sent: Wednesday, August 15, 2007 2:06 AM
 To: sigalit mangut-leiba; r-help
 Subject: Re: [R] binomial simulation

 No wonder that you are getting overflow, since
 gamma(N+1) = n! and 200!  (200/e)^200  10^370.
 There exists another way to compute C(N,k). Let me
 know if you need this
 and I will explain to you how this can be done.
 But do you really need to compute the individual
 probabilities? May be
 you need something else and there is no need to
 compute the individual
 probabilities?

 Regards,

 Moshe.

 --- sigalit mangut-leiba [EMAIL PROTECTED] wrote:

 Thank you,
 I'm trying to run the joint probabilty:

 C(N,k)*p^k*(1-p)^(N-k)*C(k,m)*q^m*(1-q)^(k-m)

 and get the error: Error in C(N, k) : object not
 interpretable as a
 factor

 so I tried the long way:

 gamma(N+1)/(gamma(k+1)*(gamma(N-k)))

 and the same with k, and got the error:

 1: value out of range in 'gammafn' in: gamma(N +
 1)
 2: value out of range in 'gammafn' in: gamma(N -
 k) 

 Do you know why it's not working?

 Thanks again,

 Sigalit.



 On 8/14/07, Moshe Olshansky
 [EMAIL PROTECTED]
 wrote:

 As I understand this,
 P(T+ | D-)=1-P(T+ | D+)=0.05
 is the probability not to detect desease for a
 person
 at ICU who has the desease. Correct?

 What I asked was whether it is possible to
 mistakenly
 detect the desease for a person who does not
 have
 it?

 Assuming that this is impossible the formula is
 below:

 If there are N patients, each has a probability
 p
 to
 have the desease (p=0.6 in your case) and q is
 the probability to
 detect the desease for a person who
 has
 it (q = 0.95 for ICU and q = 0.8 for a regular
 unit),
 then

 P(k have the desease AND m are detected) = P(k
 have the desease)*P(m

 are detected / k have
 the
 desease) =
 C(N,k)*p^k*(1-p)^(N-k)*C(k,m)*q^m*(1-q)^(k-m)
 where C(a,b) is the Binomial coefficient a
 above
 b -
 the number of ways to choose b items out of a
 (when
 the order does not matter). You of course must
 assume
 that N = k = m = 0 (otherwise the probability
 is
 0).

 To generate such pairs (k infected and m
 detected)
 you
 can do the following:

 k - rbinom(N,1,p)
 m - rbinom(k,1,q)

 Regards,

 Moshe.

 --- sigalit mangut-leiba [EMAIL PROTECTED]
 wrote:

 Hi,
 The probability of false detection is: P(T+ |
 D-)=1-P(T+ |
 D+)=0.05.
 and I want to find the joint probability
 P(T+,D+)=P(T+|D+)*P(D+)
 Thank you for your reply,
 Sigalit.


 On 8/13/07, Moshe Olshansky
 [EMAIL PROTECTED]
 wrote:

 Hi Sigalit,

 Do you want to find the probability P(T+ = t
 AND
 D+ =
 d) for all the combinations of t and d (for
 ICU
 and
 Reg.)?
 Is the probability of false detection (when
 there
 is
 no disease) always 0?

 Regards,

 Moshe.

 --- sigalit mangut-leiba [EMAIL PROTECTED]
 wrote:

 hello,
 I asked about this simulation a few days
 ago,
 but
 still i can't get what i
 need.
 I have 2 units: icu and regular. from icu
 I
 want
 to
 take 200 observations
 from binomial distribution, when
 probability
 for
 disease is: p=0.6.
 from regular I want to take 300
 observation
 with
 the
 same probability: p=0.6
 .
 the distribution to detect disease when
 disease
 occurred- *for someone from
 icu* - is: p(T+ | D+)=0.95.
 the distribution to detect disease when
 disease
 occurred- *for someone from
 reg.unit* - is: p(T+ | D+)=0.8.
 I want to compute the joint distribution
 for
 each

 === message truncated ===

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list

Re: [R] Possible to import histograms in R?

2007-08-16 Thread Nick Chorley
On 15/08/07, Ted Harding [EMAIL PROTECTED] wrote:

 So you if you want the density plot, you would need to calculate
 this for yourself. E.g.

 H1$density - counts/sum(counts)
 plot.histogram(H1,freq=FALSE)


Oddly, plot.histogram doesn't work in my version of R (which is 2.5.1), even
though R can give me the help page for it. plot() works fine though.

Thanks again!

NC

And so on ... there are many relevant details in the help pages
 ?hist and ?plot.histogram

 Best wishes,
 Ted.

 
 E-Mail: (Ted Harding) [EMAIL PROTECTED]
 Fax-to-email: +44 (0)870 094 0861
 Date: 15-Aug-07   Time: 11:53:14
 -- XFMail --


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] function to find coodinates in an array

2007-08-16 Thread Ana Conesa
Dear list,

I am looking for a function/way to get the array coordinates of given
elements in an array. What I mean is the following:
- Let X be a 3D array
- I find the ordering of the elements of X by ord - order(X) (this
returns me a vector)
- I now want to find the x,y,z coordinates of each element of ord

Can anyone help me?

Thanks!

Ana

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] summarising systemfit with saveMemory

2007-08-16 Thread Tim Calkins
Hi all -

I'm on R 2.5.1 for XP.

in the systemfit package, the summary is set to print the McElroy's
measure of fit unless it's NULL.  When the option saveMemory = TRUE,
the McElroy isn't included, instead it defaults to NA.  Thus I am
unable to use summary.systemfit.

 library(systemfit)
 example(systemfit)
 surfit2 - systemfit(SUR,system,data=Kmenta,saveMemory=T)
 summary(surfit2)

As far as I can tell, this is a result of the following code.

 print.summary.systemfit

--SNIP--

if (!is.null(x$mcelr2)) {
cat(McElroy's R-squared value for the system: )
cat(formatC(x$mcelr2, digits = digits, width = -1))
cat(\n)
}

--SNIP--

combined with

 systemfit

--SNIP--
if (!saveMemory) {
   --SNIP--
mcelr2 - 1 - (rtOmega %*% resids)/denominator
}
else {
mcelr2 - NA
}

--SNIP--

Or am I missing something here.  Thanks in advance.

-- 
Tim Calkins
0406 753 997

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] an easy way to construct this special matirx

2007-08-16 Thread shaowenhua
Hi,
Sorry if this is a repost. I searched but found no results.
I am wondering if it is an easy way to construct the following matrix:

r  1 0 00
r^2   r 1 00
r^3   r^2  r 10
r^4   r^3  r^2  r1

where r could be any number. Thanks.
Wen
[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] time series with quality codes

2007-08-16 Thread Felix Andrews
list(...),

I am working with environmental time series (eg rainfall, stream flow)
that have attached quality codes for each data point. The quality
codes have just a few factor levels, like good, suspect, poor,
imputed. I use the quality codes in plots and summaries. They are
carried through when a time series is aggregated to a longer
time-step, according to rules like worst, median or mode.

I need to support time steps of anything from hours to years. I can
assume the data are regular time series -- they might be irregular
initially but could be 'regularized'. But I would want to plot
irregular time series along with regular ones.

So far I have been using a data frame with a POSIXct column, a numeric
column and a factor column. However I would like to use zoo instead,
because of its many utility functions and easy conversion to ts. Is
there any prospect of zoo handling such numeric + factor data? Other
suggestions on elegant ways to do it are also welcome.

Felix

-- 
Felix Andrews / 安福立
PhD candidate
Integrated Catchment Assessment and Management Centre
The Fenner School of Environment and Society
The Australian National University (Building 48A), ACT 0200
Beijing Bag, Locked Bag 40, Kingston ACT 2604
http://www.neurofractal.org/felix/
xmpp:[EMAIL PROTECTED]
3358 543D AAC6 22C2 D336  80D9 360B 72DD 3E4C F5D8

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] an easy way to construct this special matirx

2007-08-16 Thread Ted Harding
On 16-Aug-07 03:10:17, [EMAIL PROTECTED] wrote:
 Hi,
 Sorry if this is a repost. I searched but found no results.
 I am wondering if it is an easy way to construct the following matrix:
 
 r 1000
 r^2   r100
 r^3   r^2  r10
 r^4   r^3  r^2  r1
 
 where r could be any number. Thanks.
 Wen

I dare say there's an even simpler way (and I feel certain
someone will post one ... ); but (example):

r-0.1
r1-r^((-3):0); r2-r^(3:0)
R-r1%*%t(r2)
R
##  [,1]  [,2]  [,3] [,4]
##[1,] 1.000 10.00 100.0 1000
##[2,] 0.100  1.00  10.0  100
##[3,] 0.010  0.10   1.0   10
##[4,] 0.001  0.01   0.11

R[upper.tri(R)]-0
R
##  [,1] [,2] [,3] [,4]
##[1,] 1.000 0.00  0.00
##[2,] 0.100 1.00  0.00
##[3,] 0.010 0.10  1.00
##[4,] 0.001 0.01  0.11

Best wishes,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 16-Aug-07   Time: 09:37:28
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] an easy way to construct this special matirx

2007-08-16 Thread Prof Brian Ripley
?toeplitz
?lower.tri

since it is the lower triangle of a Toeplitz matrix (or drop the top row)

r - 0.95
R - toeplitz(r^(0:4))
R[upper.tri(R)] - 0
R[-1,]


On Thu, 16 Aug 2007, [EMAIL PROTECTED] wrote:

 Hi,
 Sorry if this is a repost. I searched but found no results.
 I am wondering if it is an easy way to construct the following matrix:

 r  1 0 00
 r^2   r 1 00
 r^3   r^2  r 10
 r^4   r^3  r^2  r1

 where r could be any number. Thanks.
 Wen

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] time series with quality codes

2007-08-16 Thread Achim Zeileis

On Thu, 16 Aug 2007, Felix Andrews wrote:


list(...),

I am working with environmental time series (eg rainfall, stream flow)
that have attached quality codes for each data point. The quality
codes have just a few factor levels, like good, suspect, poor,
imputed. I use the quality codes in plots and summaries. They are
carried through when a time series is aggregated to a longer
time-step, according to rules like worst, median or mode.

I need to support time steps of anything from hours to years. I can
assume the data are regular time series -- they might be irregular
initially but could be 'regularized'. But I would want to plot
irregular time series along with regular ones.

So far I have been using a data frame with a POSIXct column, a numeric
column and a factor column. However I would like to use zoo instead,
because of its many utility functions and easy conversion to ts. Is
there any prospect of zoo handling such numeric + factor data? Other
suggestions on elegant ways to do it are also welcome.


There is some limited support for this in zoo. You can do
  z - zoo(myfactor, myindex)
and work with it like a zoo series and then
  coredata(z)
will recover a factor. However, you cannot bind this to other series 
without losing the factor structure. At least not in a plain zoo series. 
But you can do

  df - merge(z, Z, retclass = data.frame)
where every column of the resulting data.frame is a univariate zoo series.

The final option would be to just have a data.frame as usual and put your 
data/index into one column. But then it's more difficult to leverage zoo's 
functionality.


I would like to have more support for things like this, but currently this 
is what we have.


Best,
Z


Felix

--
Felix Andrews / 安福立
PhD candidate
Integrated Catchment Assessment and Management Centre
The Fenner School of Environment and Society
The Australian National University (Building 48A), ACT 0200
Beijing Bag, Locked Bag 40, Kingston ACT 2604
http://www.neurofractal.org/felix/
xmpp:[EMAIL PROTECTED]
3358 543D AAC6 22C2 D336  80D9 360B 72DD 3E4C F5D8

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] (coxph, se) Obtaining standard errors of coefficients from coxph to store

2007-08-16 Thread David Lloyd
Hi all,

I'm wanting to be able to find and store the z-score of coxph below: -

modz=coxph(Surv(TSURV,STATUS)~RAGE+DAGE+REG_WTIME_M+CLD_ISCH+POLY_VS,
data=kidneyT,method=breslow)


I know summary(modz) will give me this, but how do i extract the
standard error or z-score values in a similar way to obtaining the
coefficients by coef(modz) ? I think it must be something to do with
modz$var but I'm having a complete mental blank.

I need this info so I can write a function to use within a bootstrap so
I can record the number of times (proportion) each variable in the Cox
PH model is actually significant over all the bootstrap resamples.

Any assistance is greatly appreciated

DL 


Click to find local singles for dating, romance and fun.
http://tagline.bidsystem.com/fc/Ioyw36XJJVs581mfqGSywy0Z69Mq8VM03oVytPu
8otqP84CBZmNX2G/ 



span id=m2wTlpfont face=Arial, Helvetica, sans-serif size=2 
style=font-size:13.5px___BRGet
 the Free email that has everyone talking at a href=http://www.mail2world.com 
target=newhttp://www.mail2world.com/abr  font color=#99Unlimited 
Email Storage #150; POP3 #150; Calendar #150; SMS #150; Translator #150; 
Much More!/font/font/span
[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] to combine bwplot + srt option?

2007-08-16 Thread KOITA Lassana - STAC/ACE
Hi R-users,
Could someone help me to combine bwplot and srt option (exemple srt = 45 
degree or srt 90 degree)? My graphic contains 146 boxplots, I would like 
to label all of them. As you know, labels  are not readable.

Thank you for your help in advance!


Lassana KOITA 
Chargé d'Etudes de Sécurité Aéroportuaire et d'Analyse Statistique  / 
Project Engineer Airport Safety Studies  Statistical analysis
Service Technique de l'Aviation Civile (STAC) / Civil Aviation Technical 
Department 
Direction Générale de l'Aviation Civile (DGAC) / French Civil Aviation 
Headquarters
Tel: 01 49 56 80 60
Fax: 01 49 56 82 14
E-mail: [EMAIL PROTECTED]
http://www.stac.aviation-civile.gouv.fr/
[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] time series periodic data

2007-08-16 Thread Petr PIKAL
Thank you.

I will try to get the book, althoug I am not sure if I with my tiny 
knowledge of mathematics will be able to digest it. 

Meanwhile I tried to make 7 min average and then to reanalyze by spectrum, 
but the output was not very convincing.

Regards

Petr Pikal
[EMAIL PROTECTED]

Rolf Turner [EMAIL PROTECTED] napsal dne 15.08.2007 22:12:15:

 
 On 16/08/2007, at 12:26 AM, Petr PIKAL wrote:
 
  Dear all
 
  Please help me with analysis of some periodic data.
 
  I have an output from measurement each minute and this output is 
  modulated
  by rotation of the equipment (approx 6.5 min/revolution). I can easily
  spot this frequency from
 
  spectrum(mydata, some suitable span)
 
  However from other analysis I suspect there is a longer term 
  oscilation
  (about 70-80 min) I am not able to find it from mentioned data.
 
  Plese give me some hint how I could prove that such long term 
  modulation
  of my data exist in presence of quite strong modulation by this 
  short term
  oscilations.
 
You may be able to get some mileage out of complex demodulation
(at, say, frequencies of 1/70, 1/71, ..., 1/80 cycles per minute).
 
See Peter Bloomfield's book (``Fourier Analysis of Time Series:  An 
 Introduction'',
2nd ed., Wiley, 2000) for a good introduction to complex 
demodulation.
 
  cheers,
 
 Rolf Turner
 
 ##
 Attention: 
 This e-mail message is privileged and confidential. If you are not the 
 intended recipient please delete the message and notify the sender. 
 Any views or opinions presented are solely those of the author.
 
 This e-mail has been scanned and cleared by MailMarshal 
 www.marshalsoftware.com
 ##

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] to combine bwplot + srt option?

2007-08-16 Thread Dieter Menne
KOITA Lassana - STAC/ACE lassana.koita at aviation-civile.gouv.fr writes:

 Could someone help me to combine bwplot and srt option (exemple srt = 45 
 degree or srt 90 degree)? My graphic contains 146 boxplots, I would like 
 to label all of them. As you know, labels  are not readable.

You cannot, since bwplot is part of lattice and srt is standard R-graphics. I
agree, this can be very confusing, lattice was added later to R and I am happy
we have it now (thanks, Deepayan).

So when you see that the function you are using (bwplot) is documented under
lattice, always use the other options available in this package. These are
mostly documented under xyplot, which is worth a fixed link on the desktop.

Dieter

library(lattice)
bwplot(voice.part ~ height, data = singer,
   horizontal=FALSE,
   scales = list(x = list(rot = 45))) ## see xyplot

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lmer coefficient distributions and p values

2007-08-16 Thread Dieter Menne
Daniel Lakeland dlakelan at street-artists.org writes:

 
 We have used the lmer package to fit various models for the various
 experiments that she has done (random effects from multiple
 measurements for each animal or each trial, and fixed effects from
 developmental stage, and genotype etc). The results are fairly clear
 cut to me, and I suggested that she publish the results as coefficient
 estimates for the relevant contrasts, and their standard error
 estimates. However, she has read the statistical guidelines for the
 journal and they insist on p values.
 
 I personally think that p values, and sharp-null hypothesis tests are
 misguided and should be banned from publications, but it doesn't much
 matter what I think compared to what the editors want.
 
 Based on searching the archives, and finding this message:
 
 https://stat.ethz.ch/pipermail/r-help/2006-May/094765.html

.

From what you describe, using the stable function lme in nlme by the same 
author
Douglas Bates would do the job better for you. Remember lmer is under
development, which does not mean it's bad, but some nice things like weight are
still missing.
For lme, you have excellent documentation in the book by Pinheiro/Bates.

Dieter

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Polynomial fitting

2007-08-16 Thread Jon Minton
Remember that polynomials of the form 

y = b1*x + b2*x^2 + ... + bm*x^m

fit the linear regression equation form 

Y = beta_1*x_1 + beta_2*x_2 + ... + beta_m*x_m 

If one sets (from the 1st to the 2nd equation)
x - x_1
x^2 - x_2
x^3 - x_3
etc.

In R this is easy, just use the identity operator I() when specifying the
equation.
e.g. for a 3rd order polynomial:

model - lm(Y ~ x + I(x^2) + I(x^3) + I(x^4))

hth, Jon

***

I'm looking some way to do in R a polynomial fit, say like polyfit
function of Octave/MATLAB.

For who don't know, c = polyfit(x,y,m) finds the coefficients of a
polynomial p(x) of degree m that fits the data, p(x[i]) to y[i], in a
least squares sense. The result c is a vector of length m+1 containing
the polynomial coefficients in descending powers:
p(x) = c[1]*x^n + c[2]*x^(n-1) + ... + c[n]*x + c[n+1]

For prediction, one can then use function polyval like the following:

y0 = polyval( polyfit( x, y, degree ), x0 )

y0 are the prediction values at points x0 using the given polynomial.

In R, we know there is lm for 1-degree polynomial:
lm( y ~ x ) == polyfit( x, y, 1 )

and for prediction I can just create a function like:
lsqfit - function( model, xx ) return( xx * coefficients(model)[2] +
coefficients(model)[1] );
and then: y0 - lsqfit(x0)
(I've tried with predict.lm( model, newdata=x0 ) but obtain a bad result)

For a degree greater than 1, say m,  what can I use.??
I've tried with
   lm( y ~ poly(x, degree=m) )
I've also looked at glm, nlm, approx, ... but with these I can't
specify the polynomial degree.

Thank you so much!

Sincerely,

-- Marco




Checked by AVG Free Edition. 

16:55

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] to combine bwplot + srt option?

2007-08-16 Thread KOITA Lassana - STAC/ACE
Thank you for you quite and useful explanation. And do know how to sort 
them by median? 
best regargs

Lassana KOITA 
Chargé d'Etudes de Sécurité Aéroportuaire et d'Analyse Statistique  / 
Project Engineer Airport Safety Studies  Statistical analysis
Service Technique de l'Aviation Civile (STAC) / Civil Aviation Technical 
Department 
Direction Générale de l'Aviation Civile (DGAC) / French Civil Aviation 
Headquarters
Tel: 01 49 56 80 60
Fax: 01 49 56 82 14
E-mail: [EMAIL PROTECTED]
http://www.stac.aviation-civile.gouv.fr/



Dieter Menne [EMAIL PROTECTED] 
Envoyé par : [EMAIL PROTECTED]
16/08/2007 12:16

A
r-help@stat.math.ethz.ch
cc

Objet
Re: [R] to combine bwplot + srt option?






KOITA Lassana - STAC/ACE lassana.koita at aviation-civile.gouv.fr 
writes:

 Could someone help me to combine bwplot and srt option (exemple srt = 45 

 degree or srt 90 degree)? My graphic contains 146 boxplots, I would like 

 to label all of them. As you know, labels  are not readable.

You cannot, since bwplot is part of lattice and srt is standard 
R-graphics. I
agree, this can be very confusing, lattice was added later to R and I am 
happy
we have it now (thanks, Deepayan).

So when you see that the function you are using (bwplot) is documented 
under
lattice, always use the other options available in this package. These are
mostly documented under xyplot, which is worth a fixed link on the 
desktop.

Dieter

library(lattice)
bwplot(voice.part ~ height, data = singer,
   horizontal=FALSE,
   scales = list(x = list(rot = 45))) ## see xyplot

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ADF test

2007-08-16 Thread Megh Dal
Hi all,
   
  Hope you people do not feel irritated for repeatedly sending mail on Time 
series.
   
  Here I got another problem on the same, and hope I would get some answer from 
you.
   
  I have following dataset:
   
  data[,1]
  [1] 4.96 4.95 4.96 4.96 4.97 4.97 4.97 4.97 4.97 4.98 4.98 4.98 4.98 4.98 
4.99 4.99 5.00 5.01
 [19] 5.01 5.00 5.01 5.01 5.01 5.01 5.02 5.01 5.02 5.02 5.03 5.03 5.03 5.03 
5.03 5.04 5.04 5.04
 [37] 5.04 5.04 5.04 5.05 5.05 5.06 5.06 5.06 5.07 5.07 5.07 5.07 5.08 5.07 
5.08 5.08 5.09 5.10
 [55] 5.10 5.09 5.10 5.10 5.10 5.10 5.10 5.10 5.10 5.10 5.11 5.11 5.11 5.11 
5.11 5.11 5.11 5.12
 [73] 5.12 5.12 5.12 5.13 5.14 5.14 5.14 5.14 5.14 5.15 5.15 5.15 5.15 5.14 
5.15 5.15 5.15 5.16
 [91] 5.16 5.16 5.16 5.16 5.16 5.16 5.16 5.16 5.16 5.16 5.17 5.17 5.17 5.17 
5.17 5.18 5.18 5.18
[109] 5.18 5.18 5.19 5.19 5.20 5.20 5.20 5.20 5.20 5.21 5.21 5.21 5.21 5.21 
5.21 5.22 5.22 5.23
[127] 5.23 5.23 5.23 5.24 5.24 5.24 5.25 5.24 5.24 5.25 5.26 5.26 5.26 5.26 
5.26 5.26 5.26 5.27
[145] 5.27 5.26 5.27 5.27 5.28 5.29 5.29 5.29 5.29 5.30 5.30 5.30 5.31 5.31 
5.31 5.32 5.32 5.33
[163] 5.33

   
  Now I want to conduct a test for stationarity using ADF test :
   
   adf.test((data[,1]), stationary,  0)
  Augmented Dickey-Fuller Test
  data:  (data[, 1]) 
Dickey-Fuller = -3.7351, Lag order = 0, p-value = 0.02394
alternative hypothesis: stationary 

  But surprisingly it leads towards rejestion of NULL [p-value is less than 
0.05], i.e. indicates a possible stationary series. However ploting a graph of 
actual data set it doesn't seem so.
   
  Am I making any mistakes ? Can anyone give me any suggestion?
   
  Regards,
  Megh

   
-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] combine 2 data frames with missing values

2007-08-16 Thread nalluri pratap
Hi All,
   
  I have 2 data frames as follows:
   
  abc
  1   NA  1
  2   NA  2
  NA  3   3
   
  So a, b are the input values and c is the output which I am interested 
in.
  NA - Missing values. I used rbind, but its not working.
   
  Let me know if anyone can help me
   
  Thanks,
  Pratap

   
-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Combine matrix

2007-08-16 Thread Gianni Burgin
Hi R user,

I am new to R, and I have a very simple question for you. I have two matrix
A and B, with internally redundant rownames (but variables are different).
Some, but not all the rownames are shared among the two matrix. I want to
create a greater matrix that combines the previuos two, and has all the
possible combinations of matching rownames lines among matrix A and B.

looking for the solution I bumped in merge but actually works on data.frame,
and in dataframe there could be no redundancy in names.


can you help me??

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ungültige Versionsspezifikation

2007-08-16 Thread Mag. Ferri Leberl
Thank you. Hereby I send you the return to sessionInfo(). I have
meanwhile updated to 2.5.1.

R ist ein Gemeinschaftsprojekt mit vielen Beitragenden.
Tippen Sie 'contributors()' für mehr Information und 'citation()',
um zu erfahren, wie R oder R packages in Publikationen zitiert werden
können.

Tippen Sie 'demo()' für einige Demos, 'help()' für on-line Hilfe, oder
'help.start()' für eine HTML Browserschnittstelle zur Hilfe.
Tippen Sie 'q()', um R zu verlassen.

 library(cairo)
Fehler in package_version(vers) : ungültige Versionsspezifikation
 sessionInfo()
R version 2.5.1 (2007-06-27)
i486-pc-linux-gnu

locale:
LC_CTYPE=de_DE.UTF-8;LC_NUMERIC=C;LC_TIME=de_DE.UTF-8;LC_COLLATE=de_DE.UTF-8;LC_MONETARY=de_DE.UTF-8;LC_MESSAGES=de_DE.UTF-8;LC_PAPER=de_DE.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=de_DE.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets
methods
[7] base





Am Mittwoch, den 15.08.2007, 15:26 +0200 schrieb Martin Maechler:
  JK == John Kane [EMAIL PROTECTED]
  on Wed, 15 Aug 2007 08:55:59 -0400 (EDT) writes:
 
 JK I think we need more information about your system.
 JK Please run
 JK sessionInfo()
 JK and include the information in another posting.
 
 Yes, indeed.
 However R version 2.3.1 seems a bit too old to fit to a current
 version of cairo (rather 'cairoDevice' ??).
 
 And BTW: It is a *package*, not a library!!!
 
 Martin Maechler, ETH Zurich
 
 JK --- Mag. Ferri Leberl [EMAIL PROTECTED] wrote:
 
  Dear everybody,
  excuse me if this question ist trivial, however, I
  have now looked for
  an answer for quite a while and therefore dare
  placing it here.
  I want to export .svg-files and got here the advice
  to employ the
  cairo-library.
  I downloaded the *current*-version here and expanded
  it
  to /usr/local/lib/R/site-library.
  library(cairo) returned ungültige
  Versionsspezifikation (INVALID VERSION
  SPECIFICATION).
  I got some answer here (EPM39), but, stupid enough,
  I could not employ
  it as I don't know where to look for the variable
  named there.
  The R-version I employ is 2.3.1.
  Can anybody solve the (probably simple) problem?
  Thank you in advance.
  Yours, Ferri
 
 
 
 JK 
 
 JK __
 JK R-help@stat.math.ethz.ch mailing list
 JK https://stat.ethz.ch/mailman/listinfo/r-help
 JK PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 JK and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Odp: combine 2 data frames with missing values

2007-08-16 Thread Petr PIKAL
Hi

for this particular task

rowSums(cbind(a,b), na.rm=T)

gives you c column

Petr
[EMAIL PROTECTED]

[EMAIL PROTECTED] napsal dne 16.08.2007 13:03:51:

 Hi All,
 
   I have 2 data frames as follows:
 
   abc
   1   NA  1
   2   NA  2
   NA  3   3
 
   So a, b are the input values and c is the output which I am 
interested in.
   NA - Missing values. I used rbind, but its not working.
 
   Let me know if anyone can help me
 
   Thanks,
   Pratap
 
 
 -
 
[[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] combine 2 data frames with missing values

2007-08-16 Thread Stefan Grosse

   I have 2 data frames as follows:

   abc
   1   NA  1
   2   NA  2
   NA  3   3

   So a, b are the input values and c is the output which I am 
 interested in.
   NA - Missing values. I used rbind, but its not working.

   Let me know if anyone can help me

   

What exactly is your problem? To create such a data frame or what? What
did you do with rbind? Please be more explicit if you ask questions,
sometimes its hard to guess what the people want...

Stefan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Regression tree: labels in the terminal nodes

2007-08-16 Thread Juergen Kuehn
Dear everybody,


I'm a new user of R 2.4.1 and I'm searching for information on improving 
the output of regression tree graphs.

In the terminal nodes I am up to now able to indicate the number of 
values (n) and the mean of all values in this terminal node by the command

  text(tree, use.n=T, xpd=T)

Yet I would like to indicate automatically in the output graph of the 
tree some quality measure, e.g. impurity (or standard deviation .) 
and a character to identify which cases are in one terminal node, e.g. 
given a ID number or name.

Until now I did not discover in my R help scripts or in the R programm 
help how to do it. So I calculate impurity by hand, and I'm identifying 
the cases in each node by hand, and I am adding these values with a 
graphical software to the graph (as e.g. given *.jpg file). Therefore 
anybody can see that I added these values by hand. I'm quite sure that 
there is a more easy and faster way which looks (and is!) more professional.

Could anybody help me? That would be great!


Thank you very much for your support!


Dr. Jürgen Kühn
Leibniz-Centre for Agricultural Landscape Research (ZALF)
Institute of Soil Landscape Research_ 
http://www.zalf.de/home_zalf/institute/blf/blf_e/index.html_
Eberswalder Str. 84
D-15374 Müncheberg
Tel.: ++49/(0)33432/82-123
Fax: ++49/(0)33432/82-280
E-mail: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
Internet: http://www.zalf.de/home_zalf/institute/blf/blf/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] use AnnBuilderSourceUrls with local path insted of ftp adress

2007-08-16 Thread Daniela Albrecht
Hallo everybody.

I want to build my own GO package using the function GOPkgBuilder of 
AnnBuilder. It uses AnnBuilderSourceUrls to collect data from different ftp 
sites. These data are not complete for my organism, so I would like to change 
the ftp adresses to a local one. The changing itself is working but when I run 
the script I get the following Error:

Error in loadFromUrl(file.path(sourceURLs[[EG]], gene2go.gz)) : 
 URL /path/to/file/gene2go.gz is incorrect or the target site is not responding!

The full code of the script is this:

library(AnnBuilder)
list-options(AnnBuilderSourceUrls) 
list$AnnBuilderSourceUrls$EG-/path/to/file/
options(AnnBuilderSourceUrls=list$AnnBuilderSourceUrls)

GOPkgBuilder(pkgName = GO, pkgPath = ., filename = 
go_200706-termdb.obo-xml.gz, version = v.1.0, author = list(author = 
author, maintainer = maintainer [EMAIL PROTECTED]) )

Thanks in advance for any help.

Daniela Albrecht


-- 
Pt! Schon vom neuen GMX MultiMessenger gehört?
Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ADF test

2007-08-16 Thread gyadav

Hi Megh

i hope you have confused with 'what is my NULL hypothesis' ?
i suggest you to take any ideal dataset about which you know that whether 
it is stationary or not ? apply the test to know what is the NULL 
hypothesis 
used in any software :)
usually in many softwares the NULL hypothesis is in negative sense. Please 
everybody comment on this :)

hoping that you series is return series and not price series :). Thus 
applying adf test for your series :)
my test show that your series is not stationary at all as my correlalogram 
comes as follows. 
1
0.998283718
0.997582959
0.99703921
0.99665648
0.996548006
0.99647617
0.995925698
0.995317271
0.994746317
0.994727781
0.99508777
0.99501576
0.99437404
0.993338292
0.992684933
0.992310313

@@@ m
Although if i assume that your series is a price series and defining 
return = 100*ln(pt/pt-1). Returns become as follows
0
-0.201816416
0.201816416
0
0.201409937
0
0
0
0
0.201005093
0
0
0
0
0.200601873
0
0.200200267
0.199800266
0
-0.199800266
0.199800266
0
0
0
0.199401861
-0.199401861
0.199401861
0
0.199005041
0
0
0
0
0.198609797
0
0
0
0
0
0.19821612
0
0.197824001
0
0
0.19743343
0
0
0
0.197044399
-0.197044399
0.197044399
0
0.196656897
0.196270917
0
-0.196270917
0.196270917
0
0
0
0
0
0
0
0.195886449
0
0
0
0
0
0
0.195503484
0
0
0
0.195122013
0.194742028
0
0
0
0
0.194363521
0
0
0
-0.194363521
0.194363521
0
0
0.193986482
0
0
0
0
0
0
0
0
0
0
0.193610903
0
0
0
0
0.193236775
0
0
0
0
0.192864091
0
0.192492841
0
0
0
0
0.192123018
0
0
0
0
0
0.191754613
0
0.191387618
0
0
0
0.191022026
0
0
0.190657827
-0.190657827
0
0.190657827
0.190295015
0
0
0
0
0
0
0.18993358
0
-0.18993358
0.18993358
0
0.189573516
0.189214815
0
0
0
0.188857469
0
0
0.18850147
0
0
0.18814681
0
0.187793482
0
then the value of autocorrelations i.e. correlalogram comes as approx 
1
0.089252308
0.058227292
0.017934984
0.025264591
-0.014925678
-0.004668544
0.014890995
0.001625333
0.010669589
-0.010587179
-0.03000206
-0.011863654
0.00772247
0.024272208
-0.019521244
-0.035998575
-0.061608877
-0.048401231
-0.008594859

which show that the values are quite likely to make series stationary :)

 data[1:10,]
 V1 V2
1  4.96  0.000
2  4.95 -0.2018164
3  4.96  0.2018164
4  4.96  0.000
5  4.97  0.2014099
6  4.97  0.000
7  4.97  0.000
8  4.97  0.000
9  4.97  0.000
10 4.98  0.2010051
 adf.test(data[,1])

Augmented Dickey-Fuller Test

data:  data[, 1] 
Dickey-Fuller = -1.1052, Lag order = 5, p-value = 0.9188
alternative hypothesis: stationary 

 adf.test(data[,2])

Augmented Dickey-Fuller Test

data:  data[, 2] 
Dickey-Fuller = -6.2265, Lag order = 5, p-value = 0.01
alternative hypothesis: stationary 

Warning message:
p-value smaller than printed p-value in: adf.test(data[, 2]) 
 

this explains everything clearly :)
your NULL hypothesis is Series is not stationary - hence hypothesis in 
negative sense

prooved by taking ideal data

 data1-rnorm(1) #normal data
 adf.test(data1)

Augmented Dickey-Fuller Test

data:  data1 
Dickey-Fuller = -21.2118, Lag order = 21, p-value = 0.01
alternative hypothesis: stationary 

Warning message:
p-value smaller than printed p-value in: adf.test(data1) 
 

HTH




Megh Dal [EMAIL PROTECTED] 
Sent by: [EMAIL PROTECTED]
08/16/2007 04:27 PM

To
r-help@stat.math.ethz.ch
cc

Subject
[R] ADF test






Hi all,
 
  Hope you people do not feel irritated for repeatedly sending mail on 
Time series.
 
  Here I got another problem on the same, and hope I would get some answer 
from you.
 
  I have following dataset:
 
  data[,1]
  [1] 4.96 4.95 4.96 4.96 4.97 4.97 4.97 4.97 4.97 4.98 4.98 4.98 4.98 
4.98 4.99 4.99 5.00 5.01
 [19] 5.01 5.00 5.01 5.01 5.01 5.01 5.02 5.01 5.02 5.02 5.03 5.03 5.03 
5.03 5.03 5.04 5.04 5.04
 [37] 5.04 5.04 5.04 5.05 5.05 5.06 5.06 5.06 5.07 5.07 5.07 5.07 5.08 
5.07 5.08 5.08 5.09 5.10
 [55] 5.10 5.09 5.10 5.10 5.10 5.10 5.10 5.10 5.10 5.10 5.11 5.11 5.11 
5.11 5.11 5.11 5.11 5.12
 [73] 5.12 5.12 5.12 5.13 5.14 5.14 5.14 5.14 5.14 5.15 5.15 5.15 5.15 
5.14 5.15 5.15 5.15 5.16
 [91] 5.16 5.16 5.16 5.16 5.16 5.16 5.16 5.16 5.16 5.16 5.17 5.17 5.17 
5.17 5.17 5.18 5.18 5.18
[109] 5.18 5.18 5.19 5.19 5.20 5.20 5.20 5.20 5.20 5.21 5.21 5.21 5.21 
5.21 5.21 5.22 5.22 5.23
[127] 5.23 5.23 5.23 5.24 5.24 5.24 5.25 5.24 5.24 5.25 5.26 5.26 5.26 
5.26 5.26 5.26 5.26 5.27
[145] 5.27 5.26 5.27 5.27 5.28 5.29 5.29 5.29 5.29 5.30 5.30 5.30 5.31 
5.31 5.31 5.32 5.32 5.33
[163] 5.33

 
  Now I want to conduct a test for stationarity using ADF test :
 
   adf.test((data[,1]), stationary,  0)
  Augmented Dickey-Fuller Test
  data:  (data[, 1]) 
Dickey-Fuller = -3.7351, Lag order = 0, p-value = 0.02394
alternative hypothesis: stationary 

  But surprisingly it leads towards rejestion of NULL [p-value is less 
than 0.05], i.e. indicates a possible stationary series. However ploting a 
graph of actual data set it doesn't seem so.
 
  Am I making any mistakes ? Can anyone give me any suggestion?
 
  Regards,
  Megh

Re: [R] an easy way to construct this special matirx

2007-08-16 Thread Gabor Grothendieck
Here are two solutions.  In the first lo has TRUE on the lower diagonal
and diagonal. Then we compute the exponents, multiplying by lo to zero
out the upper triangle.  In the second rn is a matrix of row numbers
and rn = t(rn) is the same as lo in the first solution.

r - 2; n - 5 # test data

lo - lower.tri(diag(n), diag = TRUE)
lo * r ^ (row(lo) - col(lo) + 1)

Here is another one:

rn - row(diag(n))
(rn = t(rn)) * r ^ (rn - t(rn) + 1)

On 8/15/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 Hi,
 Sorry if this is a repost. I searched but found no results.
 I am wondering if it is an easy way to construct the following matrix:

 r  1 0 00
 r^2   r 1 00
 r^3   r^2  r 10
 r^4   r^3  r^2  r1

 where r could be any number. Thanks.
 Wen

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] an easy way to construct this special matirx

2007-08-16 Thread De-Jian,ZHAO
Hi wen,

I don't think it is easy to construct this matrix in a simple way. I
tried and found a way to do it. Try the following codes:

i-1:4
j-5
aa-matrix(0,4,5)
for (j in 1:5){aa[i,j]-(i+1-j)}
r-4 #r could be any number
bb-r^aa
bb[aa0]=0
bb

The matrix bb is what you want. Furthermore,I packaged this process
into a function called mtrx as below:

mtrx-function(row,clm,r){
i-1:row
j-clm
aa-matrix(row*clm,row,clm)
for (j in 1:clm){aa[i,j]-(i+1-j)}
#r could be any number
bb-r^aa
bb[aa0]=0
bb
}

Now you can use the function to produce the matrix.The
above-mentioned matrix is mtrx(4,5,4)

Dejian Zhao


On Thu, Aug 16, 2007 11:10, [EMAIL PROTECTED] wrote:
 Hi,
 Sorry if this is a repost. I searched but found no results.
 I am wondering if it is an easy way to construct the following
 matrix:

 r  1 0 00
 r^2   r 1 00
 r^3   r^2  r 10
 r^4   r^3  r^2  r1

 where r could be any number. Thanks.
 Wen
   [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
De-Jian Zhao
Institute of Zoology,Chinese Academy of Sciences
+86-10-64807217
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Polynomial fitting

2007-08-16 Thread Prof Brian Ripley
It is easier to use poly(raw=TRUE), and better to use poly() with 
orthogonal polynomials.

The original poster shows signs of having read neither the help for 
predict.lm nor the posting guide, and so almost certainly misused the 
predict method.


On Thu, 16 Aug 2007, Jon Minton wrote:

 Remember that polynomials of the form

 y = b1*x + b2*x^2 + ... + bm*x^m

 fit the linear regression equation form

 Y = beta_1*x_1 + beta_2*x_2 + ... + beta_m*x_m

 If one sets (from the 1st to the 2nd equation)
 x - x_1
 x^2 - x_2
 x^3 - x_3
 etc.

 In R this is easy, just use the identity operator I() when specifying the
 equation.
 e.g. for a 3rd order polynomial:

 model - lm(Y ~ x + I(x^2) + I(x^3) + I(x^4))

 hth, Jon

 ***

 I'm looking some way to do in R a polynomial fit, say like polyfit
 function of Octave/MATLAB.

 For who don't know, c = polyfit(x,y,m) finds the coefficients of a
 polynomial p(x) of degree m that fits the data, p(x[i]) to y[i], in a
 least squares sense. The result c is a vector of length m+1 containing
 the polynomial coefficients in descending powers:
 p(x) = c[1]*x^n + c[2]*x^(n-1) + ... + c[n]*x + c[n+1]

 For prediction, one can then use function polyval like the following:

 y0 = polyval( polyfit( x, y, degree ), x0 )

 y0 are the prediction values at points x0 using the given polynomial.

 In R, we know there is lm for 1-degree polynomial:
 lm( y ~ x ) == polyfit( x, y, 1 )

 and for prediction I can just create a function like:
 lsqfit - function( model, xx ) return( xx * coefficients(model)[2] +
 coefficients(model)[1] );
 and then: y0 - lsqfit(x0)
 (I've tried with predict.lm( model, newdata=x0 ) but obtain a bad result)

 For a degree greater than 1, say m,  what can I use.??
 I've tried with
   lm( y ~ poly(x, degree=m) )
 I've also looked at glm, nlm, approx, ... but with these I can't
 specify the polynomial degree.

 Thank you so much!

 Sincerely,

 -- Marco

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] time series with quality codes

2007-08-16 Thread Gabor Grothendieck
In addition, we could create a function to.df which converts a zoo
object to a data frame assuming that any column that only contains
1:nlevels is a factor with the indicated level names.  Use to.df just
before plotting:

library(zoo)
set.seed(1)
f - zoo(factor(sample(3, 10, replace = TRUE)))
x - zoo(rnorm(10))
y - zoo(rnorm(10))
z - merge(x, y, f)

to.df - function(z, levels = letters[1:3], time = FALSE) {
zz - as.data.frame(z)
for(i in ncol(zz))
if (all(zz[,i] %in% seq_along(levels)))
z[,i] - factor(levels[z[,i]])
if (time) cbind(index = index(z), zz) else zz
}

library(lattice)
xyplot(y ~ x | f, data = to.df(z))




On 8/16/07, Achim Zeileis [EMAIL PROTECTED] wrote:
 On Thu, 16 Aug 2007, Felix Andrews wrote:

  list(...),
 
  I am working with environmental time series (eg rainfall, stream flow)
  that have attached quality codes for each data point. The quality
  codes have just a few factor levels, like good, suspect, poor,
  imputed. I use the quality codes in plots and summaries. They are
  carried through when a time series is aggregated to a longer
  time-step, according to rules like worst, median or mode.
 
  I need to support time steps of anything from hours to years. I can
  assume the data are regular time series -- they might be irregular
  initially but could be 'regularized'. But I would want to plot
  irregular time series along with regular ones.
 
  So far I have been using a data frame with a POSIXct column, a numeric
  column and a factor column. However I would like to use zoo instead,
  because of its many utility functions and easy conversion to ts. Is
  there any prospect of zoo handling such numeric + factor data? Other
  suggestions on elegant ways to do it are also welcome.

 There is some limited support for this in zoo. You can do
   z - zoo(myfactor, myindex)
 and work with it like a zoo series and then
   coredata(z)
 will recover a factor. However, you cannot bind this to other series
 without losing the factor structure. At least not in a plain zoo series.
 But you can do
   df - merge(z, Z, retclass = data.frame)
 where every column of the resulting data.frame is a univariate zoo series.

 The final option would be to just have a data.frame as usual and put your
 data/index into one column. But then it's more difficult to leverage zoo's
 functionality.

 I would like to have more support for things like this, but currently this
 is what we have.

 Best,
 Z

  Felix
 
  --
  Felix Andrews / ��
  PhD candidate
  Integrated Catchment Assessment and Management Centre
  The Fenner School of Environment and Society
  The Australian National University (Building 48A), ACT 0200
  Beijing Bag, Locked Bag 40, Kingston ACT 2604
  http://www.neurofractal.org/felix/
  xmpp:[EMAIL PROTECTED]
  3358 543D AAC6 22C2 D336  80D9 360B 72DD 3E4C F5D8
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] CCC statistic in cluster analysis

2007-08-16 Thread silvia
Has anyone implemented CCC statistic (Sarle, 1983) in R? 

If so I would greatly appreciate the relevant script file. 

Any help will be much appreciated 

Best regards, Silvia

Mg. Silvia Graciela Valdano
Departamento de Ciencias Naturales
Facultad de Cs. Exactas, Físico-Químicas y Naturales
Universidad Nacional de Río Cuarto
[EMAIL PROTECTED]

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] several plots on several pages

2007-08-16 Thread Rainer M. Krug
Hi

  version
_
platform   i686-pc-linux-gnu
arch   i686
os linux-gnu
system i686, linux-gnu
status
major  2
minor  5.1
year   2007
month  06
day27
svn rev42083
language   R
version.string R version 2.5.1 (2007-06-27)



I want to create a pdf withe three graphs on a page and with two pages:

-
| 1 |
-
| 2 |
-
| 3 |
-

NEW PAGE

-
| 4 |
-
| 5 |
-
| 6 |
-

Graph 1 should ALWAYS be at that spot, graph two also, even if graph one 
produces an error when plotting (the area can be empty, but doesn't have 
to.)

I produced the foolowing code below, but I have a few problems:

1) how can I create a new page in the pdf?

2) how can I make sure that the second graph is in position 2 when graph 
one produces an error when plotting I(as in the example)? Everything 
works OK (for the firsat page) when graph one is plotted.

I have the feeling, that I am thinking to complicated.

Any help welcome,

Rainer


pdf(test.pdf)
try(
 {
 ## Set layout to three rows and only oine column
 par( mfcol=c(3,1), oma=c(0,0,0,0), mar=c(4, 4, 2, 2) )

 ## First row
 par(mfg=c(1,1))
 try( plot(runif(ff)) )

 ## Second row
 par(mfg=c(2,1))
 try( plot(runif(100)) )

 ## Third row
 par(mfg=c(3,1))
 plot(runif(1000))


 ## THE NEXT THREE SHOULD BE ON A NEW PAGE IN THE PDF

 ## Set layout to three rows and only oine column
 par( mfcol=c(3,1), oma=c(0,0,0,0), mar=c(4, 4, 2, 2) )

 ## First row
 par(mfg=c(1,1))
 try( plot(runif(ff)) )

 ## Second row
 par(mfg=c(2,1))
 try( plot(runif(100)) )

 ## Third row
 par(mfg=c(3,1))
 plot(runif(1000))

 }
 )
dev.off()


-- 
NEW EMAIL ADDRESS AND ADDRESS:

[EMAIL PROTECTED]

[EMAIL PROTECTED] WILL BE DISCONTINUED END OF MARCH

Rainer M. Krug, Dipl. Phys. (Germany), MSc Conservation
Biology (UCT)

Plant Conservation Unit
Department of Botany
University of Cape Town
Rondebosch 7701
South Africa

Tel:+27 - (0)21 650 5776 (w)
Fax:+27 - (0)86 516 2782
Fax:+27 - (0)21 650 2440 (w)
Cell:   +27 - (0)83 9479 042

Skype:  RMkrug

email:  [EMAIL PROTECTED]
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] several plots on several pages

2007-08-16 Thread ONKELINX, Thierry
Dear Rainer,

Have you considered using Sweave?

HTH,

Thierry



ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
[EMAIL PROTECTED]
www.inbo.be 

Do not put your faith in what statistics say until you have carefully
considered what they do not say.  ~William W. Watt
A statistical analysis, properly conducted, is a delicate dissection of
uncertainties, a surgery of suppositions. ~M.J.Moroney

 

 -Oorspronkelijk bericht-
 Van: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] Namens Rainer M. Krug
 Verzonden: donderdag 16 augustus 2007 14:58
 Aan: r-help
 Onderwerp: [R] several plots on several pages
 
 Hi
 
   version
 _
 platform   i686-pc-linux-gnu
 arch   i686
 os linux-gnu
 system i686, linux-gnu
 status
 major  2
 minor  5.1
 year   2007
 month  06
 day27
 svn rev42083
 language   R
 version.string R version 2.5.1 (2007-06-27)
 
 
 
 I want to create a pdf withe three graphs on a page and with 
 two pages:
 
 -
 | 1 |
 -
 | 2 |
 -
 | 3 |
 -
 
 NEW PAGE
 
 -
 | 4 |
 -
 | 5 |
 -
 | 6 |
 -
 
 Graph 1 should ALWAYS be at that spot, graph two also, even 
 if graph one produces an error when plotting (the area can be 
 empty, but doesn't have
 to.)
 
 I produced the foolowing code below, but I have a few problems:
 
 1) how can I create a new page in the pdf?
 
 2) how can I make sure that the second graph is in position 2 
 when graph one produces an error when plotting I(as in the 
 example)? Everything works OK (for the firsat page) when 
 graph one is plotted.
 
 I have the feeling, that I am thinking to complicated.
 
 Any help welcome,
 
 Rainer
 
 
 pdf(test.pdf)
 try(
  {
  ## Set layout to three rows and only oine column
  par( mfcol=c(3,1), oma=c(0,0,0,0), mar=c(4, 4, 2, 2) )
 
  ## First row
  par(mfg=c(1,1))
  try( plot(runif(ff)) )
 
  ## Second row
  par(mfg=c(2,1))
  try( plot(runif(100)) )
 
  ## Third row
  par(mfg=c(3,1))
  plot(runif(1000))
 
 
  ## THE NEXT THREE SHOULD BE ON A NEW PAGE IN THE PDF
 
  ## Set layout to three rows and only oine column
  par( mfcol=c(3,1), oma=c(0,0,0,0), mar=c(4, 4, 2, 2) )
 
  ## First row
  par(mfg=c(1,1))
  try( plot(runif(ff)) )
 
  ## Second row
  par(mfg=c(2,1))
  try( plot(runif(100)) )
 
  ## Third row
  par(mfg=c(3,1))
  plot(runif(1000))
 
  }
  )
 dev.off()
 
 
 --
 NEW EMAIL ADDRESS AND ADDRESS:
 
 [EMAIL PROTECTED]
 
 [EMAIL PROTECTED] WILL BE DISCONTINUED END OF MARCH
 
 Rainer M. Krug, Dipl. Phys. (Germany), MSc Conservation Biology (UCT)
 
 Plant Conservation Unit
 Department of Botany
 University of Cape Town
 Rondebosch 7701
 South Africa
 
 Tel:  +27 - (0)21 650 5776 (w)
 Fax:  +27 - (0)86 516 2782
 Fax:  +27 - (0)21 650 2440 (w)
 Cell: +27 - (0)83 9479 042
 
 Skype:RMkrug
 
 email:[EMAIL PROTECTED]
   [EMAIL PROTECTED]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] an easy way to construct this special matirx

2007-08-16 Thread De-Jian,ZHAO
Hi Gabor,

I am glad to see your answer,which gives a hope to resove this
question in an easy way. I replied to this question in a more
complex way before seeing your answer.

However,I think your code needs some revision, because the original
matrix is not a diagonal matrix. It has 4 rows and 5 columns.Looking
forward to your revised codes.

Best regards,

On Thu, Aug 16, 2007 20:22, Gabor Grothendieck wrote:
 Here are two solutions.  In the first lo has TRUE on the lower
 diagonal
 and diagonal. Then we compute the exponents, multiplying by lo to
 zero
 out the upper triangle.  In the second rn is a matrix of row numbers
 and rn = t(rn) is the same as lo in the first solution.

 r - 2; n - 5 # test data

 lo - lower.tri(diag(n), diag = TRUE)
 lo * r ^ (row(lo) - col(lo) + 1)

 Here is another one:

 rn - row(diag(n))
 (rn = t(rn)) * r ^ (rn - t(rn) + 1)

 On 8/15/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 Hi,
 Sorry if this is a repost. I searched but found no results.
 I am wondering if it is an easy way to construct the following
 matrix:

 r  1 0 00
 r^2   r 1 00
 r^3   r^2  r 10
 r^4   r^3  r^2  r1

 where r could be any number. Thanks.
 Wen

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
De-Jian Zhao
Institute of Zoology,Chinese Academy of Sciences
+86-10-64807217
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Combine matrix

2007-08-16 Thread Gianni Burgin
let say something like this


a=matrix(1:25, nrow=5)

rownames(a)=letters[1:5]
 colnames(a)=rep(A, 5)

 a
  A  A  A  A  A
a 1  6 11 16 21
b 2  7 12 17 22
c 3  8 13 18 23
d 4  9 14 19 24
e 5 10 15 20 25

 b=matrix(1:40, nrow=8)
 rownames(b)=c(rep(a,4),rep(b,4))
 colnames(b)=rep(B, 5)

 b
  B  B  B  B  B
a 1  9 17 25 33
a 2 10 18 26 34
a 3 11 19 27 35
a 4 12 20 28 36
b 5 13 21 29 37
b 6 14 22 30 38
b 7 15 23 31 39
b 8 16 24 32 40

as a results I wold like something like

  A  A  A  A  A  B  B  B  B  B
a 1  6 11 16 21  1  9 17 25 33
a 1  6 11 16 21  2 10 18 26 34
a 1  6 11 16 21  3 11 19 27 35
a 1  6 11 16 21  4 12 20 28 36
b 2  7 12 17 22  5 13 21 29 37
b 2  7 12 17 22  6 14 22 30 38
b 2  7 12 17 22  7 15 23 31 39
b 2  7 12 17 22  8 16 24 32 40


does it is clear? is there a function that automate this operation?


thank you very much!




On 8/16/07, jim holtman [EMAIL PROTECTED] wrote:

 Can you provide an example of what you mean; e.g., the two input
 matrices and the desired output.

 On 8/16/07, Gianni Burgin [EMAIL PROTECTED] wrote:
  Hi R user,
 
  I am new to R, and I have a very simple question for you. I have two
 matrix
  A and B, with internally redundant rownames (but variables are
 different).
  Some, but not all the rownames are shared among the two matrix. I want
 to
  create a greater matrix that combines the previuos two, and has all the
  possible combinations of matching rownames lines among matrix A and B.
 
  looking for the solution I bumped in merge but actually works on
 data.frame,
  and in dataframe there could be no redundancy in names.
 
 
  can you help me??
 
 [[alternative HTML version deleted]]
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem you are trying to solve?


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Question about sm.options sm.survival

2007-08-16 Thread Rachel Jia

Hi, there:

It's my first time to post question in this forum, so thanks for your
tolerance if my question is too naive. I am using a nonparametric smoothing
procedure in sm package to generate smoothed survival curves for continuous
covariate. I want to truncate the suvival curve and only display the part
with covariate value between 0 and 7. The following is the code I wrote:

sm.options(list(xlab=log_BSI_min3_to_base, xlim=c(0,7), ylab=Median
Progression Prob))
sm.survival(min3.base.prog.cen[,3],min3.base.prog.cen[,2],min3.base.prog.cen[,1],h=sd(min3.base.prog.cen[,3]),status.code=1
)

But the xlim option does not work. Can anyone help me with this problem?
Thanks a lot.

Rachel
-- 
View this message in context: 
http://www.nabble.com/Question-about-sm.options---sm.survival-tf4279605.html#a12181263
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] several plots on several pages

2007-08-16 Thread Rainer M. Krug
Hi Thierry

ONKELINX, Thierry wrote:
 Dear Rainer,
 
 Have you considered using Sweave?

No - and I am sure it will do what I want, but I guess it might be an 
overkill. These arew just draft outputs for myself for different 
datasets which should be easy to compare. SO I guess that Sweave might 
be an overkill (especially as I found plot.new() which jumpd to a new page).

Thanks and I will keep it in mind for the future,

Rainer


 
 HTH,
 
 Thierry
 
 
 
 ir. Thierry Onkelinx
 Instituut voor natuur- en bosonderzoek / Research Institute for Nature
 and Forest
 Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
 methodology and quality assurance
 Gaverstraat 4
 9500 Geraardsbergen
 Belgium
 tel. + 32 54/436 185
 [EMAIL PROTECTED]
 www.inbo.be 
 
 Do not put your faith in what statistics say until you have carefully
 considered what they do not say.  ~William W. Watt
 A statistical analysis, properly conducted, is a delicate dissection of
 uncertainties, a surgery of suppositions. ~M.J.Moroney
 
  
 
 -Oorspronkelijk bericht-
 Van: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] Namens Rainer M. Krug
 Verzonden: donderdag 16 augustus 2007 14:58
 Aan: r-help
 Onderwerp: [R] several plots on several pages

 Hi

   version
 _
 platform   i686-pc-linux-gnu
 arch   i686
 os linux-gnu
 system i686, linux-gnu
 status
 major  2
 minor  5.1
 year   2007
 month  06
 day27
 svn rev42083
 language   R
 version.string R version 2.5.1 (2007-06-27)



 I want to create a pdf withe three graphs on a page and with 
 two pages:

 -
 | 1 |
 -
 | 2 |
 -
 | 3 |
 -

 NEW PAGE

 -
 | 4 |
 -
 | 5 |
 -
 | 6 |
 -

 Graph 1 should ALWAYS be at that spot, graph two also, even 
 if graph one produces an error when plotting (the area can be 
 empty, but doesn't have
 to.)

 I produced the foolowing code below, but I have a few problems:

 1) how can I create a new page in the pdf?

 2) how can I make sure that the second graph is in position 2 
 when graph one produces an error when plotting I(as in the 
 example)? Everything works OK (for the firsat page) when 
 graph one is plotted.

 I have the feeling, that I am thinking to complicated.

 Any help welcome,

 Rainer


 pdf(test.pdf)
 try(
  {
  ## Set layout to three rows and only oine column
  par( mfcol=c(3,1), oma=c(0,0,0,0), mar=c(4, 4, 2, 2) )

  ## First row
  par(mfg=c(1,1))
  try( plot(runif(ff)) )

  ## Second row
  par(mfg=c(2,1))
  try( plot(runif(100)) )

  ## Third row
  par(mfg=c(3,1))
  plot(runif(1000))


  ## THE NEXT THREE SHOULD BE ON A NEW PAGE IN THE PDF

  ## Set layout to three rows and only oine column
  par( mfcol=c(3,1), oma=c(0,0,0,0), mar=c(4, 4, 2, 2) )

  ## First row
  par(mfg=c(1,1))
  try( plot(runif(ff)) )

  ## Second row
  par(mfg=c(2,1))
  try( plot(runif(100)) )

  ## Third row
  par(mfg=c(3,1))
  plot(runif(1000))

  }
  )
 dev.off()


 --
 NEW EMAIL ADDRESS AND ADDRESS:

 [EMAIL PROTECTED]

 [EMAIL PROTECTED] WILL BE DISCONTINUED END OF MARCH

 Rainer M. Krug, Dipl. Phys. (Germany), MSc Conservation Biology (UCT)

 Plant Conservation Unit
 Department of Botany
 University of Cape Town
 Rondebosch 7701
 South Africa

 Tel: +27 - (0)21 650 5776 (w)
 Fax: +27 - (0)86 516 2782
 Fax: +27 - (0)21 650 2440 (w)
 Cell:+27 - (0)83 9479 042

 Skype:   RMkrug

 email:   [EMAIL PROTECTED]
  [EMAIL PROTECTED]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
NEW EMAIL ADDRESS AND ADDRESS:

[EMAIL PROTECTED]

[EMAIL PROTECTED] WILL BE DISCONTINUED END OF MARCH

Rainer M. Krug, Dipl. Phys. (Germany), MSc Conservation
Biology (UCT)

Plant Conservation Unit
Department of Botany
University of Cape Town
Rondebosch 7701
South Africa

Tel:+27 - (0)21 650 5776 (w)
Fax:+27 - (0)86 516 2782
Fax:+27 - (0)21 650 2440 (w)
Cell:   +27 - (0)83 9479 042

Skype:  RMkrug

email:  [EMAIL PROTECTED]
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regression tree: labels in the terminal nodes

2007-08-16 Thread Achim Zeileis

On Thu, 16 Aug 2007, Juergen Kuehn wrote:


Dear everybody,


I'm a new user of R 2.4.1 and I'm searching for information on improving
the output of regression tree graphs.

In the terminal nodes I am up to now able to indicate the number of
values (n) and the mean of all values in this terminal node by the command

 text(tree, use.n=T, xpd=T)


Note that this is specific to the rpart implementation. There are also 
other recursive partitioning algorithms available, see e.g., the 
MachineLearning task view at

  http://CRAN.R-project.org/src/contrib/Views/MachineLearning.html

The ctree() algorithm in package party provides more flexibility 
concerning visualization. The plotting methods are all extensible although 
this is not extensively documented.


Maybe you also want to look at Simon Urbanek's interactive KLIMT software 
for tree visualization.


hth,
Z


Yet I would like to indicate automatically in the output graph of the
tree some quality measure, e.g. impurity (or standard deviation .)
and a character to identify which cases are in one terminal node, e.g.
given a ID number or name.

Until now I did not discover in my R help scripts or in the R programm
help how to do it. So I calculate impurity by hand, and I'm identifying
the cases in each node by hand, and I am adding these values with a
graphical software to the graph (as e.g. given *.jpg file). Therefore
anybody can see that I added these values by hand. I'm quite sure that
there is a more easy and faster way which looks (and is!) more professional.

Could anybody help me? That would be great!


Thank you very much for your support!


Dr. Jürgen Kühn
Leibniz-Centre for Agricultural Landscape Research (ZALF)
Institute of Soil Landscape Research_
http://www.zalf.de/home_zalf/institute/blf/blf_e/index.html_
Eberswalder Str. 84
D-15374 Müncheberg
Tel.: ++49/(0)33432/82-123
Fax: ++49/(0)33432/82-280
E-mail: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
Internet: http://www.zalf.de/home_zalf/institute/blf/blf/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] multiple colors within same line of text

2007-08-16 Thread Andrew Yee
Hi, I'm interested in using mtext(), but with the option of having multiple
colors in the same line of text.

For example, creating a line of text where:

Red is red and blue is blue

How do you create a text argument that lets you do this within mtext()?

Thanks,
Andrew
MGH Cancer Center

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] an easy way to construct this special matirx

2007-08-16 Thread Gabor Grothendieck
It was pointed out that the required matrix may not be square and
the superdiagonal was missing in my prior post.  Here is a revision:

r - 2; nr - 4; nc - 5 # test data

x - matrix(nr = nr, nc = nc)
x - row(x) - col(x) + 1
(x = 0) * r ^ x

On 8/16/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 Here are two solutions.  In the first lo has TRUE on the lower diagonal
 and diagonal. Then we compute the exponents, multiplying by lo to zero
 out the upper triangle.  In the second rn is a matrix of row numbers
 and rn = t(rn) is the same as lo in the first solution.

 r - 2; n - 5 # test data

 lo - lower.tri(diag(n), diag = TRUE)
 lo * r ^ (row(lo) - col(lo) + 1)

 Here is another one:

 rn - row(diag(n))
 (rn = t(rn)) * r ^ (rn - t(rn) + 1)

 On 8/15/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
  Hi,
  Sorry if this is a repost. I searched but found no results.
  I am wondering if it is an easy way to construct the following matrix:
 
  r  1 0 00
  r^2   r 1 00
  r^3   r^2  r 10
  r^4   r^3  r^2  r1
 
  where r could be any number. Thanks.
  Wen


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Combine matrix

2007-08-16 Thread François Pinard
[Gianni Burgin]
let say something like this

a=matrix(1:25, nrow=5)

rownames(a)=letters[1:5]
 colnames(a)=rep(A, 5)

 a
  A  A  A  A  A
a 1  6 11 16 21
b 2  7 12 17 22
c 3  8 13 18 23
d 4  9 14 19 24
e 5 10 15 20 25

 b=matrix(1:40, nrow=8)
 rownames(b)=c(rep(a,4),rep(b,4))
 colnames(b)=rep(B, 5)

 b
  B  B  B  B  B
a 1  9 17 25 33
a 2 10 18 26 34
a 3 11 19 27 35
a 4 12 20 28 36
b 5 13 21 29 37
b 6 14 22 30 38
b 7 15 23 31 39
b 8 16 24 32 40

as a results I wold like something like

  A  A  A  A  A  B  B  B  B  B
a 1  6 11 16 21  1  9 17 25 33
a 1  6 11 16 21  2 10 18 26 34
a 1  6 11 16 21  3 11 19 27 35
a 1  6 11 16 21  4 12 20 28 36
b 2  7 12 17 22  5 13 21 29 37
b 2  7 12 17 22  6 14 22 30 38
b 2  7 12 17 22  7 15 23 31 39
b 2  7 12 17 22  8 16 24 32 40

does it is clear? is there a function that automate this operation?

Like, maybe:

   cbind(a[rownames(b),], b)



-- 
François Pinard   http://pinard.progiciels-bpi.ca

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] residual plots for lmer in lme4 package

2007-08-16 Thread Martin Henry H. Stevens
Hi Margaret,
Have a look at qqmath in the lattice package.
?qqmath
Hank
On Aug 16, 2007, at 2:45 AM, Margaret Gardiner-Garden wrote:

 Hi,



 I was wondering if I might be able to ask some advice about doing  
 residual
 plots for the lmer function in the lme4 package.



 Our group's aim is to find if the expression staining of a  
 particular gene
 in a sample (or core)  is related to the pathology of the core.

 To do this, we used the lmer function to perform a logistic mixed  
 model
 below.  I apologise in advance for the lack of subscripts.



  logit P(yij=1) = â0 + Ui + â1Patholij where Ui~N(0, óu2),

 i indexes patient, j indexes measurement, Pathol is an indicator  
 variable
 (0,1) for benign

 epithelium versus cancer and yij is the staining indicator (0,1)  
 for each
 core where yij equals 1 if the core stains positive and 0 otherwise.



 (I have inserted some example R code at the end of this message)



 I was wondering if you knew how I could test that the errors Ui are  
 normally
 distributed in my fit.  I am not familiar with how to do residual  
 plots for
 a mixed logistic regression (or even for any logistic regression!).



 Any advice would be greatly appreciated!



 Thanks and Regards

 Marg



 Example code:



 lmer(Intensity.over2.hyp.canc~Pathology + (1|Patient.ID), data=
 HSD17beta4.hyp.canc, family=binomial, na.action=na.omit)







 #Family: binomial(logit link)

  #AIC  BIClogLik deviance

# 414.1101 431.4147 -203.0550 406.1101

#Random effects:

# GroupsNameVarianceStd.Dev.

# Patient.ID (Intercept)  4.9558  2.2262

# of obs: 559, groups: Patient.ID, 177



#Estimated scale (compare to 1)  0.6782544



#Fixed effects:

 #Estimate Std. Error z value  Pr(|z|)

#(Intercept)  -2.057340.24881 -8.2686  2.2e-16 ***

#PathologyHyperplasia -1.766270.44909 -3.9330 8.389e-05 ***



 NB. Intensity.over2.hyp.canc is the staining of the core (ie 0 or 1)

 Pathology is Hyperplasia or Cancer





 Dr Margaret Gardiner-Garden

 Garvan Institute of Medical Research

 384 Victoria Street

 Darlinghurst Sydney

 NSW 2010 Australia



 Phone: 61 2 9295 8348

 Fax: 61 2 9295 8321






 [[alternative HTML version deleted]]

 ATT1

Dr. Hank Stevens, Associate Professor
338 Pearson Hall
Botany Department
Miami University
Oxford, OH 45056

Office: (513) 529-4206
Lab: (513) 529-4262
FAX: (513) 529-4243
http://www.cas.muohio.edu/~stevenmh/
http://www.muohio.edu/ecology/
http://www.muohio.edu/botany/
E Pluribus Unum

If you send an attachment, please try to send it in a format anyone  
can read, such as PDF, text, Open Document Format, HTML, or RTF.  
Please try not to send me MS Word or PowerPoint attachments-
Why? See:  http://www.gnu.org/philosophy/no-word-attachments.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] to combine bwplot + srt option?

2007-08-16 Thread Deepayan Sarkar
On 8/16/07, KOITA Lassana - STAC/ACE
[EMAIL PROTECTED] wrote:
 Thank you for you quite and useful explanation. And do know how to sort
 them by median?

See ?reorder.factor

Note that traditional practice with bwplot() is to have the
categorical variable on the y-axis, in which case the default string
rotation is optimal.

-Deepayan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error message when using zero-inflated count regression model in package zicounts

2007-08-16 Thread James R. Milks
Dr. Stevens,

I've double-checked my variable lengths.  All of my variables  
(Total.vines, Site, Species, and DBH) came in at 549.  I did correct  
one problem in the data entry that had escaped my previous notice:  
somehow the undergrad who entered all the data managed to make the  
Acer negundo data split into two separate categories while still  
appearing to use the same ACNE abbreviation.  When I made that  
correction and re-ran zicounts, R gave me the following error messages:

  vines.zip-zicounts(resp=Total.vines~.,x=~Site+Species+DBH,z=~Site 
+Species+DBH,distname=ZIP,data=sycamores.1)

Error in ifelse(y == 0, 1, y/mu) : dim-: dims [product 12] do not  
match the length of object [549]
In addition: Warning messages:
1: longer object length is not a multiple of shorter object length  
in: eta + offset
2: longer object length is not a multiple of shorter object length  
in: y/mu

In addition, zicounts would not run a normal poisson regression on  
the data, giving me the same error messages as the ZIP regression.   
Doing a poisson regression with glm did not show any error messages.   
However, the glm model with full interactions was still over-dispersed.

Could the zicounts problem be that the individual sites and species  
had different population sizes?  For instance, Site A had 149 trees,  
site B had 55 trees, site C had 270 trees, and site D had 75 trees.   
The species had similar discrepancies in population sizes, with  
Platanus occidentalis and Acer negundo forming the majority of the  
trees.

Thanks for your help.

Jim Milks

Graduate Student
Environmental Sciences Ph.D. Program
136 Biological Sciences
Wright State University
3640 Colonel Glenn Hwy
Dayton, OH 45435

On Aug 15, 2007, at 5:58 AM, Martin Henry H. Stevens wrote:

 Hi Jim,
 With regard to same number, I simply wanted to make sure that each  
 variable was the same length. The error message you show is what  
 you would get if, for instance, you misspelled one of the variables  
 and it doesn't exist, in which case it would be NULL, while your  
 other variables would each be 550 elements in length.
 Hank
 On Aug 14, 2007, at 4:47 PM, James Milks wrote:

 Dr. Stevens,

 Unfortunately, Poisson gives me an over-dispersed model with only  
 3 out of 14 variables/interactions significant.  Doing a step-wise  
 poisson regression still ended up with the same over-dispersed  
 model.  Given the high number of zeros in the response variable,  
 Dr. Thad Tarpey (one of our statisticians on campus) suggested  
 zero-inflated poisson regression as a possible solution to the  
 over-dispersion problem.

 As for variables of the same length, there are different numbers  
 of trees for each species and site since we ran different numbers  
 of transects at each site (some sites were larger than others) and  
 there were different numbers of species and trees within each  
 transect.  Acer negundo made up ~33% of all the trees we measured;  
 Platanus occidentalis had 25%; Fraxinus americana was another 12%  
 and ~11% was Ulmus americana.  The remaining 25% was divided among  
 16 other species, all of which were excluded from the analysis due  
 to singularities in the model when they were included (something  
 about glm not liking singularities in the model).  So if the  
 zicounts requires that each species and site have the same length,  
 then I will not be able to use it unless I can get R to randomly  
 select x trees from each species and site combination.

 Thanks for your input.

 Jim Milks

 Graduate Student
 Environmental Sciences Ph.D. Program
 136 Biological Sciences
 Wright State University
 3640 Colonel Glenn Hwy
 Dayton, OH 45435


 On Aug 14, 2007, at 9:37 AM, Hank Stevens wrote:

 Hi Jim,
 Two thoughts come to me, unencumbered by the thought process or  
 knowledge of zicounts:
 1. Is Poisson really NOT appropriate? (do you have to use zicounts?)
 2. Are you 110% certain that all variables are the same length?  
 Would NA's interfere?
 Cheers,
 Hank
 On Aug 13, 2007, at 5:10 PM, James Milks wrote:

 I have data on number of vines per tree for ~550 trees.  Over  
 half of
 the trees did not have any vines and the data is fairly skewed
 (median = 0, mean = 1.158, 3rd qu. = 1.000).  I am attempting to
 investigate whether plot location (four sites), species (I'm using
 only the four most common species), or tree dbh has a significant
 influence on the number of vines per tree.  When I attempted to use
 the zicounts function, R gave me the following error message:

 vines.zip-zicounts(resp=Total.vines~.,x=~Site+Species+DBH,z=~Site
 +Species+DBH,distrname=ZIP,data=sycamores.1)
 Error in ifelse(y == 0, 1, y/mu) : dim- : dims [product 12] do not
 match the length of object [549]
 In addition: Warning messages:
 1: longer object length
 is not a multiple of shorter object length in: x[good, ]  
 * w
 2: longer object length
 is not a multiple of shorter object length in: eta + offset
 3: longer object 

[R] Newbie

2007-08-16 Thread Alan Harrison
Hello,

I'm a bit new to the world of R so forgive my ignorance.  I'm trying to do a 
zero-inflated negative binomial regression and have received an error message 
and i'm not sure what it means.  I'm running R 2.5.1 on XP.  I have just tried 
a really simple version of the model to see if it would run before I put all 
the variables in.  I have attached all the variables to the object alan.

Here is the table:

Date Location Deer Code   Sex Age Pel Per Lac Scars Emb Lar Nym Bee Mi  Mass
1   12/06/2007 Ballysallagh0   B1 FALSE   1   A   1   1 4   0   0   0   
0  0 21.59
2   15/06/2007 Ballysallagh0  B10 FALSE   1   A   1   1 5   0   0   0   
0  2 26.19
3   15/06/2007 Ballysallagh0  B12 FALSE   1   A   1   1 0   5   0   0   
0  1 28.55
4   15/06/2007 Ballysallagh0  B13 FALSE   1   A   0   1 5   0   0   0   
4 45 23.93
5   15/06/2007 Ballysallagh0  B16 FALSE   1   A   1   1 0   4   0   0   
0  1 34.19
6   15/06/2007 Ballysallagh0  B17 FALSE   1   A   1   1 5   0   0   0   
0  0 25.02
7   15/06/2007 Ballysallagh0  B18 FALSE   1   A   1   1 5   0   0   0   
0  7 33.06
8   12/06/2007 Ballysallagh0   B5 FALSE   1   A   1   1 4   0   0   0   
0  2 23.55
9   12/06/2007 Ballysallagh0   B6 FALSE   1   A   1   1 4   0   0   0   
0  2 22.67
10  12/06/2007 Ballysallagh0   B7 FALSE   1   A   1   1 5   0   0   0   
0  2 22.57
11  12/06/2007 Ballysallagh0   B8 FALSE   1   A   1   1 5   0   0   0   
0 10 24.01
12  24/06/2007  Caledon1  C12 FALSE   1   A   1   0 0   0   1   0   
0  3 21.68
13  25/06/2007  Caledon1  C17 FALSE   1   A   1   0 4   0   1   0   
0 23 20.56
14  25/06/2007  Caledon1  C18 FALSE   1   A   1   0 0   0   0   0   
2  2 17.47
15  25/06/2007  Caledon1  C20 FALSE   1   A   1   0 8   0   0   0   
0  3 19.97
16  25/06/2007  Caledon1  C21 FALSE   1   A   1   0 3   0   2   0   
0 20 20.58
17  25/06/2007  Caledon1  C24 FALSE   1   A   1   0 0   4   2   0   
0  7 17.37
18  25/06/2007  Caledon1  C27 FALSE   1   A   1   0 0   0   3   0   
0  6 24.14
19  25/06/2007  Caledon1  C28 FALSE   1   A   1   0 0   5   1   0   
0  6 21.58
20  25/06/2007  Caledon1  C33 FALSE   1   A   0   0 0   0   1   0   
0  0 19.06
21  25/06/2007  Caledon1  C35 FALSE   1   A   1   0 3   0   2   0   
0 20 24.55
22  24/06/2007  Caledon1   C4 FALSE   1   A   1   1 4   3   0   0   
0  5 22.65
23  24/06/2007  Caledon1   C8 FALSE   1   A   0   0 0   0   3   0   
0  3 23.08
24  01/06/2007 Hillsborough0  H11 FALSE   1   A   1   1 4   0   0   0   
0  0 23.01
25  01/06/2007 Hillsborough0  H16 FALSE   1   A   1   1 5   0   0   0   
0  0 18.18
26  01/06/2007 Hillsborough0  H17 FALSE   1   A   1   1 4   0   0   0   
0  0 24.45
27  01/06/2007 Hillsborough0  H19 FALSE   1   A   0   1 0   0   0   0   
0  3 21.01
28  01/06/2007 Hillsborough0   H2 FALSE   1   A   1   1 5   0   0   0   
0  1 24.86
29  01/06/2007 Hillsborough0  H21 FALSE   1   A   1   1 5   0   0   0   
0  0 20.48
30  01/06/2007 Hillsborough0  H25 FALSE   1   A   1   1 5   0   0   0   
0  0 22.09
31  01/06/2007 Hillsborough0  H26 FALSE   1   A   1   1 5   0   0   0   
0  0 20.18
32  01/06/2007 Hillsborough0   H3 FALSE   1   A   0   1 5   0   0   0   
0  0 25.82
33  01/06/2007 Hillsborough0  H34 FALSE   1   A   1   0 0   4   0   0   
0  1 18.78
34  01/06/2007 Hillsborough0  H35 FALSE   1   A   0   1 4   0   0   0   
0  0 20.02
35  01/06/2007 Hillsborough0   H5 FALSE   1   A   0   1 5   0   0   0   
0  0 19.58
36  01/06/2007 Hillsborough0   H6 FALSE   1   A   1   1 5   0   0   0   
0  6 28.84
37  16/06/2007   LoughNavar1  LN1 FALSE   1   A   1   1 5   0   3   0   
0  1 22.71
38  16/06/2007   LoughNavar1 LN10 FALSE   1   A   1   1 5   0   0   0   
0  0 23.53
39  16/06/2007   LoughNavar1 LN11 FALSE   1   A   1   1 4   0   0   0   
0  2 33.24
40  16/06/2007   LoughNavar1 LN13 FALSE   1   A   1   1 5   0   0   0   
0  0 20.20
41  16/06/2007   LoughNavar1 LN14 FALSE   1   A   1   1 0   7   0   0   
1  1 31.35
42  16/06/2007   LoughNavar1 LN16 FALSE   1   A   1   1 0   5   0   0   
0  2 19.43
43  16/06/2007   LoughNavar1 LN18 FALSE   1   A   1   0 6   0   0   0   
0 12 24.30
44  16/06/2007   LoughNavar1 LN23 FALSE   1   A   1   1 5   0   0   0   
0  1 24.34
45  16/06/2007   LoughNavar1  LN7 FALSE   1   A   1   1 5   0   1   0   
0  0 28.01
46  16/06/2007   LoughNavar1  LN9 FALSE   1   A   1   1 4   0   8   0   
0  5 25.43
47  11/07/2007   Mt.Stewart0   M1 FALSE   1   A   1   0 5   0   0   0   
0  1 25.31
48  11/07/2007   Mt.Stewart0  M18 FALSE   1   A   1   1 0   0   0   0   
0  2 26.66
49  11/07/2007   Mt.Stewart0  M21 FALSE   1   A   0   0 3   0   0   0   
0  

[R] Trim trailng space from data.frame factor variables

2007-08-16 Thread Lauri Nikkinen
Hi folks,

I would like to trim the trailing spaces in my factor variables using lapply
(described in this post by Marc Schwartz:
http://tolstoy.newcastle.edu.au/R/e2/help/07/08/22826.html) but the code is
not functioning (in this example there is only one factor with trailing
spaces):

y1 - rnorm(20) + 6.8
y2 - rnorm(20) + (1:20*1.7 + 1)
y3 - rnorm(20) + (1:20*6.7 + 3.7)
y - c(y1,y2,y3)
x - gl(5,12)
f - gl(3,20, labels=paste(lev, 1:3,, sep=))
d - data.frame(x=x,y=y, f=f)
str(d)

d[] - lapply(d, function(x) ifelse(is.factor(x), sub( +$, , x), x))
str(d)

How should I modify this?

-Lauri

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Linear models and a simple time series

2007-08-16 Thread [EMAIL PROTECTED]
Working on modeling a wild animal population.  Two data vectors:  the herd
count from year to year (estimated by a 
sampling procedure), and the number of animals killed by hunters.  Task is
to find the natural growth rate of the herd 
(A simplification, but preserves the essentials.)

My question is whether the R procedure lm() is an appropriate tool to
estimate the growth rate.

year   -1991:2007
killed
-c(7008,6663,8545,7868,9286,9365,10443,6389,6004,8631,13277,12029,10081,989
9,11023,9926,7000)
herdsize  
-c(50697,54804,46462,42410,43593,42138,43037,44495,45968,47376,45469,38815,
37186,37135,31760,31206,28563)
year.0 -which(year==1991)
year.1 -year.0+1
year.ult   -length(year)
year.penult-length(year)-1


y-heardsize[year.1:year.ult]
x-herdsize[year.0:year.penult]-killed[year.0:year.penult]
LM-lm(y~bb-1)

summary(LM)

#Call:
#lm(formula = y ~ x - 1)
#
#Residuals:
#   Min 1Q Median 3QMax 
#-11893  -1114   1137   3553   6069 
#
#Coefficients:
#   Estimate Std. Error t value Pr(|t|)
#bb  1.212170.03185   38.05 2.45e-16 ***
#---
#Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
#
#Residual standard error: 4372 on 15 degrees of freedom
#Multiple R-Squared: 0.9897, Adjusted R-squared: 0.9891 
#F-statistic:  1448 on 1 and 15 DF,  p-value: 2.453e-16 

The model seems to fit the data very well, and I am willing to believe that
a growth rate of 1.21217 gives the best fit
in a least-squares sense.  However, because the dependent and independent
variables are highly correlated, I question whether
the variance calculations are accurate in this case.  Is lm() really the
appropriate tool to be using here?

Any insights would be welcome.




mail2web.com - Microsoft® Exchange solutions from a leading provider -
http://link.mail2web.com/Business/Exchange

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Newbie

2007-08-16 Thread Stefan Grosse

 I'm a bit new to the world of R so forgive my ignorance.  

That has nothing to do with the knowledge of R but with the model. Age
has only 2 different values: 0 and 1 and if it is 0 there is no scars,
so what exactly have you expected from the model?

I would say if you just want to prove that older deer have more scars
try the Mann Whitney non parametric test...

Stefan

 I'm trying to do a zero-inflated negative binomial regression and have 
 received an error message and i'm not sure what it means.  I'm running R 
 2.5.1 on XP.  I have just tried a really simple version of the model to see 
 if it would run before I put all the variables in.  I have attached all the 
 variables to the object alan.

 Here is the table:

 Date Location Deer Code   Sex Age Pel Per Lac Scars Emb Lar Nym Bee Mi  
 Mass
 1   12/06/2007 Ballysallagh0   B1 FALSE   1   A   1   1 4   0   0   0 
   0  0 21.59
 2   15/06/2007 Ballysallagh0  B10 FALSE   1   A   1   1 5   0   0   0 
   0  2 26.19
 3   15/06/2007 Ballysallagh0  B12 FALSE   1   A   1   1 0   5   0   0 
   0  1 28.55
 4   15/06/2007 Ballysallagh0  B13 FALSE   1   A   0   1 5   0   0   0 
   4 45 23.93
 5   15/06/2007 Ballysallagh0  B16 FALSE   1   A   1   1 0   4   0   0 
   0  1 34.19
 6   15/06/2007 Ballysallagh0  B17 FALSE   1   A   1   1 5   0   0   0 
   0  0 25.02
 7   15/06/2007 Ballysallagh0  B18 FALSE   1   A   1   1 5   0   0   0 
   0  7 33.06
 8   12/06/2007 Ballysallagh0   B5 FALSE   1   A   1   1 4   0   0   0 
   0  2 23.55
 9   12/06/2007 Ballysallagh0   B6 FALSE   1   A   1   1 4   0   0   0 
   0  2 22.67
 10  12/06/2007 Ballysallagh0   B7 FALSE   1   A   1   1 5   0   0   0 
   0  2 22.57
 11  12/06/2007 Ballysallagh0   B8 FALSE   1   A   1   1 5   0   0   0 
   0 10 24.01
 12  24/06/2007  Caledon1  C12 FALSE   1   A   1   0 0   0   1   0 
   0  3 21.68
 13  25/06/2007  Caledon1  C17 FALSE   1   A   1   0 4   0   1   0 
   0 23 20.56
 14  25/06/2007  Caledon1  C18 FALSE   1   A   1   0 0   0   0   0 
   2  2 17.47
 15  25/06/2007  Caledon1  C20 FALSE   1   A   1   0 8   0   0   0 
   0  3 19.97
 16  25/06/2007  Caledon1  C21 FALSE   1   A   1   0 3   0   2   0 
   0 20 20.58
 17  25/06/2007  Caledon1  C24 FALSE   1   A   1   0 0   4   2   0 
   0  7 17.37
 18  25/06/2007  Caledon1  C27 FALSE   1   A   1   0 0   0   3   0 
   0  6 24.14
 19  25/06/2007  Caledon1  C28 FALSE   1   A   1   0 0   5   1   0 
   0  6 21.58
 20  25/06/2007  Caledon1  C33 FALSE   1   A   0   0 0   0   1   0 
   0  0 19.06
 21  25/06/2007  Caledon1  C35 FALSE   1   A   1   0 3   0   2   0 
   0 20 24.55
 22  24/06/2007  Caledon1   C4 FALSE   1   A   1   1 4   3   0   0 
   0  5 22.65
 23  24/06/2007  Caledon1   C8 FALSE   1   A   0   0 0   0   3   0 
   0  3 23.08
 24  01/06/2007 Hillsborough0  H11 FALSE   1   A   1   1 4   0   0   0 
   0  0 23.01
 25  01/06/2007 Hillsborough0  H16 FALSE   1   A   1   1 5   0   0   0 
   0  0 18.18
 26  01/06/2007 Hillsborough0  H17 FALSE   1   A   1   1 4   0   0   0 
   0  0 24.45
 27  01/06/2007 Hillsborough0  H19 FALSE   1   A   0   1 0   0   0   0 
   0  3 21.01
 28  01/06/2007 Hillsborough0   H2 FALSE   1   A   1   1 5   0   0   0 
   0  1 24.86
 29  01/06/2007 Hillsborough0  H21 FALSE   1   A   1   1 5   0   0   0 
   0  0 20.48
 30  01/06/2007 Hillsborough0  H25 FALSE   1   A   1   1 5   0   0   0 
   0  0 22.09
 31  01/06/2007 Hillsborough0  H26 FALSE   1   A   1   1 5   0   0   0 
   0  0 20.18
 32  01/06/2007 Hillsborough0   H3 FALSE   1   A   0   1 5   0   0   0 
   0  0 25.82
 33  01/06/2007 Hillsborough0  H34 FALSE   1   A   1   0 0   4   0   0 
   0  1 18.78
 34  01/06/2007 Hillsborough0  H35 FALSE   1   A   0   1 4   0   0   0 
   0  0 20.02
 35  01/06/2007 Hillsborough0   H5 FALSE   1   A   0   1 5   0   0   0 
   0  0 19.58
 36  01/06/2007 Hillsborough0   H6 FALSE   1   A   1   1 5   0   0   0 
   0  6 28.84
 37  16/06/2007   LoughNavar1  LN1 FALSE   1   A   1   1 5   0   3   0 
   0  1 22.71
 38  16/06/2007   LoughNavar1 LN10 FALSE   1   A   1   1 5   0   0   0 
   0  0 23.53
 39  16/06/2007   LoughNavar1 LN11 FALSE   1   A   1   1 4   0   0   0 
   0  2 33.24
 40  16/06/2007   LoughNavar1 LN13 FALSE   1   A   1   1 5   0   0   0 
   0  0 20.20
 41  16/06/2007   LoughNavar1 LN14 FALSE   1   A   1   1 0   7   0   0 
   1  1 31.35
 42  16/06/2007   LoughNavar1 LN16 FALSE   1   A   1   1 0   5   0   0 
   0  2 19.43
 43  16/06/2007   LoughNavar1 LN18 FALSE   1   A   1   0 6   0   0   0 
   0 12 24.30
 44  16/06/2007   LoughNavar1 LN23 FALSE   1   A   1   1 5   0   0   0 
   0  1 24.34
 45  16/06/2007   LoughNavar1  LN7 

Re: [R] Newbie

2007-08-16 Thread Stefan Grosse

 I would say if you just want to prove that older deer have more scars
 try the Mann Whitney non parametric test...
   

Forgive me but even that does not really make sense since the values are
all 0 so it is to obvious...

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Possible memory leak with R v.2.5.0

2007-08-16 Thread Martin Morgan
Hi Peter --

Here's my guess.

Ironically, adding things to broken code reduces the signal to noise
ratio. I ended up with

get.vars.for.cluster = function(
  cluster,
  genes=get.global(gene.ids ),
  ratios=get.global(ratios))
{
cluster - cluster
rows - cluster$rows
cols - cluster$cols

r - ratios[ rows, cols ]
avg.rows - apply( r, 2, mean, na.rm=TRUE )
r.all - ratios[ genes, cols ]
devs - apply( r.all, 1, -, avg.rows )

apply( devs, 2, var, na.rm=TRUE )
}

at what might reproduce your problem (though can't be sure!). The
unusual bit is

cluster - cluster

At first I thought this would be a no-op (assigning cluster to
itself), but apparently at this point in the code cluster does not
exist in the environment of the function (just in the call) so cluster
gets assigned outside the function.

So then my guess is that get.vars.for.cluster is part of a package,
and the package has a variable called cluster. get.vars.for.cluster
then assigns its first argument to the package variable cluster (which
is the first variable called cluster that -
encounters). rm(list=ls(all=TRUE)) removes everything from the global
environment, but (fortunately!) not from the package environment.

You might end up storing more than 'just' cluster, depending on what
it is.

So I think the solution is to rethink the use of - (and also the
get.global(), which are either for convenience (in which case it would
probably be better to specify a default for the function argument) or
out of a sense that copying is bad (but this is probably mistaken,
since R's semantics are 'copy on change', so passing a 'big' object
into a function does not usually trigger a copy)).

You could also try 'detach'ing the package that get.vars.for.cluster
is defined in.

Hope that points in the right direction,

Martin

Peter Waltman [EMAIL PROTECTED] writes:

I'm  working  with  a  very  large matrix ( 22k rows x 2k cols) of RNA
expression  data with R v.2.5.0 on a RedHat Enterprise machine, x86_64
architecture.
The relevant code is below, but I call a function that takes a cluster
of  this data ( a list structure that contains a $rows elt which lists
the rows (genes ) in the cluster by ID, but not the actual data itself
).
The function creates two copies of the matrix, one containing the rows
in  the  cluster,  and  one  with  the rest of the rows in the matrix.
After  doing  some  statistical  massaging,  the  function  returns  a
statistical  score  for  each  rows/genes  in  the matrix, producing a
vector of 22k elt's.
When  I  run 'top', I see that the memory stamp of R after loading the
matrix is ~750M.  However, after calling this function on 10 clusters,
this  jumps  to3.7 gig (at least by 'top's measurement), and this
will not be reduced by any subsequent calls to gc().
Output from gc() is:

   gc()   used  (Mb) gc trigger   (Mb) max used  (Mb)
  Ncells   377925  20.26819934  364.3   604878  32.4
  Vcells 88857341 678.0  240204174 1832.7 90689707 692.0
  

output from top is:

 PID  USER   PR   NI   VIRT   RES   SHR  S %CPU %MEMTIME+
  COMMAND
   1199 waltman   17   0 3844m 3.7g 3216 S  0.0 23.6  29:58.74 R

Note, the relevant call that invoked my function is:

  test   -   sapply(   c(1:10),   function(x)  get.vars.for.cluster(
  clusterStack[[x]], opt=rows ) )

Finally,  for  fun,  I  rm()'d  all variables with the rm( list=ls() )
command, and then called gc().  The memory of this empty instance of
R is still 3.4 gig, i.e.
R.console:

   rm( list=ls() )
   ls()
  character(0)
   gc()
 used  (Mb) gc trigger   (Mb) max used  (Mb)
  Ncells   363023  19.45455947  291.4   604878  32.4
  Vcells 44434871 339.1  192163339 1466.1 90689707 692.0
  

Subsequent top  output:
output from top is:

 PID  USER   PR   NI   VIRT   RES   SHR  S %CPU %MEMTIME+
  COMMAND
   1199 waltman   16   0 3507m 3.4g 3216 S  0.0 21.5  29:58.92 R

Thanks for any help or suggestions,
Peter Waltman
p.s.  code snippet follows.  Note, that I've added extra rm() and gc()
calls w/in the function to try to reduce the memory stamp to no avail.

  get.vars.for.cluster   =   function(   cluster,   genes=get.global(
  gene.ids ), opt=c(rows,cols),
ratios=get.global(ratios),  var.norm=T,
  r.sig=get.global( r.sig ),
  allow.anticor=get.global( allow.anticor )
  ) {
cat( phw dbg msg\n)
cluster - cluster
opt - match.arg( opt )
rows - cluster$rows
cols - cluster$cols
if ( opt == rows ) {
  cat( phw dbg msg: if opt == rows\n )
  r - ratios[ rows, cols ]
  r.all - ratios[ genes, cols ]
  avg.rows - apply( r, 2, mean, na.rm=T ) ##median )
  rm( r )  # phw 

Re: [R] Error message when using zero-inflated count regression model in package zicounts

2007-08-16 Thread Achim Zeileis
On Thu, 16 Aug 2007, James R. Milks wrote:

 Dr. Stevens,

 I've double-checked my variable lengths.  All of my variables
 (Total.vines, Site, Species, and DBH) came in at 549.  I did correct
 one problem in the data entry that had escaped my previous notice:
 somehow the undergrad who entered all the data managed to make the
 Acer negundo data split into two separate categories while still
 appearing to use the same ACNE abbreviation.  When I made that
 correction and re-ran zicounts, R gave me the following error messages:

Hmm, I don't know about the error messages in zicounts, but you could try 
to use the zeroinfl() implementation in package pscl:
   vines.zip - zeroinfl(Total.vines ~ Site + Species + DBH | Site +
 Species + DBH, data = sycamores.1)
and see whether this produces a similar error.
Z

  vines.zip-zicounts(resp=Total.vines~.,x=~Site+Species+DBH,z=~Site
 +Species+DBH,distname=ZIP,data=sycamores.1)

 Error in ifelse(y == 0, 1, y/mu) : dim-: dims [product 12] do not
 match the length of object [549]
 In addition: Warning messages:
 1: longer object length is not a multiple of shorter object length
 in: eta + offset
 2: longer object length is not a multiple of shorter object length
 in: y/mu

 In addition, zicounts would not run a normal poisson regression on
 the data, giving me the same error messages as the ZIP regression.
 Doing a poisson regression with glm did not show any error messages.
 However, the glm model with full interactions was still over-dispersed.

 Could the zicounts problem be that the individual sites and species
 had different population sizes?  For instance, Site A had 149 trees,
 site B had 55 trees, site C had 270 trees, and site D had 75 trees.
 The species had similar discrepancies in population sizes, with
 Platanus occidentalis and Acer negundo forming the majority of the
 trees.

 Thanks for your help.

 Jim Milks

 Graduate Student
 Environmental Sciences Ph.D. Program
 136 Biological Sciences
 Wright State University
 3640 Colonel Glenn Hwy
 Dayton, OH 45435

 On Aug 15, 2007, at 5:58 AM, Martin Henry H. Stevens wrote:

 Hi Jim,
 With regard to same number, I simply wanted to make sure that each
 variable was the same length. The error message you show is what
 you would get if, for instance, you misspelled one of the variables
 and it doesn't exist, in which case it would be NULL, while your
 other variables would each be 550 elements in length.
 Hank
 On Aug 14, 2007, at 4:47 PM, James Milks wrote:

 Dr. Stevens,

 Unfortunately, Poisson gives me an over-dispersed model with only
 3 out of 14 variables/interactions significant.  Doing a step-wise
 poisson regression still ended up with the same over-dispersed
 model.  Given the high number of zeros in the response variable,
 Dr. Thad Tarpey (one of our statisticians on campus) suggested
 zero-inflated poisson regression as a possible solution to the
 over-dispersion problem.

 As for variables of the same length, there are different numbers
 of trees for each species and site since we ran different numbers
 of transects at each site (some sites were larger than others) and
 there were different numbers of species and trees within each
 transect.  Acer negundo made up ~33% of all the trees we measured;
 Platanus occidentalis had 25%; Fraxinus americana was another 12%
 and ~11% was Ulmus americana.  The remaining 25% was divided among
 16 other species, all of which were excluded from the analysis due
 to singularities in the model when they were included (something
 about glm not liking singularities in the model).  So if the
 zicounts requires that each species and site have the same length,
 then I will not be able to use it unless I can get R to randomly
 select x trees from each species and site combination.

 Thanks for your input.

 Jim Milks

 Graduate Student
 Environmental Sciences Ph.D. Program
 136 Biological Sciences
 Wright State University
 3640 Colonel Glenn Hwy
 Dayton, OH 45435


 On Aug 14, 2007, at 9:37 AM, Hank Stevens wrote:

 Hi Jim,
 Two thoughts come to me, unencumbered by the thought process or
 knowledge of zicounts:
 1. Is Poisson really NOT appropriate? (do you have to use zicounts?)
 2. Are you 110% certain that all variables are the same length?
 Would NA's interfere?
 Cheers,
 Hank
 On Aug 13, 2007, at 5:10 PM, James Milks wrote:

 I have data on number of vines per tree for ~550 trees.  Over
 half of
 the trees did not have any vines and the data is fairly skewed
 (median = 0, mean = 1.158, 3rd qu. = 1.000).  I am attempting to
 investigate whether plot location (four sites), species (I'm using
 only the four most common species), or tree dbh has a significant
 influence on the number of vines per tree.  When I attempted to use
 the zicounts function, R gave me the following error message:

 vines.zip-zicounts(resp=Total.vines~.,x=~Site+Species+DBH,z=~Site
 +Species+DBH,distrname=ZIP,data=sycamores.1)
 Error in ifelse(y == 0, 1, y/mu) : dim- : dims [product 12] do not
 

Re: [R] Trim trailng space from data.frame factor variables

2007-08-16 Thread Lauri Nikkinen
Thanks Marc! What would be the easiest way to coerce char-variables back to
factor-variables? Is there a way to prevent the coercion in d[] - lapply(d,
function(x) if (is.factor(x)) sub( +$, , x) else x) ?



-Lauri


2007/8/16, Marc Schwartz [EMAIL PROTECTED]:

 On Thu, 2007-08-16 at 17:54 +0300, Lauri Nikkinen wrote:
  Hi folks,
 
  I would like to trim the trailing spaces in my factor variables using
 lapply
  (described in this post by Marc Schwartz:
  http://tolstoy.newcastle.edu.au/R/e2/help/07/08/22826.html) but the code
 is
  not functioning (in this example there is only one factor with trailing
  spaces):

 Ayepas I noted in that post, it was untestedmy error.

 The problem is that by using ifelse() as I did, the test for the column
 being a factor returns a single result, not one result per element.
 Hence, the appropriate conditional code is only performed on the first
 element in each column, rather than being vectorized on the entire
 column.

  y1 - rnorm(20) + 6.8
  y2 - rnorm(20) + (1:20*1.7 + 1)
  y3 - rnorm(20) + (1:20*6.7 + 3.7)
  y - c(y1,y2,y3)
  x - gl(5,12)
  f - gl(3,20, labels=paste(lev, 1:3,, sep=))
  d - data.frame(x=x,y=y, f=f)
  str(d)
 
  d[] - lapply(d, function(x) ifelse(is.factor(x), sub( +$, , x), x))
  str(d)
 
  How should I modify this?

 Try this instead:

 d[] - lapply(d, function(x) if (is.factor(x)) sub( +$, , x) else x)

  str(d)
 'data.frame':   60 obs. of  3 variables:
 $ x: chr  1 1 1 1 ...
 $ y: num  6.70 4.42 8.03 4.90 6.98 ...
 $ f: chr  lev1 lev1 lev1 lev1 ...

 Note that by using grep(), the factors are coerced to character vectors
 as expected. You would need to coerce back to factors if you need them
 as such.

 HTH,

 Marc Schwartz




[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Possible memory leak with large matrices in R v.2.5.0

2007-08-16 Thread Peter Waltman
  I'm working with a very large matrix ( 22k rows x 2k cols) of RNA 
expression data with R v.2.5.0 on a RedHat Enterprise machine, x86_64 
architecture.

The relevant code is below, but I call a function that takes a cluster 
of this data ( a list structure that contains a $rows elt which lists 
the rows (genes ) in the cluster by ID, but not the actual data itself ).

The function creates two copies of the matrix, one containing the rows 
in the cluster, and one with the rest of the rows in the matrix.  After 
doing some statistical massaging, the function returns a statistical 
score for each rows/genes in the matrix, producing a vector of 22k elt's.

When I run 'top', I see that the memory stamp of R after loading the 
matrix is ~750M.  However, after calling this function on 10 clusters, 
this jumps to  3.7 gig (at least by 'top's measurement), and this will 
not be reduced by any subsequent calls to gc().

Output from gc() is:

  gc()   used  (Mb) gc trigger   (Mb) max used  (Mb)
Ncells   377925  20.26819934  364.3   604878  32.4
Vcells 88857341 678.0  240204174 1832.7 90689707 692.0
 

output from top is:

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 1199 waltman   17   0 3844m 3.7g 3216 S  0.0 23.6  29:58.74 R

Note, the relevant call that invoked my function is:

test - sapply( c(1:10), function(x) get.vars.for.cluster(
clusterStack[[x]], opt=rows ) )

Finally, for fun, I rm()'d all variables with the rm( list=ls() ) 
command, and then called gc().  The memory of this empty instance of R 
is still 3.4 gig, i.e.

R.console:

  rm( list=ls() )
  ls()
character(0)
  gc()
   used  (Mb) gc trigger   (Mb) max used  (Mb)
Ncells   363023  19.45455947  291.4   604878  32.4
Vcells 44434871 339.1  192163339 1466.1 90689707 692.0
 

Subsequent top  output:
output from top is:

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 1199 waltman   16   0 3507m 3.4g 3216 S  0.0 21.5  29:58.92 R  


Thanks for any help or suggestions,

Peter Waltman

p.s.  code snippet follows.  Note, that I've added extra rm() and gc() 
calls w/in the function to try to reduce the memory stamp to no avail.

get.vars.for.cluster = function( cluster,
 genes=get.global( gene.ids ),
 opt=c(rows,cols),
 ratios=get.global(ratios),
 var.norm=T,
 r.sig=get.global( r.sig ),
 allow.anticor=get.global(
allow.anticor ) ) {
  cluster - cluster

  opt - match.arg( opt )
  rows - cluster$rows
  cols - cluster$cols

  if ( opt == rows ) {
r - ratios[ rows, cols ]
r.all - ratios[ genes, cols ]

avg.rows - apply( r, 2, mean, na.rm=T ) ##median )

rm( r ) 
gc()

devs - apply( r.all, 1, -, avg.rows )

if ( !allow.anticor ) {
  rm( r.all, avg.rows ) 
  gc()
}

vars - apply( devs, 2, var, na.rm=T )

rm( devs )
gc()

vars - log10( vars )
gc()

if ( allow.anticor ) {
  ## Get variance against the inverse of the mean profile
  devs.2 - apply( r.all, 1, -, -avg.rows )
  gc()

  vars.2 - apply( devs.2, 2, var, na.rm=T )
  gc()

  rm( devs.2 )
  gc()

  vars.2 - log10( vars.2 )
  gc()

   ## For each gene take the min of variance or anti-cor
variance
  vars - cbind( vars, vars.2 )

  rm( vars.2 )
  gc()

  vars - apply( vars, 1, min )
  gc()

}

## Normalize the values by the variance over the rows in
the cluster
if ( var.norm ) {

  vars - vars - mean( vars[ rows ], na.rm=T )
  tmp.sd - sd( vars[ rows ], na.rm=T )
  if ( ! is.na( tmp.sd )  tmp.sd != 0 ) vars - vars / (
tmp.sd + r.sig )
}
gc()
return( vars )

  } else {

r.all - ratios[ rows, ]
## Mean-normalized variance
vars - log10( apply( r.all, 2, var, na.rm=T ) / abs( apply(
r.all, 2, mean, na.rm=T ) ) )
names( vars ) - colnames( ratios )

## Normalize the values by the variance over the rows in the
cluster
if ( var.norm ) {
  vars - vars - mean( vars[ cluster$cols ], na.rm=T )
  tmp.sd - sd( vars[ cluster$cols ], na.rm=T )
  if ( ! is.na( tmp.sd )  tmp.sd != 0 ) vars - vars / (
tmp.sd + r.sig )
}

return( vars )
  }
}


Re: [R] Trim trailng space from data.frame factor variables

2007-08-16 Thread Marc Schwartz
On Thu, 2007-08-16 at 17:54 +0300, Lauri Nikkinen wrote:
 Hi folks,
 
 I would like to trim the trailing spaces in my factor variables using lapply
 (described in this post by Marc Schwartz:
 http://tolstoy.newcastle.edu.au/R/e2/help/07/08/22826.html) but the code is
 not functioning (in this example there is only one factor with trailing
 spaces):

Ayepas I noted in that post, it was untestedmy error.

The problem is that by using ifelse() as I did, the test for the column
being a factor returns a single result, not one result per element.
Hence, the appropriate conditional code is only performed on the first
element in each column, rather than being vectorized on the entire
column.

 y1 - rnorm(20) + 6.8
 y2 - rnorm(20) + (1:20*1.7 + 1)
 y3 - rnorm(20) + (1:20*6.7 + 3.7)
 y - c(y1,y2,y3)
 x - gl(5,12)
 f - gl(3,20, labels=paste(lev, 1:3,, sep=))
 d - data.frame(x=x,y=y, f=f)
 str(d)
 
 d[] - lapply(d, function(x) ifelse(is.factor(x), sub( +$, , x), x))
 str(d)
 
 How should I modify this?

Try this instead:

d[] - lapply(d, function(x) if (is.factor(x)) sub( +$, , x) else x)

 str(d)
'data.frame':   60 obs. of  3 variables:
 $ x: chr  1 1 1 1 ...
 $ y: num  6.70 4.42 8.03 4.90 6.98 ...
 $ f: chr  lev1 lev1 lev1 lev1 ...

Note that by using grep(), the factors are coerced to character vectors
as expected. You would need to coerce back to factors if you need them
as such.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Trim trailng space from data.frame factor variables

2007-08-16 Thread Marc Schwartz
The easiest way might be to modify the lapply() call as follows:

d[] - lapply(d, function(x) if (is.factor(x)) factor(sub( +$, , x)) else x)

 str(d)
'data.frame':   60 obs. of  3 variables:
 $ x: Factor w/ 5 levels 1,2,3,4,..: 1 1 1 1 1 1 1 1 1 1 ...
 $ y: num  7.01 8.33 5.48 6.51 5.61 ...
 $ f: Factor w/ 3 levels lev1,lev2,..: 1 1 1 1 1 1 1 1 1 1 ...


This way the coercion back to a factor takes place within the loop as
needed.

Note that I also meant to type sub() and not grep() below. The default
behavior for both is to return a character vector (if 'value = TRUE' in
grep()). There is not an argument to override that behavior.

HTH,

Marc


On Thu, 2007-08-16 at 19:19 +0300, Lauri Nikkinen wrote:
 Thanks Marc! What would be the easiest way to coerce char-variables
 back to factor-variables? Is there a way to prevent the coercion in
 d[] - lapply(d, function(x) if ( is.factor(x)) sub( +$, , x) else
 x) ?
 
  
 
 -Lauri
 
 
 
 2007/8/16, Marc Schwartz [EMAIL PROTECTED]: 
 On Thu, 2007-08-16 at 17:54 +0300, Lauri Nikkinen wrote:
  Hi folks,
 
  I would like to trim the trailing spaces in my factor
 variables using lapply 
  (described in this post by Marc Schwartz:
  http://tolstoy.newcastle.edu.au/R/e2/help/07/08/22826.html)
 but the code is
  not functioning (in this example there is only one factor
 with trailing 
  spaces):
 
 Ayepas I noted in that post, it was untestedmy error.
 
 The problem is that by using ifelse() as I did, the test for
 the column
 being a factor returns a single result, not one result per
 element. 
 Hence, the appropriate conditional code is only performed on
 the first
 element in each column, rather than being vectorized on the
 entire
 column.
 
  y1 - rnorm(20) + 6.8
  y2 - rnorm(20) + (1:20* 1.7 + 1)
  y3 - rnorm(20) + (1:20*6.7 + 3.7)
  y - c(y1,y2,y3)
  x - gl(5,12)
  f - gl(3,20, labels=paste(lev, 1:3,, sep=))
  d - data.frame (x=x,y=y, f=f)
  str(d)
 
  d[] - lapply(d, function(x) ifelse(is.factor(x), sub( +$,
 , x), x))
  str(d)
 
  How should I modify this?
 
 Try this instead: 
 
 d[] - lapply(d, function(x) if (is.factor(x)) sub( +$, ,
 x) else x)
 
  str(d)
 'data.frame':   60 obs. of  3 variables:
 $ x: chr  1 1 1 1 ... 
 $ y: num  6.70 4.42 8.03 4.90 6.98 ...
 $ f: chr  lev1 lev1 lev1 lev1 ...
 
 Note that by using grep(), the factors are coerced to
 character vectors
 as expected. You would need to coerce back to factors if you
 need them 
 as such.
 
 HTH,
 
 Marc Schwartz


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Urgent Help needed

2007-08-16 Thread AbouEl-Makarim Aboueissa
Dear All:

Urgent help is needed.


I have a data set in matrix format  of three columns: X, Y and index of four 
groups (1,2,3,4). What I need to do is the following;

1- How I can subtract the sample mean of each group indexed 1,2,3,4 from the 
 corresponding data values of this group and create new columns say 
X-sample mean 
  and Y-sample mean? I tried to use the tapply but I have some 
difficulties to restore the new data


2- How I can use the “tapply” if possible or any other R-function to find the 
correlation 
 coefficient between the X and Y columns for each group indexed 1,2,3,4.? 
Could not use the tapply.


I attached part of the data as txt file.


Thank you so much for your attention to this matter, and I look forward to hear 
from you soon.

Regards,

Abou


Data:

x   y   index
15807.2412.54
15752.5133.54
12893.7601.53
8426.88 22.23
5706.24 333 3
3982.08 560 2
3642.62 670 2
295.68  124 1
215.40  104 1
195.40  204 1
4240.21 22.42
1222.72 45.92
1142.26 23.62
63.00   90.11
1216.00 82.42
2769.60 111 2
1790.46 34.72
26.10   26.10   1
19676.830.994
10920.60203 3
6144.00 46  3
4534.48 4534.48 3
4.0065  4
29500.0056  4
17100.0077  4
9000.00 435 3
6300.00 84  3
3962.88 334 2
5690.00 653 3
3736.00 233 2
2750.00 22  2
1316.00 345 2
4595.00 4595.00 3
5928.00 45  3
2645.70 0.002
2580.24 454 2
6547.34 6547.34 3
1615.68 5   2
194.06  55  1
184.80  6   1
82.94   44  1
16649.0056  4
4500.00 74  3
1600.00 744 2

=



==
AbouEl-Makarim Aboueissa, Ph.D.
Assistant Professor of Statistics
Department of Mathematics  Statistics
University of Southern Maine
96 Falmouth Street
P.O. Box 9300
Portland, ME 04104-9300

Tel: (207) 228-8389
Email: [EMAIL PROTECTED]
  [EMAIL PROTECTED]
Office: 301C Payson Smith

x   y   index
15807.2412.54
15752.5133.54
12893.7601.53
8426.88 22.23
5706.24 333 3
3982.08 560 2
3642.62 670 2
295.68  124 1
215.40  104 1
195.40  204 1
4240.21 22.42
1222.72 45.92
1142.26 23.62
63.00   90.11
1216.00 82.42
2769.60 111 2
1790.46 34.72
26.10   26.10   1
19676.830.994
10920.60203 3
6144.00 46  3
4534.48 4534.48 3
4.0065  4
29500.0056  4
17100.0077  4
9000.00 435 3
6300.00 84  3
3962.88 334 2
5690.00 653 3
3736.00 233 2
2750.00 22  2
1316.00 345 2
4595.00 4595.00 3
5928.00 45  3
2645.70 0.002
2580.24 454 2
6547.34 6547.34 3
1615.68 5   2
194.06  55  1
184.80  6   1
82.94   44  1
16649.0056  4
4500.00 74  3
1600.00 744 2
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Trim trailng space from data.frame factor variables

2007-08-16 Thread Prof Brian Ripley
On Thu, 16 Aug 2007, Marc Schwartz wrote:

 The easiest way might be to modify the lapply() call as follows:

 d[] - lapply(d, function(x) if (is.factor(x)) factor(sub( +$, , x)) else 
 x)

 str(d)
 'data.frame':   60 obs. of  3 variables:
 $ x: Factor w/ 5 levels 1,2,3,4,..: 1 1 1 1 1 1 1 1 1 1 ...
 $ y: num  7.01 8.33 5.48 6.51 5.61 ...
 $ f: Factor w/ 3 levels lev1,lev2,..: 1 1 1 1 1 1 1 1 1 1 ...


 This way the coercion back to a factor takes place within the loop as
 needed.

 Note that I also meant to type sub() and not grep() below. The default
 behavior for both is to return a character vector (if 'value = TRUE' in
 grep()). There is not an argument to override that behavior.

I would have thought the thing to do was to apply sub() to the levels:

chfactor - function(x) { levels(x) - sub( +$, , levels(x)); x }

d[] - lapply(d, function(x) if (is.factor(x)) chfactor(x) else x)

This has the advantage of not losing the order of the levels.  It will 
merge levels if they only differ in the number of trailing spaces, which 
is probably what you want.


 HTH,

 Marc


 On Thu, 2007-08-16 at 19:19 +0300, Lauri Nikkinen wrote:
 Thanks Marc! What would be the easiest way to coerce char-variables
 back to factor-variables? Is there a way to prevent the coercion in
 d[] - lapply(d, function(x) if ( is.factor(x)) sub( +$, , x) else
 x) ?



 -Lauri



 2007/8/16, Marc Schwartz [EMAIL PROTECTED]:
 On Thu, 2007-08-16 at 17:54 +0300, Lauri Nikkinen wrote:
 Hi folks,

 I would like to trim the trailing spaces in my factor
 variables using lapply
 (described in this post by Marc Schwartz:
 http://tolstoy.newcastle.edu.au/R/e2/help/07/08/22826.html)
 but the code is
 not functioning (in this example there is only one factor
 with trailing
 spaces):

 Ayepas I noted in that post, it was untestedmy error.

 The problem is that by using ifelse() as I did, the test for
 the column
 being a factor returns a single result, not one result per
 element.
 Hence, the appropriate conditional code is only performed on
 the first
 element in each column, rather than being vectorized on the
 entire
 column.

 y1 - rnorm(20) + 6.8
 y2 - rnorm(20) + (1:20* 1.7 + 1)
 y3 - rnorm(20) + (1:20*6.7 + 3.7)
 y - c(y1,y2,y3)
 x - gl(5,12)
 f - gl(3,20, labels=paste(lev, 1:3,, sep=))
 d - data.frame (x=x,y=y, f=f)
 str(d)

 d[] - lapply(d, function(x) ifelse(is.factor(x), sub( +$,
 , x), x))
 str(d)

 How should I modify this?

 Try this instead:

 d[] - lapply(d, function(x) if (is.factor(x)) sub( +$, ,
 x) else x)

 str(d)
 'data.frame':   60 obs. of  3 variables:
 $ x: chr  1 1 1 1 ...
 $ y: num  6.70 4.42 8.03 4.90 6.98 ...
 $ f: chr  lev1 lev1 lev1 lev1 ...

 Note that by using grep(), the factors are coerced to
 character vectors
 as expected. You would need to coerce back to factors if you
 need them
 as such.

 HTH,

 Marc Schwartz

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Trim trailng space from data.frame factor variables

2007-08-16 Thread Marc Schwartz
On Thu, 2007-08-16 at 17:52 +0100, Prof Brian Ripley wrote:
 On Thu, 16 Aug 2007, Marc Schwartz wrote:
 
  The easiest way might be to modify the lapply() call as follows:
 
  d[] - lapply(d, function(x) if (is.factor(x)) factor(sub( +$, , x)) 
  else x)
 
  str(d)
  'data.frame':   60 obs. of  3 variables:
  $ x: Factor w/ 5 levels 1,2,3,4,..: 1 1 1 1 1 1 1 1 1 1 ...
  $ y: num  7.01 8.33 5.48 6.51 5.61 ...
  $ f: Factor w/ 3 levels lev1,lev2,..: 1 1 1 1 1 1 1 1 1 1 ...
 
 
  This way the coercion back to a factor takes place within the loop as
  needed.
 
  Note that I also meant to type sub() and not grep() below. The default
  behavior for both is to return a character vector (if 'value = TRUE' in
  grep()). There is not an argument to override that behavior.
 
 I would have thought the thing to do was to apply sub() to the levels:
 
 chfactor - function(x) { levels(x) - sub( +$, , levels(x)); x }
 
 d[] - lapply(d, function(x) if (is.factor(x)) chfactor(x) else x)
 
 This has the advantage of not losing the order of the levels.  It will 
 merge levels if they only differ in the number of trailing spaces, which 
 is probably what you want.

Quite true. 

As also noted in that prior thread as I recall, is the 'strip.white'
option in read.table() et al which would obviate the need for
post-import trimming if it makes sense in this application for Lauri.

Thanks,

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] several plots on several pages

2007-08-16 Thread Greg Snow
You can set up the 3 plots per page by using:

 par(mfrow=c(3,1))

Then there are a couple of options for skipping the top graphics
position if the graph fails.  If you know that the graph failed then you
can just use plot.new() (or frame()) to skip the top plot and plot the
next one in the 2nd position.

Another option is you can call par(mfg=c(2,1)) explicitly before
plotting the 2nd plot to force it to plot in the 2,1 position.

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111
 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Rainer M. Krug
 Sent: Thursday, August 16, 2007 6:58 AM
 To: r-help
 Subject: [R] several plots on several pages
 
 Hi
 
   version
 _
 platform   i686-pc-linux-gnu
 arch   i686
 os linux-gnu
 system i686, linux-gnu
 status
 major  2
 minor  5.1
 year   2007
 month  06
 day27
 svn rev42083
 language   R
 version.string R version 2.5.1 (2007-06-27)
 
 
 
 I want to create a pdf withe three graphs on a page and with 
 two pages:
 
 -
 | 1 |
 -
 | 2 |
 -
 | 3 |
 -
 
 NEW PAGE
 
 -
 | 4 |
 -
 | 5 |
 -
 | 6 |
 -
 
 Graph 1 should ALWAYS be at that spot, graph two also, even 
 if graph one produces an error when plotting (the area can be 
 empty, but doesn't have
 to.)
 
 I produced the foolowing code below, but I have a few problems:
 
 1) how can I create a new page in the pdf?
 
 2) how can I make sure that the second graph is in position 2 
 when graph one produces an error when plotting I(as in the 
 example)? Everything works OK (for the firsat page) when 
 graph one is plotted.
 
 I have the feeling, that I am thinking to complicated.
 
 Any help welcome,
 
 Rainer
 
 
 pdf(test.pdf)
 try(
  {
  ## Set layout to three rows and only oine column
  par( mfcol=c(3,1), oma=c(0,0,0,0), mar=c(4, 4, 2, 2) )
 
  ## First row
  par(mfg=c(1,1))
  try( plot(runif(ff)) )
 
  ## Second row
  par(mfg=c(2,1))
  try( plot(runif(100)) )
 
  ## Third row
  par(mfg=c(3,1))
  plot(runif(1000))
 
 
  ## THE NEXT THREE SHOULD BE ON A NEW PAGE IN THE PDF
 
  ## Set layout to three rows and only oine column
  par( mfcol=c(3,1), oma=c(0,0,0,0), mar=c(4, 4, 2, 2) )
 
  ## First row
  par(mfg=c(1,1))
  try( plot(runif(ff)) )
 
  ## Second row
  par(mfg=c(2,1))
  try( plot(runif(100)) )
 
  ## Third row
  par(mfg=c(3,1))
  plot(runif(1000))
 
  }
  )
 dev.off()
 
 
 --
 NEW EMAIL ADDRESS AND ADDRESS:
 
 [EMAIL PROTECTED]
 
 [EMAIL PROTECTED] WILL BE DISCONTINUED END OF MARCH
 
 Rainer M. Krug, Dipl. Phys. (Germany), MSc Conservation Biology (UCT)
 
 Plant Conservation Unit
 Department of Botany
 University of Cape Town
 Rondebosch 7701
 South Africa
 
 Tel:  +27 - (0)21 650 5776 (w)
 Fax:  +27 - (0)86 516 2782
 Fax:  +27 - (0)21 650 2440 (w)
 Cell: +27 - (0)83 9479 042
 
 Skype:RMkrug
 
 email:[EMAIL PROTECTED]
   [EMAIL PROTECTED]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] several plots on several pages

2007-08-16 Thread Greg Snow
Oops, I read further down in your original post and see that you already
knew about par(mfg=c(2,1)).  To get it to advance to page 2 for the 4th
plot try calling plot.new() which should move you to the next page, then
doing par(mfg=c(1,1)) should cause the next graph to be at the top.

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111
 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Rainer M. Krug
 Sent: Thursday, August 16, 2007 6:58 AM
 To: r-help
 Subject: [R] several plots on several pages
 
 Hi
 
   version
 _
 platform   i686-pc-linux-gnu
 arch   i686
 os linux-gnu
 system i686, linux-gnu
 status
 major  2
 minor  5.1
 year   2007
 month  06
 day27
 svn rev42083
 language   R
 version.string R version 2.5.1 (2007-06-27)
 
 
 
 I want to create a pdf withe three graphs on a page and with 
 two pages:
 
 -
 | 1 |
 -
 | 2 |
 -
 | 3 |
 -
 
 NEW PAGE
 
 -
 | 4 |
 -
 | 5 |
 -
 | 6 |
 -
 
 Graph 1 should ALWAYS be at that spot, graph two also, even 
 if graph one produces an error when plotting (the area can be 
 empty, but doesn't have
 to.)
 
 I produced the foolowing code below, but I have a few problems:
 
 1) how can I create a new page in the pdf?
 
 2) how can I make sure that the second graph is in position 2 
 when graph one produces an error when plotting I(as in the 
 example)? Everything works OK (for the firsat page) when 
 graph one is plotted.
 
 I have the feeling, that I am thinking to complicated.
 
 Any help welcome,
 
 Rainer
 
 
 pdf(test.pdf)
 try(
  {
  ## Set layout to three rows and only oine column
  par( mfcol=c(3,1), oma=c(0,0,0,0), mar=c(4, 4, 2, 2) )
 
  ## First row
  par(mfg=c(1,1))
  try( plot(runif(ff)) )
 
  ## Second row
  par(mfg=c(2,1))
  try( plot(runif(100)) )
 
  ## Third row
  par(mfg=c(3,1))
  plot(runif(1000))
 
 
  ## THE NEXT THREE SHOULD BE ON A NEW PAGE IN THE PDF
 
  ## Set layout to three rows and only oine column
  par( mfcol=c(3,1), oma=c(0,0,0,0), mar=c(4, 4, 2, 2) )
 
  ## First row
  par(mfg=c(1,1))
  try( plot(runif(ff)) )
 
  ## Second row
  par(mfg=c(2,1))
  try( plot(runif(100)) )
 
  ## Third row
  par(mfg=c(3,1))
  plot(runif(1000))
 
  }
  )
 dev.off()
 
 
 --
 NEW EMAIL ADDRESS AND ADDRESS:
 
 [EMAIL PROTECTED]
 
 [EMAIL PROTECTED] WILL BE DISCONTINUED END OF MARCH
 
 Rainer M. Krug, Dipl. Phys. (Germany), MSc Conservation Biology (UCT)
 
 Plant Conservation Unit
 Department of Botany
 University of Cape Town
 Rondebosch 7701
 South Africa
 
 Tel:  +27 - (0)21 650 5776 (w)
 Fax:  +27 - (0)86 516 2782
 Fax:  +27 - (0)21 650 2440 (w)
 Cell: +27 - (0)83 9479 042
 
 Skype:RMkrug
 
 email:[EMAIL PROTECTED]
   [EMAIL PROTECTED]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Trim trailng space from data.frame factor variables

2007-08-16 Thread Phil Spector
If the problem is with the levels of the factor, why not change
them directly?

 d = data.frame(a=1:5,
+ b=c('one ','two','three ','three ','two'))
 d$b
[1] onetwothree  three  two
Levels: one  three  two
 levels(d$b) = sub(' +$','',levels(d$b))
 d$b
[1] one   two   three three two
Levels: one three two

- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 [EMAIL PROTECTED]




On Thu, 16 Aug 2007, Marc Schwartz wrote:

 The easiest way might be to modify the lapply() call as follows:

 d[] - lapply(d, function(x) if (is.factor(x)) factor(sub( +$, , x)) else 
 x)

 str(d)
 'data.frame':   60 obs. of  3 variables:
 $ x: Factor w/ 5 levels 1,2,3,4,..: 1 1 1 1 1 1 1 1 1 1 ...
 $ y: num  7.01 8.33 5.48 6.51 5.61 ...
 $ f: Factor w/ 3 levels lev1,lev2,..: 1 1 1 1 1 1 1 1 1 1 ...


 This way the coercion back to a factor takes place within the loop as
 needed.

 Note that I also meant to type sub() and not grep() below. The default
 behavior for both is to return a character vector (if 'value = TRUE' in
 grep()). There is not an argument to override that behavior.

 HTH,

 Marc


 On Thu, 2007-08-16 at 19:19 +0300, Lauri Nikkinen wrote:
 Thanks Marc! What would be the easiest way to coerce char-variables
 back to factor-variables? Is there a way to prevent the coercion in
 d[] - lapply(d, function(x) if ( is.factor(x)) sub( +$, , x) else
 x) ?



 -Lauri



 2007/8/16, Marc Schwartz [EMAIL PROTECTED]:
 On Thu, 2007-08-16 at 17:54 +0300, Lauri Nikkinen wrote:
 Hi folks,

 I would like to trim the trailing spaces in my factor
 variables using lapply
 (described in this post by Marc Schwartz:
 http://tolstoy.newcastle.edu.au/R/e2/help/07/08/22826.html)
 but the code is
 not functioning (in this example there is only one factor
 with trailing
 spaces):

 Ayepas I noted in that post, it was untestedmy error.

 The problem is that by using ifelse() as I did, the test for
 the column
 being a factor returns a single result, not one result per
 element.
 Hence, the appropriate conditional code is only performed on
 the first
 element in each column, rather than being vectorized on the
 entire
 column.

 y1 - rnorm(20) + 6.8
 y2 - rnorm(20) + (1:20* 1.7 + 1)
 y3 - rnorm(20) + (1:20*6.7 + 3.7)
 y - c(y1,y2,y3)
 x - gl(5,12)
 f - gl(3,20, labels=paste(lev, 1:3,, sep=))
 d - data.frame (x=x,y=y, f=f)
 str(d)

 d[] - lapply(d, function(x) ifelse(is.factor(x), sub( +$,
 , x), x))
 str(d)

 How should I modify this?

 Try this instead:

 d[] - lapply(d, function(x) if (is.factor(x)) sub( +$, ,
 x) else x)

 str(d)
 'data.frame':   60 obs. of  3 variables:
 $ x: chr  1 1 1 1 ...
 $ y: num  6.70 4.42 8.03 4.90 6.98 ...
 $ f: chr  lev1 lev1 lev1 lev1 ...

 Note that by using grep(), the factors are coerced to
 character vectors
 as expected. You would need to coerce back to factors if you
 need them
 as such.

 HTH,

 Marc Schwartz


 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] AIC and logLik for logistic regression in R and S-PLUS

2007-08-16 Thread Ben Bolker
Leandra Desousa sousa at ims.uaf.edu writes:

 I am using 'R' version 2.2.1 and 'S-PLUS' version 6.0; and I loaded the 
 MASS library in 'S-PLUS'.
 
 I am running a logistic regression using glm:
 
  summary(mydata.glm)
 Call:
 glm(formula = COMU ~ MeanPycUpT + MeanPycUpS, family = binomial,data = 
 mydata)

  [snip]

 Null deviance: 30.316  on 21  degrees of freedom
 Residual deviance: 23.900  on 19  degrees of freedom
 AIC: 29.9
 
 'R'
 -
   AIC(mydata.glm)
 [1] 29.89986
 
   logLik(mydata.glm)
 'log Lik.' -11.94993 (df=3)
 -
 
 'S-PLUS'
 -
   AIC(mydata.glm)
 [1] 71.03222
 
   logLik(mydata.glm)
 [1] -31.51611
 -
 

 1) Which AIC value is the correct one?
 2) Which log-likelihood value is the correct one?

   AIC and log-likelihood are often defined differently
in software packages -- specifically, additive constants
can be included or excluded as long as they are done consistently,
without affecting inferences from the model.  The absolute
values of AIC and logLik aren't that important; the only thing
that really matters are differences among models.  Have you
tried comparing models within R and within S-PLUS to establish
whether they give the same inferences (I would guess they do)?

 3) If  'extractAIC' in 'S-PLUS' and all values in 'R' are the correct 
 ones, and the 'AIC' and 'logLik' in 'S-PLUS' values are wrong then:
Why 'S-PLUS' cannot retrieve a log-likelihood value from my glm 
 object('mydata.glm'), even though it is using log-likelihood to 
 calculate its residual deviance?

   It's very hard for us to debug S-PLUS!  Some (most?) of
us on the list don't even have S-PLUS installed any more ...
Perhaps you should ask this question on an S-PLUS mailing list,
or of the (paid) S-PLUS technical support team ...
 
   And the obligatory R-list nags:

 * it would help if you upgraded your R version -- R 2.3.0
came out in April 2006, and we're now up to 2.5.1
  * it's the MASS package, not the MASS library

  cheers
Ben Bolker

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Urgent Help needed

2007-08-16 Thread Weiwei Shi
try this:

t0 = read.table(datatest.txt, header=T)
X.mean = ave(t0[,1], as.factor(t0[,3]))

you do the rest of Y.mean and make them into a data.fame or whatever.

HTH,

Weiwei

On 8/16/07, AbouEl-Makarim Aboueissa [EMAIL PROTECTED] wrote:
 Dear All:

 Urgent help is needed.


 I have a data set in matrix format  of three columns: X, Y and index of four 
 groups (1,2,3,4). What I need to do is the following;

 1- How I can subtract the sample mean of each group indexed 1,2,3,4 from the
  corresponding data values of this group and create new columns say 
 X-sample mean
   and Y-sample mean? I tried to use the tapply but I have some 
 difficulties to restore the new data


 2- How I can use the tapply if possible or any other R-function to find the 
 correlation
  coefficient between the X and Y columns for each group indexed 1,2,3,4.? 
 Could not use the tapply.


 I attached part of the data as txt file.


 Thank you so much for your attention to this matter, and I look forward to 
 hear from you soon.

 Regards,

 Abou


 Data:
 
 x   y   index
 15807.2412.54
 15752.5133.54
 12893.7601.53
 8426.88 22.23
 5706.24 333 3
 3982.08 560 2
 3642.62 670 2
 295.68  124 1
 215.40  104 1
 195.40  204 1
 4240.21 22.42
 1222.72 45.92
 1142.26 23.62
 63.00   90.11
 1216.00 82.42
 2769.60 111 2
 1790.46 34.72
 26.10   26.10   1
 19676.830.994
 10920.60203 3
 6144.00 46  3
 4534.48 4534.48 3
 4.0065  4
 29500.0056  4
 17100.0077  4
 9000.00 435 3
 6300.00 84  3
 3962.88 334 2
 5690.00 653 3
 3736.00 233 2
 2750.00 22  2
 1316.00 345 2
 4595.00 4595.00 3
 5928.00 45  3
 2645.70 0.002
 2580.24 454 2
 6547.34 6547.34 3
 1615.68 5   2
 194.06  55  1
 184.80  6   1
 82.94   44  1
 16649.0056  4
 4500.00 74  3
 1600.00 744 2

 =



 ==
 AbouEl-Makarim Aboueissa, Ph.D.
 Assistant Professor of Statistics
 Department of Mathematics  Statistics
 University of Southern Maine
 96 Falmouth Street
 P.O. Box 9300
 Portland, ME 04104-9300

 Tel: (207) 228-8389
 Email: [EMAIL PROTECTED]
   [EMAIL PROTECTED]
 Office: 301C Payson Smith


 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

Did you always know?
No, I did not. But I believed...
---Matrix III

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] use AnnBuilderSourceUrls with local path insted of ftp adress

2007-08-16 Thread John Zhang


I want to build my own GO package using the function GOPkgBuilder of 
AnnBuilder. It uses AnnBuilderSourceUrls to collect data from different ftp 
sites. These data are not complete for my organism, so I would like to change 
the ftp adresses to a local one. The changing itself is working but when I run 
the script I get the following Error:

Error in loadFromUrl(file.path(sourceURLs[[EG]], gene2go.gz)) : 
 URL /path/to/file/gene2go.gz is incorrect or the target site is not 
responding!


First of all, please post questions about BioConductor packages in BioConductor 
mailing list instead of R.

Is /path/to/file a real local path? If so, you may also need to modify 
GOPkgBuilder a little. Try to change the line

eg = EG(srcUrl = sourceURLs[[EG]]) to
eg = EG(srcUrl = sourceURLs[[EG]], fromWeb = FALSE)



The full code of the script is this:

library(AnnBuilder)
list-options(AnnBuilderSourceUrls) 
list$AnnBuilderSourceUrls$EG-/path/to/file/
options(AnnBuilderSourceUrls=list$AnnBuilderSourceUrls)

GOPkgBuilder(pkgName = GO, pkgPath = ., filename = 
go_200706-termdb.obo-xml.gz, version = v.1.0, author = list(author = 
author, maintainer = maintainer [EMAIL PROTECTED]) )

Thanks in advance for any help.

Daniela Albrecht


-- 
Pt! Schon vom neuen GMX MultiMessenger gehört?
Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Jianhua Zhang
Department of Medical Oncology
Dana-Farber Cancer Institute
44 Binney Street
Boston, MA 02115-6084

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Urgent Help needed

2007-08-16 Thread Henrique Dallazuanna
For the 2nd item, perhaps:

by(df[,1:2], df$index, FUN=cor)

where df is your data.frame.

-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

On 16/08/07, AbouEl-Makarim Aboueissa [EMAIL PROTECTED] wrote:

 Dear All:

 Urgent help is needed.


 I have a data set in matrix format  of three columns: X, Y and index of
 four groups (1,2,3,4). What I need to do is the following;

 1- How I can subtract the sample mean of each group indexed 1,2,3,4 from
 the
  corresponding data values of this group and create new columns say
 X-sample mean
   and Y-sample mean? I tried to use the tapply but I have some
 difficulties to restore the new data


 2- How I can use the tapply if possible or any other R-function to find
 the correlation
  coefficient between the X and Y columns for each group indexed
 1,2,3,4.? Could not use the tapply.


 I attached part of the data as txt file.


 Thank you so much for your attention to this matter, and I look forward to
 hear from you soon.

 Regards,

 Abou


 Data:
 
 x   y   index
 15807.2412.54
 15752.5133.54
 12893.7601.53
 8426.88 22.23
 5706.24 333 3
 3982.08 560 2
 3642.62 670 2
 295.68  124 1
 215.40  104 1
 195.40  204 1
 4240.21 22.42
 1222.72 45.92
 1142.26 23.62
 63.00   90.11
 1216.00 82.42
 2769.60 111 2
 1790.46 34.72
 26.10   26.10   1
 19676.830.994
 10920.60203 3
 6144.00 46  3
 4534.48 4534.48 3
 4.0065  4
 29500.0056  4
 17100.0077  4
 9000.00 435 3
 6300.00 84  3
 3962.88 334 2
 5690.00 653 3
 3736.00 233 2
 2750.00 22  2
 1316.00 345 2
 4595.00 4595.00 3
 5928.00 45  3
 2645.70 0.002
 2580.24 454 2
 6547.34 6547.34 3
 1615.68 5   2
 194.06  55  1
 184.80  6   1
 82.94   44  1
 16649.0056  4
 4500.00 74  3
 1600.00 744 2

 =



 ==
 AbouEl-Makarim Aboueissa, Ph.D.
 Assistant Professor of Statistics
 Department of Mathematics  Statistics
 University of Southern Maine
 96 Falmouth Street
 P.O. Box 9300
 Portland, ME 04104-9300

 Tel: (207) 228-8389
 Email: [EMAIL PROTECTED]
   [EMAIL PROTECTED]
 Office: 301C Payson Smith


 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Urgent Help needed

2007-08-16 Thread Marc Schwartz
On Thu, 2007-08-16 at 12:33 -0400, AbouEl-Makarim Aboueissa wrote:
 Dear All:
 
 Urgent help is needed.
 
 
 I have a data set in matrix format  of three columns: X, Y and index
 of four groups (1,2,3,4). What I need to do is the following;
 
 1- How I can subtract the sample mean of each group indexed 1,2,3,4
 from the 
  corresponding data values of this group and create new columns
 say X-sample mean 
   and Y-sample mean? I tried to use the tapply but I have some
 difficulties to restore the new data
 
 
 2- How I can use the “tapply” if possible or any other R-function to
 find the correlation 
  coefficient between the X and Y columns for each group indexed
 1,2,3,4.? Could not use the tapply.
 
 
 I attached part of the data as txt file.
 
 
 Thank you so much for your attention to this matter, and I look
 forward to hear from you soon.
 
 Regards,
 
 Abou
 
 
 Data:
 
 x y   index
 15807.24  12.54
 15752.51  33.54
 12893.76  01.53
 8426.88   22.23
 5706.24   333 3
 3982.08   560 2
 3642.62   670 2
 295.68124 1
 215.40104 1
 195.40204 1
 4240.21   22.42
 1222.72   45.92
 1142.26   23.62
 63.00 90.11
 1216.00   82.42
 2769.60   111 2
 1790.46   34.72
 26.10 26.10   1
 19676.83  0.994
 10920.60  203 3
 6144.00   46  3
 4534.48   4534.48 3
 4.00  65  4
 29500.00  56  4
 17100.00  77  4
 9000.00   435 3
 6300.00   84  3
 3962.88   334 2
 5690.00   653 3
 3736.00   233 2
 2750.00   22  2
 1316.00   345 2
 4595.00   4595.00 3
 5928.00   45  3
 2645.70   0.002
 2580.24   454 2
 6547.34   6547.34 3
 1615.68   5   2
 194.0655  1
 184.806   1
 82.94 44  1
 16649.00  56  4
 4500.00   74  3
 1600.00   744 2
 
 =


I might be tempted to take the following approach:

If your data is a matrix, coerce it to a data frame first. Let's call
that 'DF'.

 str(DF)
'data.frame':   44 obs. of  3 variables:
 $ x: num  15807 15753 12894  8427  5706 ...
 $ y: num  12.5 33.5 1.5 22.2 333 560 670 124 104 204 ...
 $ index: int  4 4 3 3 3 2 2 1 1 1 ...


Now use split() to break up the data frame into a list of 4
sub-dataframes, based upon the index value.  We can use scale() within a
lapply() loop to center the 'x' and 'y' columns for each sub-dataframe:


DF.ctr - lapply(split(DF[, -3], DF$index), scale, scale = FALSE)


 str(DF.ctr)
List of 4
 $ 1: num [1:8, 1:2]  138.5   58.2   38.2  -94.2 -131.1 ...
  ..- attr(*, dimnames)=List of 2
  .. ..$ : chr [1:8] 8 9 10 14 ...
  .. ..$ : chr [1:2] x y
  ..- attr(*, scaled:center)= Named num [1:2] 157.2  81.7
  .. ..- attr(*, names)= chr [1:2] x y
 $ 2: num [1:16, 1:2]  1469  1129  1727 -1291 -1371 ...
  ..- attr(*, dimnames)=List of 2
  .. ..$ : chr [1:16] 6 7 11 12 ...
  .. ..$ : chr [1:2] x y
  ..- attr(*, scaled:center)= Named num [1:2] 2513  230
  .. ..- attr(*, names)= chr [1:2] x y
 $ 3: num [1:13, 1:2]  5879  1413 -1308  3906  -870 ...
  ..- attr(*, dimnames)=List of 2
  .. ..$ : chr [1:13] 3 4 5 20 ...
  .. ..$ : chr [1:2] x y
  ..- attr(*, scaled:center)= Named num [1:2] 7014 1352
  .. ..- attr(*, names)= chr [1:2] x y
 $ 4: num [1:7, 1:2] -6262 -6317 -2393 17931  7431 ...
  ..- attr(*, dimnames)=List of 2
  .. ..$ : chr [1:7] 1 2 19 23 ...
  .. ..$ : chr [1:2] x y
  ..- attr(*, scaled:center)= Named num [1:2] 2206943
  .. ..- attr(*, names)= chr [1:2] x y


Now, create a new single DF comprised of the sub-dataframes from DF.ctr:

DF.new - do.call(rbind, DF.ctr)


Define colnames:

colnames(DF.new) - c(x-mean, y-mean)


 str(DF.new)
 num [1:44, 1:2]  138.5   58.2   38.2  -94.2 -131.1 ...
 - attr(*, dimnames)=List of 2
  ..$ : chr [1:44] 8 9 10 14 ...
  ..$ : chr [1:2] x-mean y-mean


Now, use merge() to join DF and DF.new by the rownames:

DF.final - merge(DF, DF.new, by = row.names)

 DF.final
   Row.namesx   y index  x-mean   y-mean
1  1 15807.24   12.50 4 -6262.12857   -30.498571
2 10   195.40  204.00 138.22750   122.35
3 11  4240.21   22.40 2  1726.93188  -208.037500
4 12  1222.72   45.90 2 -1290.55812  -184.537500
5 13  1142.26   23.60 2 -1371.01812  -206.837500
6 1463.00   90.10 1   -94.17250 8.45
7 15  1216.00   82.40 2 -1297.27812  -148.037500
8 16  2769.60  111.00 2   256.32188  -119.437500
9 17  1790.46   34.70 2  -722.81812  -195.737500
101826.10   26.10 1  -131.07250   -55.55
1119 19676.830.99 4 -2392.53857   -42.008571
12 2 15752.51   33.50 4 -6316.85857-9.498571
1320 10920.60  

Re: [R] Possible memory leak with R v.2.5.0

2007-08-16 Thread Peter Waltman
Hi Martin  -

Thanks for the feedback.  Right after I sent the email to the list last 
night, I realized that I'd forgotten to clear the all the vars we 
attach() to the environment before rm()'ing everything.  Whoops.

However, I found that while doing this *does* reduce the memory stamp by 
about a gig (down to 2.6 from 3.4), subsequent calls to gc() still can't 
reduce the memory stamp any further.

Also, I probably should have added some add'l explanation of our code.  
I'm working on is legacy code that doesn't implement packages per se (or 
per the definition that R uses, at least).  Instead, they are lists() of 
functions, i.e.

try ( detach( gain-funcs ), silent=T )
gain-funcs = list(
func1 = function() {
...
},
func2 = function(){
...
   }
)
attach( gain-funcs )

In this framework, we can source a given .R file and have access to 
these functions without cluttering up the global namespace.

Additionally, the 'ratios' var contains the expression matrix, and is 
actually an elt of another variable called 'global.data' that is also 
attach()'d.  Similar to what you suggested, I pulled out the get.global( 
'ratios' ) parameter from the function (since it's already available 
globally), and found that that had no effect on reducing the memory 
stamp.  Frustrating.

Likewise, after checking, I discovered that the

cluster - cluster

command was just for debugging purposes, so I've since commented it out 
in both your version, as well as mine.

Additionally, there is no need to pass the cluster into the argument 
since we keep it in a stack (implemented as a list) that's available in 
the global namespace, so I re-wrote your version (and mine) to take the 
index of the cluster, rather than pass the cluster itself.

Running your stripped down version of the get.vars.for.cluster() fn on 5 
clusters caused R's memory stamp to jump up to as high as 5.7 gig, 
ending with a final stamp of 4.9 gig.  Detach()'ing all the vars we add 
and the gc()'ing allowed it to drop to 3.4 gig.  rm()'ing the var that 
stored the results of these 5 get.vars.for.cluster() calls and then 
gc()'ing did not further reduce the memory stamp.  rm()'ing everything 
from the global namespace and gc()'in dropped this further to 3.1 gig, 
but no further.

Adding lines to remove the r and r.all vars and gc() w/in the 
get.vars.for.cluster() function reduced the running memory footprint to 
a range of 3.5-4.4 gig, and detach()'ing, rm()'ing and gc()'ing dropped 
it down to 2.8.

The odd thing is that calling gc() at this point reports that R is using 
far less memory than 'top' reports, for example, from one of my tests:

  gc()
   used  (Mb) gc trigger   (Mb) max used  (Mb)
Ncells   380537  20.46366476  340.1   380701  20.4
Vcells 88715615 676.9  277437132 2116.7 88715745 676.9

versus top:

PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
1199 waltman   17   0 3844m 3.7g 3216 S  0.0 23.6  29:58.74 R

Given this behavior, I can only assume that there's a memory leak in R, 
at least for v.2.5.0.  I'll try to get a version of v.2.4.x installed 
somewhere to see if I see a similar behavior with it.

Peter

Martin Morgan wrote:
 Hi Peter --

 Here's my guess.

 Ironically, adding things to broken code reduces the signal to noise
 ratio. I ended up with

 get.vars.for.cluster = function(
   cluster,
   genes=get.global(gene.ids ),
   ratios=get.global(ratios))
 {
 cluster - cluster
 rows - cluster$rows
 cols - cluster$cols

 r - ratios[ rows, cols ]
 avg.rows - apply( r, 2, mean, na.rm=TRUE )
 r.all - ratios[ genes, cols ]
 devs - apply( r.all, 1, -, avg.rows )

 apply( devs, 2, var, na.rm=TRUE )
 }

 at what might reproduce your problem (though can't be sure!). The
 unusual bit is

 cluster - cluster

 At first I thought this would be a no-op (assigning cluster to
 itself), but apparently at this point in the code cluster does not
 exist in the environment of the function (just in the call) so cluster
 gets assigned outside the function.

 So then my guess is that get.vars.for.cluster is part of a package,
 and the package has a variable called cluster. get.vars.for.cluster
 then assigns its first argument to the package variable cluster (which
 is the first variable called cluster that -
 encounters). rm(list=ls(all=TRUE)) removes everything from the global
 environment, but (fortunately!) not from the package environment.

 You might end up storing more than 'just' cluster, depending on what
 it is.

 So I think the solution is to rethink the use of - (and also the
 get.global(), which are either for convenience (in which case it would
 probably be better to specify a default for the function argument) or
 out of a sense that copying is bad (but this is probably mistaken,
 since R's semantics are 'copy on change', so passing a 'big' object
 into a function does not usually trigger 

Re: [R] Urgent Help needed

2007-08-16 Thread AbouEl-Makarim Aboueissa
Thanks all.

it works.

Just one more thing: if you look to this out put,

 by(data1[,2:3], data1[,4], cor)[1]
$`1`
XY
X   1.0000.4400451
Y   0.4400451  1.000

Q. How I can just pick the value of the correlation 0.4400451 from this output 
and call it sat corxy.


Once again thank you all so much for your helps.


Abou


==
AbouEl-Makarim Aboueissa, Ph.D.
Assistant Professor of Statistics
Department of Mathematics  Statistics
University of Southern Maine
96 Falmouth Street
P.O. Box 9300
Portland, ME 04104-9300

Tel: (207) 228-8389
Email: [EMAIL PROTECTED]
  [EMAIL PROTECTED]
Office: 301C Payson Smith

 Henrique Dallazuanna [EMAIL PROTECTED] 8/16/2007 2:05 PM 
For the 2nd item, perhaps:

by(df[,1:2], df$index, FUN=cor)

where df is your data.frame.

-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

On 16/08/07, AbouEl-Makarim Aboueissa [EMAIL PROTECTED] wrote:

 Dear All:

 Urgent help is needed.


 I have a data set in matrix format  of three columns: X, Y and index of
 four groups (1,2,3,4). What I need to do is the following;

 1- How I can subtract the sample mean of each group indexed 1,2,3,4 from
 the
  corresponding data values of this group and create new columns say
 X-sample mean
   and Y-sample mean? I tried to use the tapply but I have some
 difficulties to restore the new data


 2- How I can use the tapply if possible or any other R-function to find
 the correlation
  coefficient between the X and Y columns for each group indexed
 1,2,3,4.? Could not use the tapply.


 I attached part of the data as txt file.


 Thank you so much for your attention to this matter, and I look forward to
 hear from you soon.

 Regards,

 Abou


 Data:
 
 x   y   index
 15807.2412.54
 15752.5133.54
 12893.7601.53
 8426.88 22.23
 5706.24 333 3
 3982.08 560 2
 3642.62 670 2
 295.68  124 1
 215.40  104 1
 195.40  204 1
 4240.21 22.42
 1222.72 45.92
 1142.26 23.62
 63.00   90.11
 1216.00 82.42
 2769.60 111 2
 1790.46 34.72
 26.10   26.10   1
 19676.830.994
 10920.60203 3
 6144.00 46  3
 4534.48 4534.48 3
 4.0065  4
 29500.0056  4
 17100.0077  4
 9000.00 435 3
 6300.00 84  3
 3962.88 334 2
 5690.00 653 3
 3736.00 233 2
 2750.00 22  2
 1316.00 345 2
 4595.00 4595.00 3
 5928.00 45  3
 2645.70 0.002
 2580.24 454 2
 6547.34 6547.34 3
 1615.68 5   2
 194.06  55  1
 184.80  6   1
 82.94   44  1
 16649.0056  4
 4500.00 74  3
 1600.00 744 2

 =



 ==
 AbouEl-Makarim Aboueissa, Ph.D.
 Assistant Professor of Statistics
 Department of Mathematics  Statistics
 University of Southern Maine
 96 Falmouth Street
 P.O. Box 9300
 Portland, ME 04104-9300

 Tel: (207) 228-8389
 Email: [EMAIL PROTECTED] 
   [EMAIL PROTECTED] 
 Office: 301C Payson Smith


 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help 
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html 
 and provide commented, minimal, self-contained, reproducible code.




__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help with optimization using GENOUD

2007-08-16 Thread Anup Nandialath
Dear Friends,

I have been trying to learn how to use the derivative free optimization 
algorithms implemented in the package RGENOUD by Mebane and Sekhon. However, it 
does not seem to work for reasons best described as my total ignorance. If 
anybody has experience using this package, it would be really helpful if you 
can point out where I'm making a mistake. 

Thanks in advance

Anup

Sample code attached

library(rgenoud)

nobs - 5000
t.beta - c(0,1,-1)
X - as.matrix(cbind(rep(1, nobs), runif(nobs), runif(nobs))) # Creating the 
design matrix
prodterm - (X%*%t.beta)+rnorm(nrow(X))
Y - as.matrix(ifelse(prodterm0, 0, 1))

# Defining the likelihood function

log.like - function(beta, Y, X)
{
term1 - pnorm(X%*%beta)
term2 - 1-term1
loglik - (sum(Y*log(term1))+sum((1-Y)*log(term2))) # Likelihood function to be 
maximized
}

stval - c(0,0,0)
opt.output - optim(stval,log.like,Y=Y[,1], X=X[,1:3],
hessian=T, method=BFGS, control=c(fnscale=-1,trace=1))
opt.output

### Now using GENOUD gives me errors

genoud.output - genoud(log.like,beta=stval,X=X[,1:3], Y=Y[,1], nvars=3, 
pop.size=3000, max=TRUE)


   
-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Urgent Help needed

2007-08-16 Thread Henrique Dallazuanna
Hi, try this:

by(df[,1:2], df$index, FUN=function(x)cor(x[1],x[2]))

-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

On 16/08/07, AbouEl-Makarim Aboueissa [EMAIL PROTECTED] wrote:

 Thanks all.

 it works.

 Just one more thing: if you look to this out put,

  by(data1[,2:3], data1[,4], cor)[1]
 $`1`
 XY
 X   1.0000.4400451
 Y   0.4400451  1.000

 Q. How I can just pick the value of the correlation 0.4400451 from this
 output and call it sat corxy.


 Once again thank you all so much for your helps.


 Abou


 ==
 AbouEl-Makarim Aboueissa, Ph.D.
 Assistant Professor of Statistics
 Department of Mathematics  Statistics
 University of Southern Maine
 96 Falmouth Street
 P.O. Box 9300
 Portland, ME 04104-9300

 Tel: (207) 228-8389
 Email: [EMAIL PROTECTED]
   [EMAIL PROTECTED]
 Office: 301C Payson Smith

  Henrique Dallazuanna [EMAIL PROTECTED] 8/16/2007 2:05 PM 
 For the 2nd item, perhaps:

 by(df[,1:2], df$index, FUN=cor)

 where df is your data.frame.

 --
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O

 On 16/08/07, AbouEl-Makarim Aboueissa [EMAIL PROTECTED] wrote:
 
  Dear All:
 
  Urgent help is needed.
 
 
  I have a data set in matrix format  of three columns: X, Y and index of
  four groups (1,2,3,4). What I need to do is the following;
 
  1- How I can subtract the sample mean of each group indexed 1,2,3,4 from
  the
   corresponding data values of this group and create new columns say
  X-sample mean
and Y-sample mean? I tried to use the tapply but I have some
  difficulties to restore the new data
 
 
  2- How I can use the tapply if possible or any other R-function to
 find
  the correlation
   coefficient between the X and Y columns for each group indexed
  1,2,3,4.? Could not use the tapply.
 
 
  I attached part of the data as txt file.
 
 
  Thank you so much for your attention to this matter, and I look forward
 to
  hear from you soon.
 
  Regards,
 
  Abou
 
 
  Data:
  
  x   y   index
  15807.2412.54
  15752.5133.54
  12893.7601.53
  8426.88 22.23
  5706.24 333 3
  3982.08 560 2
  3642.62 670 2
  295.68  124 1
  215.40  104 1
  195.40  204 1
  4240.21 22.42
  1222.72 45.92
  1142.26 23.62
  63.00   90.11
  1216.00 82.42
  2769.60 111 2
  1790.46 34.72
  26.10   26.10   1
  19676.830.994
  10920.60203 3
  6144.00 46  3
  4534.48 4534.48 3
  4.0065  4
  29500.0056  4
  17100.0077  4
  9000.00 435 3
  6300.00 84  3
  3962.88 334 2
  5690.00 653 3
  3736.00 233 2
  2750.00 22  2
  1316.00 345 2
  4595.00 4595.00 3
  5928.00 45  3
  2645.70 0.002
  2580.24 454 2
  6547.34 6547.34 3
  1615.68 5   2
  194.06  55  1
  184.80  6   1
  82.94   44  1
  16649.0056  4
  4500.00 74  3
  1600.00 744 2
 
  =
 
 
 
  ==
  AbouEl-Makarim Aboueissa, Ph.D.
  Assistant Professor of Statistics
  Department of Mathematics  Statistics
  University of Southern Maine
  96 Falmouth Street
  P.O. Box 9300
  Portland, ME 04104-9300
 
  Tel: (207) 228-8389
  Email: [EMAIL PROTECTED]
[EMAIL PROTECTED]
  Office: 301C Payson Smith
 
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Linear models over large datasets

2007-08-16 Thread Alp ATICI
I'd like to fit linear models on very large datasets. My data frames
are about 200 rows x 200 columns of doubles and I am using an 64
bit build of R. I've googled about this extensively and went over the
R Data Import/Export guide. My primary issue is although my data
represented in ascii form is 4Gb in size (therefore much smaller
considered in binary), R consumes about 12Gb of virtual memory.

What exactly are my options to improve this? I looked into the biglm
package but the problem with it is it uses update() function and is
therefore not transparent (I am using a sophisticated script which is
hard to modify). I really liked the concept behind the  LM package
here: http://www.econ.uiuc.edu/~roger/research/rq/RMySQL.html
But it is no longer available. How could one fit linear models to very
large datasets without loading the entire set into memory but from a
file/database (possibly through a connection) using a relatively
simple modification of standard lm()? Alternatively how could one
improve the memory usage of R given a large dataset (by changing some
default parameters of R or even using on-the-fly compression)? I don't
mind much higher levels of CPU time required.

Thank you in advance for your help.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] combining P values using Fisher's method

2007-08-16 Thread Jiong Zhang, PhD
Hi All,

Can somebody tell me how to use R to combine p values using Fisher's method? 
thanks.

Jiong


 The email message (and any attachments) is for the sole use of the intended 
recipient(s) and may contain confidential information.  Any unauthorized 
review, use, disclosure or distribution is prohibited.  If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message (and any attachments).  Thank You.



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] (no subject)

2007-08-16 Thread mara serrano
hi,
i'm new to R and i'm trying to port a quattro pro spreadsheet into R. 
spreadsheets have optional lower and upper limit parameters on the beta 
distribution function. i would like to know how to incorporate this with R's 
pbeta function.

thanks in advance,
mara.




  

Park yourself in front of a world of choices in alternative vehicles. Visit

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error message when using zero-inflated count regression model in package zicounts

2007-08-16 Thread James Milks
No, zeroinfl() in pscl did not give me any errors when I ran the  
model.  So there may be a bug in zicounts.  Only problem now is how  
to interpret the zeroinfl model.  Am I correct in my understanding  
that zeroinfl runs both the poisson and binomial models without  
interactions?  I'm thinking along those lines since no interaction  
terms appear in either model.  If so, how do I check for any  
interactions?

Thanks.

Jim Milks

Graduate Student
Environmental Sciences Ph.D. Program
136 Biological Sciences
Wright State University
3640 Colonel Glenn Hwy
Dayton, OH 45435


On Aug 16, 2007, at 11:19 AM, Achim Zeileis wrote:

 On Thu, 16 Aug 2007, James R. Milks wrote:

 Dr. Stevens,

 I've double-checked my variable lengths.  All of my variables
 (Total.vines, Site, Species, and DBH) came in at 549.  I did correct
 one problem in the data entry that had escaped my previous notice:
 somehow the undergrad who entered all the data managed to make the
 Acer negundo data split into two separate categories while still
 appearing to use the same ACNE abbreviation.  When I made that
 correction and re-ran zicounts, R gave me the following error  
 messages:

 Hmm, I don't know about the error messages in zicounts, but you  
 could try to use the zeroinfl() implementation in package pscl:
   vines.zip - zeroinfl(Total.vines ~ Site + Species + DBH | Site +
 Species + DBH, data = sycamores.1)
 and see whether this produces a similar error.
 Z

  vines.zip-zicounts(resp=Total.vines~.,x=~Site+Species+DBH,z=~Site
 +Species+DBH,distname=ZIP,data=sycamores.1)

 Error in ifelse(y == 0, 1, y/mu) : dim-: dims [product 12] do not
 match the length of object [549]
 In addition: Warning messages:
 1: longer object length is not a multiple of shorter object length
 in: eta + offset
 2: longer object length is not a multiple of shorter object length
 in: y/mu

 In addition, zicounts would not run a normal poisson regression on
 the data, giving me the same error messages as the ZIP regression.
 Doing a poisson regression with glm did not show any error messages.
 However, the glm model with full interactions was still over- 
 dispersed.

 Could the zicounts problem be that the individual sites and species
 had different population sizes?  For instance, Site A had 149 trees,
 site B had 55 trees, site C had 270 trees, and site D had 75 trees.
 The species had similar discrepancies in population sizes, with
 Platanus occidentalis and Acer negundo forming the majority of the
 trees.

 Thanks for your help.

 Jim Milks

 Graduate Student
 Environmental Sciences Ph.D. Program
 136 Biological Sciences
 Wright State University
 3640 Colonel Glenn Hwy
 Dayton, OH 45435



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Linear models over large datasets

2007-08-16 Thread Greg Snow
Here are a couple of options that you could look at:

The biglm package also has the bigglm function which you only call once
(no update), you just need to give it a function that reads the data in
chunks for you.  Using bigglm with a gaussian family is equivalent to
lm.

You could also write your own function that calls biglm and the
necessary updates on it, then just call your function.

The SQLiteDF package has an sdflm function that uses the same internal
code as biglm, but based on having the data stored in an sqlite
database.  You don't need to call update with this function.

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111
 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Alp ATICI
 Sent: Thursday, August 16, 2007 2:24 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Linear models over large datasets
 
 I'd like to fit linear models on very large datasets. My data 
 frames are about 200 rows x 200 columns of doubles and I 
 am using an 64 bit build of R. I've googled about this 
 extensively and went over the R Data Import/Export guide. 
 My primary issue is although my data represented in ascii 
 form is 4Gb in size (therefore much smaller considered in 
 binary), R consumes about 12Gb of virtual memory.
 
 What exactly are my options to improve this? I looked into 
 the biglm package but the problem with it is it uses update() 
 function and is therefore not transparent (I am using a 
 sophisticated script which is hard to modify). I really liked 
 the concept behind the  LM package
 here: http://www.econ.uiuc.edu/~roger/research/rq/RMySQL.html
 But it is no longer available. How could one fit linear 
 models to very large datasets without loading the entire set 
 into memory but from a file/database (possibly through a 
 connection) using a relatively simple modification of 
 standard lm()? Alternatively how could one improve the memory 
 usage of R given a large dataset (by changing some default 
 parameters of R or even using on-the-fly compression)? I 
 don't mind much higher levels of CPU time required.
 
 Thank you in advance for your help.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Linear models over large datasets

2007-08-16 Thread Daniel Lakeland
On Thu, Aug 16, 2007 at 03:24:08PM -0500, Alp ATICI wrote:
 I'd like to fit linear models on very large datasets. My data frames
 are about 200 rows x 200 columns of doubles and I am using an 64
 bit build of R. I've googled about this extensively and went over the
 R Data Import/Export guide. My primary issue is although my data
 represented in ascii form is 4Gb in size (therefore much smaller
 considered in binary), R consumes about 12Gb of virtual memory.

One option is to simply buy more memory, which might work for you in
this case, but in larger cases, is not scalable.

I'm not sure how to make R happier with handling large datasets, but
you may be able to use the power of random sampling to help you?

Read the data from mysql, selecting a random 10% subset. This should
use 1.2 Gb or so. You then fit the model to this subset. Repeat the
procedure 100 times using independent samples. Now you have
bootstrapped the coefficients of your model. Use the average value and
standard deviation of the coefficients as your coefficient estimates
and standard errors??

Since swapping is typically 1000 times slower or more than disk
access, this process might take 1/10 of the time or less compared to
letting the R process thrash its disk.

It's a thought, not sure how well it works.

-- 
Daniel Lakeland
[EMAIL PROTECTED]
http://www.street-artists.org/~dlakelan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] combining P values using Fisher's method

2007-08-16 Thread Jonathan Baron
On 08/16/07 12:50, Jiong Zhang, PhD wrote:
 Hi All,
 
 Can somebody tell me how to use R to combine p values using Fisher's method? 
 thanks.

This might get you started.  It is for the sole purpose of combining
two two-tailed p-values for effects in the same direction.  Thus, it
is not very general.  I think it gives a one-tailed result.  Better
check it.

combine - function(x,y) return(pchisq(-2*log(x/2)-2*log(y/2),4,low=F))

-- 
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page: http://www.sas.upenn.edu/~baron

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ADF test

2007-08-16 Thread John C Frain
The option alternative in adf.test() takes the value 'stationary' or
'explosive'.  The value 'explosive' is used to test if the series is
stationary about a linear time trend.  This means that a constant and
trend are to be included in the DF or ADF test regression.  In the
case here the series is trended.  The question is if the trend is
stochastic or deterministic.  Have a look at the following analysis

t=1:length(x)
plot(t,x)
trend = lm(x~t)
abline(lm(x~t))
summary(trend)
library(urca)
x = ts(x, start=1, end = length(x), frequency=1)
x.ct = ur.df(x,lags=0,type='trend')
plot(x.ct)
library(tseries)
adf.test(x,alternative = explosive , k=0)
summary(ur.df(x,lags=0,type='trend')

which analyses your data using two different libraries. (x is your
data and both procs. produce the same DF test).  I should remark that
your data are rounded and this possibly acts against a full analysis.
Some knowledge of the data generating process might suggest a more
appropriate way of testing for stationarity.

On 16/08/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

 Hi Megh

 i hope you have confused with 'what is my NULL hypothesis' ?
 i suggest you to take any ideal dataset about which you know that whether
 it is stationary or not ? apply the test to know what is the NULL
 hypothesis
 used in any software :)
 usually in many softwares the NULL hypothesis is in negative sense. Please
 everybody comment on this :)

 hoping that you series is return series and not price series :). Thus
 applying adf test for your series :)
 my test show that your series is not stationary at all as my correlalogram
 comes as follows.
 1
 0.998283718
 0.997582959
 0.99703921
 0.99665648
 0.996548006
 0.99647617
 0.995925698
 0.995317271
 0.994746317
 0.994727781
 0.99508777
 0.99501576
 0.99437404
 0.993338292
 0.992684933
 0.992310313

 @@@ m
 Although if i assume that your series is a price series and defining
 return = 100*ln(pt/pt-1). Returns become as follows
 0
 -0.201816416
 0.201816416
 0
 0.201409937
 0
 0
 0
 0
 0.201005093
 0
 0
 0
 0
 0.200601873
 0
 0.200200267
 0.199800266
 0
 -0.199800266
 0.199800266
 0
 0
 0
 0.199401861
 -0.199401861
 0.199401861
 0
 0.199005041
 0
 0
 0
 0
 0.198609797
 0
 0
 0
 0
 0
 0.19821612
 0
 0.197824001
 0
 0
 0.19743343
 0
 0
 0
 0.197044399
 -0.197044399
 0.197044399
 0
 0.196656897
 0.196270917
 0
 -0.196270917
 0.196270917
 0
 0
 0
 0
 0
 0
 0
 0.195886449
 0
 0
 0
 0
 0
 0
 0.195503484
 0
 0
 0
 0.195122013
 0.194742028
 0
 0
 0
 0
 0.194363521
 0
 0
 0
 -0.194363521
 0.194363521
 0
 0
 0.193986482
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0.193610903
 0
 0
 0
 0
 0.193236775
 0
 0
 0
 0
 0.192864091
 0
 0.192492841
 0
 0
 0
 0
 0.192123018
 0
 0
 0
 0
 0
 0.191754613
 0
 0.191387618
 0
 0
 0
 0.191022026
 0
 0
 0.190657827
 -0.190657827
 0
 0.190657827
 0.190295015
 0
 0
 0
 0
 0
 0
 0.18993358
 0
 -0.18993358
 0.18993358
 0
 0.189573516
 0.189214815
 0
 0
 0
 0.188857469
 0
 0
 0.18850147
 0
 0
 0.18814681
 0
 0.187793482
 0
 then the value of autocorrelations i.e. correlalogram comes as approx
 1
 0.089252308
 0.058227292
 0.017934984
 0.025264591
 -0.014925678
 -0.004668544
 0.014890995
 0.001625333
 0.010669589
 -0.010587179
 -0.03000206
 -0.011863654
 0.00772247
 0.024272208
 -0.019521244
 -0.035998575
 -0.061608877
 -0.048401231
 -0.008594859

 which show that the values are quite likely to make series stationary :)

  data[1:10,]
 V1 V2
 1  4.96  0.000
 2  4.95 -0.2018164
 3  4.96  0.2018164
 4  4.96  0.000
 5  4.97  0.2014099
 6  4.97  0.000
 7  4.97  0.000
 8  4.97  0.000
 9  4.97  0.000
 10 4.98  0.2010051
  adf.test(data[,1])

Augmented Dickey-Fuller Test

 data:  data[, 1]
 Dickey-Fuller = -1.1052, Lag order = 5, p-value = 0.9188
 alternative hypothesis: stationary

  adf.test(data[,2])

Augmented Dickey-Fuller Test

 data:  data[, 2]
 Dickey-Fuller = -6.2265, Lag order = 5, p-value = 0.01
 alternative hypothesis: stationary

 Warning message:
 p-value smaller than printed p-value in: adf.test(data[, 2])
 

 this explains everything clearly :)
 your NULL hypothesis is Series is not stationary - hence hypothesis in
 negative sense

 prooved by taking ideal data

  data1-rnorm(1) #normal data
  adf.test(data1)

Augmented Dickey-Fuller Test

 data:  data1
 Dickey-Fuller = -21.2118, Lag order = 21, p-value = 0.01
 alternative hypothesis: stationary

 Warning message:
 p-value smaller than printed p-value in: adf.test(data1)
 

 HTH




 Megh Dal [EMAIL PROTECTED]
 Sent by: [EMAIL PROTECTED]
 08/16/2007 04:27 PM

 To
 r-help@stat.math.ethz.ch
 cc

 Subject
 [R] ADF test






 Hi all,

  Hope you people do not feel irritated for repeatedly sending mail on
 Time series.

  Here I got another problem on the same, and hope I would get some answer
 from you.

  I have following dataset:

  data[,1]
  [1] 4.96 4.95 4.96 4.96 4.97 4.97 4.97 4.97 4.97 4.98 4.98 4.98 4.98
 4.98 4.99 4.99 5.00 5.01
  [19] 5.01 5.00 5.01 5.01 5.01 

Re: [R] Linear models over large datasets

2007-08-16 Thread Gabor Grothendieck
Its actually only a few lines of code to do this from first principles.
The coefficients depend only on the cross products X'X and X'y and you
can build them up easily by extending this example to read files or
a database holding x and y instead of getting them from the args.
Here we process incr rows of builtin matrix state.x77 at a time
building up the two cross productxts, xtx and xty, regressing
Income (variable 2) on the other variables:

mylm - function(x, y, incr = 25) {
start - xtx - xty - 0
while(start  nrow(x)) {
idx - seq(start + 1, min(start + incr, nrow(x)))
x1 - cbind(1, x[idx,])
xtx - xtx + crossprod(x1)
xty - xty + crossprod(x1, y[idx])
start - start + incr
}
solve(xtx, xty)
}

mylm(state.x77[,-2], state.x77[,2])


On 8/16/07, Alp ATICI [EMAIL PROTECTED] wrote:
 I'd like to fit linear models on very large datasets. My data frames
 are about 200 rows x 200 columns of doubles and I am using an 64
 bit build of R. I've googled about this extensively and went over the
 R Data Import/Export guide. My primary issue is although my data
 represented in ascii form is 4Gb in size (therefore much smaller
 considered in binary), R consumes about 12Gb of virtual memory.

 What exactly are my options to improve this? I looked into the biglm
 package but the problem with it is it uses update() function and is
 therefore not transparent (I am using a sophisticated script which is
 hard to modify). I really liked the concept behind the  LM package
 here: http://www.econ.uiuc.edu/~roger/research/rq/RMySQL.html
 But it is no longer available. How could one fit linear models to very
 large datasets without loading the entire set into memory but from a
 file/database (possibly through a connection) using a relatively
 simple modification of standard lm()? Alternatively how could one
 improve the memory usage of R given a large dataset (by changing some
 default parameters of R or even using on-the-fly compression)? I don't
 mind much higher levels of CPU time required.

 Thank you in advance for your help.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Question on output of vectors from a for loop into a matrix

2007-08-16 Thread Ryan Briscoe Runquist

Hello R-help,

I am a recent convert to R from SAS and I am having some difficulty with
output of a for loop into a matrix.  I have searched the help manuals and
the archives, but I can't get my code to work.  It is probably a syntax
error that I am not spotting.  I am trying to make a distance matrix by
extracting the distances from each point to all others in a vector (the for
loop).  I can get them all to print and can save each individually to a
file, but I cannot get them to be bound into a matrix.

Here is the code that I have tried:

m-length(Day.1.flower.coords$x) #31 grid points

output.matix-matrix(0.0,m,m)
for(i in 1:m){
dist-spDistsN1(Day.1.coords.matrix, Day.1.coords.matrix[i,])
dist-as.vector(dist)
output.matrix[i,]-dist
print(dist)} 

The error message that I get is:

Error in output.matrix[i,] - dist : incorrect number of subscripts on matrix

Thanks for your help.

Ryan

~~
Ryan D. Briscoe Runquist
Population Biology Graduate Group
University of California, Davis
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sample size computation for survival data

2007-08-16 Thread Stacey Burrows
Does a package exist to help compute sample size for censored data? I am 
planning a study involving interval censored data.

I know that there exists functions in stata for this, so I was wondering if R 
has similar facilities. So far I have not been able to find anything for R.

Thanks for any help.

Stacey

   
-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question on output of vectors from a for loop into a matrix

2007-08-16 Thread Dylan Beaudette
On Thursday 16 August 2007 15:35, Ryan Briscoe Runquist wrote:
 Hello R-help,

 I am a recent convert to R from SAS and I am having some difficulty with
 output of a for loop into a matrix.  I have searched the help manuals and
 the archives, but I can't get my code to work.  It is probably a syntax
 error that I am not spotting.  I am trying to make a distance matrix by
 extracting the distances from each point to all others in a vector (the for
 loop).  I can get them all to print and can save each individually to a
 file, but I cannot get them to be bound into a matrix.

 Here is the code that I have tried:

 m-length(Day.1.flower.coords$x) #31 grid points

 output.matix-matrix(0.0,m,m)
 for(i in 1:m){
   dist-spDistsN1(Day.1.coords.matrix, Day.1.coords.matrix[i,])
   dist-as.vector(dist)
   output.matrix[i,]-dist
   print(dist)}

 The error message that I get is:

 Error in output.matrix[i,] - dist : incorrect number of subscripts on
 matrix

 Thanks for your help.

 Ryan

 ~~
 Ryan D. Briscoe Runquist
 Population Biology Graduate Group
 University of California, Davis
 [EMAIL PROTECTED]



Hi Bryan, for all UCD students you have the luxury of not using a loop! :) 

Would something like this work - for the creation of a 'distance matrix' :

# example dataset: 2 x 2 grid
d - expand.grid(x=1:2, y=1:2)

# an instructive plot
plot(d, type='n')
text(d, rownames(d))

# create distance object and convert to matrix:
m - as.matrix(dist(d))
m

 1234
1 0.00 1.00 1.00 1.414214
2 1.00 0.00 1.414214 1.00
3 1.00 1.414214 0.00 1.00
4 1.414214 1.00 1.00 0.00

# is that the kind of distance matrix you are looking for : 

?dist

Distance Matrix Computation

Description:

 This function computes and returns the distance matrix computed by
 using the specified distance measure to compute the distances
 between the rows of a data matrix.
[...]

cheers,

Dylan

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented, minimal,
 self-contained, reproducible code.

-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Combine matrix

2007-08-16 Thread Moshe Olshansky
This does not work in the general case. To see this,
try:
rownames(a}[5] - a
and see what happens then.

--- François Pinard [EMAIL PROTECTED] wrote:

 [Gianni Burgin]
 let say something like this
 
 a=matrix(1:25, nrow=5)
 
 rownames(a)=letters[1:5]
  colnames(a)=rep(A, 5)
 
  a
   A  A  A  A  A
 a 1  6 11 16 21
 b 2  7 12 17 22
 c 3  8 13 18 23
 d 4  9 14 19 24
 e 5 10 15 20 25
 
  b=matrix(1:40, nrow=8)
  rownames(b)=c(rep(a,4),rep(b,4))
  colnames(b)=rep(B, 5)
 
  b
   B  B  B  B  B
 a 1  9 17 25 33
 a 2 10 18 26 34
 a 3 11 19 27 35
 a 4 12 20 28 36
 b 5 13 21 29 37
 b 6 14 22 30 38
 b 7 15 23 31 39
 b 8 16 24 32 40
 
 as a results I wold like something like
 
   A  A  A  A  A  B  B  B  B  B
 a 1  6 11 16 21  1  9 17 25 33
 a 1  6 11 16 21  2 10 18 26 34
 a 1  6 11 16 21  3 11 19 27 35
 a 1  6 11 16 21  4 12 20 28 36
 b 2  7 12 17 22  5 13 21 29 37
 b 2  7 12 17 22  6 14 22 30 38
 b 2  7 12 17 22  7 15 23 31 39
 b 2  7 12 17 22  8 16 24 32 40
 
 does it is clear? is there a function that automate
 this operation?
 
 Like, maybe:
 
cbind(a[rownames(b),], b)
 
 
 
 -- 
 François Pinard   http://pinard.progiciels-bpi.ca
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function to find coodinates in an array

2007-08-16 Thread Moshe Olshansky
A not very good solution is as below:

If your array's dimensions were KxMxN and the linear
index is i then
n - ceiling(i/(K*M))
i1 - i - (n-1)*(K*M)
m - ceiling(i1/K)
k - i1 - (m-1)*K

and your index is (k,m,n)

I am almost sure that there is a function in R which
does this (it exists in Matlab).

Regards,

Moshe.

--- Ana Conesa [EMAIL PROTECTED] wrote:

 Dear list,
 
 I am looking for a function/way to get the array
 coordinates of given
 elements in an array. What I mean is the following:
 - Let X be a 3D array
 - I find the ordering of the elements of X by ord -
 order(X) (this
 returns me a vector)
 - I now want to find the x,y,z coordinates of each
 element of ord
 
 Can anyone help me?
 
 Thanks!
 
 Ana
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function to find coodinates in an array

2007-08-16 Thread Henrik Bengtsson
See arrayIndex() in the R.utils package, e.g.

X - array((2*3*4):1, dim=c(2,3,4))
idx - 1:length(X)
ijk - arrayIndex(idx, dim=dim(X))
print(ijk)

 [,1] [,2] [,3]
[1,]111
[2,]211
[3,]121
[4,]221
[5,]131
[6,]231
[7,]112
[8,]212
[9,]122
10,]222
11,]132
12,]232
13,]113
14,]213
15,]123
16,]223
17,]133
18,]233
19,]114
20,]214
21,]124
22,]224
23,]134
24,]234

/Henrik

On 8/16/07, Moshe Olshansky [EMAIL PROTECTED] wrote:
 A not very good solution is as below:

 If your array's dimensions were KxMxN and the linear
 index is i then
 n - ceiling(i/(K*M))
 i1 - i - (n-1)*(K*M)
 m - ceiling(i1/K)
 k - i1 - (m-1)*K

 and your index is (k,m,n)

 I am almost sure that there is a function in R which
 does this (it exists in Matlab).

 Regards,

 Moshe.

 --- Ana Conesa [EMAIL PROTECTED] wrote:

  Dear list,
 
  I am looking for a function/way to get the array
  coordinates of given
  elements in an array. What I mean is the following:
  - Let X be a 3D array
  - I find the ordering of the elements of X by ord -
  order(X) (this
  returns me a vector)
  - I now want to find the x,y,z coordinates of each
  element of ord
 
  Can anyone help me?
 
  Thanks!
 
  Ana
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
  reproducible code.
 

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function to find coodinates in an array

2007-08-16 Thread Marc Schwartz
If I am correctly understanding the problem, I think that this is what
you want:

set.seed(1)

# Create a 3x3x3 array
ARR - array(sample(100, 27), c(3, 3, 3))

 ARR
, , 1

 [,1] [,2] [,3]
[1,]   27   89   97
[2,]   37   20   62
[3,]   57   86   58

, , 2

 [,1] [,2] [,3]
[1,]6   61   43
[2,]   19   34   88
[3,]   16   67   83

, , 3

 [,1] [,2] [,3]
[1,]   32   17   21
[2,]   63   51   29
[3,]   75   101


# Get the ordered indices of the elements in the array
 order(ARR)
 [1] 27 10 24 12 22 11  5 25  1 26 19 14  2 16 23  3  9 13  8 20 15 21
[23] 18  6 17  4  7


# Get the actual array elements in order
 ARR[order(ARR)]
 [1]  1  6 10 16 17 19 20 21 27 29 32 34 37 43 51 57 58 61 62 63 67 75
[23] 83 86 88 89 97


# Now loop over the above and using which(), get the 3D indices
 t(sapply(ARR[order(ARR)], function(x) which(ARR == x, arr.ind = TRUE)))
  [,1] [,2] [,3]
 [1,]333
 [2,]112
 [3,]323
 [4,]312
 [5,]123
 [6,]212
 [7,]221
 [8,]133
 [9,]111
[10,]233
[11,]113
[12,]222
[13,]211
[14,]132
[15,]223
[16,]311
[17,]331
[18,]122
[19,]231
[20,]213
[21,]322
[22,]313
[23,]332
[24,]321
[25,]232
[26,]121
[27,]131


See ?which and take note of the arr.ind argument.

HTH,

Marc Schwartz


On Thu, 2007-08-16 at 19:21 -0700, Moshe Olshansky wrote:
 A not very good solution is as below:
 
 If your array's dimensions were KxMxN and the linear
 index is i then
 n - ceiling(i/(K*M))
 i1 - i - (n-1)*(K*M)
 m - ceiling(i1/K)
 k - i1 - (m-1)*K
 
 and your index is (k,m,n)
 
 I am almost sure that there is a function in R which
 does this (it exists in Matlab).
 
 Regards,
 
 Moshe.
 
 --- Ana Conesa [EMAIL PROTECTED] wrote:
 
  Dear list,
  
  I am looking for a function/way to get the array
  coordinates of given
  elements in an array. What I mean is the following:
  - Let X be a 3D array
  - I find the ordering of the elements of X by ord -
  order(X) (this
  returns me a vector)
  - I now want to find the x,y,z coordinates of each
  element of ord
  
  Can anyone help me?
  
  Thanks!
  
  Ana
 

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] N.cohen.kappa function for polytomous outcomes

2007-08-16 Thread Stacey Burrows
Is there a function analogous to N.cohen.kappa (concord package) which computes 
sample size needed for a Kappa statistic with polytomous outcomes?

I have just started using R and so far I am finding it is missing alot of 
things that Stata has.



   
-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Combine matrix

2007-08-16 Thread Gabor Grothendieck
Try this.  We convert to data frame placing the row names in column 1, do
the merge, remove column 1 and convert back to matrix:

# test input
a - matrix(1:25, nrow = 5,
  dimnames = list(letters[1:5], rep(A, 5)))
b - matrix(1:40, nrow = 8,
  dimnames = list(rep(letters[1:2], each = 4), rep(B, 5)))

# 1. process
to.DF - function(x) data.frame(rn = row.names(x), x, row.names = 1:nrow(x))
out - as.matrix(merge(to.DF(a), to.DF(b), by = 1)[,-1])
colnames(out) - c(colnames(a), colnames(b))
out

# 2. same but merge is done using sqldf
# assume same a, b and to.DF as before

library(sqldf)
DFa - to.DF(a)
DFb - to.DF(b)
out - as.matrix(sqldf(select * from DFa join DFb using(rn))[-1])
colnames(out) - c(colnames(a), colnames(b))
out


# 3. same but uses sqldf and proto (which sqldf automatically loads)
# assume same a, b and to.DF as before

library(sqldf)
out - as.matrix(sqldf(select * from a join b using(rn),
  envir = proto(a = to.DF(a), b = to.DF(b)))[-1])
colnames(out) - c(colnames(a), colnames(b))
out




On 8/16/07, Gianni Burgin [EMAIL PROTECTED] wrote:
 let say something like this


 a=matrix(1:25, nrow=5)

 rownames(a)=letters[1:5]
  colnames(a)=rep(A, 5)

  a
  A  A  A  A  A
 a 1  6 11 16 21
 b 2  7 12 17 22
 c 3  8 13 18 23
 d 4  9 14 19 24
 e 5 10 15 20 25

  b=matrix(1:40, nrow=8)
  rownames(b)=c(rep(a,4),rep(b,4))
  colnames(b)=rep(B, 5)

  b
  B  B  B  B  B
 a 1  9 17 25 33
 a 2 10 18 26 34
 a 3 11 19 27 35
 a 4 12 20 28 36
 b 5 13 21 29 37
 b 6 14 22 30 38
 b 7 15 23 31 39
 b 8 16 24 32 40

 as a results I wold like something like

  A  A  A  A  A  B  B  B  B  B
 a 1  6 11 16 21  1  9 17 25 33
 a 1  6 11 16 21  2 10 18 26 34
 a 1  6 11 16 21  3 11 19 27 35
 a 1  6 11 16 21  4 12 20 28 36
 b 2  7 12 17 22  5 13 21 29 37
 b 2  7 12 17 22  6 14 22 30 38
 b 2  7 12 17 22  7 15 23 31 39
 b 2  7 12 17 22  8 16 24 32 40


 does it is clear? is there a function that automate this operation?


 thank you very much!




 On 8/16/07, jim holtman [EMAIL PROTECTED] wrote:
 
  Can you provide an example of what you mean; e.g., the two input
  matrices and the desired output.
 
  On 8/16/07, Gianni Burgin [EMAIL PROTECTED] wrote:
   Hi R user,
  
   I am new to R, and I have a very simple question for you. I have two
  matrix
   A and B, with internally redundant rownames (but variables are
  different).
   Some, but not all the rownames are shared among the two matrix. I want
  to
   create a greater matrix that combines the previuos two, and has all the
   possible combinations of matching rownames lines among matrix A and B.
  
   looking for the solution I bumped in merge but actually works on
  data.frame,
   and in dataframe there could be no redundancy in names.
  
  
   can you help me??
  
  [[alternative HTML version deleted]]
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
 
 
  --
  Jim Holtman
  Cincinnati, OH
  +1 513 646 9390
 
  What is the problem you are trying to solve?
 

[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function to find coodinates in an array

2007-08-16 Thread François Pinard
[Ana Conesa]

 I am looking for a function/way to get the array coordinates of given
 elements in an array. What I mean is the following:
 - Let X be a 3D array
 - I find the ordering of the elements of X by ord - order(X)
   (this returns me a vector)
 - I now want to find the x,y,z coordinates of each element of ord

[Moshe Olshansky]

If your array's dimensions were KxMxN and the linear
index is i then
n - ceiling(i/(K*M))
i1 - i - (n-1)*(K*M)
m - ceiling(i1/K)
k - i1 - (m-1)*K

and your index is (k,m,n)

The reshape package might be helpful, here.  If I understand the problem 
correctly, given this artificial example:

   X - sample(1:24)
   dim(X) - c(2, 3, 4)

you would want:

   library(reshape)
   melt(X)[order(X), -4]

so getting the indices in a three columns data frame.

-- 
François Pinard   http://pinard.progiciels-bpi.ca

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function to find coodinates in an array

2007-08-16 Thread Gabor Grothendieck
Get the indices using expand.grid and then reorder them:

set.seed(1); X - array(rnorm(24), 2:4) # input
X # look at X

do.call(expand.grid, sapply(dim(X), seq))[order(X),]


On 8/16/07, Ana Conesa [EMAIL PROTECTED] wrote:
 Dear list,

 I am looking for a function/way to get the array coordinates of given
 elements in an array. What I mean is the following:
 - Let X be a 3D array
 - I find the ordering of the elements of X by ord - order(X) (this
 returns me a vector)
 - I now want to find the x,y,z coordinates of each element of ord

 Can anyone help me?

 Thanks!

 Ana

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] comparison of arrays of strings

2007-08-16 Thread ramakanth reddy
Hi

i have two arrays of genes names,one with18 gene names and the other with 
24000 gene names,I have to compare both of them for finding common names.

I have both the arrays in .csv format.i loaded the files and tried to compare 
them using for and if loops

but I got the error Error in Ops.factor(cgh[i, 1], cgh[j, 2]) : 
level sets of factors are different

Please suggest me how to solve this problem or any other alternative procedure

Thanks
ramakanth




  Get the freedom to save as many mails as you wish. To know how, go to 
http://help.yahoo.com/l/in/yahoo/mail/yahoomail/tools/tools-08.html
[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] for plots

2007-08-16 Thread Brad Zhang
Hi, All,

I am a beginner for R. Now I have installed R 2.5.1 in Window
environment. After I run a program such as gam I would like to display
a plot for the object. The following is an example. When I did this,
only the last plot was presented on my screen. How can I get a plot
before the last plot? I mean if the object has several plots how can I
get those?



gam.object - gam(y ~ s(x,6) + z,data=gam.data)

plot(gam.object,se=TRUE)





Thank you.



Brad.


Dr. Guicheng (Brad) Zhang
Senior Research Officer

School of Paediatrics and Child Health
Telethon Institute for Child Health Research
100 Roberts Road, Subiaco
Western Australia, 6008 AUSTRALIA

Email: [EMAIL PROTECTED]
Phone: 93407896
Fax: 93882097



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] comparison of arrays of strings

2007-08-16 Thread jim holtman
Read them into 2 different vectors and then use 'intersect'.

On 8/17/07, ramakanth reddy [EMAIL PROTECTED] wrote:
 Hi

 i have two arrays of genes names,one with18 gene names and the other with 
 24000 gene names,I have to compare both of them for finding common names.

 I have both the arrays in .csv format.i loaded the files and tried to compare 
 them using for and if loops

 but I got the error Error in Ops.factor(cgh[i, 1], cgh[j, 2]) :
level sets of factors are different

 Please suggest me how to solve this problem or any other alternative procedure

 Thanks
 ramakanth




  Get the freedom to save as many mails as you wish. To know how, go to 
 http://help.yahoo.com/l/in/yahoo/mail/yahoomail/tools/tools-08.html
[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] for plots

2007-08-16 Thread jim holtman
Turn 'Recording on for the plots.

windows(record=TRUE)

or select from the GUI.

On 8/17/07, Brad Zhang [EMAIL PROTECTED] wrote:
 Hi, All,

 I am a beginner for R. Now I have installed R 2.5.1 in Window
 environment. After I run a program such as gam I would like to display
 a plot for the object. The following is an example. When I did this,
 only the last plot was presented on my screen. How can I get a plot
 before the last plot? I mean if the object has several plots how can I
 get those?



 gam.object - gam(y ~ s(x,6) + z,data=gam.data)

 plot(gam.object,se=TRUE)





 Thank you.



 Brad.


 Dr. Guicheng (Brad) Zhang
 Senior Research Officer

 School of Paediatrics and Child Health
 Telethon Institute for Child Health Research
 100 Roberts Road, Subiaco
 Western Australia, 6008 AUSTRALIA

 Email: [EMAIL PROTECTED]
 Phone: 93407896
 Fax: 93882097



[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.