Re: [R] plot in package psych with function error.bars.by

2014-06-16 Thread Tham Tran
Hi William,

I've just updated your latest package psych_1.4.6.11.zip from server
personality-project/r/src/contrib/. One time the updating process was
finished, i tried to run based samples code:

require(psych)
keys.list=list(Agree=c(-1,2:5),Conscientious=c(6:8,-9,-10),Extraversion=c(-11,-12,13:15),Neuroticism=c(16:20),Openness
= c(21,-22,23,24,-25))
keys = make.keys(28,keys.list,item.labels=colnames(bfi))
scores = scoreItems(keys,bfi,min=1,max=6) 
error.bars.by(scores$scores,round(bfi$age/10)*10,by.var=TRUE,main=BFI age
trends,legend=3,labels=colnames(scores$scores),xlab=Age,ylab=Mean item
score)

then i had an error following: 

Erreur dans if (del == 0  to == 0) return(to) : 
  valeur manquante là où TRUE / FALSE est requis
De plus : Messages d'avis :
1: In qt(1 - alpha/2, group.stats[[g]]$n - 1) : production de NaN
2: In dt(ln, n - 1) : production de NaN
3: In qt(alpha/2, n - 1) : production de NaN

Could you tell me how to fix these issues? May i had a mistake of updating
your lasted package?

Sincerly
Tham






--
View this message in context: 
http://r.789695.n4.nabble.com/plot-in-package-psych-with-function-error-bars-by-tp4691632p4692177.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Aggregating 15 minute xts sequence to hourly

2014-06-16 Thread Costas Vorlow
Dear all,

Why aggregation of 15 minute xts data happens on the 45th (3rd quarter) and
not the exact hour close (i.e., 00) time?

For example, temp below is an xts sequence with 15-minute frequency:

 quarters - ISOdatetime(2012,05,02,9,0,0) + seq(0:39)*15*60;
 set.seed(42);
 observation - xts(1:40, order.by=as.POSIXct(quarters));
 head(observation);
[,1]
2012-05-02 09:15:001
2012-05-02 09:30:002
2012-05-02 09:45:003
2012-05-02 10:00:004
2012-05-02 10:15:005
2012-05-02 10:30:006

 ends-endpoints(observation,'hours');
 temp-period.apply(observation, ends,sum);
 temp
[,1]
2012-05-02 09:45:006
2012-05-02 10:45:00   22
2012-05-02 11:45:00   38
2012-05-02 12:45:00   54
2012-05-02 13:45:00   70
2012-05-02 14:45:00   86
2012-05-02 15:45:00  102
2012-05-02 16:45:00  118
2012-05-02 17:45:00  134
2012-05-02 18:45:00  150
2012-05-02 19:00:00   40


I get the sum of every quarter within the hour on the third quarter. How
can I implicitly calculate the sum of the quarterly data on the hour's
close (10:00, 11:00, 12:00 and so on) ?

Many thanks in advance,
Costas

__

*Costas Vorlow
http://www.gravatar.com/avatar/49a9dee59073b1ed4a36440a06aeb81b *
*http://www.linkedin.com/in/costasvorlow
http://www.linkedin.com/in/costasvorlow*
*http://www.vorlow.com* http://www.vorlow.com

▇ ▅ █ ▅ ▇ ▂ ▃ ▁ ▁ ▅ ▃ ▅ ▅ ▄ ▅ ▇ ▅ █ ▅ ▇ 
▂ ▃ ▁ ▁ ▅ ▃ ▅ ▅ ▄ ▅ ▇ ▅ █ ▅ ▇

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] prediction based on conditional logistic regression clogit

2014-06-16 Thread peter dalgaard

On 16 Jun 2014, at 05:22 , array chip arrayprof...@yahoo.com wrote:

 Hi, I am using clogit() from survival package to do conditional logistic 
 regression. I also need to make prediction on an independent dataset to 
 calculate predicted probability. Here is an example:
 
 
 dat - data.frame(set=rep(1:50,each=3), status=rep(c(1,0,0),50), 
 x1=rnorm(150,5,1), x2=rnorm(150,7,1.5))
 dat.test - data.frame(set=rep(1:30,each=3), status=rep(c(1,0,0),30), 
 x1=rnorm(90,5,1), x2=rnorm(90,7,1.5))
 fit-clogit(status~x1+x2+strata(set),dat)
 predict(fit,newdata=dat.test,type='expected')
 Error in Surv(rep(1, 150L), status) : 
   Time and status are different lengths
 
 Can anyone suggest what's wrong here?
 


The direct cause is that clogit() works by using the fact that the likelihood 
is equivalent to a coxph() likelihood with stratification and all observation 
lengths set to 1. Therefore the analysis is formally on Surv(rep(1, 150L), 
status) and that goes belly-up if you apply the same formula to a data set of 
different length. 

However, notice that there is no such thing as predict.clogit(), so you are 
attempting predict.coxph() on a purely formal Cox model. It is unclear to what 
extent predicted values, in the sense of coxph() are compatible with 
predictions in conditional logit models.

I'm rusty on this, but I think what you want is something like

m - model.matrix(~ x1 + x2 - 1, data=dat.test)
pp - exp(m %*% coef(fit))
pps - ave(pp, dat.test$set, FUN=sum)
pp/pps

i.e. the conditional probability that an observation is a case given covariates 
and that there is on case in each set (in the data given, you have sets of 
three with one being a case, so all predicted probabilities are close to 0.33). 
For more general matched sets, I'm not really sure what one wants. Real experts 
are welcome to chime in.

-pd

 Thanks!
 
 John
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] correlation given p value and sample size

2014-06-16 Thread Witold E Wolski
Hi,


Looking for and function which produces the minimum r (pearson
correlation) so that H0 (r=0) can be rejected, given sample size and
p-value?


Witold

-- 
Witold Eryk Wolski

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Defining default method for S3, S4 and R5 classes

2014-06-16 Thread Luca Cerone
thanks  Suzen

2014-06-15 2:34 GMT+02:00 Suzen, Mehmet msu...@gmail.com:
 There is a nice tutorial on this:
 http://adv-r.had.co.nz/OO-essentials.html

 For an in depth guide, have a look at the book from John Chambers,
 Software for data analysis programming with R.

 On 13 June 2014 12:20, Luca Cerone luca.cer...@gmail.com wrote:
 Dear all,

 I am writing a script implementing a pipeline to analyze some of the
 data we receive.

 One of the steps in this pipeline involves clustering the data, and I
 am interested
 in studying the effects of different clustering algorithms on the final 
 results.

 I am having issues making my code general enough because the
 clustering algorithms we are interested all return different types of
 objects (S3, S4 and R5 classes, as well as simple named lists).

 From the output of these algorithms I need to extract a list with as many
 elements as the number of clusters and such that each element contains the 
 ids
 of the elements in each cluster.

 I have easily done this for each of the cluster algorithms,
 the problem is: how can I make so that rather than having to check for
 classes and
 types this is done automatically?

 For example, for the algorithms that return S3 classes I have defined
 a method get_cluster_list.default and then created the methods for
 the individual classes, which is used in the main body of the
 pipeline.

 I have no idea how I can do this for S4 and R5 classes and,  more
 importantly, I would
 like an approach that works when using all S3, S4 and R5 classes.

 Do you know how I could do this?

 Thanks for the help,
 Luca

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Luca Cerone

Tel: +34 692 06 71 28
Skype: luca.cerone

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Aggregating 15 minute xts sequence to hourly

2014-06-16 Thread Joshua Ulrich
On Mon, Jun 16, 2014 at 3:41 AM, Costas Vorlow costas.vor...@gmail.com wrote:
 Dear all,

 Why aggregation of 15 minute xts data happens on the 45th (3rd quarter) and
 not the exact hour close (i.e., 00) time?

The 00 time is the beginning of the hour, not the end.  E.g.,
10:00:00 is the beginning of the 10-o'clock hour.

 For example, temp below is an xts sequence with 15-minute frequency:

 quarters - ISOdatetime(2012,05,02,9,0,0) + seq(0:39)*15*60;
 set.seed(42);
 observation - xts(1:40, order.by=as.POSIXct(quarters));
 head(observation);
 [,1]
 2012-05-02 09:15:001
 2012-05-02 09:30:002
 2012-05-02 09:45:003
 2012-05-02 10:00:004
 2012-05-02 10:15:005
 2012-05-02 10:30:006

 ends-endpoints(observation,'hours');
 temp-period.apply(observation, ends,sum);
 temp
 [,1]
 2012-05-02 09:45:006
 2012-05-02 10:45:00   22
 2012-05-02 11:45:00   38
 2012-05-02 12:45:00   54
 2012-05-02 13:45:00   70
 2012-05-02 14:45:00   86
 2012-05-02 15:45:00  102
 2012-05-02 16:45:00  118
 2012-05-02 17:45:00  134
 2012-05-02 18:45:00  150
 2012-05-02 19:00:00   40


 I get the sum of every quarter within the hour on the third quarter. How
 can I implicitly calculate the sum of the quarterly data on the hour's
 close (10:00, 11:00, 12:00 and so on) ?

Again, those are the beginnings of the hours.  endpoints() and
period.apply() only use the timestamps in your data.  If you want to
round up to the beginning of the next hour, use align.time().

 align.time(temp, 3600)
[,1]
2012-05-02 10:00:006
2012-05-02 11:00:00   22
2012-05-02 12:00:00   38
2012-05-02 13:00:00   54
2012-05-02 14:00:00   70
2012-05-02 15:00:00   86
2012-05-02 16:00:00  102
2012-05-02 17:00:00  118
2012-05-02 18:00:00  134
2012-05-02 19:00:00  150
2012-05-02 20:00:00   40

 Many thanks in advance,
 Costas

 __

 *Costas Vorlow
 http://www.gravatar.com/avatar/49a9dee59073b1ed4a36440a06aeb81b *
 *http://www.linkedin.com/in/costasvorlow
 http://www.linkedin.com/in/costasvorlow*
 *http://www.vorlow.com* http://www.vorlow.com

 ▇ ▅ █ ▅ ▇ ▂ ▃ ▁ ▁ ▅ ▃ ▅ ▅ ▄ ▅ ▇ ▅ █ ▅ ▇ ▂ ▃ ▁ ▁ ▅ ▃ ▅ ▅ ▄ ▅ ▇ ▅ █ ▅ ▇


Best,
--
Joshua Ulrich  |  about.me/joshuaulrich
FOSS Trading  |  www.fosstrading.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] correlation given p value and sample size

2014-06-16 Thread peter dalgaard
There's a simple relation 

t = r / sqrt(1 - r^2) * sqrt(n - 2)
r = t / sqrt(n - 2 + t^2)

where t has a t distribution on n-2 df. Insert t = +-qt(p/2, n-2).

-pd


On 16 Jun 2014, at 11:23 , Witold E Wolski wewol...@gmail.com wrote:

 Hi,
 
 
 Looking for and function which produces the minimum r (pearson
 correlation) so that H0 (r=0) can be rejected, given sample size and
 p-value?
 
 
 Witold
 
 -- 
 Witold Eryk Wolski
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with SEM package - model significance

2014-06-16 Thread John Fox
Dear Bernardo,

The df for the LR chisquare over-identification test come not from the number 
of observations, but from the difference between the number of observable 
variances and covariances, on the one hand, and free parameters to estimate, on 
the other. In your case, these numbers are equal, and so df = 0. The LR 
chisquare for a just-identified model is also necessarily 0: the model 
perfectly reproduces the covariational structure of the observed variables. 

R (and most statistical software) by default writes very small and very large 
numbers in scientific format. In your case, -2.873188e-13 = -2.87*10^-13, that 
is, 0 within rounding error. You can change the way numbers are printed with 
the R scipen option.

Some other observations:

(1) Your model is recursive and has no latent variables; you would get the same 
estimates from OLS regression using lm().

(2) For quite some time now, the sem package has included specifyEquations() as 
a more convenient way of specifying a model, in preference to specifyModel(). 
See ?specifyEquations.

(3) You don't have to specify the error variances directly; specifyEquations(), 
or specifyModel(), will supply them.

I hope this helps,
 John


John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/


On Sun, 15 Jun 2014 20:15:31 -0700 (PDT)
 Bernardo Santos bernardo_brand...@yahoo.com.br wrote:
 Dear all, 
 
 I used sem function from the package SEM to fit a model. However, I cannot 
 say if the model is correspondent to the data or not (chisquare test).
 I used the commands:
 
 model1 - specifyModel()
 estadio - compflora, a1, NA
 estadio - compfauna, a2, NA
 estadio - interacoesobs, a3, NA
 compflora - compfauna, b1, NA
 compflora - interacoesobs, b2, NA
 compfauna - interacoesobs, c1, NA
 estadio - estadio, e1, NA
 compflora - compflora, e2, NA
 compfauna - compfauna, e3, NA
 interacoesobs - interacoesobs, e4, NA
 
 sem1 - sem(model1, cov.matrix, length(samples))
 summary(sem1)
 
 and I got the result:
 
 Model Chisquare =  -2.873188e-13   Df =  0 Pr(Chisq) = NA AIC =  20 BIC =  
 -2.873188e-13 Normalized Residuals Min.   1st Qu.Median  Mean   3rd 
 Qu.  Max. 
 0.000e+00 0.000e+00 2.957e-16 3.193e-16 5.044e-16 8.141e-16  R-square for 
 Endogenous Variables compflora compfauna interacoesobs  0.0657
 0.10560.2319  Parameter Estimates Estimate Std Errorz value   
  Pr(|z|) 
 a1 3.027344e-01 1.665395e-01 1.81779316 6.909575e-02 compflora --- estadio   

 a2 2.189427e-01 1.767404e-01 1.23878105 2.154266e-01 compfauna --- estadio   

 a3 7.314192e-03 1.063613e-01 0.06876742 9.451748e-01 interacoesobs --- 
 estadio  
 b1 2.422906e-01 1.496290e-01 1.61927587 1.053879e-01 compfauna --- compflora 

 b2 3.029933e-01 9.104901e-02 3.32780446 8.753328e-04 interacoesobs --- 
 compflora
 c1 4.863368e-02 8.638177e-02 0.56300857 5.734290e-01 interacoesobs --- 
 compfauna
 e1 6.918133e+04 1.427102e+04 4.84767986 1.249138e-06 estadio -- estadio 

 e2 9.018230e+04 1.860319e+04 4.84767986 1.249138e-06 compflora -- compflora 

 e3 9.489661e+04 1.957568e+04 4.84767986 1.249138e-06 compfauna -- compfauna 

 e4 3.328072e+04 6.865289e+03 4.84767986 1.249138e-06 interacoesobs -- 
 interacoesobs Iterations =  0 
 
 I understand the results, but I do not know how to interpret the first line 
 that tells me about the model:
 Model Chisquare =  -2.873188e-13   Df =  0 Pr(Chisq) = NA
 
 How can DF be zero, if the number of observations I used in sem funcition was 
 48 and I have only 4 variables? What is the p value?
 
 Thanks in advance.
 Bernardo Niebuhr
   [[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Aggregating 15 minute xts sequence to hourly

2014-06-16 Thread Costas Vorlow
Dear Joshua,

Thanks for your reply. As I see, the solution you suggest aligns the time
stamps as required but leaves the aggregation results as is.

Hence, the last quarter data of every hour are not aggregated still...

Am I right or am I understanding something wrongly?

I tried to move ahead ends by one (ends-ends+1) but this does not work
either. It seems that if you change the endpoints, still aggregation
happens every 45 minutes as you pointed out, although the ends variable
points to the round hour time stamp...



__

*Costas Vorlow
http://www.gravatar.com/avatar/49a9dee59073b1ed4a36440a06aeb81b *
*http://www.linkedin.com/in/costasvorlow
http://www.linkedin.com/in/costasvorlow*
*http://www.vorlow.com* http://www.vorlow.com

▇ ▅ █ ▅ ▇ ▂ ▃ ▁ ▁ ▅ ▃ ▅ ▅ ▄ ▅ ▇ ▅ █ ▅ ▇ 
▂ ▃ ▁ ▁ ▅ ▃ ▅ ▅ ▄ ▅ ▇ ▅ █ ▅ ▇


On 16 June 2014 13:31, Joshua Ulrich josh.m.ulr...@gmail.com wrote:

 On Mon, Jun 16, 2014 at 3:41 AM, Costas Vorlow costas.vor...@gmail.com
 wrote:
  Dear all,
 
  Why aggregation of 15 minute xts data happens on the 45th (3rd quarter)
 and
  not the exact hour close (i.e., 00) time?
 
 The 00 time is the beginning of the hour, not the end.  E.g.,
 10:00:00 is the beginning of the 10-o'clock hour.

  For example, temp below is an xts sequence with 15-minute frequency:
 
  quarters - ISOdatetime(2012,05,02,9,0,0) + seq(0:39)*15*60;
  set.seed(42);
  observation - xts(1:40, order.by=as.POSIXct(quarters));
  head(observation);
  [,1]
  2012-05-02 09:15:001
  2012-05-02 09:30:002
  2012-05-02 09:45:003
  2012-05-02 10:00:004
  2012-05-02 10:15:005
  2012-05-02 10:30:006
 
  ends-endpoints(observation,'hours');
  temp-period.apply(observation, ends,sum);
  temp
  [,1]
  2012-05-02 09:45:006
  2012-05-02 10:45:00   22
  2012-05-02 11:45:00   38
  2012-05-02 12:45:00   54
  2012-05-02 13:45:00   70
  2012-05-02 14:45:00   86
  2012-05-02 15:45:00  102
  2012-05-02 16:45:00  118
  2012-05-02 17:45:00  134
  2012-05-02 18:45:00  150
  2012-05-02 19:00:00   40
 
 
  I get the sum of every quarter within the hour on the third quarter. How
  can I implicitly calculate the sum of the quarterly data on the hour's
  close (10:00, 11:00, 12:00 and so on) ?
 
 Again, those are the beginnings of the hours.  endpoints() and
 period.apply() only use the timestamps in your data.  If you want to
 round up to the beginning of the next hour, use align.time().

  align.time(temp, 3600)
 [,1]
 2012-05-02 10:00:006
 2012-05-02 11:00:00   22
 2012-05-02 12:00:00   38
 2012-05-02 13:00:00   54
 2012-05-02 14:00:00   70
 2012-05-02 15:00:00   86
 2012-05-02 16:00:00  102
 2012-05-02 17:00:00  118
 2012-05-02 18:00:00  134
 2012-05-02 19:00:00  150
 2012-05-02 20:00:00   40

  Many thanks in advance,
  Costas
 
  __
 
  *Costas Vorlow
  http://www.gravatar.com/avatar/49a9dee59073b1ed4a36440a06aeb81b *
  *http://www.linkedin.com/in/costasvorlow
  http://www.linkedin.com/in/costasvorlow*
  *http://www.vorlow.com* http://www.vorlow.com
 
  ▇ ▅ █ ▅ ▇ ▂ ▃ ▁ ▁ ▅ ▃ ▅ ▅ ▄ ▅ ▇ ▅ █ ▅ 
  ▇ ▂ ▃ ▁ ▁ ▅ ▃ ▅ ▅ ▄ ▅ ▇ ▅ █ ▅ ▇
 

 Best,
 --
 Joshua Ulrich  |  about.me/joshuaulrich
 FOSS Trading  |  www.fosstrading.com


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Determination lag order - problem with daily data and AR / ARIMA

2014-06-16 Thread serena1234
Hello,

I am trying to determine a lag order for my data with the help of AIC and/
or BIC in order to conduct further tests. It is about prices measured at a
daily frequency (weekends and holidays excluded).

My first approach was to approximate the process with an AR model using the
function ar(x, ...) and a loop to try several lags and then determine the
AIC and BIC values for each lag to determine the lowest one.
However, when I try to use the BIC function or the AIC, setting k =
log(length(time series)), it does not work. The error says that the model is
of the class ar and AIC cannot work with that.
[This is not the loop, but just the general problem when inserting an ar
model into AIC]
 model=ar(price, aic = FALSE, method=ols)
 AIC(model, k = 2)
Error in UseMethod(logLik) : 
  no applicable method for 'logLik' applied to an object of class ar
 AIC(model, k = log(length(price_G)))
Error in UseMethod(logLik) : 
  no applicable method for 'logLik' applied to an object of class ar

Alternatively, I know that ar selects by default he lag order via the AIC
criterion, but it suggests 40 lags, which appears quite high to me.
Therefore, I wanted to check this result for robustness by applying BIC. But
that doesn't work due to the problem explained above.

Another option was to use an ARIMA model with order = c(lags, 0, 0) and then
determine the AIC and BIC values. That does generally work, but it
calculates AICs and BICs of zero for every kind of lag. That doesn't make
sense to me.

So that is why I think I may have a problem in classifying my daily data. I
just inserted the numeric vector for calculating the models. But how can I
classify the daily data as a time series correctly? (Because weekends and
holidays are missing.) I've tried the zoo package, but then I can't use the
ARIMA function any more.

Can anyone offer any help to make on of the three approaches work?

Thanks in advance!



--
View this message in context: 
http://r.789695.n4.nabble.com/Determination-lag-order-problem-with-daily-data-and-AR-ARIMA-tp4692194.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] xml package - free document / memory leak

2014-06-16 Thread Loos, Martin
Dear R helpers

I have a problem of releasing memory after having read .xml-files using the xml 
package (version 3.98-1.1, R 3.0.2, Windows 7, 64bit). The problem has appeared 
previously and several solutions/bug fixes have been proposed. I went through 
many of these and have also read (and understood?) Duncan Lang`s Memory 
Management page, outlining the counter-based memory release for nodes and 
documents. However, the problem persists, i.e.,

filed- ... # some PubChem .xml file path
doc-xmlTreeParse(file=filed,useInternalNodes=TRUE)
get_data-getNodeSet(doc,path=//r:PC-InfoData,
c(r = http://www.ncbi.nlm.nih.gov;)
)

will not allow me to release doc from memory using combinations/orders of 
rm(), free(), gc() for doc and get_data. I ended up using 
.Call(RS_XML_forceFreeDoc, doc) and monitoring the counter settings with 
.Call(R_getXMLRefCount, ...) - but that cannot really be the solution, not?

What am I doing wrong? Thank you very much for your help, Martin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Power graph for two Proportion

2014-06-16 Thread Suganthie Jeyaganth
Dear R mailing listers,
I am try to find the different power calculation for 

p1=0.2 and p2=0.4 , with significant level=0.05  (one sided test)

I would like to  have a graph   y -axis as a power and 

x-axis as a sample size .

I run this command for different value of power. and get the n and power 

to draw a graph

pwr.2p.test(h=ES.h(0.4,0.2), power = 0.87, sig.level=0.05,alternative=greater)


Is there is any easy way i can do this in R.
Thanks 


Suga
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Determination lag order - problem with daily data and AR / ARIMA

2014-06-16 Thread serena1234
Hello,
I am trying to determine a lag order for my data with the help of AIC and/
or BIC in order to conduct further tests. It is about prices measured at a
daily frequency (weekends and holidays excluded).  

1) My first approach was to approximate the process with an AR model using
the function ar(x, ...) and a loop to try several lags and then determine
the AIC and BIC values for each lag to determine the lowest one. However,
when I try to use the BIC function or the AIC, setting k = log(length(time
series)), it does not work. The error says that the model is of the class ar
and AIC cannot work with that. 
[This is not the loop, but just the general problem when inserting an ar
model into AIC]
  model=ar(price, aic = FALSE, method=ols) 
 AIC(model, k = 2) Error in UseMethod(logLik) :   
no applicable method for 'logLik' applied to an object of class ar 
 AIC(model, k = log(length(price_G))) Error in UseMethod(logLik) :  
 no applicable method for 'logLik' applied to an object of class ar  

2) Alternatively, I know that ar selects by default he lag order via the AIC
criterion, but it suggests 40 lags, which appears quite high to me.
Therefore, I wanted to check this result for robustness by applying BIC. But
that doesn't work due to the problem explained above. 

3) Another option was to use an ARIMA model with order = c(lags, 0, 0) and
then determine the AIC and BIC values. That does generally work, but it
calculates AICs and BICs of zero for every kind of lag. That doesn't make
sense to me.  

So that is why I think I may have a problem in classifying my daily data. I
just inserted the numeric vector for calculating the models. But how can I
classify the daily data as a time series correctly? (Because weekends and
holidays are missing.) I've tried the zoo package, but then I can't use the
ARIMA function any more.  

Can anyone offer any help to make one of the three approaches work?  
Thanks in advance! 



--
View this message in context: 
http://r.789695.n4.nabble.com/Determination-lag-order-problem-with-daily-data-and-AR-ARIMA-tp4692195.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Question on JAVA and R

2014-06-16 Thread Luis Eduardo Castillo Méndez
As I discuss today we are trying to do with my partner Steven Penaloza and
a small program of supervised classification of images with unconventional
using R methods, connecting libraries and own code of R to an environment
of JAVA, in our case we are using NETBEANS to interface, the connection
was a success, performs the functions and classification, but purely
procedural way, the problem occurs when we try to implement a small
interface to variables called methods R you are saved in JAVA although the
Double data eg can maneuver between a language and another, do not know
how to do it with an image. TIFF, which is our case, the following are a
few lines of code that opens a window buscardor type, in this case an
image is opened. Tiff and assigned to the img variable, but is within R,
is when for example we can not assign it to a button and is called the
variable to be used in a class or event.



code.addRCode (fileName - tclvalue (tkgetOpenFile ()));

code.addRCode (img -brick (fileName));




Thank you very much


*




Como le comente el día de hoy estamos intentando realizar con  mi
compañero Steven Peñaloza y yo un pequeño programa de clasificación
supervisada de imágenes con métodos no convencionales usando R, conectando
las librerías y código propio de R a un ambiente de JAVA, en nuestro caso
estamos usando NETBEANS para realizar la interfaz, la conexión ya fue un
éxito, realiza las funciones y la clasificación, pero de manera netamente
procedimental, el problema ocurre cuando intentamos implementarlo en una
pequeña interfaz, para que las variables llamadas con métodos R sean
guardadas en JAVA, aunque con datos tipo Double por ejemplo es posible
“maniobrarlo” entre un lenguaje y otro, no sabemos como hacerlo con una
imagen . TIFF, que es nuestro caso, las siguientes son unas lineas de
codigo que abre una ventana tipo buscardor, en este caso se abre una
imagen . Tiff y se le asigna a la variable img, pero queda dentro de R, es
aquí cuando no podemos por ejemplo asignarlo a un botón y que sea llamado 
para ser usada la variable en una clase o evento.



code.addRCode(fileName - tclvalue(tkgetOpenFile()));

code.addRCode(img-brick(fileName));




Muchas gracias

-- 
 Luis Eduardo Castillo Méndez
Matemático - Magíster en Ciencias Estadística
 Docente Asociado de Tiempo Completo
   Ingeniería Catastral y Geodesia


-
  ***
Este correo y cualquier archivo anexo son confidenciales y para uso exclusivo  
de
la persona o entidad de destino.  Esta  comunicacion  puede  contener 
información 
protegida por el privilegio de cliente-abogado. Si usted ha recibido  este 
correo
por error, equivocacion u omision queda estrictamente prohibido  la  
utilizacion,
copia, reimpresion o reenvio del mismo. En tal caso,  favor  notificar  en  
forma
inmediata al remitente.  

Protejase de los fraudes por internet ! 
http://www.udistrital.edu.co/alertaSeguridad.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Any refit function available for 'car' package?

2014-06-16 Thread Gang Chen
Suppose that I need to run a multivariate linear model

Y = X B + E

many times with the same model matrix X but each time with different
response matrix Y. Is there a function available in 'car' package
similar to refit() in lme4 package so that the model matrix X would
not be reassembled each time? Also, runtime can be saved without
repeatedly performing the same matrix computations such as
(X'X)^(-1)X'.

Thanks,
Gang

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R, Rserve logging

2014-06-16 Thread Hema Seshadri
I am running R on daemon mode Rserve (to connect to java). It spits out a
constant flow output. Is there a way to turn it off or set it to rotate
etc. In other words I want to find out how R, Rserve handles logging.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] glm.fit: fitted probabilities numerically 0 or 1 occurred for a continuous variable?

2014-06-16 Thread Nwinters
I have gotten the this error before: glm.fit: fitted probabilities
numerically 0 or 1 occurred

and the problem was usually solved by combining one or more categories were
there were no observations.

I am now having this error show up for a variable that is continuous (not
categorical).

What could be the cause of this for a continuous variable??

Thanks, 
Nick



--
View this message in context: 
http://r.789695.n4.nabble.com/glm-fit-fitted-probabilities-numerically-0-or-1-occurred-for-a-continuous-variable-tp4692211.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm.fit: fitted probabilities numerically 0 or 1 occurred for a continuous variable?

2014-06-16 Thread Marc Schwartz

On Jun 16, 2014, at 2:34 PM, Nwinters nicholas.wint...@mail.mcgill.ca wrote:

 I have gotten the this error before: glm.fit: fitted probabilities
 numerically 0 or 1 occurred
 
 and the problem was usually solved by combining one or more categories were
 there were no observations.
 
 I am now having this error show up for a variable that is continuous (not
 categorical).
 
 What could be the cause of this for a continuous variable??
 
 Thanks, 
 Nick


Presuming that this is logistic regression (family = binomial), the error is 
suggestive of complete or near complete separation in the association between 
your continuous IV and your binary response. This can occur if there is a 
breakpoint within the range of your IV where the dichotomous event is present 
on one side of the break and is absent on the other side of the break.

The resolution for the problem will depend upon first confirming the etiology 
of it and then, within the context of subject matter expertise, making some 
decisions on how to proceed. 

If you Google logistic regression separation, you will get some resources 
that can be helpful.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] prediction based on conditional logistic regression clogit

2014-06-16 Thread array chip
Thank you Peter. Any other suggestions are absolutely welcome!!

John




 From: peter dalgaard pda...@gmail.com

Cc: r-help@r-project.org r-help@r-project.org 
Sent: Monday, June 16, 2014 2:22 AM
Subject: Re: [R] prediction based on conditional logistic regression clogit





 Hi, I am using clogit() from survival package to do conditional logistic 
 regression. I also need to make prediction on an independent dataset to 
 calculate predicted probability. Here is an example:
 
 
 dat - data.frame(set=rep(1:50,each=3), status=rep(c(1,0,0),50), 
 x1=rnorm(150,5,1), x2=rnorm(150,7,1.5))
 dat.test - data.frame(set=rep(1:30,each=3), status=rep(c(1,0,0),30), 
 x1=rnorm(90,5,1), x2=rnorm(90,7,1.5))
 fit-clogit(status~x1+x2+strata(set),dat)
 predict(fit,newdata=dat.test,type='expected')
 Error in Surv(rep(1, 150L), status) : 
   Time and status are different lengths
 
 Can anyone suggest what's wrong here?
 


The direct cause is that clogit() works by using the fact that the likelihood 
is equivalent to a coxph() likelihood with stratification and all observation 
lengths set to 1. Therefore the analysis is formally on Surv(rep(1, 150L), 
status) and that goes belly-up if you apply the same formula to a data set of 
different length. 

However, notice that there is no such thing as predict.clogit(), so you are 
attempting predict.coxph() on a purely formal Cox model. It is unclear to what 
extent predicted values, in the sense of coxph() are compatible with 
predictions in conditional logit models.

I'm rusty on this, but I think what you want is something like

m - model.matrix(~ x1 + x2 - 1, data=dat.test)
pp - exp(m %*% coef(fit))
pps - ave(pp, dat.test$set, FUN=sum)
pp/pps

i.e. the conditional probability that an observation is a case given covariates 
and that there is on case in each set (in the data given, you have sets of 
three with one being a case, so all predicted probabilities are close to 0.33). 
For more general matched sets, I'm not really sure what one wants. Real experts 
are welcome to chime in.

-pd




 Thanks!
 
 John
     [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error in merge [negative length vectors are not allowed]

2014-06-16 Thread Kate Ignatius
Hi All,

I'm trying to merge two files together using:

combinedfiles - merge(comb1,comb2,by=c(Place,Stall,Menu))

comb1 is about 2 million + rows (158MB) and comb2 is about 600K+ rows (52MB).

When I try to merge using the above syntax I get the error:

Error in merge.data.frame(comb1, comb2, by = c(Place,Stall,Menu)) :
  negative length vectors are not allowed

Is there is something that I'm doing wrong?  I've merged larger files
together in the past without a problem so am curious what might be the
problem here...

Thanks in advance!

~K

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] prediction based on conditional logistic regression clogit

2014-06-16 Thread Charles Berry
peter dalgaard pdalgd at gmail.com writes:

 
 
 On 16 Jun 2014, at 05:22 , array chip arrayprofile at yahoo.com wrote:
 

  Hi, I am using clogit() from survival package to do conditional
  logistic regression. I also need to make prediction on an
  independent dataset to calculate predicted probability. Here is an
  example:

[snip]

  Can anyone suggest what's wrong here?
  

  The direct cause is that clogit() works by using the fact that the
 likelihood is equivalent to a coxph() likelihood with stratification
 and all observation lengths set to 1. Therefore the analysis is
 formally on Surv(rep(1, 150L), status) and that goes belly-up if you
 apply the same formula to a data set of different length.

  However, notice that there is no such thing as predict.clogit(), so
 you are attempting predict.coxph() on a purely formal Cox model. It
 is unclear to what extent predicted values, in the sense of coxph()
 are compatible with predictions in conditional logit models.

 
 I'm rusty on this, but I think what you want is something like
 
 m - model.matrix(~ x1 + x2 - 1, data=dat.test)
 pp - exp(m %*% coef(fit))
 pps - ave(pp, dat.test$set, FUN=sum)
 pp/pps
 

 i.e. the conditional probability that an observation is a case given
 covariates and that there is on case in each set (in the data given,
 you have sets of three with one being a case, so all predicted
 probabilities are close to 0.33). For more general matched sets, I'm
 not really sure what one wants. Real experts are welcome to chime
 in.

For the general situation of n cases in a stratum of size N, you want the
probability that the unit in question is one of n units drawn from a
stratum of size N without replacement with unequal probabilities of
selection over the units.

I am *not* an expert on that, but there is plenty written on it.

 Horvitz, Daniel G., and Donovan J. Thompson. A generalization of
 sampling without replacement from a finite universe. Journal of the
 American Statistical Association 47.260 (1952): 663-685.

is a place to start.

The probability in question is a sum over the

factorial(n)*choose(N-1,n-1)) 

elements corresponding to the number of samples (and orders) that
include a chosen element.

Of course, for n=1 there is just the one element, pp/pps.

HTH,

Chuck

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in merge [negative length vectors are not allowed]

2014-06-16 Thread Prof Brian Ripley

On 16/06/2014 23:21, Kate Ignatius wrote:

Hi All,

I'm trying to merge two files together using:

combinedfiles - merge(comb1,comb2,by=c(Place,Stall,Menu))

comb1 is about 2 million + rows (158MB) and comb2 is about 600K+ rows (52MB).

When I try to merge using the above syntax I get the error:

Error in merge.data.frame(comb1, comb2, by = c(Place,Stall,Menu)) :
   negative length vectors are not allowed

Is there is something that I'm doing wrong?  I've merged larger files


Not telling us the the 'at a minimum' information asked for in the 
posting guide.



together in the past without a problem so am curious what might be the
problem here...

Thanks in advance!

~K


This is usually an indication that you are trying to create more than 
2^31 rows in the result, which looks plausible given your data set 
sizes. AFAICS merge does not allow long vectors on a 64-bit system in 
released versions of R.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error: C stack usage

2014-06-16 Thread Mohan Radhakrishnan
Hi,
   I am not relating this to shiny but this error was thrown by my
shiny server code. Does it have anything to do with low memory settings ?
I am spawning a JVM using rJava. That JVM is using the attach API to
connect to another JVM

Thanks,
Mohan

Error: C stack usage  140730070087404 is too close to the limit
Error: C stack usage  140730070156700 is too close to the limit
 Warning: stack imbalance in '.Call', 59 then -1
Warning: stack imbalance in '{', 56 then -4

 *** caught bus error ***
address 0x100583fd8, cause 'non-existent physical address'


 *** caught bus error ***
address 0x100583fd8, cause 'non-existent physical address'

Traceback:
 1: run(timeoutMs)
 2: service(timeout)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: run(10)


 *** caught segfault ***
address 0x20057ea40, cause 'memory not mapped'

Traceback:
 1: run(timeoutMs)
 2: service(timeout)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.