Re: [R] plot in package psych with function error.bars.by
Hi William, I've just updated your latest package psych_1.4.6.11.zip from server personality-project/r/src/contrib/. One time the updating process was finished, i tried to run based samples code: require(psych) keys.list=list(Agree=c(-1,2:5),Conscientious=c(6:8,-9,-10),Extraversion=c(-11,-12,13:15),Neuroticism=c(16:20),Openness = c(21,-22,23,24,-25)) keys = make.keys(28,keys.list,item.labels=colnames(bfi)) scores = scoreItems(keys,bfi,min=1,max=6) error.bars.by(scores$scores,round(bfi$age/10)*10,by.var=TRUE,main=BFI age trends,legend=3,labels=colnames(scores$scores),xlab=Age,ylab=Mean item score) then i had an error following: Erreur dans if (del == 0 to == 0) return(to) : valeur manquante là où TRUE / FALSE est requis De plus : Messages d'avis : 1: In qt(1 - alpha/2, group.stats[[g]]$n - 1) : production de NaN 2: In dt(ln, n - 1) : production de NaN 3: In qt(alpha/2, n - 1) : production de NaN Could you tell me how to fix these issues? May i had a mistake of updating your lasted package? Sincerly Tham -- View this message in context: http://r.789695.n4.nabble.com/plot-in-package-psych-with-function-error-bars-by-tp4691632p4692177.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Aggregating 15 minute xts sequence to hourly
Dear all, Why aggregation of 15 minute xts data happens on the 45th (3rd quarter) and not the exact hour close (i.e., 00) time? For example, temp below is an xts sequence with 15-minute frequency: quarters - ISOdatetime(2012,05,02,9,0,0) + seq(0:39)*15*60; set.seed(42); observation - xts(1:40, order.by=as.POSIXct(quarters)); head(observation); [,1] 2012-05-02 09:15:001 2012-05-02 09:30:002 2012-05-02 09:45:003 2012-05-02 10:00:004 2012-05-02 10:15:005 2012-05-02 10:30:006 ends-endpoints(observation,'hours'); temp-period.apply(observation, ends,sum); temp [,1] 2012-05-02 09:45:006 2012-05-02 10:45:00 22 2012-05-02 11:45:00 38 2012-05-02 12:45:00 54 2012-05-02 13:45:00 70 2012-05-02 14:45:00 86 2012-05-02 15:45:00 102 2012-05-02 16:45:00 118 2012-05-02 17:45:00 134 2012-05-02 18:45:00 150 2012-05-02 19:00:00 40 I get the sum of every quarter within the hour on the third quarter. How can I implicitly calculate the sum of the quarterly data on the hour's close (10:00, 11:00, 12:00 and so on) ? Many thanks in advance, Costas __ *Costas Vorlow http://www.gravatar.com/avatar/49a9dee59073b1ed4a36440a06aeb81b * *http://www.linkedin.com/in/costasvorlow http://www.linkedin.com/in/costasvorlow* *http://www.vorlow.com* http://www.vorlow.com â â â â â â â â â â â â â â â â â â â â â â â â â â â â â â â â â â â [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prediction based on conditional logistic regression clogit
On 16 Jun 2014, at 05:22 , array chip arrayprof...@yahoo.com wrote: Hi, I am using clogit() from survival package to do conditional logistic regression. I also need to make prediction on an independent dataset to calculate predicted probability. Here is an example: dat - data.frame(set=rep(1:50,each=3), status=rep(c(1,0,0),50), x1=rnorm(150,5,1), x2=rnorm(150,7,1.5)) dat.test - data.frame(set=rep(1:30,each=3), status=rep(c(1,0,0),30), x1=rnorm(90,5,1), x2=rnorm(90,7,1.5)) fit-clogit(status~x1+x2+strata(set),dat) predict(fit,newdata=dat.test,type='expected') Error in Surv(rep(1, 150L), status) : Time and status are different lengths Can anyone suggest what's wrong here? The direct cause is that clogit() works by using the fact that the likelihood is equivalent to a coxph() likelihood with stratification and all observation lengths set to 1. Therefore the analysis is formally on Surv(rep(1, 150L), status) and that goes belly-up if you apply the same formula to a data set of different length. However, notice that there is no such thing as predict.clogit(), so you are attempting predict.coxph() on a purely formal Cox model. It is unclear to what extent predicted values, in the sense of coxph() are compatible with predictions in conditional logit models. I'm rusty on this, but I think what you want is something like m - model.matrix(~ x1 + x2 - 1, data=dat.test) pp - exp(m %*% coef(fit)) pps - ave(pp, dat.test$set, FUN=sum) pp/pps i.e. the conditional probability that an observation is a case given covariates and that there is on case in each set (in the data given, you have sets of three with one being a case, so all predicted probabilities are close to 0.33). For more general matched sets, I'm not really sure what one wants. Real experts are welcome to chime in. -pd Thanks! John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] correlation given p value and sample size
Hi, Looking for and function which produces the minimum r (pearson correlation) so that H0 (r=0) can be rejected, given sample size and p-value? Witold -- Witold Eryk Wolski __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Defining default method for S3, S4 and R5 classes
thanks Suzen 2014-06-15 2:34 GMT+02:00 Suzen, Mehmet msu...@gmail.com: There is a nice tutorial on this: http://adv-r.had.co.nz/OO-essentials.html For an in depth guide, have a look at the book from John Chambers, Software for data analysis programming with R. On 13 June 2014 12:20, Luca Cerone luca.cer...@gmail.com wrote: Dear all, I am writing a script implementing a pipeline to analyze some of the data we receive. One of the steps in this pipeline involves clustering the data, and I am interested in studying the effects of different clustering algorithms on the final results. I am having issues making my code general enough because the clustering algorithms we are interested all return different types of objects (S3, S4 and R5 classes, as well as simple named lists). From the output of these algorithms I need to extract a list with as many elements as the number of clusters and such that each element contains the ids of the elements in each cluster. I have easily done this for each of the cluster algorithms, the problem is: how can I make so that rather than having to check for classes and types this is done automatically? For example, for the algorithms that return S3 classes I have defined a method get_cluster_list.default and then created the methods for the individual classes, which is used in the main body of the pipeline. I have no idea how I can do this for S4 and R5 classes and, more importantly, I would like an approach that works when using all S3, S4 and R5 classes. Do you know how I could do this? Thanks for the help, Luca __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Luca Cerone Tel: +34 692 06 71 28 Skype: luca.cerone __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Aggregating 15 minute xts sequence to hourly
On Mon, Jun 16, 2014 at 3:41 AM, Costas Vorlow costas.vor...@gmail.com wrote: Dear all, Why aggregation of 15 minute xts data happens on the 45th (3rd quarter) and not the exact hour close (i.e., 00) time? The 00 time is the beginning of the hour, not the end. E.g., 10:00:00 is the beginning of the 10-o'clock hour. For example, temp below is an xts sequence with 15-minute frequency: quarters - ISOdatetime(2012,05,02,9,0,0) + seq(0:39)*15*60; set.seed(42); observation - xts(1:40, order.by=as.POSIXct(quarters)); head(observation); [,1] 2012-05-02 09:15:001 2012-05-02 09:30:002 2012-05-02 09:45:003 2012-05-02 10:00:004 2012-05-02 10:15:005 2012-05-02 10:30:006 ends-endpoints(observation,'hours'); temp-period.apply(observation, ends,sum); temp [,1] 2012-05-02 09:45:006 2012-05-02 10:45:00 22 2012-05-02 11:45:00 38 2012-05-02 12:45:00 54 2012-05-02 13:45:00 70 2012-05-02 14:45:00 86 2012-05-02 15:45:00 102 2012-05-02 16:45:00 118 2012-05-02 17:45:00 134 2012-05-02 18:45:00 150 2012-05-02 19:00:00 40 I get the sum of every quarter within the hour on the third quarter. How can I implicitly calculate the sum of the quarterly data on the hour's close (10:00, 11:00, 12:00 and so on) ? Again, those are the beginnings of the hours. endpoints() and period.apply() only use the timestamps in your data. If you want to round up to the beginning of the next hour, use align.time(). align.time(temp, 3600) [,1] 2012-05-02 10:00:006 2012-05-02 11:00:00 22 2012-05-02 12:00:00 38 2012-05-02 13:00:00 54 2012-05-02 14:00:00 70 2012-05-02 15:00:00 86 2012-05-02 16:00:00 102 2012-05-02 17:00:00 118 2012-05-02 18:00:00 134 2012-05-02 19:00:00 150 2012-05-02 20:00:00 40 Many thanks in advance, Costas __ *Costas Vorlow http://www.gravatar.com/avatar/49a9dee59073b1ed4a36440a06aeb81b * *http://www.linkedin.com/in/costasvorlow http://www.linkedin.com/in/costasvorlow* *http://www.vorlow.com* http://www.vorlow.com ▇ ▅ █ ▅ ▇ ▂ ▃ ▁ ▁ ▅ ▃ ▅ ▅ ▄ ▅ ▇ ▅ █ ▅ ▇ ▂ ▃ ▁ ▁ ▅ ▃ ▅ ▅ ▄ ▅ ▇ ▅ █ ▅ ▇ Best, -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] correlation given p value and sample size
There's a simple relation t = r / sqrt(1 - r^2) * sqrt(n - 2) r = t / sqrt(n - 2 + t^2) where t has a t distribution on n-2 df. Insert t = +-qt(p/2, n-2). -pd On 16 Jun 2014, at 11:23 , Witold E Wolski wewol...@gmail.com wrote: Hi, Looking for and function which produces the minimum r (pearson correlation) so that H0 (r=0) can be rejected, given sample size and p-value? Witold -- Witold Eryk Wolski __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with SEM package - model significance
Dear Bernardo, The df for the LR chisquare over-identification test come not from the number of observations, but from the difference between the number of observable variances and covariances, on the one hand, and free parameters to estimate, on the other. In your case, these numbers are equal, and so df = 0. The LR chisquare for a just-identified model is also necessarily 0: the model perfectly reproduces the covariational structure of the observed variables. R (and most statistical software) by default writes very small and very large numbers in scientific format. In your case, -2.873188e-13 = -2.87*10^-13, that is, 0 within rounding error. You can change the way numbers are printed with the R scipen option. Some other observations: (1) Your model is recursive and has no latent variables; you would get the same estimates from OLS regression using lm(). (2) For quite some time now, the sem package has included specifyEquations() as a more convenient way of specifying a model, in preference to specifyModel(). See ?specifyEquations. (3) You don't have to specify the error variances directly; specifyEquations(), or specifyModel(), will supply them. I hope this helps, John John Fox, Professor McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox/ On Sun, 15 Jun 2014 20:15:31 -0700 (PDT) Bernardo Santos bernardo_brand...@yahoo.com.br wrote: Dear all, I used sem function from the package SEM to fit a model. However, I cannot say if the model is correspondent to the data or not (chisquare test). I used the commands: model1 - specifyModel() estadio - compflora, a1, NA estadio - compfauna, a2, NA estadio - interacoesobs, a3, NA compflora - compfauna, b1, NA compflora - interacoesobs, b2, NA compfauna - interacoesobs, c1, NA estadio - estadio, e1, NA compflora - compflora, e2, NA compfauna - compfauna, e3, NA interacoesobs - interacoesobs, e4, NA sem1 - sem(model1, cov.matrix, length(samples)) summary(sem1) and I got the result: Model Chisquare = -2.873188e-13 Df = 0 Pr(Chisq) = NA AIC = 20 BIC = -2.873188e-13 Normalized Residuals Min. 1st Qu.Median Mean 3rd Qu. Max. 0.000e+00 0.000e+00 2.957e-16 3.193e-16 5.044e-16 8.141e-16 R-square for Endogenous Variables compflora compfauna interacoesobs 0.0657 0.10560.2319 Parameter Estimates Estimate Std Errorz value Pr(|z|) a1 3.027344e-01 1.665395e-01 1.81779316 6.909575e-02 compflora --- estadio a2 2.189427e-01 1.767404e-01 1.23878105 2.154266e-01 compfauna --- estadio a3 7.314192e-03 1.063613e-01 0.06876742 9.451748e-01 interacoesobs --- estadio b1 2.422906e-01 1.496290e-01 1.61927587 1.053879e-01 compfauna --- compflora b2 3.029933e-01 9.104901e-02 3.32780446 8.753328e-04 interacoesobs --- compflora c1 4.863368e-02 8.638177e-02 0.56300857 5.734290e-01 interacoesobs --- compfauna e1 6.918133e+04 1.427102e+04 4.84767986 1.249138e-06 estadio -- estadio e2 9.018230e+04 1.860319e+04 4.84767986 1.249138e-06 compflora -- compflora e3 9.489661e+04 1.957568e+04 4.84767986 1.249138e-06 compfauna -- compfauna e4 3.328072e+04 6.865289e+03 4.84767986 1.249138e-06 interacoesobs -- interacoesobs Iterations = 0 I understand the results, but I do not know how to interpret the first line that tells me about the model: Model Chisquare = -2.873188e-13 Df = 0 Pr(Chisq) = NA How can DF be zero, if the number of observations I used in sem funcition was 48 and I have only 4 variables? What is the p value? Thanks in advance. Bernardo Niebuhr [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Aggregating 15 minute xts sequence to hourly
Dear Joshua, Thanks for your reply. As I see, the solution you suggest aligns the time stamps as required but leaves the aggregation results as is. Hence, the last quarter data of every hour are not aggregated still... Am I right or am I understanding something wrongly? I tried to move ahead ends by one (ends-ends+1) but this does not work either. It seems that if you change the endpoints, still aggregation happens every 45 minutes as you pointed out, although the ends variable points to the round hour time stamp... __ *Costas Vorlow http://www.gravatar.com/avatar/49a9dee59073b1ed4a36440a06aeb81b * *http://www.linkedin.com/in/costasvorlow http://www.linkedin.com/in/costasvorlow* *http://www.vorlow.com* http://www.vorlow.com â â â â â â â â â â â â â â â â â â â â â â â â â â â â â â â â â â â On 16 June 2014 13:31, Joshua Ulrich josh.m.ulr...@gmail.com wrote: On Mon, Jun 16, 2014 at 3:41 AM, Costas Vorlow costas.vor...@gmail.com wrote: Dear all, Why aggregation of 15 minute xts data happens on the 45th (3rd quarter) and not the exact hour close (i.e., 00) time? The 00 time is the beginning of the hour, not the end. E.g., 10:00:00 is the beginning of the 10-o'clock hour. For example, temp below is an xts sequence with 15-minute frequency: quarters - ISOdatetime(2012,05,02,9,0,0) + seq(0:39)*15*60; set.seed(42); observation - xts(1:40, order.by=as.POSIXct(quarters)); head(observation); [,1] 2012-05-02 09:15:001 2012-05-02 09:30:002 2012-05-02 09:45:003 2012-05-02 10:00:004 2012-05-02 10:15:005 2012-05-02 10:30:006 ends-endpoints(observation,'hours'); temp-period.apply(observation, ends,sum); temp [,1] 2012-05-02 09:45:006 2012-05-02 10:45:00 22 2012-05-02 11:45:00 38 2012-05-02 12:45:00 54 2012-05-02 13:45:00 70 2012-05-02 14:45:00 86 2012-05-02 15:45:00 102 2012-05-02 16:45:00 118 2012-05-02 17:45:00 134 2012-05-02 18:45:00 150 2012-05-02 19:00:00 40 I get the sum of every quarter within the hour on the third quarter. How can I implicitly calculate the sum of the quarterly data on the hour's close (10:00, 11:00, 12:00 and so on) ? Again, those are the beginnings of the hours. endpoints() and period.apply() only use the timestamps in your data. If you want to round up to the beginning of the next hour, use align.time(). align.time(temp, 3600) [,1] 2012-05-02 10:00:006 2012-05-02 11:00:00 22 2012-05-02 12:00:00 38 2012-05-02 13:00:00 54 2012-05-02 14:00:00 70 2012-05-02 15:00:00 86 2012-05-02 16:00:00 102 2012-05-02 17:00:00 118 2012-05-02 18:00:00 134 2012-05-02 19:00:00 150 2012-05-02 20:00:00 40 Many thanks in advance, Costas __ *Costas Vorlow http://www.gravatar.com/avatar/49a9dee59073b1ed4a36440a06aeb81b * *http://www.linkedin.com/in/costasvorlow http://www.linkedin.com/in/costasvorlow* *http://www.vorlow.com* http://www.vorlow.com â â â â â â â â â â â â â â â â â â â â â â â â â â â â â â â â â â â Best, -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Determination lag order - problem with daily data and AR / ARIMA
Hello, I am trying to determine a lag order for my data with the help of AIC and/ or BIC in order to conduct further tests. It is about prices measured at a daily frequency (weekends and holidays excluded). My first approach was to approximate the process with an AR model using the function ar(x, ...) and a loop to try several lags and then determine the AIC and BIC values for each lag to determine the lowest one. However, when I try to use the BIC function or the AIC, setting k = log(length(time series)), it does not work. The error says that the model is of the class ar and AIC cannot work with that. [This is not the loop, but just the general problem when inserting an ar model into AIC] model=ar(price, aic = FALSE, method=ols) AIC(model, k = 2) Error in UseMethod(logLik) : no applicable method for 'logLik' applied to an object of class ar AIC(model, k = log(length(price_G))) Error in UseMethod(logLik) : no applicable method for 'logLik' applied to an object of class ar Alternatively, I know that ar selects by default he lag order via the AIC criterion, but it suggests 40 lags, which appears quite high to me. Therefore, I wanted to check this result for robustness by applying BIC. But that doesn't work due to the problem explained above. Another option was to use an ARIMA model with order = c(lags, 0, 0) and then determine the AIC and BIC values. That does generally work, but it calculates AICs and BICs of zero for every kind of lag. That doesn't make sense to me. So that is why I think I may have a problem in classifying my daily data. I just inserted the numeric vector for calculating the models. But how can I classify the daily data as a time series correctly? (Because weekends and holidays are missing.) I've tried the zoo package, but then I can't use the ARIMA function any more. Can anyone offer any help to make on of the three approaches work? Thanks in advance! -- View this message in context: http://r.789695.n4.nabble.com/Determination-lag-order-problem-with-daily-data-and-AR-ARIMA-tp4692194.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] xml package - free document / memory leak
Dear R helpers I have a problem of releasing memory after having read .xml-files using the xml package (version 3.98-1.1, R 3.0.2, Windows 7, 64bit). The problem has appeared previously and several solutions/bug fixes have been proposed. I went through many of these and have also read (and understood?) Duncan Lang`s Memory Management page, outlining the counter-based memory release for nodes and documents. However, the problem persists, i.e., filed- ... # some PubChem .xml file path doc-xmlTreeParse(file=filed,useInternalNodes=TRUE) get_data-getNodeSet(doc,path=//r:PC-InfoData, c(r = http://www.ncbi.nlm.nih.gov;) ) will not allow me to release doc from memory using combinations/orders of rm(), free(), gc() for doc and get_data. I ended up using .Call(RS_XML_forceFreeDoc, doc) and monitoring the counter settings with .Call(R_getXMLRefCount, ...) - but that cannot really be the solution, not? What am I doing wrong? Thank you very much for your help, Martin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Power graph for two Proportion
Dear R mailing listers, I am try to find the different power calculation for p1=0.2 and p2=0.4 , with significant level=0.05 (one sided test) I would like to have a graph y -axis as a power and x-axis as a sample size . I run this command for different value of power. and get the n and power to draw a graph pwr.2p.test(h=ES.h(0.4,0.2), power = 0.87, sig.level=0.05,alternative=greater) Is there is any easy way i can do this in R. Thanks Suga [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Determination lag order - problem with daily data and AR / ARIMA
Hello, I am trying to determine a lag order for my data with the help of AIC and/ or BIC in order to conduct further tests. It is about prices measured at a daily frequency (weekends and holidays excluded). 1) My first approach was to approximate the process with an AR model using the function ar(x, ...) and a loop to try several lags and then determine the AIC and BIC values for each lag to determine the lowest one. However, when I try to use the BIC function or the AIC, setting k = log(length(time series)), it does not work. The error says that the model is of the class ar and AIC cannot work with that. [This is not the loop, but just the general problem when inserting an ar model into AIC] model=ar(price, aic = FALSE, method=ols) AIC(model, k = 2) Error in UseMethod(logLik) : no applicable method for 'logLik' applied to an object of class ar AIC(model, k = log(length(price_G))) Error in UseMethod(logLik) : no applicable method for 'logLik' applied to an object of class ar 2) Alternatively, I know that ar selects by default he lag order via the AIC criterion, but it suggests 40 lags, which appears quite high to me. Therefore, I wanted to check this result for robustness by applying BIC. But that doesn't work due to the problem explained above. 3) Another option was to use an ARIMA model with order = c(lags, 0, 0) and then determine the AIC and BIC values. That does generally work, but it calculates AICs and BICs of zero for every kind of lag. That doesn't make sense to me. So that is why I think I may have a problem in classifying my daily data. I just inserted the numeric vector for calculating the models. But how can I classify the daily data as a time series correctly? (Because weekends and holidays are missing.) I've tried the zoo package, but then I can't use the ARIMA function any more. Can anyone offer any help to make one of the three approaches work? Thanks in advance! -- View this message in context: http://r.789695.n4.nabble.com/Determination-lag-order-problem-with-daily-data-and-AR-ARIMA-tp4692195.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question on JAVA and R
As I discuss today we are trying to do with my partner Steven Penaloza and a small program of supervised classification of images with unconventional using R methods, connecting libraries and own code of R to an environment of JAVA, in our case we are using NETBEANS to interface, the connection was a success, performs the functions and classification, but purely procedural way, the problem occurs when we try to implement a small interface to variables called methods R you are saved in JAVA although the Double data eg can maneuver between a language and another, do not know how to do it with an image. TIFF, which is our case, the following are a few lines of code that opens a window buscardor type, in this case an image is opened. Tiff and assigned to the img variable, but is within R, is when for example we can not assign it to a button and is called the variable to be used in a class or event. code.addRCode (fileName - tclvalue (tkgetOpenFile ())); code.addRCode (img -brick (fileName)); Thank you very much * Como le comente el día de hoy estamos intentando realizar con mi compañero Steven Peñaloza y yo un pequeño programa de clasificación supervisada de imágenes con métodos no convencionales usando R, conectando las librerías y código propio de R a un ambiente de JAVA, en nuestro caso estamos usando NETBEANS para realizar la interfaz, la conexión ya fue un éxito, realiza las funciones y la clasificación, pero de manera netamente procedimental, el problema ocurre cuando intentamos implementarlo en una pequeña interfaz, para que las variables llamadas con métodos R sean guardadas en JAVA, aunque con datos tipo Double por ejemplo es posible maniobrarlo entre un lenguaje y otro, no sabemos como hacerlo con una imagen . TIFF, que es nuestro caso, las siguientes son unas lineas de codigo que abre una ventana tipo buscardor, en este caso se abre una imagen . Tiff y se le asigna a la variable img, pero queda dentro de R, es aquí cuando no podemos por ejemplo asignarlo a un botón y que sea llamado para ser usada la variable en una clase o evento. code.addRCode(fileName - tclvalue(tkgetOpenFile())); code.addRCode(img-brick(fileName)); Muchas gracias -- Luis Eduardo Castillo Méndez Matemático - Magíster en Ciencias Estadística Docente Asociado de Tiempo Completo Ingeniería Catastral y Geodesia - *** Este correo y cualquier archivo anexo son confidenciales y para uso exclusivo de la persona o entidad de destino. Esta comunicacion puede contener información protegida por el privilegio de cliente-abogado. Si usted ha recibido este correo por error, equivocacion u omision queda estrictamente prohibido la utilizacion, copia, reimpresion o reenvio del mismo. En tal caso, favor notificar en forma inmediata al remitente. Protejase de los fraudes por internet ! http://www.udistrital.edu.co/alertaSeguridad.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Any refit function available for 'car' package?
Suppose that I need to run a multivariate linear model Y = X B + E many times with the same model matrix X but each time with different response matrix Y. Is there a function available in 'car' package similar to refit() in lme4 package so that the model matrix X would not be reassembled each time? Also, runtime can be saved without repeatedly performing the same matrix computations such as (X'X)^(-1)X'. Thanks, Gang __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R, Rserve logging
I am running R on daemon mode Rserve (to connect to java). It spits out a constant flow output. Is there a way to turn it off or set it to rotate etc. In other words I want to find out how R, Rserve handles logging. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] glm.fit: fitted probabilities numerically 0 or 1 occurred for a continuous variable?
I have gotten the this error before: glm.fit: fitted probabilities numerically 0 or 1 occurred and the problem was usually solved by combining one or more categories were there were no observations. I am now having this error show up for a variable that is continuous (not categorical). What could be the cause of this for a continuous variable?? Thanks, Nick -- View this message in context: http://r.789695.n4.nabble.com/glm-fit-fitted-probabilities-numerically-0-or-1-occurred-for-a-continuous-variable-tp4692211.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glm.fit: fitted probabilities numerically 0 or 1 occurred for a continuous variable?
On Jun 16, 2014, at 2:34 PM, Nwinters nicholas.wint...@mail.mcgill.ca wrote: I have gotten the this error before: glm.fit: fitted probabilities numerically 0 or 1 occurred and the problem was usually solved by combining one or more categories were there were no observations. I am now having this error show up for a variable that is continuous (not categorical). What could be the cause of this for a continuous variable?? Thanks, Nick Presuming that this is logistic regression (family = binomial), the error is suggestive of complete or near complete separation in the association between your continuous IV and your binary response. This can occur if there is a breakpoint within the range of your IV where the dichotomous event is present on one side of the break and is absent on the other side of the break. The resolution for the problem will depend upon first confirming the etiology of it and then, within the context of subject matter expertise, making some decisions on how to proceed. If you Google logistic regression separation, you will get some resources that can be helpful. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prediction based on conditional logistic regression clogit
Thank you Peter. Any other suggestions are absolutely welcome!! John From: peter dalgaard pda...@gmail.com Cc: r-help@r-project.org r-help@r-project.org Sent: Monday, June 16, 2014 2:22 AM Subject: Re: [R] prediction based on conditional logistic regression clogit Hi, I am using clogit() from survival package to do conditional logistic regression. I also need to make prediction on an independent dataset to calculate predicted probability. Here is an example: dat - data.frame(set=rep(1:50,each=3), status=rep(c(1,0,0),50), x1=rnorm(150,5,1), x2=rnorm(150,7,1.5)) dat.test - data.frame(set=rep(1:30,each=3), status=rep(c(1,0,0),30), x1=rnorm(90,5,1), x2=rnorm(90,7,1.5)) fit-clogit(status~x1+x2+strata(set),dat) predict(fit,newdata=dat.test,type='expected') Error in Surv(rep(1, 150L), status) : Time and status are different lengths Can anyone suggest what's wrong here? The direct cause is that clogit() works by using the fact that the likelihood is equivalent to a coxph() likelihood with stratification and all observation lengths set to 1. Therefore the analysis is formally on Surv(rep(1, 150L), status) and that goes belly-up if you apply the same formula to a data set of different length. However, notice that there is no such thing as predict.clogit(), so you are attempting predict.coxph() on a purely formal Cox model. It is unclear to what extent predicted values, in the sense of coxph() are compatible with predictions in conditional logit models. I'm rusty on this, but I think what you want is something like m - model.matrix(~ x1 + x2 - 1, data=dat.test) pp - exp(m %*% coef(fit)) pps - ave(pp, dat.test$set, FUN=sum) pp/pps i.e. the conditional probability that an observation is a case given covariates and that there is on case in each set (in the data given, you have sets of three with one being a case, so all predicted probabilities are close to 0.33). For more general matched sets, I'm not really sure what one wants. Real experts are welcome to chime in. -pd Thanks! John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in merge [negative length vectors are not allowed]
Hi All, I'm trying to merge two files together using: combinedfiles - merge(comb1,comb2,by=c(Place,Stall,Menu)) comb1 is about 2 million + rows (158MB) and comb2 is about 600K+ rows (52MB). When I try to merge using the above syntax I get the error: Error in merge.data.frame(comb1, comb2, by = c(Place,Stall,Menu)) : negative length vectors are not allowed Is there is something that I'm doing wrong? I've merged larger files together in the past without a problem so am curious what might be the problem here... Thanks in advance! ~K __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prediction based on conditional logistic regression clogit
peter dalgaard pdalgd at gmail.com writes: On 16 Jun 2014, at 05:22 , array chip arrayprofile at yahoo.com wrote: Hi, I am using clogit() from survival package to do conditional logistic regression. I also need to make prediction on an independent dataset to calculate predicted probability. Here is an example: [snip] Can anyone suggest what's wrong here? The direct cause is that clogit() works by using the fact that the likelihood is equivalent to a coxph() likelihood with stratification and all observation lengths set to 1. Therefore the analysis is formally on Surv(rep(1, 150L), status) and that goes belly-up if you apply the same formula to a data set of different length. However, notice that there is no such thing as predict.clogit(), so you are attempting predict.coxph() on a purely formal Cox model. It is unclear to what extent predicted values, in the sense of coxph() are compatible with predictions in conditional logit models. I'm rusty on this, but I think what you want is something like m - model.matrix(~ x1 + x2 - 1, data=dat.test) pp - exp(m %*% coef(fit)) pps - ave(pp, dat.test$set, FUN=sum) pp/pps i.e. the conditional probability that an observation is a case given covariates and that there is on case in each set (in the data given, you have sets of three with one being a case, so all predicted probabilities are close to 0.33). For more general matched sets, I'm not really sure what one wants. Real experts are welcome to chime in. For the general situation of n cases in a stratum of size N, you want the probability that the unit in question is one of n units drawn from a stratum of size N without replacement with unequal probabilities of selection over the units. I am *not* an expert on that, but there is plenty written on it. Horvitz, Daniel G., and Donovan J. Thompson. A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association 47.260 (1952): 663-685. is a place to start. The probability in question is a sum over the factorial(n)*choose(N-1,n-1)) elements corresponding to the number of samples (and orders) that include a chosen element. Of course, for n=1 there is just the one element, pp/pps. HTH, Chuck __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error in merge [negative length vectors are not allowed]
On 16/06/2014 23:21, Kate Ignatius wrote: Hi All, I'm trying to merge two files together using: combinedfiles - merge(comb1,comb2,by=c(Place,Stall,Menu)) comb1 is about 2 million + rows (158MB) and comb2 is about 600K+ rows (52MB). When I try to merge using the above syntax I get the error: Error in merge.data.frame(comb1, comb2, by = c(Place,Stall,Menu)) : negative length vectors are not allowed Is there is something that I'm doing wrong? I've merged larger files Not telling us the the 'at a minimum' information asked for in the posting guide. together in the past without a problem so am curious what might be the problem here... Thanks in advance! ~K This is usually an indication that you are trying to create more than 2^31 rows in the result, which looks plausible given your data set sizes. AFAICS merge does not allow long vectors on a 64-bit system in released versions of R. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error: C stack usage
Hi, I am not relating this to shiny but this error was thrown by my shiny server code. Does it have anything to do with low memory settings ? I am spawning a JVM using rJava. That JVM is using the attach API to connect to another JVM Thanks, Mohan Error: C stack usage 140730070087404 is too close to the limit Error: C stack usage 140730070156700 is too close to the limit Warning: stack imbalance in '.Call', 59 then -1 Warning: stack imbalance in '{', 56 then -4 *** caught bus error *** address 0x100583fd8, cause 'non-existent physical address' *** caught bus error *** address 0x100583fd8, cause 'non-existent physical address' Traceback: 1: run(timeoutMs) 2: service(timeout) Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace Selection: run(10) *** caught segfault *** address 0x20057ea40, cause 'memory not mapped' Traceback: 1: run(timeoutMs) 2: service(timeout) Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.