Re: [R] Limiting the scope of RNGkind/set.seed
Elizabeth There is a package (of mine) setRNG on CRAN that may be a helpful example (code/tests/examples/vignette). Most of the package is testing designed to fail if the RNG in R is changed in a way that will affect my other package testing. Martin's function in the previous reply has most of the import parts and adds warning suppression, so you might want to consider small adjustments on a combination of the two. Just to summarize the issues, from memory: 0/ Using a preset default seed in a function's argument makes the function not random by default. If you are doing that then maybe you need to consider carefully whether the function should be using random number generation. 1/ It is good practice to use on.exit() in your function to reset things, so the state remains unaltered if your function fails. 2/ Saving the old seed does not work when it is unset, as it is by default in a new session, so you need to do something that insures it is set. 3/ You may need to save and reset not only the seed but also the RNG kind and the normal.kind, and possibly the kind for some other distributions. (setRNG does not handle other distributions.) It looks like you need to save and reset sample.kind. 4/ You should add the capability to pass all settings to your functions so that you can reproduce things when you want. 5/ I have found it useful to always pass back the settings in objects returned by functions like simulations. That way you always have a record when you discover something you want to reproduce. 6/ If parallel computing is considered then for reproducibility you need to save the number of nodes in the cluster. (I think this point is not as widely known as it should be.) No doubt I have forgotten a few things. Paul Gilbert On 4/17/19 6:00 AM, r-help-requ...@r-project.org wrote: > Date: Tue, 16 Apr 2019 19:22:34 +0200 > From: Martin Maechler > To: Elizabeth Purdom > Cc: Bert Gunter, R-help > > Subject: Re: [R] Limiting the scope of RNGkind/set.seed > Message-ID:<23734.3930.10744.126...@stat.math.ethz.ch> > Content-Type: text/plain; charset="utf-8" > >>>>>> Elizabeth Purdom >>>>>> on Tue, 16 Apr 2019 09:45:45 -0700 writes: > > Hi Bert, Thanks for your response. What you suggest is > > more or less the fix I suggested in my email (my second > > version of .rcolors). I writing more because I was > > wondering if there was a better way to work with RNG that > > would avoid doing that. It doesn’t feel very friendly for > > my package to be making changes to the user’s global > > environment, even though I am setting them back (and if it > > weren’t for the fact that setting the new R 3.6 argument > > `sample.kind=“Rounding”` creates a warning, I wouldn’t > > have even realized I was affecting the user’s settings, so > > it seems potentially hazardous that packages could be > > changing users settings without them being aware of > > it). So I was wondering if there was a way to more fully > > isolate the command. Thanks, Elizabeth > > Hi Elizabeth, > > there's actually something better -- I think -- that you can do: > > You store .Random.seed before doing an RNGkind() & set.seed() > setting, do all that, and make sure that .Random.seed is > restored when leaving your function. > > This works because the (typically quite long) .Random.seed > stores the full state of the RNG, i.e., all RNGkind() settings > *and* the result of set.seed() , calling r(n, ..) etc. > > If you additionally use on.exit() instead of manually reset > things, you have the additional advantage, that things are also > reset when your functions ends because the user interrupts its > computations, or an error happens, etc. > > So, your function would more elegantly (and robustly!) look like > > .rcolors <- function(seed = 23589) { > if(!exists(".Random.seed", envir = .GlobalEnv)) { > message("calling runif(1)"); runif(1) } > old.R.s <- .Random.seed > ## will reset everything on exiting this function: > on.exit(assign(".Random.seed", old.R.s, envir=.GlobalEnv)) > ## set seed for sample() "back compatibly": > suppressWarnings(RNGversion("3.5.0")) > set.seed(seed) > ## return random permutation of "my colors" > sample(colors()[-c(152:361)]) > } > > BTW, you can look at simulate() methods in standard R, e.g., > >stats:::simulate.lm > > to see the same method use [optionally, with slightly more sophistication] > > > Best, > Martin > > Martin Mächler > ETH Zurich, Switzerland __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random Seed Location
On 03/04/2018 07:14 PM, Henrik Bengtsson wrote: On Sun, Mar 4, 2018 at 3:23 PM, Duncan Murdochwrote: ... An issue is that .Random.seed doesn't contain the full state of the RNG system, so restoring it doesn't necessarily lead to an identical sequence of output. The only way to guarantee the sequence will repeat is to call set.seed(n), and that only leads to a tiny fraction of possible states. Expanding .Random.seed so that it does contain the full state would be a good idea, and that would make your preserveRandomSeed really easy to write. Here's a demo that .Random.seed is not enough: set.seed(123, normal.kind = "Box-Muller") rnorm(1) [1] -0.1613431 save <- .Random.seed rnorm(1) [1] 0.6706031 .Random.seed <- save rnorm(1) [1] -0.4194403 If .Random.seed were the complete state, the 2nd and 3rd rnorm results would be the same. To be pedantic, it is not the RNG state that is the problem, it is the state of the normal transformation "Box-Muller". And, again pedantic >So, this is is only for some of the RNG kinds. As I recall, it is not a problem for any RNG kinds, it is only a problem with the Box-Muller normal.kind. Things may have changed, and parallel adds the need to record number of nodes. Is the reason for this limitation that it is not possible for R (not even the R internals) to get hold of some of the RNG generators? In other words, it is unlikely to ever be fixed? It has been around for a very long time (since at least R 0.99) and, as I recall, it was not easy to fix. I think the more substantial reason is that Box-Muller is no longer the default or preferred normal generator. It may only be used for backward compatibility, in which case messing with it could be a disaster with very little potential benefit. Paul __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random Seed Location
On Mon, Feb 26, 2018 at 3:25 PM, Gary Black <gwblack...@sbcglobal.net> wrote: (Sorry to be a bit slow responding.) You have not supplied a complete example, which would be good in this case because what you are suggesting could be a serious bug in R or a package. Serious journals require reproducibility these days. For example, JSS is very clear on this point. To your question > My question simply is: should the location of the set.seed command matter, > provided that it is applied before any commands which involve randomness > (such as partitioning)? the answer is no, it should not matter. But the proviso is important. You can determine where things are messing up using something like set.seed(654321) zk <- RNGkind()# [1] "Mersenne-Twister" "Inversion" zk z <- runif(2) z set.seed(654321) # install.packages(c('caret', 'ggplot2', 'e1071')) library(caret) all(runif(2) == z) # should be true but it is not always set.seed(654321) library(ggplot2) all(runif(2) == z) # should be true set.seed(654321) library(e1071) all(runif(2) == z) # should be true all(RNGkind() == zk) # should be true On my computer package caret seems to sometimes, but not always, do something that advances or changes the RNG. So you will need to set the seed after that package is loaded if you want reproducibility. As Bill Dunlap points out, parallel can introduce much more complicated issues. If you are in fact using parallel then we really need a new thread with a better subject line, and the discussion will get much messier. The short answer is that, yes you should be able to get reproducible results with parallel computing. If you cannot then you are almost certainly doing something wrong. To publish you really must have reproducible results. In the example that Bill gave, I think the problem is that set.seed() only resets the seed in the main thread, the nodes continue to operate with unreset RNG. To demonstrate this to yourself you can do library(parallel) cl <- parallel::makeCluster(3) parallel::clusterCall(cl, function()set.seed(100)) parallel::clusterCall(cl, function()RNGkind()) parallel::clusterCall(cl, function()runif(2)) # similar result from all nodes # [1] 0.3077661 0.2576725 However, do *NOT* do that in real work. You will be getting the same RNG stream from each node. If you are using random numbers and parallel you need to read a lot more, and probably consider a variant of the "L'Ecuyer" generator or something designed for parallel computing. One special point I will mention because it does not seem to be widely appreciated: the number of nodes affects the random stream, so recording the number of compute nodes along with the RNG and seed information is important for reproducible results. This has the unfortunate consequence that an experiment cannot be reproduced on a larger cluster. (If anyone knows differently I would very much like to hear.) Paul Gilbert > Hi all, > > For some odd reason when running naïve bayes, k-NN, etc., I get slightly > different results (e.g., error rates, classification probabilities) from > run > to run even though I am using the same random seed. > > Nothing else (input-wise) is changing, but my results are somewhat > different > from run to run. The only randomness should be in the partitioning, and I > have set the seed before this point. > > My question simply is: should the location of the set.seed command matter, > provided that it is applied before any commands which involve randomness > (such as partitioning)? > > If you need to see the code, it is below: > > Thank you, > Gary > > > A. Separate the original (in-sample) data from the new (out-of-sample) > data. Set a random seed. > >> InvestTech <- as.data.frame(InvestTechRevised) >> outOfSample <- InvestTech[5001:nrow(InvestTech), ] >> InvestTech <- InvestTech[1:5000, ] >> set.seed(654321) > > B. Install and load the caret, ggplot2 and e1071 packages. > >> install.packages(“caret”) >> install.packages(“ggplot2”) >> install.packages(“e1071”) >> library(caret) >> library(ggplot2) >> library(e1071) > > C. Bin the predictor variables with approximately equal counts using > the cut_number function from the ggplot2 package. We will use 20 bins. > >> InvestTech[, 1] <- cut_number(InvestTech[, 1], n = 20) >> InvestTech[, 2] <- cut_number(InvestTech[, 2], n = 20) >> outOfSample[, 1] <- cut_number(outOfSample[, 1], n = 20) >> outOfSample[, 2] <- cut_number(outOfSample[, 2], n = 20) > > D. Partition the original (in-sample) data into 60% training and 40% > validation sets. > >> n <- nrow(InvestTech) >
Re: [R] Optim function returning always initial value for parameter to be optimized
On 02/10/2018 06:00 AM, r-help-requ...@r-project.org wrote: Did you check the gradient? I don't think so. It's zero, so of course you end up where you start. Try data.input= data.frame(state1 = (1:500), state2 = (201:700) ) err.th.scalar <- function(threshold, data){ state1 <- data$state1 state2 <- data$state2 op1l <- length(state1) op2l <- length(state2) op1.err <- sum(state1 <= threshold)/op1l op2.err <- sum(state2 >= threshold)/op2l I think this function is not smooth, and not even continuous. Gradient methods require differentiable (smooth) functions. A numerical approximation will be zero unless you are right near a jump point, so you are unlikely to move from your initial guess. Paul total.err <- (op1.err + op2.err) return(total.err) } soln <- optim(par = 300, fn=err.th.scalar, data = data.input, method = "BFGS") soln require("numDeriv") gtest <- grad(err.th.scalar, x=300, data = data.input) gtest On 2018-02-09 09:05 AM, BARLAS Marios 247554 wrote: data.input= data.frame(state1 = (1:500), state2 = (201:700) ) with data that partially overlap in terms of values. I want to minimize the assessment error of each state by using this function: err.th.scalar <- function(threshold, data){ state1 <- data$state1 state2 <- data$state2 op1l <- length(state1) op2l <- length(state2) op1.err <- sum(state1 <= threshold)/op1l op2.err <- sum(state2 >= threshold)/op2l total.err <- (op1.err + op2.err) return(total.err) } SO I'm trying to minimize the total error. This Total Error should be a U shape essentially. I'm using optim as follows: optim(par = 300, fn=err.th.scalar, data = data.input, method = "BFGS") Maybe develop an analytic gradient if it is very small, as the numeric approximation can then be zero even when the true gradient is not. JN __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] require vs library ( Some basic time series questions)
On 11/17/2016 04:49 PM, Jeff Newmiller wrote: require(tfplot) tfplot(x.ts) Would just like to point out that require() should not be treated as interchangeable with library(). The former returns a logical status indicating success or failure, while the latter throws an error if it falls. You should reserve use of require() for cases when you are implementing an alternative path of execution for failure, and in nearly all usual cases use the library() function instead so hapless users of your script don't have to sift through all the subsequent errors to figure out the problem. Mea culpa. Force of habit from usually writing with an alternative path of execution. In this example I should have used library(), especially since 'tfplot' may not have been installed on the user's system and so the library() error message would be more explicit than the warning from require(). But "in nearly all usual cases" seems a bit strong. It implies R users don't usually program much and thus do not implement an alternative path of execution for failure. (Some of us consider that the most usual case.) Paul __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Some basic time series questions
On 11/17/2016 06:00 AM, r-help-requ...@r-project.org wrote: Hi, As I sit and learn how to work with time series, I've run into a problem that is eluding a quick easy answer (and googling for answers seems to really slow the process...) Question #1-- In a simple example on R 3.3.1 (sorry my employer hasn't upgraded to 3.3.2 yet): x=rnorm(26,0,1) x.ts<-ts(x,start=c(2014,9),frequency=12) inputting x.ts at the prompt gives me a table with the rooms denoted by year and columns denoted by months and everything lines up wonderfully. Now my problem comes when I type at the prompt plot(x.ts) or plot(x.ts, xlab="") or plot.ts(x.ts,xlab="") I get a plot of the values, but my x-axis labels are 2015.0, 2015.5, 2016.0, and 2016.5 . January 2015 is coming out as 2015.0... Is there a way of getting a more intelligible x-axis labeling? Even 2015.1 for Janaury, etc. would work, or even getting an index (either Septemebr 2014 representing 0 or 1 and it incrementally increasing each month). There are several ways to do this. Having some bias, I would do require(tfplot) tfplot(x.ts) Question #2-- If I have a time series of decadal events, how best should I set the frequency. It is historical data, in the form of say AD 610-619 5 events, AD 620-629 7 events, etc. For anything other than annual, quarterly, and monthly data you probably should consider zoo. (There are other options, but I think zoo is the most widely used for some time now.) Guessing a bit how you think of this data, I would say you want the date index to be the first year of the decade. To illustrate, I can generate a hundred decades of random data and index it thus: require(zoo) x <- zoo(round(10 * runif(100)) , order.by= 10 * 61:160) Paul Sorry for such a basic questions. Any advice would be appreciated. Thanks in advance, MEH Mark E. Hall, PhD Assistant Field Manager Black Rock Field Office Winnemucca District Office 775-623-1529. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dealing with -Inf in a maximisation problem.
I am trying to deal with a maximisation problem in which it is possible for the objective function to (quite legitimately) return the value -Inf, (Just to add to the pedantic part of the discuss by those of us that do not qualify as younger and wiser:) Setting log(0) to -Inf is often convenient but really I think the log function is undefined at zero, so I would not refer to this as "legitimate". which causes the numerical optimisers that I have tried to fall over. In theory as well as practice. You need to have a function that is defined on the whole domain. The -Inf values arise from expressions of the form "a * log(b)", with b = 0. Under the *starting* values of the parameters, a must equal equal 0 whenever b = 0, so we can legitimately say that a * log(b) = 0 in This also is undefined and not "legitimate". I think there is no reason it should be equal zero. We tend to want to set it to the value we think of as the "limit": for a=0 the limit as b goes to zero would be zero, but the limit of a*(-inf) is -inf as a goes to zero. So, you really do need to avoid zero because your function is not defined there, or find a redefinition that works properly at zero. I think you have a solution from another post. Paul these circumstances. However as the maximisation algorithm searches over parameters it is possible for b to take the value 0 for values of a that are strictly positive. (The values of "a" do not change during this search, although they *do* change between "successive searches".) Clearly if one is *maximising* the objective then -Inf is not a value of particular interest, and we should be able to "move away". But the optimising function just stops. It is also clear that "moving away" is not a simple task; you can't estimate a gradient or Hessian at a point where the function value is -Inf. Can anyone suggest a way out of this dilemma, perhaps an optimiser that is equipped to cope with -Inf values in some sneaky way? Various ad hoc kludges spring to mind, but they all seem to be fraught with peril. I have tried changing the value returned by the objective function from "v" to exp(v) --- which maps -Inf to 0, which is nice and finite. However this seemed to flatten out the objective surface too much, and the search stalled at the 0 value, which is the antithesis of optimal. The problem arises in a context of applying the EM algorithm where the M-step cannot be carried out explicitly, whence numerical optimisation. I can give more detail if anyone thinks that it could be relevant. I would appreciate advice from younger and wiser heads! :-) cheers, Rolf Turner -- Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Optimization issue - Solution converges for any initial value
Narendra You should check the definition of your objective function, something seems to be wrong. When you evaluate the objective function with your initial guess, and then with some different value, you should expect the returned value will be different, but it is not: > print(Koval.Error.func(vp.kval, n=1), digits=20) [1] 2767.1357969742521163 > print(Koval.Error.func(2*vp.kval, n=1), digits=20) [1] 2767.1357969742521163 So the objective function is flat. Optim converges with the message "CONVERGENCE: NORM OF PROJECTED GRADIENT <= PGTOL". Of course, the gradient is zero because the objective function is flat. This would normally be considered a "user error" but perhaps John's newer code can catch it? Paul Date: Thu, 29 Sep 2016 14:53:08 -0500 From: Narendra ModiTo: "r-help@r-project.org" Subject: [R] Optimization issue - Solution converges for any initial value Message-ID:
Re: [R] how to transform db query result into a set of timeseries
There is a utility function TSquery() in package TSsql that attempts to do this. Most of the functions in that package are for databases with a specific layout intended for storing time series, but TSquery() attempts to build a series from a somewhat arbitrary database. It is hard to be completely generic and handle every possible database structure, so you might just examine the function for hints. I think it does not handle %H:%M:%S but the general logic should help. The main problem is that your query is not guaranteed to return data in time order. (You may be lucky if you loaded it that way, but it can change unexpectedly.) You can do the ordering with the xts() order.by argument but it is probably quicker to do it in the db so you need less manipulation of the data you get back. TSquery() uses ORDER BY in the sql query to ensure the order: q <- paste(q, " GROUP BY ", dates, " ORDER BY ", dates, " ;") If the query result is df then I think you can construct your series simply with zonnen <- xts( cbind(df$M. df$G, df$N), order.by = as.POSIXct( df$Date, format="%Y-%m-%d %H:%M:%S") ) There are several other details in the function that you may find useful. Paul Gilbert Date: Mon, 5 Sep 2016 22:28:50 +0200 From: Stef Mientki <stef.mien...@gmail.com> hello, I've a number of timeseries into a database and want to display these timeseries into graph. Now the code below works well, but as the user can select which timeseries should be shown (up to 20 timeseries) the code below should be dynamic and can be quiet large and complex. Is there an easier way to convert a database result into timeseries accepted by dygraph ? SQL <- "select Date, M, G, N from Compare_Model" df <- dbGetQuery ( con, statement = SQL ) zon1 <- xts ( df$M, as.POSIXct ( df$Date, format="%Y-%m-%d %H:%M:%S") ) zon2 <- xts ( df$G, as.POSIXct ( df$Date, format="%Y-%m-%d %H:%M:%S") ) zon3 <- xts ( df$N, as.POSIXct ( df$Date, format="%Y-%m-%d %H:%M:%S") ) zonnen <- Reduce ( function(...) merge(..., all=TRUE ), list ( zon, zon2, zon3 )) dygraph ( zonnen ) thanks, Stef __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] FW: Multivariate ARIMA
See also package dse. There are examples in the guide: http://cran.at.r-project.org/web/packages/dse/vignettes/Guide.pdf Paul On 02/14/2016 06:00 AM, r-help-requ...@r-project.org wrote: Date: Fri, 12 Feb 2016 18:12:37 + From: Thomas LofaroTo:"r-help@R-project.org" Subject: [R] FW: Multivariate ARIMA Question Message-ID: Content-Type: text/plain; charset="UTF-8" Hi, sorry, I will try to make this short. Essentially, what I am trying to do is run a multivariate ARIMA model predicting one variable with another. However, the only R packages I have found to accommodate such an analysis are ARIMAX and VAR. Unfortunately, there don?t seem to be any good tutorials or demonstrations of how to actually perform this analysis in practice, on my two columns of data. I was wondering if anyone knew of a resource that might help (or one that provided an example of r code to accomplish this). Thanks sooo much! __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Estimating MA parameters through arima or through package "dlm"
(Sorry for the delay in responding to this.) On 01/05/2016 06:00 AM, r-help-requ...@r-project.org wrote: Date: Mon, 4 Jan 2016 11:31:22 -0500 From: Mark Leeds<marklee...@gmail.com> To: Stefano Sofia<stefano.so...@regione.marche.it> Cc:"r-help@r-project.org" <r-help@r-project.org> Subject: Re: [R] Estimating MA parameters through arima or through package "dlm" Message-ID: <cahz+bwyxanjp4ksu2szmbfrlakh8_aswn07s_3ekyjeywua...@mail.gmail.com> Content-Type: text/plain; charset="UTF-8" Hi: I don't have time to look at the details of what you're doing but the "equivalence" between state space and arima ( as paul gilbert pointed out a few weeks ago ) is not a true equivalence. To state this a bit more precisely, there is a true equivalence between the input-output equivalence classes of (linear, time invariant) state-space models with state dimension n and the input-output equivalence classes of (linear, time invariant) ARMA models with McMillan degree n. (In fact, the quotient spaces are diffeomorphic not just isomorphic.) This means you should be able to get exactly comparable results for anything that is an equivalence class invariant, including residuals and anything calculated from residuals, such as likelihood. Model roots and thus stability are also invariants, but you probably will not get comparable results for most other things involving parameters. This is not just a "statistical equivalence" as is sometimes suggested, it is an algebraic equivalence between quotient spaces. However, if you estimate a state-space model and estimate an ARMA model, there are several other things that come into play related to estimation. Comparing the two estimated models you are unlikely to find comparable results. (Typically ARMA estimation is more robust at finding the best in my experience.) Even with simulation testing of estimation starting with "true" models it is problematic to get estimated models that are equivalent. If you really want to see the equivalence you need to do a conversion of a model from one form to the other. I cannot speak to dlm, but dse was built for studying this equivalence and the users' guide has examples. If you are really interested in this topic, I recommend section 5 of http://www.bankofcanada.ca/1993/03/working-paper-199/, I realize this is getting a bit old, but if you find a more up-to-date summary I would like to hear about it. That paper also has a demonstration that the equivalent models give results that are comparable within numerical precision of computers, not just to some statistical significance. if you are in an area of the parameter space that the state space formulation can't reach, then you won't get the same parameter estimates. so, what you're doing might be okay or might not be, depending on whether the state space formulation can reach that area of the parameter space. "Can't reach" is an estimation problem. This is typically a more serious problem with state-space models. The quotient spaces are diffeomorphic so in theory it should be possible to reach the same solution if you properly account for the fact that you are estimating on a smooth manifold and not a vector space. In practice you have also to worry about twisting of the parameter space and finite time for estimation, and gradients that may converge toward zero near boundaries of the manifold's charts. there's another state space formulation that is truly equivalent which is called the SSOE formulation or innovations representation but I don't know if you want to get into that. google "SSOE state space" if you're interested. The quotient space of input-output equivalence classes for innovations form models is equivalent to the quotient space of input-output equivalence classes for non-innovations form models. You need more information to identify a non-innovations form, typically some physical understanding of the system. On the bases of only input-output data, and with no additional understanding of the physical system, there would be no reason to choose a non-innovations form for estimation. There is more discussion of this in the above mentioned summary and in the dse user's guide. BTW, estimation problems tend to be much more severe with multivariate series than with univariate series. This is not just because of the usual issues. Especially in state-space representations, the twisting of the parameter space seems to be especially bad. Paul Mark On Mon, Jan 4, 2016 at 9:25 AM, Stefano Sofia < stefano.so...@regione.marche.it> wrote: >Dear list users, >I want to use apply a MA(2) process (x=beta1*epsilon_(t-1) + >beta2*epsilon_(t-1) + epsilon_(t)) to a given time series (x), and I want >to estimate the two parameters beta1, beta2 and the variance of the random >variable epsilon_(t). >
Re: [R] Updating a Time Series After Forecast()
Lorenzo Berend's suggestion may be the simplest if you have univariate ts style series. If you are writing code and you want it to work with multivariate series and other time representations then you might want to consider the splice() function in package tframe. library(tfplot) ts3 <- splice(ts2, pred2$mean) (and tframePlus if you need zoo and other support.) Regards, Paul On 01/16/2016 06:00 AM, r-help-requ...@r-project.org wrote: Date: Fri, 15 Jan 2016 13:02:58 +0100 From: Berend HasselmanTo: Lorenzo Isella Cc:"r-help@r-project.org" Subject: Re: [R] Updating a Time Series After Forecast() Message-ID: Content-Type: text/plain; charset=us-ascii >On 14 Jan 2016, at 22:36, Lorenzo Isella wrote: > >Dear All, >Perhaps I am drowning in a cup of water, since I am positive that the >answer will be a one-liner. >Consider the following short script > > > >library(forecast) > >ts2<-structure(c(339130, 356462, 363234, 378179, 367864, 378337, 392157, >402153, 376361, 392204, 403483, 414034, 391967, 406067, 419464, >434913, 410102, 424795, 437073, 448827, 415569, 430561, 444719, >455764, 419892, 444190, 454648, 466312, 439922, 448963, 465153, >475621, 445502, 457198, 473573, 485764, 463895, 470274, 484390, >490678, 478003, 483570, 499141, 509216, 481395, 492345, 511184, >513420, 483757, 490884, 514966, 515457, 497614, 510139, 523467, >526406, 499784, 519033, 532009, 531260, 521539, 532590, 553118, >557725, 548321, 556832, 578087, 578120, 566116, 580571, 587993, >569985, 534326, 539641, 564824, 568445, 558614, 570192, 594584, >598305, 593769, 598278, 620147, 615884, 611033, 609304, 630458, >624325, 614356, 627192, 649324, 645988, 642965, 645125, 669471, >665529, 664248, 669670, 694719), na.action = structure(1:64, class = >"omit"), .Tsp = c(1991, >2015.5, 4), class = "ts") > >fit2 <- auto.arima(ts2, approximation=FALSE,trace=FALSE) > >pred2 <- forecast(fit2, h=2) > >### > >So, I have an original quarterly time series ts2 and a forecast for 2 >quarters pred2. > >I would like to combine ts2 and pred2 (just the prediction) into a new >time series (in other words, just stretch a bit ts2). >How can I do that? A possible way is this ts3 <- ts(c(ts2,pred2$mean),start=start(ts2),frequency=frequency(ts2)) Most likely there are more ways of getting what you want. Berend >Many thanks > >Lorenzo > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matrix which results singular but at the same time positive definite
Stefano I think in other response to in this thread you got the answer to the question you asked, but you may be asking the wrong question. I'm not familiar with the specific papers you mention and you have not provided enough detail about what you are doing, so I am guessing a bit. The term "dynamic linear model" can refer to both linear ARMA/ARIMA models and to linear state-space models, however some authors use it to refer exclusively to state-space models and from your phrasing I am guessing you are doing that. There would be many state-space models equivalent to a given ARMA/ARIMA model, but without specifying structural aspects of the system you will likely be using one of the innovations form state-space models that are equivalent. In an innovations form state-space model the state is defined as an expectation. From a practical point of view, this is one of the most important differences between an innovation form and a non-innovations form state-space model. Since the expectation is known exactly, the state-tracking error is zero. That means the covariance matrix from the filter or smoother should be a zero matrix, which you should not be trying to invert. In a non-innovations form the state has a physical interpretation rather than being an expectation, so the state tracking error should not be degenerate in that case. I mention all this because your covariance matrix looks very close to zero. Paul Gilbert On 12/11/2015 06:00 AM, r-help-requ...@r-project.org wrote: Dear John, thank you for your considerations. This matrix (which is a variance matrix) is part of an algorithm for forward-filtering and backward-sampling of Dynamic Linear Models (West and Harrison, 1997), applied to DLM representation of ARIMA processes (Petris, Petrone, Campagnoli). It is therefore very difficult to explain why this variance matrix becomes so ill conditioned. This already happens at the first iteration of the algorithm. I will try to work on initial conditions and some fixed parameters. Thank you again Stefano __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] package submission
I was similarly bitten a couple of days ago when I submitted an update of my deldir package. I'm*pretty* sure that I completed all the requisite steps in the steps in the web-based submission procedure, but I never received the promised confirmation email. Being a thicko, I overlooked this fact, and started to wonder why the update never appeared on CRAN. Eventually I got my act together, went over the instructions and re-submitted the package. This time I got the confirmation email, and the update duly appeared on CRAN. Maybe there's a buglet in the software that effects the automated submission procedure. I was similarly bitten but, paying closer attention on the second attempt, I suspect I thought I was finished at successfully uploaded and I did not go all the way to the very bottom to see that I then had to click on submit to get the successfully submitted page. (Text of the messages from memory, so may not be exact.) Once successfully submitted, the email asking for confirmation is almost instantaneous. For users like me that push the boundaries of discovering possible user errors, it would be helpful if the first message said successfully uploaded - now read this page and submit at the bottom. Paul I would have said it's more likely that I fumble-fingered something in my initial submission attempt. However, hearing that someone else has had a similar experience makes me a wee bit more suspicious that there's a bug. Could be tricky to track down but, given that if there is a bug its effect is intermittent and rare. cheers, Rolf Turner P. S. On my second try, the confirmation email arrived essentially instantaneously. So I would advise package submitters: if you don't get the confirmation email*right away* then something has very likely gone wrong. Check the siteftp://CRAN.R-project.org/incoming/ as advised and if your package ain't there, try again. R. T. On 29/11/14 09:11, Henrik Bengtsson wrote: Fromhttp://cran.r-project.org/web/packages/policies.html: When submitting a package to CRAN you should use the submission form athttp://CRAN.R-project.org/submit.html (and not send an email). You will be sent a confirmation email which needs to be accepted.(*) ... In either case, you can check that the submission was received by looking atftp://CRAN.R-project.org/incoming/.; (*) That confirmation email is sent automatically and instantaneously. It confirms you as a maintainer and the submission (not the acceptance). /Henrik On Fri, Nov 28, 2014 at 6:35 AM, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 28/11/2014 9:16 AM, Dan Vatnik wrote: Hi, I am the maintainer of the traj package. I have added a vignette to the package this week(Wednesday) and uploaded the new tarball to CRAN. I have not yet received a confirmation e-mail.I do not know if the e-mail is sent automatically or if a person needs to approve manually. I am just wondering if I did everything correctly or if there is a mistake in my uploading procedure. The CRAN people make the decisions manually, so it may just be that nobody has got to it yet.If you don't hear from them in a week, you might consider writing to the submission email address, but I wouldn't do it sooner than that. -- Rolf Turner Technical Editor ANZJS __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RODBC and PosgreSQL problems
I'm not sure if this was due to mailer wrap, but I think you are putting a return between the end of the table name and the quote mark, and possibly that or a space is being included in the table name in the sqlQuery() statement. Do you have the same problem if you put the quote mark immediately after the table name, or if you put a ; after the table name? (BTW, you will probably get better responses to this sort of question on the r-sig-db mailing list.) Paul Message: 23 Date: Fri, 30 May 2014 18:00:06 + From: Fraser D. Neimanfnei...@monticello.org To:r-help@r-project.org r-help@r-project.org Subject: [R] RODBC and PosgreSQL problems Message-ID: 2176ad174d58cb4abbda99f3458c20171be55...@granger.monticello.org Content-Type: text/plain Dear All, I am trying for the first time to run SQL queries against a remote PostgreSQL database via RODBC. I am able to establish a connection just fine, as shown by getting results back from the sqlTables(), sqlColumns() and sqlPrimary Key() functions in RODBC. However, when I try to run a SQL query using the sqlQuery() function I get [1] 42P01 7 ERROR: relation \tblceramicware\ does not exist;\nError while executing the query [2] [RODBC] ERROR: Could not SQLExecDirect '\n SELECT * \n FROM tblCeramicWare What am I doing wrong? Here are the relevant snips from the R console. What's puzzling is that tblcermicWare is recognized as an argument to sqlColumns() and sqlPrimaryKey() . But NOT in sqlQuery() . Thanks for any pointers. best, Fraser library(RODBC) # connect to DAACS and assign a name (DAACSch) to the connection DRCch - odbcConnect(postgreSQL35W , case= nochange, uid =XX,pwd=XX); #list the tables that are avalailabale sqlTables(DRCch, tableType = TABLE) TABLE_QUALIFIER TABLE_OWNER TABLE_NAME TABLE_TYPE REMARKS 1 daacs-production public TempSTPTable TABLE 2 daacs-production public activities TABLE 3 daacs-production public articles TABLE 4 daacs-production public schema_migrations TABLE 5 daacs-production public tblACDistance TABLE 6 daacs-production public tblArtifactBox TABLE 7 daacs-production public tblArtifactImage TABLE 8 daacs-production public tblBasicColor TABLE 9 daacs-production public tblBead TABLE sqlColumns(DRCch, tblCeramicWare) TABLE_QUALIFIER TABLE_OWNER TABLE_NAME COLUMN_NAME DATA_TYPE TYPE_NAME PRECISION LENGTH SCALE RADIX NULLABLE 1 daacs-production public tblCeramicWare WareID 4 int410 4 0100 2 daacs-production public tblCeramicWare Ware-9 varchar50100NANA1 REMARKS COLUMN_DEF SQL_DATA_TYPE SQL_DATETIME_SUB CHAR_OCTET_LENGTH ORDINAL_POSITION 1 nextval('global_id_seq'::regclass) 4 NA -11 2 NA -9 NA 1002 IS_NULLABLE DISPLAY_SIZE FIELD_TYPE AUTO_INCREMENT PHYSICAL NUMBER TABLE OID BASE TYPEID TYPMOD 1NA 11 23 1 1 27441 0 -1 2NA 50 1043 0 2 27441 0 50 sqlPrimaryKeys(DRCch, tblCeramicWare) TABLE_QUALIFIER TABLE_OWNER TABLE_NAME COLUMN_NAME KEY_SEQ PK_NAME 1 daacs-production public tblCeramicWare WareID 1 tblCeramicWare_pkey sqlQuery(DRCch,paste( + SELECT * + FROM tblCeramicWare + )) [1] 42P01 7 ERROR: relation \tblceramicware\ does not exist;\nError while executing the query [2] [RODBC] ERROR: Could not SQLExecDirect '\n SELECT * \n FROM tblCeramicWare \n ' Fraser D. Neiman Department of Archaeology, Monticello (434) 984 9812 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem connecting to database via RPostgreSQL/RS-DBI: could not connect error
A post with subject Basic R to SQL considerations has been put on R-sig-DB to address some of the basic questions here. The R-sig-DB thread is not specific to RPostgreSQL. I have learned enough since this posting to be aware that my question was based on assumptions so false to fact that they verge on the nonsensical, but not enough to reframe it more sensibly. So I am just going to say what I believed then and what I believe now, and perhaps you can head off the more extreme excursions from reality in my current beliefs. Because I knew that libpq depended on the version of PostgreSQL that was installed, and that it used to be the case that you could not just install the package, but it is now possible because of the addition of libpq, I concluded that libpq contained an installer for PostgreSQL. I now think that this is false, and that libpq is some sort of tailored interface that varies with the OS and the PostgreSQL version. Tailored where and by what, I still don't know, but maybe I do not need to know. Since I thought I had to use the version of PostgreSQL installed by libpq, I did not try to independently install PostgreSQL. Now I think that PostgreSQL has to be installed and working before you start to install RPostgreSQL, in order, among other things, to get the right libpq contents or settings. I spent most of a day trying to figure out how to do what I was thinking of as a strictly local installation of PostgreSQL, meaning one that did not connect to the internet via a port and TCP/IP. I now believe that there is no such thing, at least on a Windows machine, or that doing so would be complex and difficult, compared to a connection via a machine-internal TCP/IP connection -- something I did not know existed until the day before yesterday. For the last few days I have been trying, so far unsuccessfully, to install PostgreSQL from the (misnamed) one click installer from enterprisedb.com. I have figured out that the installation and error logs are written to stderr, and where my OS puts it. I've gone through the 40+ pages of log. Although the log contains more than 2 dozen warnings and errors, I now believe that all of the ones that matter derive in some way from this one: Executing batch file 'rad3BBD8.bat'... 'icacls' is not recognized as an internal or external command, operable program or batch file. . . . and that this means I am having some sort of problem with the Windows XP/NT access control/permissions system. Unfortunately I know nothing whatsoever about the Windows permission system. Learning about it is the next thing on my list. I am really quite good at framing feasible research agendas to address important policy questions on a shoestring. I'm good with regression, with displaying complicated argument in sensible graphics. I am new to using databases on my own behalf (rather than having someone to do it for me), but I am finding the query side of databases elegant and intuitive. But I am not a competent IT/sysadmin/tech support person, and I am not going to be one soon, or maybe ever. So I find it somewhat disheartening that I am spending so much time being just that. Still, I am very grateful for the help and support of this list and the various Stack Exchange lists, which have made possible such learning as I have been able to accomplish. Like the glaciers, my progress is slow but inexorable. Or so I like to believe. Warmest regards, andrewH __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting multiple time series with variables having different units
I'm not sure if I understand exactly what you mean, but if you want separate panels, one above the other, with a common time span but each with their own Y scale, then the function tfplot() in package tfplot does this. There are examples in the User Guide at http://cran.at.r-project.org/web/packages/tfplot/vignettes/Guide.pdf. Of course, if you want fine control over the plot then you are better off with one of the graphics packages, as others have suggested. Those would give you complete access to the underlying details. The objective of tfplot() is to quickly give you something that is fairly good. I work mostly with economic data, where the different scales is important, but I rarely work with daily data so your mileage may vary. Please let me know if you discover problems. (Sorry to come to this discussion a bit late, but I did not want to point you to a version I was about to replace. A new version (2014.2-1) is now on CRAN and will be on the mirrors shortly.) Paul On 02/03/2014 06:00 AM, r-help-requ...@r-project.org wrote: Date: Sun, 02 Feb 2014 14:09:38 -0500 From: David Parkhurstparkh...@imap.iu.edu To:r-help@r-project.org Subject: [R] Plotting multiple time series with variables having different units Message-ID:52ee97f2.3000...@imap.iu.edu Content-Type: text/plain; charset=ISO-8859-1; format=flowed I've tried to figure out how to do this from what I read, but haven't been successful. Suppose I have a dataframe with variables Date, X, and Y (and maybe U, V, and Z) where X, Y, etc. have different units. I'd like to plot Y vs. Time above X vs. Time, above one another. For example, X is the number of gulls counted on a reservoir, and Y is the number of coliform bacteria counted on a petri plate from a water sample leaving the reservoir, so these have very different ranges. U and V might be numbers of geese and numbers of ducks counted on the same days. What commands would I use to create such a set of plots? Thank you for any help. David __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Package dependencies in building R packages
The responses to this seem to be assuming that you want users to have access to your tt() function, that is, you export it. Just in case the really simple case has been overlooked: if you are only using this function internally in your package there should be no problem. Your package's namespace ensures that your package's functions find your tt(), and if you do not export it then user references to tt() will not find your version. As others have pointed out, if you want users to have access to your tt() and they also need access to survival's tt(), then there is a problem. You either need to change the name or users will need to be explicit about which one they want. (BTW - this is a question that could be asked on the R-devel list.) Paul On 13-12-31 06:00 AM, r-help-requ...@r-project.org wrote: Date: Mon, 30 Dec 2013 20:01:46 +0100 From: Axel Urbizaxel.ur...@gmail.com To: Duncan Murdochmurdoch.dun...@gmail.com Cc:R-help@r-project.org R-help@r-project.org Subject: Re: [R] Package dependencies in building R packages Message-ID: CAAyVsXKCN1Zn4gbG=QkD759kaDadPbgp+9e3W6nPvOLVwAsU=w...@mail.gmail.com Content-Type: text/plain Thanks for your kind response Duncan. To be more specific, I'm using the function mvrnorm from MASS. The issue is that MASS depends on survival and I have a function in my package named tt() which conflicts with a function in survival of the same name. I can think of 2 alternatives solutions to my problem, but I'm to an expert: 1) Copy mvrnorm into my package, which I thought was not a good idea 2) Rename my tt() function to something else in my package, but this is painful as I have it all over the place in other functions. Any suggestions would be much appreciated. Best, Axel. On Mon, Dec 30, 2013 at 7:51 PM, Duncan Murdochmurdoch.dun...@gmail.comwrote: On 13-12-30 1:24 PM, Axel Urbiz wrote: Dear users, My package {foo} depends on a function miscFUN which is on package {foo_depend}. This last package also depends on other packages, say {A, B, C}, but miscFUN is not dependent on A, B, C (only on foo_depend). In my package {foo}, is there a way to only have it depend on the function miscFUN from {foo_depend} without having the user to have installed A, B, C? (as none of those packages are needed for my package to work properly). Also, is this a best practice? There's no way for your package to tell R to ignore the dependencies declared by foo_depend. If you really only need one function from that package, simply copy the source of miscFUN into your package (assuming foo_depend's license permits that). But this is not best practice, unless that function is very simple. Best practice is to declare your dependence by importing that function from foo_depend. Duncan Murdoch [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Should there be an R-beginners list?
On 13-11-25 06:00 AM, r-help-requ...@r-project.org wrote: Date: Sun, 24 Nov 2013 13:04:43 -0600 From: Yihui Xiex...@yihui.name To: Bert Guntergunter.ber...@gene.com Cc:r-help@r-project.org r-help@r-project.org Subject: Re: [R] Should there be an R-beginners list? Message-ID: CANROs4d=usu3ofqmtxbugv9v4_t9r+c3m0s15lrdcfpdj1q...@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1 I'm not aware of a discussion on this, but I would say no. Just for the record, there was a discussion of this in a thread called newbie list in August 2001 when R-help started getting busy: http://tolstoy.newcastle.edu.au/R/help/01c/0880.html Predates StackOverflow I think, but several of the comments may still be valid. Paul Fragmentation is bad. Further fragmentation is worse. TL;DR = Actually I'd say all mailing lists except r-devel should be moving to StackOverlow in the future (disclaimer: I'm not affiliated with it). Mailing lists are good for a smaller group of people, and especially good when more focused on discussions on development (including bug reports). The better place for questions is a web forum. Both you and I have been staying in these R mailing lists for a few years now. You can recall how many times a user was asked to post to another mailing list (this is not an appropriate list to ask your question; please post to r-such-and-such instead), how many times you see something like Alternative HTML removed, how many times you see a post Bla Bla (was: Foo Bar), and how many times users were reminded Please read the posting guide, Please do read, and PLEASE do read. But it just does not help much even if you write PLEASE DO READ. Why do we have such problems in the mailing lists again and again? Is that simply because users are not respecting the rules? I do not think so. I believe that is the flaw of mailing lists. A mailing list is managed by a small team (hey, Martin, thank you). On StackOverflow, you simply edit the tags of a post to make it belong to a new mailing list (you can post with tags r+ubuntu+graphics, or r+lattice, etc). There is no need to request and wait for the system admin to make a decision. Users can help themselves, and help others as well. HTML can be good in many cases, actually. Who hates syntax highlighting and R plots in an R question? You are free to ask a question that is poorly formatted, and there are good chances that it will be immediately edited by another experienced user. You are free to yell in the comments asking for more details before posting a formal answer. You can express ah, this is a bad question by down-voting so that future readers know that guy screwed up and we just let the world ignore the noise. It is like peer-review, and the reviewers can help you improve your post. In a mailing list, when you are done, you are done. You are forever written in history, right or wrong, smart or stupid. You want to delete your record in the history? No, no, gentleman, it was your fault not reading the post guide. For me, I understand all the rationale behind the mailing list model. I'm just saying, the primary goal for such a service is to discuss issues about R, instead of issues induced by the mailing list itself. We could have made some issues not directly related to R go away by community efforts instead of giving instructions a million times, given an appropriate platform. Five years, 42,000 posts:http://stackoverflow.com/questions/tagged/r I'm not terribly worried about transition from mailing lists to SO. Sorry about the generalization of the original topic, but I hate using a new title Should there be R mailing lists? (was: Should there be an R-beginners list?) Last but not least, I probably need to clarify that I benefited a lot from the mailing lists in the past, and I truly appreciate it. I wrote this with the future in mind, not the past. The past was good, and the future can be better. Regards, Yihui -- Yihui Xiexieyi...@gmail.com Web:http://yihui.name Department of Statistics, Iowa State University 2215 Snedecor Hall, Ames, IA On Sun, Nov 24, 2013 at 11:13 AM, Bert Guntergunter.ber...@gene.com wrote: Folks: If this has been previously discussed and settled, please say so and refer me to the discussion. If you believe this to be inappropriate or otherwise frivolous, also please say so, as I do not wish to waste your time or this space. I write as a long time reader and sometimes contributor to r-help. Due to R's growth in usage by a broad data analysis community (engineers, scientists, social scientists, finance, informaticians, as well as more traditional statisticians), this list seems to me to becoming deluged by requests for help by casual users and students for whom R is not going to be regularly or extensively used. I would characterize this group as having only basic statistical, programming, and data analysis skills. This is not meant as a criticism, and there are certainly many for whom this is inaccurate. But
Re: [R] Am I working with regularly spaced time series?
On 13-10-22 06:00 AM, Weiwu Zhang zhangwe...@realss.com wrote: My data is sampled once per minute. At the same second each minute or not? Regularly spaced would mean exactly one minute between observations. There are invalid samples, leaving a lot of holes in the samples, successful sample is around 80% of all minutes in a day. and during the last 4 months sampling, one month's data was stored on a harddisk that failed, leaving a month's gap in between. This is called missing observations. With regular spacing you need to fill in the holes with NA. With irregular spacing you can either drop the missing observations or, if you know the time at which they were missed, you could fill in with NA. So am I working with regularly spaced time series or not? Should I padd all missing data with NAs, and start with ts(), and followed by forecast package (which seems to have all the functions I need in the begining) or should I start with a library with irregular time series in mind? Also, ts() manual didn't say how to create time-series with one minute as daltat. Its seems to assume time-series is about dates. So the data I have with me, is it really time series at all? ts() representations works best with regularly spaced monthly, quarterly, or annual data. You can use it for other things if they fit nicely into the regular spaced observations with a frequency of observation, such as 12 times per year or 60 times per hour. This usually only makes sense if the frequency has something to do with your problem, like seasonality questions. You can also use frequency 1 for one observation per period, like annual data, which in your case would be once per minute. I'm inclined to think that a zoo (see package zoo) represenation would fit your problem better. HTH, Paul Newbie question indeed. Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] creation of an high frequency series
(from R-help Digest, Vol 117, Issue 26) Date: Sun, 25 Nov 2012 07:49:05 -0800 (PST) From: billycorgcandi...@gmail.com To:r-help@r-project.org Subject: [R] creation of an high frequency series Message-ID:1353858545067-4650744.p...@n4.nabble.com Content-Type: text/plain; charset=us-ascii Hi R Users! I would like to create an high frequency series but I am experiencing some difficulties. My series should start at 09.30 a.m. each day and end at 16.00 for, let's say, 2 years. I don't care on how many observations are for each day. It's ok also one observation each minute. In this case, I would have 390 observations each day. I have tried the following: start - ISOdatetime(year=2001, month=1, day=1, hour=9, min=0, sec=0, tz=GMT) end - ISOdatetime(year=2001, month=1, day=1, hour=16, min=0, sec=0, tz=GMT) z - zooreg(NA, start = start, end = end, frequency=390) For this problem z - zooreg(rep(NA,390), start = start, end = end, frequency=390) works, however, if your real problem is that you have time stamps, possibly not equally spaced, then you should consider zoo() with the order.by argument, for example: z - zoo(rep(NA,390), order.by = start + 30 * 0:389) If your data source actually provides the time stamps then this should be very easy. Note that neither of these do consistency checking between the length of the series and the specified time frame, that is up to you to do. You might get warning messages later indicating that something is not right if you made a mistake. Your zooreg() above seems to just ignore the inconsistency whereas I think zoo() with order.by recycles the data. Paul but the result is a poor, single NA. And I am not considering multiple days.. How could I solve this problem?? Thank you so much!! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] hessian fails for box-constrained problems when close to boundary?
Matthieu (regarding the final question, but not the ones before that.) The short answer is that you seem to have a solution, since hessian in numDeriv works. The long answer follows. Outside a constraint area one cannot assume that a function will evaluate properly. Some constraints are imposed because the function is not defined outside the region. This would mean that the gradient and hessian are not defined in the usual sense at the boundary; the left and right derivatives should exist and be the same in the limit. I have had on my to-do list for some time a project to add one-sided approximates in numDeriv, but that will probably not happen soon. If your solution is actually on the boundary, you might consider some substitution which removes the constraint, and consider the hessian on the reduced, unconstrained, space. In your situation I would also be concerned about the optimum being very close, but not on the boundary. You might want to check that this is not an artifact of the convergence criteria. Check that the objective is actually inferior at a point on the boundary close to your optimum. Also, the gradient should be zero at the optimum, but that may not be exactly so because of numerical convergence criteria. If the gradient is pointing toward the boundary, that is a sign that you may just have converged too soon. Also, the hessian coming from the optimization is built up as an approximation using information from the steps of the optimization. I am not familiar enough with the numerical handling of boundary constraints to know how this approximation from the optimization might be affected, but you may need to be careful in this regard. Paul On 12-11-15 10:25 AM, Matthieu Stigler wrote: Hi I am trying to recover the hessian of a problem optimised with box-constraints. The problem is that in some cases, my estimates are very close to the boundary, which will make optim(..., hessian=TRUE) or optimHessian() fail, as they do not follow the box-constraints, and hence estimate the function in the unfeasible parameter space. As a simple example (my problem is more complex though, simple param transformations do not apply ashere), imagine estimating mu and sigma (restricted to be 0) of a simple normally distributed data, where however the true sigma is very close to zero: LLNorm - function(para, dat=rn) return(-sum(dnorm(dat, para[1], para[2], log=TRUE))) rn2 - c(rep(10.3, 2000), 10.31) optim(c(10,1), fn=LLNorm, method=L-BFGS-B, lower=c(-Inf, 0.001), dat=rn2,hessian=TRUE) Error in optim(c(10, 1), fn = LLNorm, method = L-BFGS-B, lower = c(-Inf, : non-finite finite-difference value [2] The only solution/workaround I found is to do a two steps procedure: use optim() without hessian, then use optimHess, specifying the length of the numerical variations (arg ndeps) as smaller as the distance between the parameter and the bound, i.e.: op-optim(c(10,1), fn=LLNorm, method=L-BFGS-B, lower=c(-Inf, 0.001), dat=rn2,hessian=FALSE) H-optimHess(op$par, LLNorm, dat=rn2, control=list(ndeps=rep((op$par[2]-0)/1000,2))) While this solution works, it is surely not elegant, and I have following questions: 1) is this solution to use smaller steps good at all? 2) I am not sure either I understand the reasons why the hessian in a constrained optim problem is evaluated without the constraints, or at least the problem transformed into a warning? Furthermore, I realised that using the numDeriv package, the function hessian() does not offer either constraints for the parameters, yet in the special case of the example above, it does not fail (although would have problems with parameter even closer to the bound), unlike the optimHessian() function? See: library(numDeriv) hessian(LLNorm, op$par, dat=rn2) This brings me to the final question: is there a technical reason not to allow a constrained hessian, which seems to indicate the fact that the two implementations in R do not do it? I found a very interesting answer by Spencer Grave for a similar question: http://tolstoy.newcastle.edu.au/R/e4/help/08/06/15061.html although this regards more the statistical implications than the numerical issues. Thanks a lot! Matthieu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Date Math
On 12-10-15 06:00 AM, r-help-requ...@r-project.org wrote: Jeff, My understanding is that the lag command will lag an entire time series. That isn't what I'm looking for. I just want, for example, today, and 5 entries back. for exmple: iter - '2011-05-18' observations[iter] # works fine, returns the row at that date. index(observations[iter[) # just returns the iter back index(observations[iter]) - 5 # returns 2011-05-17 23:59:57 PDT, so it subtracted 3 seconds. really, I want to find: iter- 5 days. Switching between 5 entries back and 5 days back has me confused about whether you have more than one entry some days or not. If you mean you have at most one entry per day, and you want the last five days that you have observations, then this part of your problem can be solved by switching from POSIXct to dates: z - Sys.time() z [1] 2012-10-15 08:34:58 EDT z -5 [1] 2012-10-15 08:34:53 EDT as.Date(z) -5 [1] 2012-10-10 (BTW, a complete example is always useful when asking for help. I think your obviously fails code would be pretty close to working, depending on how the dates were generated.) Paul -- Noah Silverman Smart Media Corp. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Is there any package for Vector Auto-regressive with exogenous variable other than fastVAR?
Package dse also does vector auto-regression with exogenous variables (and also vector ARMA and state-space models with exogenous variables). But I don't understand what you mean by not taking the base in the model so I don't know if it will solve your problem. Paul On 12-10-04 06:00 AM, r-help-requ...@r-project.org wrote: Date: Wed, 3 Oct 2012 23:08:54 -0700 (PDT) From: Sankalpsankalpsing...@gmail.com To:r-help@r-project.org Subject: [R] Is there any package for Vector Auto-regressive with exogenous variable other than fastVAR? Message-ID:1349330934183-4644964.p...@n4.nabble.com Content-Type: text/plain; charset=us-ascii Is there any package for Vector Auto-regressive with exogenous variable other than fastVAR? Because it is not able to solve my problem of not taking the base in the model. Please suggest some appropriate solution -- View this message in context:http://r.789695.n4.nabble.com/Is-there-any-package-for-Vector-Auto-regressive-with-exogenous-variable-other-than-fastVAR-tp4644964.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about R performance on UNIX/LINUX with, different memory/swap configurations
On 12-09-17 06:00 AM, r-help-requ...@r-project.org wrote: Date: Sun, 16 Sep 2012 14:41:42 -0500 From: Dirk Eddelbuettele...@debian.org To: Eberle, Anthonyae...@allstate.com Cc:r-help@r-project.org Subject: Re: [R] Question about R performance on UNIX/LINUX with different memory/swap configurations Message-ID:20566.11126.175103.643...@max.nulle.part Content-Type: text/plain; charset=us-ascii On 16 September 2012 at 13:30, Eberle, Anthony wrote: | Does anyone have any guidance on swap and memory configuration when | running R v2.15.1 on UNIX/LINUX? Through some benchmarking across | multiple hardware (UNIX, LINUX, SPARC, x86, Windows, physical, virtual) | it seems that the smaller memory machines have an advantage. Would you be able to provide some empirical proofs for that conjecture? Please demonstrate how, say, a X gb ram machine outperforms one with Y gb where Y X. Less memory is/never/ better. Exceptions to the rule: There are a couple of unusual situations were less memory can be better. As someone else pointed out, if you start swapping then performance takes a hit. So, if you have enough physical memory to do your own job, but not enough to let other jobs run and push you into swapping, then you may get better performance on the single job. Second, if you have long recursions that grab lots of memory, then these will use swap. Typically one would want the job to succeed, and thus swap is good, but slow. However, if your code is bad and doing an infinite recursion, then with swap this can take a very long time to fail, whereas with only physical memory it fails much more quickly. Third, there were (are?) situations where 32 bit machines/OSes perform (very slightly) faster than 64 bit machines/OSes, and since the former typically have less memory, one could have the impression that less memory is faster. Paul Dirk -- Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help Digest, Vol 107, Issue 14
On 12-01-14 06:00 AM, r-help-requ...@r-project.org wrote: Date: Fri, 13 Jan 2012 02:09:09 -0800 (PST) From: statquant2statqu...@gmail.com To:r-help@r-project.org Subject: Re: [R] simulating stable VAR process Message-ID:1326449349804-4291835.p...@n4.nabble.com Content-Type: text/plain; charset=us-ascii Hello Paul Thanks for the answer but my point is not how to simulate a VAR(p) process and check that it is stable. My question is more how can I generate a VAR(p) such that I already know that it is stable. Just to be clear, my answer was telling you how to check that the model is stable, not how to simulate a process and check if the process is stationary, which is probably what you mean when you say stable process. Beware that the terminology stable model is often used nowadays to refer to whether the parameters are changing over time. This has little to do with whether the model is stable in the sense of the modulus of the determinant of the polynomial. Elsewhere in this thread, I think John Frain has given you an algorithm for generating models. We know a condition that assure that it is stable (see first message) but this is not a condition on coefficients etc... Yes, it is a condition on coefficients. What I want is generate say a 1000 random VAR(3) processes over say 500 time periods that will be STABLE (meaning If I run stability() all will pass the test) I am a bit confused by what you mean here. If you are saying generate 500 time periods for each of the 1000 models, you do not have to generate any time periods unless you want simulated data, you can check if the models are stable directly, because this is a check on the coefficients of the model. [If you are saying that the model is changing over time, then the models may all individually be stable in the determinant sense, but this is not stable (by construction) in the second sense above. And, the generated process will probably not be stationary according to many of the usual definitions.] When I try to do that it seems that none of the VAR I am generating pass this test, so I assume that the class of stable VAR(p) is very small compared to the whole VAR(p) process. When you say small I think you are talking about a measure on the space of coefficients. If you use the usual metric and generated the coefficients uniformly randomly over R^n, then the models that have determinant inside the unit ball would probably be a very small set. Even with considerable restrictions, say for example, the coefficients all between -1 and 1, I think the stable models would be a small set. HTH, Paul -- View this message in context:http://r.789695.n4.nabble.com/simulating-stable-VAR-process-tp4261177p4291835.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The Future of R | API to Public Databases
The situation for this kind of interface is much more advanced (for economic time series data) than has been suggested in other postings. Several of the organizations you mention support SDMX and I believe there is a working R interface to SDMX which has not yet been made public. A more complete list of organizations that I think already have working server side support for SDMX is: the OECD, Eurostat, the ECB, the IMF, the UN, the BIS, the Federal Reserve Board, the World Bank, the Italian Statistics agency, and to a small extent by the Bank of Canada. I have a working API to several time series databases (TS* packages on CRAN), and a partially working interface to SDMX, but have postponed further development of that in the hope that the already working code will be made available. Please see http://tsdbi.r-forge.r-project.org/ for more details. I would, of course, be happy to have other developers involved in this project. If you think you can contribute then see r-forge.r-project.org for details on how to join projects. Paul On 12-01-14 06:00 AM, r-help-requ...@r-project.org wrote: Date: Sat, 14 Jan 2012 02:44:07 +0530 From: Benjamin Weberm...@bwe.im To:r-help@r-project.org Subject: [R] The Future of R | API to Public Databases Message-ID: cany9q8k+zyvrkjjgbjp+jtnyaw15gqkocivyvpgwgyqa9dl...@mail.gmail.com Content-Type: text/plain; charset=UTF-8 Dear R Users - R is a wonderful software package. CRAN provides a variety of tools to work on your data. But R is not apt to utilize all the public databases in an efficient manner. I observed the most tedious part with R is searching and downloading the data from public databases and putting it into the right format. I could not find a package on CRAN which offers exactly this fundamental capability. Imagine R is the unified interface to access (and analyze) all public data in the easiest way possible. That would create a real impact, would put R a big leap forward and would enable us to see the world with different eyes. There is a lack of a direct connection to the API of these databases, to name a few: - Eurostat - OECD - IMF - Worldbank - UN - FAO - data.gov - ... The ease of access to the data is the key of information processing with R. How can we handle the flow of information noise? R has to give an answer to that with an extensive API to public databases. I would love your comments and ideas as a contribution in a vital discussion. Benjamin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simulating stable VAR process
The simulate function in dse lets you specify the model and the distribution of the noise term (or even their values so you can get any distribution you like). So, you should be able to do what you want, with either a VAR(p) or a vector ARMA process. If you are getting a process that explodes then your model is probably not stable. If it is a dse TSmodel you can check it with stability(), see ?stability in dse. Beware that the condition Modulus 1 depends on whether your lagged parameters are specified on the left or right side of the equation. This changes the sign of the lag parameters and inverts the condition. Dse assumes lagged terms are specified on the left side, which is a bit unusual compared to introductory text books. However, when you get to hard problems it has advantages because the AR term is a matrix polynomial ring and so it is easier to apply some useful mathematics. Paul Date: Wed, 4 Jan 2012 05:17:05 -0800 (PST) From: statquant2statqu...@gmail.com To:r-help@r-project.org Subject: Re: [R] simulating stable VAR process Message-ID:1325683025141-4261210.p...@n4.nabble.com Content-Type: text/plain; charset=us-ascii More specifically. I know that a condition for a VAR(p) process to be stable (weakly stationary) is that the companion form of the equation (see AWESOME Pfaff book analysis of integrated and cointegrated time series in R) as eigenvalues of modulus1. My problem is that I want to generate such processes... When I try to generate random VAR(p) processes they seems to explode (clearly they are not weakly stationary...) Is there a way somebody know? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Varma models in the dse package
Tanja Your TSdata object dfdata as printed in your email looks a bit funny. It should be a list with a matrix of numeric data in the element named output: dfdata - TSdata(output=matrix(rnorm(200), 100,2)) dfdata output data: Series 1 Series 2 10.01905979 -0.096441240 20.73873854 -0.654558048 ... I'm just guessing that you specified this incorrectly somehow. If not, please try to give a reproducible example. Paul -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Tanja Krone Sent: November 22, 2011 10:36 AM To: r-help@r-project.org Subject: [R] Varma models in the dse package Hi, I tried to run the VARMA model in the dse package. I specified a model: arma A(L) = 1+0.244L10+0.05L1 0-0.325L11-0.234L1 B(L) = 1-0.277L10+0.211L1 0-0.206L11+0.238L1 and have a TSdata object: dfdata output data: Series 1 Series 2 1 difex2 difem2 but I get this warning message: estMaxLik(arma, dfdata) Error in l.ARMA(setArrays(Shape, coefficients = coefficients), data, result = like, : NA/NaN/Inf in foreign function call (arg 14) In addition: Warning message: In l.ARMA(setArrays(Shape, coefficients = coefficients), data, result = like, : NAs introduced by coercion bots series are the same length (69). Is there something I am missing? Kind regards and thanks in advance, Tanja [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. La version française suit le texte anglais. This email may contain privileged and/or confidential information, and the Bank of Canada does not waive any related rights. Any distribution, use, or copying of this email or the information it contains by other than the intended recipient is unauthorized. If you received this email in error please delete it immediately from your system and notify the sender promptly by email that you have done so. Le présent courriel peut contenir de l'information privilégiée ou confidentielle. La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute diffusion, utilisation ou copie de ce courriel ou des renseignements qu'il contient par une personne autre que le ou les destinataires désignés est interdite. Si vous recevez ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans délai à l'expéditeur un message électronique pour l'aviser que vous avez éliminé de votre ordinateur toute copie du courriel reçu. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a question on R optimization functions
It seems more likely that the return value from your function is NA or NaN or Inf. This might then result in an NA parameter value being calculated for the next step. This is possible, for example, because the line search extends outside the feasible region. You can re-write your function to check for that case and return a large negative value (or positive if minimizing), or stop() if that is more appropriate. Paul -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Dajiang J. Liu Sent: March 25, 2011 3:59 AM To: r-help@r-project.org Subject: [R] a question on R optimization functions Dear All, I use nlminb or optim for maximizing likelihood functions. Sometimes, the parameter values happen to be NA, then the program will hang there and iterate forever without stopping. No error message will be produced. So I can not use error catch method such as try. Are there any suggestions how I can circumvent this problem? Maybe I can time a function, and if the time exceeds a threshold, it will be stopped. I am not sure if this is feasible in R. As a note, I am running simulations with thousands of replicates, so there needs to be a systematic way of doing this. Are there any suggestions on how to do this? Thank you very much! Regards, Dajiang __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. La version française suit le texte anglais. This email may contain privileged and/or confidential information, and the Bank of Canada does not waive any related rights. Any distribution, use, or copying of this email or the information it contains by other than the intended recipient is unauthorized. If you received this email in error please delete it immediately from your system and notify the sender promptly by email that you have done so. Le présent courriel peut contenir de l'information privilégiée ou confidentielle. La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute diffusion, utilisation ou copie de ce courriel ou des renseignements qu'il contient par une personne autre que le ou les destinataires désignés est interdite. Si vous recevez ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans délai à l'expéditeur un message électronique pour l'aviser que vous avez éliminé de votre ordinateur toute copie du courriel reçu. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to obtain seed after generating random number?
It is nearly impossible to go back after the fact and figure out the seed you started with. So, you need to be careful to record the seed first. If you are doing your simulations with a function then it is a good idea to always start in the function by saving a record of the seed and returning it with the result. Then you always have the information you need. Beware that the seed is not the only thing you need to save. You also need to record the uniform generator you are using, and the generator for the distribution you are using. Of course, if you always only use the default then you won't have a problem here, until the default changes. You might want to look at the utilities and examples in the package setRNG. Paul -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Bogaso Christofer Sent: August 24, 2010 1:12 PM To: r-help@r-project.org Subject: [R] How to obtain seed after generating random number? Dear all, I was doing an experiment to disprove some theory therefore performing lot of random simulation. Goal is to show the audience that although something has very rare chance to occur but it doesn't mean that event would be impossible. In this case after getting that rare event I need to show that same scenario for multiple times to explain other audience. Hence I need to somehow save that seed which generates that random numbers after doing the experiment. However as it is very rare event it is not very practical to start with a fixed seed and then generate random numbers. Hence I am looking for some way which will tell me about that corresponding seed which was responsible to generate that particular series of random numbers responsible for occurrence of that rare event. In short, I need to know the seed ***after*** generating the random numbers. Is there any possibility to know this? Thanks and regards, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. La version française suit le texte anglais. This email may contain privileged and/or confidential information, and the Bank of Canada does not waive any related rights. Any distribution, use, or copying of this email or the information it contains by other than the intended recipient is unauthorized. If you received this email in error please delete it immediately from your system and notify the sender promptly by email that you have done so. Le présent courriel peut contenir de l'information privilégiée ou confidentielle. La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute diffusion, utilisation ou copie de ce courriel ou des renseignements qu'il contient par une personne autre que le ou les destinataires désignés est interdite. Si vous recevez ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans délai à l'expéditeur un message électronique pour l'aviser que vous avez éliminé de votre ordinateur toute copie du courriel reçu. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can RMySQL be used for a paramterized query?
Ted I'm not sure I fully understand the question, but you may want to consider creating a temporary table with a join, which you can do with a query from your R session, and then query that table to bring the data into R. Roughly, the logic is to leave the data in the db if you are not doing any fancy calculations. You might also find order by is useful. (This is what I use in TSdbi to make sure data comes back in the right order as a time series.) It may even be possible to get everything you want back in one step using this and group by, rather than looping, but depending on the analysis you want to do in R, that may not be the most convenient way. BTW, I think you realize you do not have to use the TSMySQL commands to access the TSMySQL database. They are usually convenient, but you can query the tables directly with RMySQL functions. Paul -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Henrique Dallazuanna Sent: June 10, 2010 8:47 AM To: Ted Byers Cc: R-help Forum Subject: Re: [R] Can RMySQL be used for a paramterized query? I think you can do this: ids - dbGetQuery(conn, SELECT id FROM my_table) other_table - dbGetQuery(conn, sprintf(SELECT * FROM my_other_table WHERE t1_id in (%s), paste(ids, collapse = ,))) On Wed, Jun 9, 2010 at 11:24 PM, Ted Byers r.ted.by...@gmail.com wrote: I have not found anything about this except the following from the DBI documentation : Bind variables: the interface is heavily biased towards queries, as opposed to general purpose database development. In particular we made no attempt to define bind variables; this is a mechanism by which the contents of R/S objects are implicitly moved to the database during SQL execution. For instance, the following embedded SQL statement /* SQL */ SELECT * from emp_table where emp_id = :sampleEmployee would take the vector sampleEmployee and iterate over each of its elements to get the result. Perhaps the DBI could at some point in the future implement this feature. I can connect, and execute a SQL query such as SELECT id FROM my_table, and display a frame with all the IDs from my_table. But I need also to do something like SELECT * FROM my_other_table WHERE t1_id = x where 'x' is one of the IDs returned by the first select statement. Actually, I have to do this in two contexts, one where the data are not ordered by time and one where it is (and thus where I'd have to use TSMySQL to execute something like SELECT record_datetime,value FROM my_ts_table WHERE t2_id = x). I'd like to embed this in a loop where I iterate over the IDs returned by the first select, get the appropriate data from the second for each ID, analyze that data and store results in another table in the DB, and then proceed to the next ID in the list. I suppose an alternative would be to get all the data at once, but the resulting resultset would be huge, and I don't (yet) know how to take a subset of the data in a frame based on a given value in one ot the fields and analyze that. Can you point me to an example of how this is done, or do I have to use a mix of perl (to get the data) and R (to do the analysis)? Any insights on how to proceed would be appreciated. Thanks. Ted [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] La version française suit le texte anglais. This email may contain privileged and/or confidential in...{{dropped:26}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading dates in R using SQL and otherwise (and someinteresting behavior by the data editor)
I can use as.Date() on the result from an sql date field, but this may depend on the backend database engine too. There may also be some sensitivity to character set encoding used on the database and the client, and from you email it looks a bit like you could have some problems in this respect. Paul Gilbert -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Paul Miller Sent: April 8, 2010 11:14 AM To: r-help@r-project.org Subject: [R] Reading dates in R using SQL and otherwise (and someinteresting behavior by the data editor) Hello Everyone,  I am a newbie with about a month's worth of experience using R. I've just spent a little time learning how to work with date values. Mostly, this has involved converting text values into dates.  So far, I've managed to figure out how to do this in R proper using as.Date. However, I'm still struggling with doing this using SQL and RODBC.  In the process of learning to create date values in R proper, I noticed some interesting behavior on the part of the data editor. At first, this led me to believe that my efforts had been unsuccessful. The output from my R console below illustrates this behavior.  test - mydata test$test_date - as.Date(test$ae_datestarted, format='%m/%d/%Y') class(test$test_date) [1] Date mode(test$test_date) [1] numeric fix(test)  (At this point, I clicked on the test_date column) Warning: class discarded from column ‘test_date’ class(test$test_date) [1] character mode(test$test_date) [1] character  When I run my code, it works correctly. But when I open the data frame in the data editor and click on the test_date column, the editor says that it is character. And beyond that, the editor discards the class for test_date. Should the editor do this? Or is it my fault for trying to look at test_date in the editor in the first place? In SAS, I'm used to creating data and then opening the dataset to look at what I've done. Maybe I shouldn't be doing this in R though.  Returning to the issue of converting text values to dates using SQL (Server) and RODBC. Does anyone know how ot do this? I've been trying to do this using things like Cast and Convert. Usually, these attempts fail. When SQL Server does seem to be sending something back, it appears that R cannot accept it. Any help with this problem would be greatly appreciated.  Thanks,  Paul      __ [[elided Yahoo spam]] avourite sites. Download it now [[alternative HTML version deleted]] La version française suit le texte anglais. This email may contain privileged and/or confidential information, and the Bank of Canada does not waive any related rights. Any distribution, use, or copying of this email or the information it contains by other than the intended recipient is unauthorized. If you received this email in error please delete it immediately from your system and notify the sender promptly by email that you have done so. Le présent courriel peut contenir de l'information privilégiée ou confidentielle. La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute diffusion, utilisation ou copie de ce courriel ou des renseignements qu'il contient par une personne autre que le ou les destinataires désignés est interdite. Si vous recevez ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans délai à l'expéditeur un message électronique pour l'aviser que vous avez éliminé de votre ordinateur toute copie du courriel reçu. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] new dse package
With the release of R-2.10.0 I have divided the dse bundle into packages. The bundled package dse1 and part of dse2 are now in the new package dse. The remaining part of dse2 is in a new package EvalEst. The package dse does multivariate ARMA and state space time series modelling and forecasting, while package EvalEst is now focused on the evaluation of estimation methods. To aid transition, there are packages dse1 and dse2 on CRAN. These do nothing except require the new packages dse and EvalEst respectively, but should help prevent breaking of existing code that requires() dse1 or dse2. I have also taken this opportunity to do a long overdue cleanup of the Users' Guides distributed as vignettes in the packages. Paul Gilbert La version française suit le texte anglais. This email may contain privileged and/or confidential in...{{dropped:26}} ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dse model setup help
I think the problem here is that you seem to be trying to specify a non-innovations form model, for which both Q and R need to be specified, but you have only specified Q. The code guesses whether you are specifying an innovations model or an non-innovations model based on whether you specify the Kalman gain K. Since you don't specify K, it assumes a non-innovations form and expects to find both Q and R. It seems one of my error checks could be improved. Paul Bob McCall wrote: I tried to specify a model in dse1 but something isn't right. Anybody have any tips? model-SS(F=f,G=g,H=h,Q=q,z0=z,P0=p) Error in locateSS(model$R, constants$R, R, p, p, plist) : The dimension of something in the SS model structure is bad. dim(f) [1] 5 5 dim(g) [1] 5 1 dim(h) [1] 1 5 dim(q) [1] 5 5 dim(z) [1] 5 1 dim(p) [1] 5 5 thanks, Bob __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. La version française suit le texte anglais. This email may contain privileged and/or confidential information, and the Bank of Canada does not waive any related rights. Any distribution, use, or copying of this email or the information it contains by other than the intended recipient is unauthorized. If you received this email in error please delete it immediately from your system and notify the sender promptly by email that you have done so. Le présent courriel peut contenir de l'information privilégiée ou confidentielle. La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute diffusion, utilisation ou copie de ce courriel ou des renseignements qu'il contient par une personne autre que le ou les destinataires désignés est interdite. Si vous recevez ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans délai à l'expéditeur un message électronique pour l'aviser que vous avez éliminé de votre ordinateur toute copie du courriel reçu. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SQL Queries from Multiple Servers
Tom Schenk Jr wrote: I use RODBC as my conduit from R to SQL. It works well when the tables are stored on one channel, e.g., channel - odbcConnect(data_base_01, uid=, dsn=) However, I often need to match tables across multiple databases, e.g., data_base_01 and data_base_02. However, odbcConnect() appears limited insofar as you may only query from tables within a single channel, e.g., database. I do not have access to write and create new tables on the SQL servers, which is a possible solution (e.g., copy all tables into a single database). Is there any way, in RODBC or another R-friendly SQL package, to perform SQL operations across multiple databases? I'm not sure if this can be done with odbc, but with MySQL it is possible to do joins across multiple databases, and creating temporary tables may be possible even without the write access you would need for a permanent table. I'm not sure if you can pass this kind of statement from R, because the connection usually specifies the database. However, I have constructed temporary tables with a simple mysql client and then queried them from R. They stay around as long as you don't quit the simple client. I am not really sure this is suppose to work. Another option is two connections and do some of the comparison in R, or write the results to an SQLite connection, on which you usually have write access. This might be slow and you may have to deal with chunks if you have big tables. Joins across databases are also possible with PostgreSQL, I'm told, but they are more difficult. Paul Warm regards. La version française suit le texte anglais. This email may contain privileged and/or confidential in...{{dropped:26}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] remote browser and per-session dir
Running R on a remote system I can almost never get help.start() to work, because it passes the request to my local browser, which cannot see the links in per-session dir created in /tmp on the remote machine. Is there a way to put the per-session dir in my home dir (which is NFS mounted), or some other fix for this feature? Paul Gilbert La version française suit le texte anglais. This email may contain privileged and/or confidential information, and the Bank of Canada does not waive any related rights. Any distribution, use, or copying of this email or the information it contains by other than the intended recipient is unauthorized. If you received this email in error please delete it immediately from your system and notify the sender promptly by email that you have done so. Le présent courriel peut contenir de l'information privilégiée ou confidentielle. La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute diffusion, utilisation ou copie de ce courriel ou des renseignements qu'il contient par une personne autre que le ou les destinataires désignés est interdite. Si vous recevez ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans délai à l'expéditeur un message électronique pour l'aviser que vous avez éliminé de votre ordinateur toute copie du courriel reçu. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] windows vs. linux code
Gabor Grothendieck wrote: Try if (.Platform$OS.type == windows) ... else ... Gabor has suggested what I think is the best way to do this check, but in my experience, if you are doing this check then you are almost certainly missing some feature of R that will let you avoid doing it. For graphs, dev.new() has been mentioned. I would be curious to know when people find it necessary to do this check, other than to issue a system() command for some very specialized local system reason (i.e. something that would never be used in a package and rarely in code run on different computers - not just different OSs). Paul La version française suit le texte anglais. This email may contain privileged and/or confidential information, and the Bank of Canada does not waive any related rights. Any distribution, use, or copying of this email or the information it contains by other than the intended recipient is unauthorized. If you received this email in error please delete it immediately from your system and notify the sender promptly by email that you have done so. Le présent courriel peut contenir de l'information privilégiée ou confidentielle. La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute diffusion, utilisation ou copie de ce courriel ou des renseignements qu'il contient par une personne autre que le ou les destinataires désignés est interdite. Si vous recevez ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans délai à l'expéditeur un message électronique pour l'aviser que vous avez éliminé de votre ordinateur toute copie du courriel reçu. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] New international competition for the Digital Humanities
(Slightly off topic and outside my area - but this may be of interest to the R community) WASHINGTON (January 16, 2009) -- Today, a new international competition called the Digging into Data Challenge was announced by four leading research agencies: the Joint Information Systems Committee (JISC) from the United Kingdom, the National Endowment for the Humanities (NEH) and the National Science Foundation (NSF) from the United States, and the Social Sciences and Humanities Research Council (SSHRC) from Canada. The Digging into Data Challenge encourages humanities and social science research using large-scale data analysis, challenging scholars to develop international partnerships and explore vast digital resources, including electronic repositories of books, newspapers, and photographs to identify new opportunities for scholarship. Interested applicants must first submit a letter of intent by March 15, 2009. Further information about the competition and the application process can be found at http://www.diggingintodata.org . See the full press release at: http://www.sshrc-crsh.gc.ca/web/whatsnew/press_releases/2009/digging_into_data_e.asp La version française suit le texte anglais. This email may contain privileged and/or confidential information, and the Bank of Canada does not waive any related rights. Any distribution, use, or copying of this email or the information it contains by other than the intended recipient is unauthorized. If you received this email in error please delete it immediately from your system and notify the sender promptly by email that you have done so. Le présent courriel peut contenir de l'information privilégiée ou confidentielle. La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute diffusion, utilisation ou copie de ce courriel ou des renseignements qu'il contient par une personne autre que le ou les destinataires désignés est interdite. Si vous recevez ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans délai à l'expéditeur un message électronique pour l'aviser que vous avez éliminé de votre ordinateur toute copie du courriel reçu. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R ORM?
Faheem Mitha wrote: Hi Hans, Thanks for the reply. On Tue, 18 Nov 2008, Hans Werner Borchers wrote: Faheem Mitha faheem at email.unc.edu writes: Hi, Does anyone know of an R ORM (Object Relational Mapper)? I'm thinking of something similar to sqlalchemy (http://www.sqlalchemy.org/). Alternatively or additionally, can people offer suggestions about managing relational databases from R. I'm currently using postgresql, but would like a flexible solution that works across different databases, hence the enquiry about ORMs. Hi Faheem, I am wondering, what kind of objects or classes you would like to apply the object relational mapper to. S4 classes? -- I doubt that very much. Yes. R classes. As long as they support the object oriented paradigm, it doesn't matter too much to me what kind exactly. ORMs map between classes and db tables. The attraction of an ORM is high level abstraction capabilities as well as a helping of syntactic sugar. The only reasonable structure for which one could use a mapper in R is the data frame. All R database packages provide support for getting and sending data in data frames to and from databases. Admittedly, this is not what someone from the Python community would accept as an object relational mapper, but may be sufficient given that there is only this one class concerned. You mean eg http://rpgsql.sourceforge.net/ can be used to read and write from a postgresql db, and there are similar packages for mysql etc.? The only way I know to become a somewhat independent from specific databases is to turn to RODBC (Windows or Linux, you didn't say that). The only way I know is to use standard SQL, and avoid product specific extensions in all your code that uses the database. Even that is not perfect, but it is a pretty good start. I don't think RODBC really solves this problem for you. It does allow you to talk to different products, but so do the various DBI drivers. But in either case, if you use product specific extensions then you will tie yourself to the product (which is one reason vendors like to add such useful extensions). I read the initial remark to be about porting the database itself. In my limited experience this may be a small part compared to the code for maintaining and using the database. Paul I use Linux (Debian). You mean http://cran.r-project.org/web/packages/RODBC/index.html? Is this a recommended approach? CCing back to list, just in case someone else is interested... Faheem. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. La version française suit le texte anglais. This email may contain privileged and/or confidential in...{{dropped:26}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] Packages for time series databases
I recently put several packages for time series databases on CRAN. The main package, TSdbi, provides a common interface to time series databases. The objective is to define a standard interface so users can retrieve time series data from various sources with a simple, common, set of commands, and so programs can be written to be portable with respect to the data source. The SQL implementations also provide a database table design, so users needing to set up a time series database have a reasonably complete way to do this easily. The interface provides for a variety of options with respect to the representation of time series in R. There is also a (not yet well tested) mechanism to handle multilingual data documentation. -TSdbi is the main package. -TSMySQL, TSPostgreSQL, and TSSQLite provide interfaces for these three SQL databases (using RMySQL, RSQLite, and RPostgreSQL). -TSodbc provides an interface using RODBC (tested only with a PostgreSQL server). -TSfame provides an interface to Fame using the fame package. -TSpadi provides an interface to Fame using a (somewhat dated) padi server. -TShistQuote provides an interface to get.hist.quote. Many thanks to the authors and maintainers of the underlying packages, and several people who have commented on early versions of the TS* packages. Paul Gilbert La version française suit le texte anglais. This email may contain privileged and/or confidential information, and the Bank of Canada does not waive any related rights. Any distribution, use, or copying of this email or the information it contains by other than the intended recipient is unauthorized. If you received this email in error please delete it immediately from your system and notify the sender promptly by email that you have done so. Le présent courriel peut contenir de l'information privilégiée ou confidentielle. La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute diffusion, utilisation ou copie de ce courriel ou des renseignements qu'il contient par une personne autre que le ou les destinataires désignés est interdite. Si vous recevez ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans délai à l'expéditeur un message électronique pour l'aviser que vous avez éliminé de votre ordinateur toute copie du courriel reçu. ___ R-packages mailing list [EMAIL PROTECTED] https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Ottawa area R Users meeting
There will be a meeting of the Ottawa Gatineau R Users Group on Monday November 10 from 4pm to 6pm. More details are provided below. Please pass this message along to friends you think may be interested. Paul Gilbert _ Dear R friends: To get OGRUG going again I've reserved room DMS 6160 in the Desmarais Building of the University of Ottawa for 1600-1800 November 10. This is the big curved building opposite National Defence HQ at 55 Laurier E. The transitway stops are at the door or across the road. Parking exists in the basement -- bring gold or at least a gold credit card if you intend to drive! 6160 is on the 6th floor. Paul Gilbert and I went to the Use R meeting in Dortmund and will give an appreciation of what we learned at that meeting, which was very well attended. We will also try to gain a picture of the usage of R in Ottawa by exchange of information, with the aim of making OGRUG a modest but effective resource for R users in the region. As an ongoing agenda item, may I suggest that we invite anyone with a particular question, glitch, note or such about R to bring it up. For items needing some show as well as tell, please email me materials and I'll try to arrange for presentation. As an example, I'll show a 2-line script that draws multi-page graphs (e.g., stock market trades by the minute for last month over 10 by 2 sheets of paper). Please share this with others who may be interested. Cheers, John Nash [EMAIL PROTECTED] La version française suit le texte anglais. This email may contain privileged and/or confidential in...{{dropped:26}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dbConnect
Christiane Reuter wrote: Hi everybody, I'm having a problem with connecting to my MySQL database. Each time I try to connect library(RMySQL) m - dbDriver(MySQL) con - dbConnect (m, host=my_host,username=my_username, password=my_password, dbname=name_of_db) it says Fehler in mysqlNewConnection(drv, ...) : RS-DBI driver: (could not connect my_host on dbname name_of_db Error:Can't connect to MySQL server on 'my_host' (60) I access the database via https. Might this be the problem? (I have an other database that I access via http where I have no probelems at all...) If I understand correctly, you are using http to the web server and it is access MySQL running on the web server machine, which means the database server is listening for localhost connections. The dbConnect is going over the network directly to the server, which means the server needs to be set up to listen on an network address/port. If it is not, you would get tis message, but there are other possible causes too. Paul And if so, is there anyone who knows what I need to do to overcome that problem? Thank you so much! -Christiane [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. La version française suit le texte anglais. This email may contain privileged and/or confidential in...{{dropped:26}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using acf() for multiple columns
rcoder wrote: Hi Gary, Thanks for your reply. This works fine. Is there any way to send the ACF data to a matrix, so I could analyse the data in excel for e.g.? The coefficients are returned in the acf element of the result, as mentioned in the documentation (?acf). The side affect of plotting can be controlled by the plot argument: A - matrix(rnorm(1500),nrow=500) z -acf(A, lag.max=10, plot=FALSE)$acf dim(z) [1] 11 3 3 I think what you are looking for are the elements z[,1,1], z[[,2,2] and z[,3,3]. z[,1,1] [1] 1.0 0.080656537 0.041652217 -0.051640091 -0.032176245 [6] -0.063410768 0.033985699 0.011444829 -0.027779925 0.007840191 [11] -0.042187141 This is doing some extra calculation compared to what you want, because it is calculating all the cross correlations. The trade-off is that the acf calculation gets passed to C code and is fast, whereas splitting out the rows may not be. For a small number of columns there is probably not much difference. If you are going to transfer this to something else by copying and pasting the result, you should print it to a higher precision. I'm not sure why you would do that to analyse ... the usual problem is getting the data into R. Paul Gilbert Thanks, rcoder Ling, Gary (Electronic Trading) wrote: Hi, here is one possible solution ... -gary ### example ### # create a 500x3 multi-ts A - matrix(rnorm(1500),nrow=500) # misc. graphical setting par(mfrow=c(ncol(A),1)) # then our friend -- lapply(split(...)) lapply(split(A,col(A)), acf) #Or lapply(split(A,col(A)), function(ts) acf(ts, lag.max=10)) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of rcoder Sent: Wednesday, August 06, 2008 4:13 PM To: r-help@r-project.org Subject: [R] using acf() for multiple columns Hi everyone, I'm trying to use the acf() function to calculate the autocorrelation of each column in a matrix. The trouble is that I can only seem to get the function to work if I extract the data in the column into a separate matrix and then apply the acf() function to this column. I have something like this: acf(mat,lag.max=10,na.action=na.pass) ...but I would really like to apply the function to 'mat' where 'mat' is a matrix as opposed to a vector. The function actually doesn't return an acf coefficient, but instead plots the data. So, in addition to handling matrices with multiple columns, would anyone know how to coerce the function to output the underlying data? Finally, when working with a matrix, is there a way I can specify how many plots I can display after the function executes? I managed to generate a multiple plot when I was experimenting, but the titles suggested the acf was calculated between adjacent columns in the matrix, which is something I was puzzled about. Thanks, rcoder -- View this message in context: http://www.nabble.com/using-acf%28%29-for-multiple-columns-tp18858672p18 858672.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This message w/attachments (message) may be privileged, confidential or proprietary, and if you are not an intended recipient, please notify the sender, do not use or share it and delete it. Unless specifically indicated, this message is not an offer to sell or a solicitation of any investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Merrill Lynch. Subject to applicable law, Merrill Lynch may monitor, review and retain e-communications (EC) traveling through its networks/systems. The laws of the country of each sender/recipient may impact the handling of EC, and EC may be archived, supervised and produced in countries other than the country in which you are located. This message cannot be guaranteed to be secure or error-free. This message is subject to terms available at the following link: http://www.ml.com/e-communications_terms/. By messaging with Merrill Lynch you consent to the foregoing. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. La version française suit le texte anglais. This email may contain privileged and/or confidential in...{{dropped:26}} __ R-help@r-project.org mailing list https
Re: [R] Exception Handling
I do this kind of thing under control of make. If you organize the individual pieces as separate targets then make -k will keep going and make -j can be used to run things in parallel. R can be made to throw an error with something like this: egTarget: @$(MKDIR) $(FLAGS) @echo library(tools); z - doSomething() ; \ if (0 == length(capture.output(z))) q('no', status=0) else \ {print(z); q('no', status=1)} | R --vanilla --slave $(FLAGS)/[EMAIL PROTECTED] @mv $(FLAGS)/[EMAIL PROTECTED] $(FLAGS)/$@ @touch $(FLAGS)/$@ # better time stamp in some parallel situations In this example I am assuming that printing z has zero length output if everything was ok, but you could set up different situations, and you will probably want to use try() in doSomething(). HTH. Paul Gilbert On Sun, Jul 6, 2008 at 7:36 PM, Tudor Bodea [EMAIL PROTECTED] wrote: Dear useRs: Please provide me with your thoughts on an issue related to the design of a production level system. For example, let's suppose that I need to run the same R script for a finite sequence of items (e.g., in the energy industry, I may need to asses the profitability of all gas stations in the state of Florida). For each of the items, the R script accesses some remote databases, gets all the necessary information and processes the data locally. If the data for every item in the item list were not corrupted then the R script would easily do its job. However, every now and then, the data for some items is partially missing and the R script returns in such cases an error message (e.g., for a given gas station, 1 month worth of data is missing from a 3 month decision horizon). As of right now, using the error option of options (), whenever an exception happens, the R script sends me an email error message and kills the current R instance. As I already figured it out, this is not necessarily the most efficient way to deal with such exceptions. Ideally, I would like the script to inform me of the data problem but continue the analysis with the remaining of the items. Based on my searches, I believe that try / stop can answer my problem. However, if any of you already implemented such a safe fail system I would really appreciate your taking the time to (1) share with me what you think constitutes the best practices and/or (2) point out to me any online material relevant to the topic. I run various versions of R on Windows and multiple UNIX platforms. The list of items is read from a .csv file and stored in a dataframe. Thank you. Tudor -- Tudor Dan Bodea Georgia Institute of Technology School of Civil and Environmental Engineering Web: http://www.prism.gatech.edu/~gtg757i __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. La version française suit le texte anglais. This email may contain privileged and/or confidential information, and the Bank of Canada does not waive any related rights. Any distribution, use, or copying of this email or the information it contains by other than the intended recipient is unauthorized. If you received this email in error please delete it immediately from your system and notify the sender promptly by email that you have done so. Le présent courriel peut contenir de l'information privilégiée ou confidentielle. La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute diffusion, utilisation ou copie de ce courriel ou des renseignements qu'il contient par une personne autre que le ou les destinataires désignés est interdite. Si vous recevez ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans délai à l'expéditeur un message électronique pour l'aviser que vous avez éliminé de votre ordinateur toute copie du courriel reçu. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] NumDeriv - derivatives of covariance matrix
Your calculation can be thought of as a function from R^m to R^(n*n), and functions in numDeriv can be used to calculate a numerical approximation to the derivative of the function. However, the functions in numDeriv try to calculate accurate approximations, as opposed to quick approximations like one might want in an optimization problem. Given that you already have an analytic solution, I doubt that even a quick approximation will be faster. You might better look at trying to convert parts of your double loop into vector or matrix calculations, or focusing on the fact that the matrix is symmetric. Paul Gilbert Daomeng Gao wrote: Hello R-help, I need to compute matrices of first derivatives of a covariance matrix C with entries given by c_ij=theta*exp(-0.5* sum(eta*(x[i,]-x[j,])^2)), wrt to elements of eta, a m-dimensional vector of parameters, given a n*m data matrix x. So far, I have been computing matrices for each parameter (given by par[index]) analytically, using the following kmatder- function(x, par, index) { ## x: n*m matrix ## par: vector of parameters, m=length(par)=ncol(x) ## compute matrix of partial derivatives wrt parameter par[index]: Cder = d C/d par[index] theta-1 eta-par n-nrow(x) Cder-matrix(0,n,n) for (i in 1:n) { for (j in i:n) { Cder[i,j]-(-0.5*((x[i,index]-x[j,index])^2))*theta*exp(-0.5* sum(eta*(x[i,]-x[j,])^2)) } } Cder-0.5*(Cder+t(Cder)) Cder } I was wondering whether it might be possible to speed up things using numDeriv (jacobian). If so, what would be the right way to implement a suitable method ? Cheers, Gao Daomeng [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. La version française suit le texte anglais. This email may contain privileged and/or confidential in...{{dropped:26}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] History of R
Kathy The dedication of the developers and several other important things have already been mentioned. Here are a few points I have not seen. - I believe S was originally open source (before the term existed and before GPL, and license issues were probably clouded with respect to changing the code). This meant parts of the community of S user had this tradition. Some, no doubt, were a bit upset about the Splus move to closed source. - This community had also significantly contributed to Statlib, so there were some packages that could be leveraged in the beginning. This may have been not so important for what the packages did, but for the fact that they gave an extensive test suite, so one could have considerable confidence in the results. - Purchase cost is typically not so important for corporate and institutional users, since it is usually dominated by support costs. However, young users may often feel they would prefer to have their personal investment in something they can easily take with them if they move. Some of us at the other end like the idea that we don't need a corporate account to continue research we might be interested in doing when we retire. - All risk averse users should like the idea that programs and acquired skills are not tied to the operating system and hardware flavor of the month. (R has excelled in this respect.) - Help on the R lists has always been exceptionally good (sometimes even if you don't read the documentation first - but expect to be chastised). If you look at the S help list over the past 15 years, you will find many of the most difficult questions were answered by people involved with R. - I ran my own code interchangeably in Splus and R for many years (starting with R-0.16). For a long time Splus was production and R was so I would have a backup. For me, the defining factor in moving to R for production was the introduction of the package system. This is really special in the way that it develops the synergy of the community. By packaging your code you get to leverage all the code checking and documentation checking of the system, and you get to add your own tests that run automatically when you build your package. Not only that, but if you make your package availabe on CRAN you get not only the useful feedback from users, but also the automatic information about what is going to break in your code in the next release of R (from the daily checks on multiple platforms). This is not only useful to package developers, but provides R itself with what I would guess is the largest automatic test bed in the industry. The system is also interesting in the way that it has resolved one of the big problems of Statlib: there is an automatic mechanism for removing broken and unmaintained packages. Paul Gilbert Kathy Gerber wrote: Earlier today I sent a question to Frank Harrell as an R developer with whom I am most familiar. He suggested also that I put my questions to the list for additional responses. Next month I'll be giving a talk on R as an example of high quality open source software. I think there is much to learn from R as a high quality extensible product that (at least as far as I can tell) has never been spun or hyped like so many open source fads. The question that intrigues me the most is why is R as an open source project is so incredibly successful and other projects, say for example, Octave don't enjoy that level of success? I have some ideas of course, but I would really like to know your thoughts when you look at R from such a vantage point. Thanks. Kathy Gerber University of Virginia ITC - Research Computing Support __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. La version française suit le texte anglais. This email may contain privileged and/or confidential information, and the Bank of Canada does not waive any related rights. Any distribution, use, or copying of this email or the information it contains by other than the intended recipient is unauthorized. If you received this email in error please delete it immediately from your system and notify the sender promptly by email that you have done so. Le présent courriel peut contenir de l'information privilégiée ou confidentielle. La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute diffusion, utilisation ou copie de ce courriel ou des renseignements qu'il contient par une personne autre que le ou les destinataires désignés
Re: [R] Using R in a university course: dealing with proposal comments
Stas Kolenikov wrote: ... Training researchers of tomorrow might be great, but ifyour students get on the market in the end of the semester, they won't have the luxury of waiting until R becomes THE package of choice. Not being a teacher, I usually follow these discussions with a bit of amusement and some befuddlement. We hire young people hoping they will bring in bright new ideas from academia, and academics are training the students based on what they think are the old things we use. Fortunately, R is already one of the packages of choice many places. Another point that needs more emphasis is that R is actually a programming language, like Matlab and and APL, so it really has more general usefulness than statistics packages that one might use in the narrower context of a statistics course. Paul Gilbert La version française suit le texte anglais. This email may contain privileged and/or confidential in...{{dropped:26}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] building packages for Linux vs. Windows
Erin Hodgess wrote: Hi R People: I sure that this is a really easy question, but here goes: I'm trying to build a package that will run on both Linux and Windows. However, there are several commands in a section that will be different in Linux than they are in Windows. Erin Several people have indicated how to do this, but I encourage you to be sure you really need to do it. Many things can be made to work the same way on all OSs, and packages are much easier to maintain if you do not have several variants. You might consider posting a few example of where you find this necessary, and ask if there is an OS independent way to do it. Paul Gilbert Would I be better off just to build two separate packages, please? If just one is needed, how could I determine which system is running in order to use the correct command, please? Thanks in advance, Erin La version française suit le texte anglais. This email may contain privileged and/or confidential in...{{dropped:26}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] things that are difficult/impossible to do in SAS or SPSSbut simple in R
The argument for SAS (and Stata) when working with large dataset comes up fairly often. I have not had much experience in this area, but have been pleasantly surprised using R in combination with an SQL interface, in situations with modestly large, messy datasets. I certainly would appreciate comments on the relative merits from anyone that has more experience in this area. Paul Gilbert Walter Paczkowski wrote: Good morning, I use SAS and R/S-Plus as my primary tools so I have a lot of experience with these programs. By far and away, SAS is superior for handling the messy datasets, but also the very large ones. I work at times with datasets in the hundreds of thousands (and on occasion, millions) of records. SAS, and especially PROC SQL, are invaluable for this. But once I get to datasets manageable for R/S-Plus, then I ship to these tools for the programming and graphics. This seems to work great. Walt Paczkowski Data Analytics Corp. -Original Message- From: Rob Robinson [EMAIL PROTECTED] Sent: Jan 17, 2008 4:31 AM To: [EMAIL PROTECTED] Subject: Re: [R] things that are difficult/impossible to do in SAS or SPSSbut simple in R I wonder if those who complain about SAS as a programming environment have discovered SAS/IML which provides a programming environment akin to Matlab which is more than capable (at least for those problems which can be treated with a matrix like approach). As someone who uses both SAS and R - graphical output is so much easier in R, but for handling large 'messy' datasets SAS wins hands down... Cheers Rob *** Want to know about Britain's birds? Try www.bto.org/birdfacts *** Dr Rob Robinson, Senior Population Biologist British Trust for Ornithology, The Nunnery, Thetford, Norfolk, IP24 2PU Ph: +44 (0)1842 750050 E: [EMAIL PROTECTED] Fx: +44 (0)1842 750030 W: http://www.bto.org How can anyone be enlightened, when truth is so poorly lit = -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeffrey J. Hallman Sent: 16 January 2008 22:38 To: [EMAIL PROTECTED] Subject: Re: [R] things that are difficult/impossible to do in SAS or SPSSbut simple in R SAS has no facilities for date arithmetic and no easy way to build it yourself. In fact, that's the biggest problem with SAS: it stinks as a programming environment, so it's always much more difficult than it should be to do something new. As soon as you get away from the canned procs and have to write something of your own, SAS falls down. I don't know enough about SPSS to comment. -- Jeff __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. La version française suit le texte anglais. This email may contain privileged and/or confidential information, and the Bank of Canada does not waive any related rights. Any distribution, use, or copying of this email or the information it contains by other than the intended recipient is unauthorized. If you received this email in error please delete it immediately from your system and notify the sender promptly by email that you have done so. Le présent courriel peut contenir de l'information privilégiée ou confidentielle. La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute diffusion, utilisation ou copie de ce courriel ou des renseignements qu'il contient par une personne autre que le ou les destinataires désignés est interdite. Si vous recevez ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans délai à l'expéditeur un message électronique pour l'aviser que vous avez éliminé de votre ordinateur toute copie du courriel reçu. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] An R is slow-article
Gustaf Rydevik wrote: Hi all, Reading the wikipedia page on R, I stumbled across the following: http://fluff.info/blog/arch/0172.htm There are certainly situations where one would want to consider faster solutions than interpreted languages but, having been through these arguments a few times over the years, here are a few things you might consider: 1/ How much is your time worth, how much does the computer time cost, and how much does a faster computer cost when you start writing your code? 2/ How much is your time worth, how much does the computer time cost, and how much does a faster computer cost when you finish writing your code? 3/ If you tweak the code, or use someone else's private tweaks, how much do you trust the results relative to more widely used and tested versions? 4/ You should do speed comparisons with something resembling your real problem. 5/ If you want to make R look really bad use a loop that gobbles lots of memory, so your machine starts to swap. (This is my guess of part of the problem with the script.) 6/ If you want your code to be really fast, don't do any error checking. (This also avoids the enormous amount of time you waste when you find errors.) It does seem interesting that the C execution is that much slower from R than from a native C program. Could any of the more technically knowledgeable people explain why this is so? The author also have some thought-provoking opinions on R being no-good and that you should write everything in C People used to say assembler, that's progress. Paul Gilbert instead (mainly because R is slow and too good at graphics, encouraging data snooping). See http://fluff.info/blog/arch/0041.htm While I don't agree (granted, I can't really write C), it was interesting to read something from a very different perspective than I'm used to. Best regards, Gustaf _ Department of Epidemiology, Swedish Institute for Infectious Disease Control work email: gustaf.rydevik at smi dot ki dot se skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. La version française suit le texte anglais. This email may contain privileged and/or confidential in...{{dropped:26}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R Compilation error on Ubuntu
Taka You might want to try R-SIG-Debian for this kind of question. Here are instructions from Dirk in response to my similar question a few days ago. HTH, Paul _ On 12 December 2007 at 18:17, Paul Gilbert wrote: | I'm trying to build R from source on Ubuntu Gutsy Gibbon. I've done | apt-get install r-base-dev and apt-get libX11-dev, but R configure is You're close: 'r-base-dev' gives you (about) everything to run 'install.package()' once you have R; libX11-dev ought to give you everything but somehow doesn't so ... | still complaining about X11 headers/libs are not available. What else | do I need? ... use the 'apt-get build-dep $package' command, ie $ sudo apt-get build-dep r-base which will pull in this list (as per the current Debian unstable package): Build-Depends: gcc (= 4:4.1.0), g++ (= 4:4.1.0), gfortran (= 4:4.1.0), refblas3-dev | atlas3-base-dev, tcl8.4-dev, tk8.4-dev, bison, groff-base, libncurses5-dev, libreadline5-dev, debhelper (= 5.0.0), texi2html, texinfo (= 4.1-2), libbz2-dev, libpcre3-dev, tetex-bin, tetex-extra, xpdf-reader, zlib1g-dev, libpng12-dev, libjpeg62-dev, libx11-dev, libxt-dev, x-dev So in this particular case, libxt-dev and x-dev were still missing. 'apt-get build-dep $foo' is really convenient. Try 'apt-get source $foo' as well to get sources (if you have entries like deb-src http://ftp.us.debian.org/debian/ unstable main contrib non-free I use that all the time to fetch Debian sources onto Ubuntu machines at work. It may at times conflict with what packages Ubuntu has, or what names they used -- but for R and one or two dozen other Debian packages it has always been useful for me on Ubuntu too. Cheers, Dirk Takatsugu Kobayashi wrote: Hi I bought a new laptop HP dv9500 just a week ago and installed a Ubuntu gutsy ribbon on this laptop. I wanted to install Fedora but there are more threads on Ubuntu, so I decided to install Ubuntu. After hours of struggle in configuring x server/graphic card stuff, I installed R for Gutsy ribbon using sudo apt-get install g77 r-core'. Now when I tried to install 'sp' package I got this error message: Warning in install.packages(sp) : argument 'lib' is missing: using '/usr/local/lib/R/site-library' --- Please select a CRAN mirror for use in this session --- Loading Tcl/Tk interface ... done trying URL 'http://cran.mtu.edu/src/contrib/sp_0.9-17.tar.gz' Content type 'application/x-gzip' length 359194 bytes opened URL == downloaded 350Kb * Installing *source* package 'sp' ... ** libs gcc-4.2 -std=gnu99 -I/usr/share/R/include -I/usr/share/R/include -fpic -g -O2 -c gcdist.c -o gcdist.o /bin/bash: gcc-4.2: command not found make: *** [gcdist.o] Error 127 ERROR: compilation failed for package 'sp' ** Removing '/usr/local/lib/R/site-library/sp' The downloaded packages are in /tmp/RtmpfOk3H7/downloaded_packages Warning message: installation of package 'sp' had non-zero exit status in: install.packages(sp) Then, I removed this gutsy ribbon version of R and attempted to install fresh R-2.6.1.tar.gz. When I tried to build it, I got this error message: configure: error: --with-readline=yes (default) and headers/libs are not available I installed xserver-xorg-dev ok. I am using Fedora 6 on my desktop and have never seen these above error messages. Should I install a Fedora and R on it? Or should I just create a link using 'ln -s'? I appreciate your help. Have a festive holiday! taka __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. La version française suit le texte anglais. This email may contain privileged and/or confidential in...{{dropped:26}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 00LOCK error with site-library
Mark W Kimpel wrote: I have identical R.profiles and R_HOME directories set up on both my local machine and a remote linux cluster. To keep my libraries and R install separate, I use a site-library on both machines. The first line of my .Rprofile is: '.libPaths(new= ~/R_HOME/site-library) #tell R where site-library is' Until R-2.6.0 this was working fine on both machines, but since I have been unable to upgrade packages on the remote machine. Every attempt at upgrade results in a directory '00LOCK' being place in the site-directory and then the upgrade fails. This directory gets placed there and then removed when the upgrade is finished, to prevent two processes from trying to simultaneously update the directory. If you terminated an upgrade in an unexpected way, say by dropping your connection, then the file will be left and you need to manually remove it before you try to upgrade again. If you have removed it, and it keeps re-appearing, it is a sign of something else going wrong (usually outside of R). Paul This has happened with R-2.6.0, 2.6.1, and now with R-devel (2.7.0). If I have this setup incorrectly I am puzzled as to why it works with my local machine and why it worked with versions of R prior to 2.6.0. Ideas? Thanks, Mark La version française suit le texte anglais. This email may contain privileged and/or confidential in...{{dropped:26}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] kalman filter estimation
[EMAIL PROTECTED] wrote: Hi, Following convention below: y(t) = Ax(t)+Bu(t)+eps(t) # observation eq x(t) = Cx(t-1)+Du(t)+eta(t) # state eq I modified the following routine (which I copied from: http://www.stat.pitt.edu/stoffer/tsa2/Rcode/Kall.R) to accommodate u(t), an exogenous input to the system. for (i in 2:N){ xp[[i]]=C%*%xf[[i-1]] Pp[[i]]=C%*%Pf[[i-1]]%*%t(C)+Q siginv=A[[i]]%*%Pp[[i]]%*%t(A[[i]])+R sig[[i]]=(t(siginv)+siginv)/2 # make sure sig is symmetric siginv=solve(sig[[i]]) # now siginv is sig[[i]]^{-1} K=Pp[[i]]%*%t(A[[i]])%*%siginv innov[[i]]=as.matrix(yobs[i,])-A[[i]]%*%xp[[i]] xf[[i]]=xp[[i]]+K%*%innov[[i]] Pf[[i]]=Pp[[i]]-K%*%A[[i]]%*%Pp[[i]] like= like + log(det(sig[[i]])) + t(innov[[i]])%*%siginv%*%innov[[i]] } like=0.5*like list(xp=xp,Pp=Pp,xf=xf,Pf=Pf,like=like,innov=innov,sig=sig,Kn=K) } I tried to fit my problem and observe that I got positive log likelihood mainly because the log of determinant of my variance matrix is largely negative. That's not good because they should be positive. Have anyone experience this kind of instability? The determinant should not be negative (the line # make sure sig is symmetric should look after that) but the log can be negative. Also, I realize that I have about 800 sample points. The above routine when being plugged to optim becomes very slow. Could anyone share a faster way to compute kalman filter? The KF recursion you show is not time varying, but the code you show looks like it may allow for time varying models. (I'm just guessing, I'm not familiar with the code.) If you do not need the time varying aspect then this is probably a slow way to implement. Package dse1 in the dse bundle implements the recursion the way you show, with an exogenous input u(t), so you don't even have to modify the code. It is implemented in R for demonstration, but also in fortran for speed. See the dse Users Guide for more details, and also ?SS and ?l.SS. One caveat is that the way the likelihood is cumulated over i in your code above allows for time varying sig, which in theory can be important for a diffuse prior. In dse the likelihood is calculated using the residual to get a sample estimate of sig, which is faster. I have not found cases where this makes much difference (but would be interested to know of one). And my last problem is, optim with my defined feasible space does not converge. I have about 20 variables that I need to identify using MLE method. Is there any other way that I can try out? I tried most of the methods available in optim already. They do not converge at all.. Thank you. This used to be a well known problem for multivariate state space models, but seems to be largely forgotten. It does not seriously affect univariate models. For a review of the old literature see section 5 of Gilbert 1993, available at http://www.bank-banque-canada.ca/pgilbert/research.php#MiscResearch. The bottom line is that you are in trouble trying to use hill climbing methods and state-space models unless you have a very good starting point, or are luck. This is a feature of the parameter space, not a reflection on the optimization routine. One solution is to start by estimating a VAR or ARMA model and then convert it to a state space model, which was one of the original purposes of dse. (See ?bft for example.) Paul Gilbert - adschai __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. La version française suit le texte anglais. This email may contain privileged and/or confidential information, and the Bank of Canada does not waive any related rights. Any distribution, use, or copying of this email or the information it contains by other than the intended recipient is unauthorized. If you received this email in error please delete it immediately from your system and notify the sender promptly by email that you have done so. Le présent courriel peut contenir de l'information privilégiée ou confidentielle. La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute diffusion, utilisation ou copie de ce courriel ou des renseignements qu'il contient par une personne autre que le ou les destinataires désignés est interdite. Si vous recevez ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans délai à l'expéditeur un message électronique pour l'aviser que vous avez éliminé de votre ordinateur toute copie du courriel reçu
Re: [R] kalman filter estimation
Giovanni Petris wrote: Kalman filter for general state space models, especially in its naive version, is known for its numerical instability. This is the reason why people developed square root filters, based on Cholesky decomposition of variance matrices. In package dlm the implementation of Kalman filter is based on the singular value decomposition (SVD) of the relevant variance matrices, providing for a more robust algorithm than the standard square root filter. The lack of convergence of optimization algorithms in finding MLEs of unknown parameters is not surprising to me, especially if you have many parameters. When using state space models it is easy to end up with a fairly flat, or multimodal likelihood function. These are two signs of poor identifiability of the model. Lack of identification makes things worse, but these problems can occur even when the model is identified. The problem is actually worse then what you indicate. There can be large regions of parameter space where the likelihood increases as parameters diverge to infinity. This was the basis of the literature on chart switching procedures. Also, it is not so much the number of parameters that cause the problem as it is the dimension of the output (though they tend to be related). A univariate model with a large state is usually not too bad if it is identified and there is enough data. However, multiple series can cause problems even with relatively few parameters. You can live with it, but be aware that it is there. My suggestion is to start the optimization from several different initial values and compare maximized values of the likelihood. Simulated annealing may be used to better explore the parameter space. Yes. Are you aware of any work on this in R? Paul HTH, Giovanni Date: Thu, 15 Nov 2007 04:41:26 + (GMT) From: [EMAIL PROTECTED] Sender: [EMAIL PROTECTED] Priority: normal Precedence: list Hi, Following convention below: y(t) = Ax(t)+Bu(t)+eps(t) # observation eq x(t) = Cx(t-1)+Du(t)+eta(t) # state eq I modified the following routine (which I copied from: http://www.stat.pitt.edu/stoffer/tsa2/Rcode/Kall.R) to accommodate u(t), an exogenous input to the system. for (i in 2:N){ xp[[i]]=C%*%xf[[i-1]] Pp[[i]]=C%*%Pf[[i-1]]%*%t(C)+Q siginv=A[[i]]%*%Pp[[i]]%*%t(A[[i]])+R sig[[i]]=(t(siginv)+siginv)/2 # make sure sig is symmetric siginv=solve(sig[[i]]) # now siginv is sig[[i]]^{-1} K=Pp[[i]]%*%t(A[[i]])%*%siginv innov[[i]]=as.matrix(yobs[i,])-A[[i]]%*%xp[[i]] xf[[i]]=xp[[i]]+K%*%innov[[i]] Pf[[i]]=Pp[[i]]-K%*%A[[i]]%*%Pp[[i]] like= like + log(det(sig[[i]])) + t(innov[[i]])%*%siginv%*%innov[[i]] } like=0.5*like list(xp=xp,Pp=Pp,xf=xf,Pf=Pf,like=like,innov=innov,sig=sig,Kn=K) } I tried to fit my problem and observe that I got positive log likelihood mainly because the log of determinant of my variance matrix is largely negative. That's not good because they should be positive. Have anyone experience this kind of instability? Also, I realize that I have about 800 sample points. The above routine when being plugged to optim becomes very slow. Could anyone share a faster way to compute kalman filter? And my last problem is, optim with my defined feasible space does not converge. I have about 20 variables that I need to identify using MLE method. Is there any other way that I can try out? I tried most of the methods available in optim already. They do not converge at all.. Thank you. - adschai __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. La version française suit le texte anglais. This email may contain privileged and/or confidential information, and the Bank of Canada does not waive any related rights. Any distribution, use, or copying of this email or the information it contains by other than the intended recipient is unauthorized. If you received this email in error please delete it immediately from your system and notify the sender promptly by email that you have done so. Le présent courriel peut contenir de l'information privilégiée ou confidentielle. La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute diffusion, utilisation ou copie de ce courriel ou des renseignements qu'il contient par une personne autre que le ou les destinataires désignés est interdite. Si vous recevez ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans délai à l'expéditeur un message électronique pour l'aviser que
Re: [R] Multivariate time series
Package dse does do linear, multivariate time series with multivariate exogenous inputs, using VAR, multivariate ARMA, and state-space models, just like you are describing. You can specify the model or have it automatically determined. Perhaps you could give a few more details about what you are looking for. (I assume you have looked at the dse Users Guide distributed with the bundle, or some of the help, for example ?estVARXls, ?bft, and ?forecast. Sections 6 and 7 of the guide are about the prediction problem, very much like you describe.) Paul Giusy wrote: Hello to everyone! I have a question for you..I need to predict multivariate time series, for example sales of 2 products related one to the other, having the 2 prices like inputs.. Is there in R a function to do it? I saw dse package but I didn't find what a I'm looking for.. Could anyone help me? Thank you very much Giusy La version française suit le texte anglais. This email may contain privileged and/or confidential in...{{dropped:26}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error: X11 cannot allocate additional graphics colours.
(This could be fixed, it has not happened to me in a long time, but I will mention it mainly because it is not something you are likely to think of.) It used to be that the X colours might be defined by the first application that needed them, so if the systems administrator happened to start up X and mozilla on the server, especially if the server had limited graphics, then that could potentially cause this problem. I never really understood why it would be like this, so probably it is fixed. (This was not an R bug.) Paul Marc Schwartz wrote: This is not a X server version issue. There are Linux system (not R) display settings that will dictate the number of simultaneous colors that can be displayed. This will be dependent upon the display resolution defined and the amount of video RAM on the graphics card. The higher the display resolution the more video memory it takes since you have more pixels. The more colors you want to display simultaneously, the more memory per pixel it takes. Those concepts, BTW, are the same on Windows and OS X. So if you are running directly on the RHEL system, check the display setting to be sure that it is set for a sufficient number of colors. This would be an administrative task requiring root privilege. Ideally, you want so-called TrueColor or Millions of Colors to be set. This requires a pixel depth of 24 bpp or 32 bpp, depending upon VRAM available. On the other hand, if you are connected remotely to the RHEL server, using ssh and Xvfb on a server that is not running X, then you will need to adjust (or have the SysAdmin adjust) the -pixdepths setting on the RHEL server. This controls the 'bpp' available on the server. See 'man Xvfb' for more information for the latter scenario. HTH, Marc Schwartz On Mon, 2007-10-15 at 20:53 +0100, michael watson (IAH-C) wrote: Thanks for the response... My confusion about plot stems from the fact I am plotting 82 points with 82 colours, so surely all colours get plotted? As for updating X, I recently installed the latest version of XFree86 for my version of linux, RHEL 4. As for Brian's e-mail you quoted, I do try and look at things like that, but I don't know what arguments he refers to: Run the X11 device with the arguments stated Which arguments, and how do I run my X11 device with them? I've pretty much had this problem with every version of R on linux, all the way from SuSe 8.2 to SuSe 9.2, through to RHEL3 and now RHEL4 - I always have this problem. Perhaps you could tell me which version of X you use to generate 1000 colours without getting the error message? I'm sorry, but what i would really love is someone to say RHEL4? Aha , you need to install X version foo, available from http://www.foo.com/bar.rpm...) -Original Message- From: Charles C. Berry [mailto:[EMAIL PROTECTED] Sent: Mon 15/10/2007 7:54 PM To: michael watson (IAH-C) Cc: r-help@r-project.org Subject: Re: [R] Error: X11 cannot allocate additional graphics colours. You knew this? http://tolstoy.newcastle.edu.au/R/e2/help/06/09/0640.html I cannot replicate your error. I use n - 1000 on R-2.6.0, and it still works. Only a guess, but maybe your X setup is out of date. Maybe an update would help? As for why axis triggers this, axis uses all the colors, but image only uses (something like) those that fall into the bins shown here: hist( mat, breaks=n ) As you see there are usually some empty bins and those colors do not get rendered till axis() does its thing. Chuck On Mon, 15 Oct 2007, michael watson (IAH-C) wrote: Dear All Another one I have touched on before with a much older OS and version. My sessionInfo() is: sessionInfo() R version 2.5.1 (2007-06-27) i686-redhat-linux-gnu locale: LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.U TF-8;LC_MONETARY=en_GB.UTF-8;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF- 8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_ID ENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods [7] base other attached packages: DetectiV 1.1 I'm getting the following error message when plotting a quite complex graph: Error in axis(side = 2, at = c(min[i] - sptl, max[i] + sptl), labels = NA, : Error: X11 cannot allocate additional graphics colours. Consider using X11 with colortype=pseudo.cube or gray. The problem is I only get it under certain circumstances. I have some quite convoluted test code: n - 82 mat - matrix(rnorm(n*10), ncol=n) fcolors - terrain.colors(n) image(z=mat, axes=FALSE) oneis - 1 / ncol(mat) sptl - oneis / 3 max - 1:n * oneis min - c(0, max[1:length(max)-1]) for (i in 1:n) { axis(side=2, at=c(min[i]-sptl,max[i]+sptl), labels=NA, line=0.9, lwd=3, lty=1, tick=TRUE, tck=0, col=fcolors[i],lend=2) } Now, this code works without error on values of n up to and