Re: [R] Limiting the scope of RNGkind/set.seed

2019-04-17 Thread Paul Gilbert

Elizabeth

There is a package (of mine) setRNG on CRAN that may be a helpful 
example (code/tests/examples/vignette). Most of the package is testing 
designed to fail if the RNG in R is changed in a way that will affect my 
other package testing. Martin's function in the previous reply has most 
of the import parts and adds warning suppression, so you might want to 
consider small adjustments on a combination of the two. Just to 
summarize the issues, from memory:


0/ Using a preset default seed in a function's argument makes the 
function not random by default. If you are doing that then maybe you 
need to consider carefully whether the function should be using random 
number generation.


1/ It is good practice to use on.exit() in your function to reset 
things, so the state remains unaltered if your function fails.


2/ Saving the old seed does not work when it is unset, as it is by 
default in a new session, so you need to do something that insures it is 
set.


3/ You may need to save and reset not only the seed but also the RNG 
kind and the normal.kind, and possibly the kind for some other 
distributions. (setRNG does not handle other distributions.) It looks 
like you need to save and reset sample.kind.


4/ You should add the capability to pass all settings to your functions 
so that you can reproduce things when you want.


5/ I have found it useful to always pass back the settings in objects 
returned by functions like simulations. That way you always have a 
record when you discover something you want to reproduce.


6/ If parallel computing is considered then for reproducibility you need 
to save the number of nodes in the cluster. (I think this point is not 
as widely known as it should be.)


No doubt I have forgotten a few things.

Paul Gilbert

On 4/17/19 6:00 AM, r-help-requ...@r-project.org wrote:
> Date: Tue, 16 Apr 2019 19:22:34 +0200
> From: Martin Maechler
> To: Elizabeth Purdom
> Cc: Bert Gunter, R-help
> 
> Subject: Re: [R] Limiting the scope of RNGkind/set.seed
> Message-ID:<23734.3930.10744.126...@stat.math.ethz.ch>
> Content-Type: text/plain; charset="utf-8"
>
>>>>>> Elizabeth Purdom
>>>>>>  on Tue, 16 Apr 2019 09:45:45 -0700 writes:
>  > Hi Bert, Thanks for your response. What you suggest is
>  > more or less the fix I suggested in my email (my second
>  > version of .rcolors). I writing more because I was
>  > wondering if there was a better way to work with RNG that
>  > would avoid doing that. It doesn’t feel very friendly for
>  > my package to be making changes to the user’s global
>  > environment, even though I am setting them back (and if it
>  > weren’t for the fact that setting the new R 3.6 argument
>  > `sample.kind=“Rounding”` creates a warning, I wouldn’t
>  > have even realized I was affecting the user’s settings, so
>  > it seems potentially hazardous that packages could be
>  > changing users settings without them being aware of
>  > it). So I was wondering if there was a way to more fully
>  > isolate the command.  Thanks, Elizabeth
>
> Hi Elizabeth,
>
> there's actually something better -- I think -- that you can do:
>
> You store .Random.seed  before doing an RNGkind() & set.seed()
> setting, do all that, and make sure that .Random.seed is
> restored when leaving your function.
>
> This works because the (typically quite long) .Random.seed
> stores the full state of the RNG, i.e., all RNGkind() settings
> *and*  the result of set.seed() , calling r(n, ..)  etc.
>
> If you additionally use  on.exit()  instead of manually reset
> things, you have the additional advantage, that things are also
> reset when your functions ends because the user interrupts its
> computations, or an error happens, etc.
>
> So, your function would more elegantly (and robustly!)  look like
>
> .rcolors <- function(seed = 23589) {
>  if(!exists(".Random.seed", envir = .GlobalEnv)) {
>  message("calling runif(1)"); runif(1) }
>  old.R.s <- .Random.seed
>  ## will reset everything on exiting this function:
>  on.exit(assign(".Random.seed", old.R.s, envir=.GlobalEnv))
>  ## set seed for sample() "back compatibly":
>  suppressWarnings(RNGversion("3.5.0"))
>  set.seed(seed)
>  ## return random permutation of "my colors"
>  sample(colors()[-c(152:361)])
> }
>
> BTW, you can look at  simulate() methods in standard R, e.g.,
>
>stats:::simulate.lm
>
> to see the same method use [optionally, with slightly more 
sophistication]

>
>
> Best,
> Martin
>
> Martin Mächler
> ETH Zurich, Switzerland

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random Seed Location

2018-03-05 Thread Paul Gilbert



On 03/04/2018 07:14 PM, Henrik Bengtsson wrote:

On Sun, Mar 4, 2018 at 3:23 PM, Duncan Murdoch  wrote:

...

An issue is that .Random.seed doesn't contain the full state of the RNG
system, so restoring it doesn't necessarily lead to an identical sequence of
output.  The only way to guarantee the sequence will repeat is to call
set.seed(n), and that only leads to a tiny fraction of possible states.

Expanding .Random.seed so that it does contain the full state would be a
good idea, and that would make your preserveRandomSeed really easy to write.

Here's a demo that .Random.seed is not enough:


set.seed(123, normal.kind = "Box-Muller")
rnorm(1)

[1] -0.1613431

save <- .Random.seed
rnorm(1)

[1] 0.6706031

.Random.seed <- save
rnorm(1)

[1] -0.4194403

If .Random.seed were the complete state, the 2nd and 3rd rnorm results would
be the same.


To be pedantic, it is not the RNG state that is the problem, it is the 
state of the normal transformation "Box-Muller".  And, again pedantic


>So, this is is only for some of the RNG kinds.

As I recall, it is not a problem for any RNG kinds, it is only a problem 
with the Box-Muller normal.kind. Things may have changed, and parallel 
adds the need to record number of nodes.



Is the reason for this
limitation that it is not possible for R (not even the R internals) to
get hold of some of the RNG generators?  In other words, it is
unlikely to ever be fixed?


It has been around for a very long time (since at least R 0.99) and, as 
I recall, it was not easy to fix. I think the more substantial reason is 
that Box-Muller is no longer the default or preferred normal generator. 
It may only be used for backward compatibility, in which case messing 
with it could be a disaster with very little potential benefit.


Paul

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random Seed Location

2018-03-04 Thread Paul Gilbert

On Mon, Feb 26, 2018 at 3:25 PM, Gary Black <gwblack...@sbcglobal.net>
wrote:

(Sorry to be a bit slow responding.)

You have not supplied a complete example, which would be good in this 
case because what you are suggesting could be a serious bug in R or a 
package. Serious journals require reproducibility these days. For 
example, JSS is very clear on this point.


To your question
> My question simply is:  should the location of the set.seed command 
matter,

> provided that it is applied before any commands which involve randomness
> (such as partitioning)?

the answer is no, it should not matter. But the proviso is important.

You can determine where things are messing up using something like

 set.seed(654321)
 zk <- RNGkind()# [1] "Mersenne-Twister" "Inversion"
 zk
 z <- runif(2)
 z
 set.seed(654321)

 #  install.packages(c('caret', 'ggplot2', 'e1071'))
 library(caret)
 all(runif(2)  == z)   # should be true but it is not always

 set.seed(654321)
 library(ggplot2)
 all(runif(2)  == z)   # should be true

 set.seed(654321)
 library(e1071)
 all(runif(2)  == z)   # should be true

 all(RNGkind() == zk)  # should be true

On my computer package caret seems to sometimes, but not always, do 
something that advances or changes the RNG. So you will need to set the 
seed after that package is loaded if you want reproducibility.


As Bill Dunlap points out, parallel can introduce much more complicated 
issues. If you are in fact using parallel then we really need a new 
thread with a better subject line, and the discussion will get much messier.


The short answer is that, yes you should be able to get reproducible 
results with parallel computing. If you cannot then you are almost 
certainly doing something wrong. To publish you really must have 
reproducible results.


In the example that Bill gave, I think the problem is that set.seed() 
only resets the seed in the main thread, the nodes continue to operate 
with unreset RNG. To demonstrate this to yourself you can do


 library(parallel)
 cl <- parallel::makeCluster(3)
 parallel::clusterCall(cl, function()set.seed(100))
 parallel::clusterCall(cl, function()RNGkind())
 parallel::clusterCall(cl, function()runif(2)) # similar result from 
all nodes

   # [1] 0.3077661 0.2576725

However, do *NOT* do that in real work. You will be getting the same RNG 
stream from each node. If you are using random numbers and parallel you 
need to read a lot more, and probably consider a variant of the 
"L'Ecuyer" generator or something designed for parallel computing.


One special point I will mention because it does not seem to be widely
appreciated: the number of nodes affects the random stream, so recording 
the number of compute nodes along with the RNG and seed information is 
important for reproducible results. This has the unfortunate consequence 
that an experiment cannot be reproduced on a larger cluster. (If anyone 
knows differently I would very much like to hear.)


Paul Gilbert


> Hi all,
>
> For some odd reason when running naïve bayes, k-NN, etc., I get slightly
> different results (e.g., error rates, classification probabilities) from
> run
> to run even though I am using the same random seed.
>
> Nothing else (input-wise) is changing, but my results are somewhat
> different
> from run to run.  The only randomness should be in the partitioning, 
and I

> have set the seed before this point.
>
> My question simply is:  should the location of the set.seed command 
matter,

> provided that it is applied before any commands which involve randomness
> (such as partitioning)?
>
> If you need to see the code, it is below:
>
> Thank you,
> Gary
>
>
> A.  Separate the original (in-sample) data from the new 
(out-of-sample)

> data.  Set a random seed.
>
>> InvestTech <- as.data.frame(InvestTechRevised)
>> outOfSample <- InvestTech[5001:nrow(InvestTech), ]
>> InvestTech <- InvestTech[1:5000, ]
>> set.seed(654321)
>
> B.  Install and load the caret, ggplot2 and e1071 packages.
>
>> install.packages(“caret”)
>> install.packages(“ggplot2”)
>> install.packages(“e1071”)
>> library(caret)
>> library(ggplot2)
>> library(e1071)
>
> C.  Bin the predictor variables with approximately equal counts using
> the cut_number function from the ggplot2 package.  We will use 20 bins.
>
>> InvestTech[, 1] <- cut_number(InvestTech[, 1], n = 20)
>> InvestTech[, 2] <- cut_number(InvestTech[, 2], n = 20)
>> outOfSample[, 1] <- cut_number(outOfSample[, 1], n = 20)
>> outOfSample[, 2] <- cut_number(outOfSample[, 2], n = 20)
>
> D.  Partition the original (in-sample) data into 60% training and 40%
> validation sets.
>
>> n <- nrow(InvestTech)
>

Re: [R] Optim function returning always initial value for parameter to be optimized

2018-02-10 Thread Paul Gilbert



On 02/10/2018 06:00 AM, r-help-requ...@r-project.org wrote:

Did you check the gradient? I don't think so. It's zero, so of course
you end up where you start.

Try

data.input= data.frame(state1 = (1:500), state2 = (201:700) )
err.th.scalar <- function(threshold, data){

 state1 <- data$state1
 state2 <- data$state2

 op1l <- length(state1)
 op2l <- length(state2)

 op1.err <- sum(state1 <= threshold)/op1l
 op2.err <- sum(state2 >= threshold)/op2l


I think this function is not smooth, and not even continuous. Gradient 
methods require differentiable (smooth) functions. A numerical 
approximation will be zero unless you are right near a jump point, so 
you are unlikely to move from your initial guess.


Paul


 total.err <- (op1.err + op2.err)

 return(total.err)
}

soln <- optim(par = 300, fn=err.th.scalar, data = data.input, method =
"BFGS")
soln
require("numDeriv")
gtest <- grad(err.th.scalar, x=300, data = data.input)
gtest


On 2018-02-09 09:05 AM, BARLAS Marios 247554 wrote:

data.input= data.frame(state1 = (1:500), state2 = (201:700) )

with data that partially overlap in terms of values.

I want to minimize the assessment error of each state by using this function:

err.th.scalar <- function(threshold, data){
   
   state1 <- data$state1

   state2 <- data$state2
   
   op1l <- length(state1)

   op2l <- length(state2)
   
   op1.err <- sum(state1 <= threshold)/op1l

   op2.err <- sum(state2 >= threshold)/op2l
   
   total.err <- (op1.err + op2.err)


   return(total.err)
}


SO I'm trying to minimize the total error. This Total Error should be a U shape 
essentially.


I'm using optim as follows:

optim(par = 300, fn=err.th.scalar, data = data.input, method = "BFGS")


Maybe develop an analytic gradient if it is very small, as the numeric
approximation can then be zero even when the true gradient is not.

JN




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] require vs library ( Some basic time series questions)

2016-11-17 Thread Paul Gilbert



On 11/17/2016 04:49 PM, Jeff Newmiller wrote:

require(tfplot)
tfplot(x.ts)


Would just like to point out that require() should not be treated as
interchangeable with library(). The former returns a logical status
indicating success or failure, while the latter throws an error if it
falls.  You should reserve use of require() for cases when you are
implementing an alternative path of execution for failure, and in
nearly all usual cases use the library() function instead so hapless
users of your script don't have to sift through all the subsequent
errors to figure out the problem.

Mea culpa.  Force of habit from usually writing with an alternative path 
of execution. In this example I should have used library(), especially 
since 'tfplot' may not have been installed on the user's system and so 
the library() error message would be more explicit than the warning from 
require(). But "in nearly all usual cases" seems a bit strong. It 
implies R users don't usually program much and thus do not implement an 
alternative path of execution for failure. (Some of us consider that the 
most usual case.)


Paul

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Some basic time series questions

2016-11-17 Thread Paul Gilbert


On 11/17/2016 06:00 AM, r-help-requ...@r-project.org wrote:

Hi,

As I sit and learn how to work with time series, I've run into a problem
that is eluding a quick easy answer (and googling for answers seems to
really slow the process...)

Question #1--
In a simple example on R 3.3.1 (sorry my employer hasn't upgraded to 3.3.2
yet):

x=rnorm(26,0,1)
x.ts<-ts(x,start=c(2014,9),frequency=12)

inputting x.ts at the prompt gives me a table with the rooms denoted by
year and columns denoted by months and everything lines up wonderfully.

Now my problem comes when I type at the prompt

plot(x.ts)  or
plot(x.ts, xlab="") or
plot.ts(x.ts,xlab="")

I get a plot of the values, but my x-axis labels are 2015.0, 2015.5,
2016.0, and 2016.5 .  January 2015 is coming out as 2015.0...

Is there a way of getting a more intelligible x-axis labeling?  Even 2015.1
for Janaury, etc. would work, or even getting an index (either Septemebr
2014 representing 0 or 1 and it incrementally increasing each month).


There are several ways to do this. Having some bias, I would do

  require(tfplot)
  tfplot(x.ts)



Question #2--
If I have a time series of decadal events, how best should I set the
frequency.  It is historical data, in the form of say AD 610-619 5 events,
AD 620-629 7 events, etc.


For anything other than annual, quarterly, and monthly data you probably 
should consider zoo. (There are other options, but I think zoo is the 
most widely used for some time now.) Guessing a bit how you think of 
this data, I would say you want the date index to be the first year of 
the decade. To illustrate, I can generate a hundred decades of random 
data and index it thus:


require(zoo)
x <- zoo(round(10 * runif(100)) , order.by= 10 * 61:160)

Paul



Sorry for such a basic questions.  Any advice would be appreciated.

Thanks in advance, MEH



Mark E. Hall, PhD
Assistant Field Manager
Black Rock Field Office
Winnemucca District Office
775-623-1529.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Dealing with -Inf in a maximisation problem.

2016-11-07 Thread Paul Gilbert


I am trying to deal with a maximisation problem in which it is possible
for the objective function to (quite legitimately) return the value
-Inf,


(Just to add to the pedantic part of the discuss by those of us that do 
not qualify as younger and wiser:)


Setting log(0) to -Inf is often convenient but really I think the log 
function is undefined at zero, so I would not refer to this as "legitimate".



which causes the numerical optimisers that I have tried to fall over.


In theory as well as practice. You need to have a function that is 
defined on the whole domain.




The -Inf values arise from expressions of the form "a * log(b)", with b
= 0.  Under the *starting* values of the parameters, a must equal equal
0 whenever b = 0, so we can legitimately say that a * log(b) = 0 in


This also is undefined and not "legitimate". I think there is no reason 
it should be equal zero. We tend to want to set it to the value we think 
of as the "limit": for a=0 the limit as b goes to zero would be zero, 
but the limit of a*(-inf) is -inf as a goes to zero.


So, you really do need to avoid zero because your function is not 
defined there, or find a redefinition that works properly at zero. I 
think you have a solution from another post.


Paul


these circumstances.  However as the maximisation algorithm searches
over parameters it is possible for b to take the value 0 for values of
a that are strictly positive.  (The values of "a" do not change during
this search, although they *do* change between "successive searches".)

Clearly if one is *maximising* the objective then -Inf is not a value of
particular interest, and we should be able to "move away".  But the
optimising function just stops.

It is also clear that "moving away" is not a simple task; you can't
estimate a gradient or Hessian at a point where the function value is -Inf.

Can anyone suggest a way out of this dilemma, perhaps an optimiser that
is equipped to cope with -Inf values in some sneaky way?

Various ad hoc kludges spring to mind, but they all seem to be fraught
with peril.

I have tried changing the value returned by the objective function from
"v" to exp(v) --- which maps -Inf to 0, which is nice and finite.
However this seemed to flatten out the objective surface too much, and
the search stalled at the 0 value, which is the antithesis of optimal.

The problem arises in a context of applying the EM algorithm where the
M-step cannot be carried out explicitly, whence numerical optimisation.
I can give more detail if anyone thinks that it could be relevant.

I would appreciate advice from younger and wiser heads! :-)

cheers,

Rolf Turner

-- Technical Editor ANZJS Department of Statistics University of
Auckland Phone: +64-9-373-7599 ext. 88276


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Optimization issue - Solution converges for any initial value

2016-09-30 Thread Paul Gilbert

Narendra

You should check the definition of your objective function, something 
seems to be wrong.  When you evaluate the objective function with your 
initial guess, and then with some different value, you should expect the 
returned value will be different, but it is not:


> print(Koval.Error.func(vp.kval, n=1), digits=20)
[1] 2767.1357969742521163
> print(Koval.Error.func(2*vp.kval, n=1), digits=20)
[1] 2767.1357969742521163

So the objective function is flat. Optim converges with the message 
"CONVERGENCE: NORM OF PROJECTED GRADIENT <= PGTOL". Of course, the 
gradient is zero because the objective function is flat.


This would normally be considered a "user error" but perhaps John's 
newer code can catch it?


Paul


Date: Thu, 29 Sep 2016 14:53:08 -0500
From: Narendra Modi 
To: "r-help@r-project.org" 
Subject: [R] Optimization issue - Solution converges for any initial
value
Message-ID:

Re: [R] how to transform db query result into a set of timeseries

2016-09-07 Thread Paul Gilbert
There is a utility function TSquery() in package TSsql that attempts to 
do this. Most of the functions in that package are for databases with a 
specific layout intended for storing time series, but TSquery() attempts 
to build a series from a somewhat arbitrary database. It is hard to be 
completely generic and handle every possible database structure, so you 
might just examine the function for hints. I think it does not handle 
%H:%M:%S but the general logic should help.


The main problem is that your query is not guaranteed to return data in 
time order. (You may be lucky if you loaded it that way, but it can 
change unexpectedly.) You can do the ordering with the xts() order.by 
argument but it is probably quicker to do it in the db so you need less 
manipulation of the data you get back. TSquery() uses   ORDER BY in the 
sql query to ensure the order:


   q <- paste(q, " GROUP BY ", dates, " ORDER BY ", dates, " ;")

If the query result is df then I think you can construct your series 
simply with


  zonnen <- xts( cbind(df$M. df$G, df$N),
 order.by = as.POSIXct( df$Date,
   format="%Y-%m-%d %H:%M:%S") )

There are several other details in the function that you may find useful.

Paul Gilbert


Date: Mon, 5 Sep 2016 22:28:50 +0200
From: Stef Mientki <stef.mien...@gmail.com>
hello,

I've a number of timeseries into a database and want to display these
timeseries into graph.

Now the code below works well, but as the user can select which
timeseries should be shown (up to 20 timeseries) the code below should
be dynamic and can be quiet large and complex.

Is there an easier way to convert a database result into timeseries
accepted by dygraph ?

 SQL <- "select Date, M, G, N from Compare_Model"
 df <- dbGetQuery ( con, statement = SQL )

 zon1 <- xts ( df$M,  as.POSIXct ( df$Date, format="%Y-%m-%d
%H:%M:%S") )
 zon2 <- xts ( df$G,  as.POSIXct ( df$Date, format="%Y-%m-%d
%H:%M:%S") )
 zon3 <- xts ( df$N,  as.POSIXct ( df$Date, format="%Y-%m-%d
%H:%M:%S") )

 zonnen <- Reduce ( function(...) merge(..., all=TRUE ), list ( zon,
zon2, zon3 ))

 dygraph ( zonnen )


thanks,

Stef


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] FW: Multivariate ARIMA

2016-02-15 Thread Paul Gilbert
See also package dse.  There are examples in the guide: 
http://cran.at.r-project.org/web/packages/dse/vignettes/Guide.pdf


Paul

On 02/14/2016 06:00 AM, r-help-requ...@r-project.org wrote:

Date: Fri, 12 Feb 2016 18:12:37 + From: Thomas
Lofaro To:"r-help@R-project.org"
 Subject: [R] FW: Multivariate ARIMA
Question Message-ID:


 Content-Type: text/plain; charset="UTF-8"

Hi, sorry, I will try to make this short. Essentially, what I am
trying to do is run a multivariate ARIMA model predicting one
variable with another.  However, the only R packages I have found to
accommodate such an analysis are ARIMAX and VAR.  Unfortunately,
there don?t seem to be any good tutorials or demonstrations of how to
actually perform this analysis in practice, on my two columns of
data.  I was wondering if anyone knew of a resource that might help
(or one that provided an example of r code to accomplish this).
Thanks sooo much!


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Estimating MA parameters through arima or through package "dlm"

2016-01-24 Thread Paul Gilbert

(Sorry for the delay in responding to this.)

On 01/05/2016 06:00 AM, r-help-requ...@r-project.org wrote:

Date: Mon, 4 Jan 2016 11:31:22 -0500
From: Mark Leeds<marklee...@gmail.com>
To: Stefano Sofia<stefano.so...@regione.marche.it>
Cc:"r-help@r-project.org"  <r-help@r-project.org>
Subject: Re: [R] Estimating MA parameters through arima or through
package "dlm"
Message-ID:
<cahz+bwyxanjp4ksu2szmbfrlakh8_aswn07s_3ekyjeywua...@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"

Hi: I don't have time to look at the details of what you're doing but the
"equivalence"
between state space and arima ( as paul gilbert pointed out a few weeks ago
) is not a true equivalence.


To state this a bit more precisely, there is a true equivalence between 
the input-output  equivalence classes of (linear, time invariant) 
state-space models with state dimension n and the input-output 
equivalence classes of (linear, time invariant) ARMA models with 
McMillan degree n. (In fact, the quotient spaces are diffeomorphic not 
just isomorphic.) This means you should be able to get exactly 
comparable results for anything that is an equivalence class invariant, 
including residuals and anything calculated from residuals, such as 
likelihood. Model roots and thus stability are also invariants, but you 
probably will not get comparable results for most other things involving 
parameters.


This is not just a "statistical equivalence" as is sometimes suggested, 
it is an algebraic equivalence between quotient spaces.


However, if you estimate a state-space model and estimate an ARMA model, 
there are several other things that come into play related to 
estimation. Comparing the two estimated models you are unlikely to find 
comparable results. (Typically ARMA estimation is more robust at finding 
the best in my experience.) Even with simulation testing of estimation 
starting with "true" models it is problematic to get estimated models 
that are equivalent.


If you really want to see the equivalence you need to do a conversion of 
a model from one form to the other. I cannot speak to dlm, but dse was 
built for studying this equivalence and the users' guide has examples. 
If you are really interested in this topic, I recommend section 5 of 
http://www.bankofcanada.ca/1993/03/working-paper-199/,
I realize this is getting a bit old, but if you find a more up-to-date 
summary I would like to hear about it. That paper also has a 
demonstration that the equivalent models give results that are 
comparable within numerical precision of computers, not just to some 
statistical significance.


  if you are in an area of the parameter space that the state space
formulation
  can't reach, then you won't get the same parameter estimates. so, what
you're doing
might be okay or might not be, depending on whether the state space
formulation
can reach that area of the parameter space.


"Can't reach" is an estimation problem. This is typically a more serious 
problem with state-space models. The quotient spaces are diffeomorphic 
so in theory it should be possible to reach the same solution if you 
properly account for the fact that you are estimating on a smooth 
manifold and not a vector space. In practice you have also to worry 
about twisting of the parameter space and finite time for estimation, 
and gradients that may converge toward zero near boundaries of the 
manifold's charts.



there's another state space
formulation that is truly equivalent which is called the SSOE formulation
or innovations representation but
I don't know if you want to get into that. google "SSOE state space" if
you're interested.


The quotient space of input-output equivalence classes for innovations 
form models is equivalent to the quotient space of input-output 
equivalence classes for non-innovations form models. You need more 
information to identify a non-innovations form, typically some physical 
understanding of the system. On the bases of only input-output data, and 
with no additional understanding of the physical system, there would be 
no reason to choose a non-innovations form for estimation. There is more 
discussion of this in the above mentioned summary and in the dse user's 
guide.


BTW, estimation problems tend to be much more severe with multivariate 
series than with univariate series. This is not just because of the 
usual issues. Especially in state-space representations, the twisting of 
the parameter space seems to be especially bad.


Paul




Mark


On Mon, Jan 4, 2016 at 9:25 AM, Stefano Sofia <
stefano.so...@regione.marche.it> wrote:


>Dear list users,
>I want to use apply a MA(2) process (x=beta1*epsilon_(t-1) +
>beta2*epsilon_(t-1) + epsilon_(t)) to a given time series (x), and I want
>to estimate the two parameters beta1, beta2 and the variance of the random
>variable epsilon_(t).
>

Re: [R] Updating a Time Series After Forecast()

2016-01-22 Thread Paul Gilbert

Lorenzo

Berend's suggestion may be the simplest if you have univariate ts style 
series. If you are writing code and you want it to work with 
multivariate series and other time representations then you might want 
to consider the splice() function in package tframe.

  library(tfplot)
  ts3 <- splice(ts2, pred2$mean)

(and tframePlus if you need zoo and other support.)

Regards,
Paul

On 01/16/2016 06:00 AM, r-help-requ...@r-project.org wrote:

Date: Fri, 15 Jan 2016 13:02:58 +0100
From: Berend Hasselman
To: Lorenzo Isella
Cc:"r-help@r-project.org"  
Subject: Re: [R] Updating a Time Series After Forecast()
Message-ID:
Content-Type: text/plain; charset=us-ascii



>On 14 Jan 2016, at 22:36, Lorenzo Isella  wrote:
>
>Dear All,
>Perhaps I am drowning in a cup of water, since I am positive that the
>answer will be a one-liner.
>Consider the following short script
>
>
>
>library(forecast)
>
>ts2<-structure(c(339130, 356462, 363234, 378179, 367864, 378337, 392157,
>402153, 376361, 392204, 403483, 414034, 391967, 406067, 419464,
>434913, 410102, 424795, 437073, 448827, 415569, 430561, 444719,
>455764, 419892, 444190, 454648, 466312, 439922, 448963, 465153,
>475621, 445502, 457198, 473573, 485764, 463895, 470274, 484390,
>490678, 478003, 483570, 499141, 509216, 481395, 492345, 511184,
>513420, 483757, 490884, 514966, 515457, 497614, 510139, 523467,
>526406, 499784, 519033, 532009, 531260, 521539, 532590, 553118,
>557725, 548321, 556832, 578087, 578120, 566116, 580571, 587993,
>569985, 534326, 539641, 564824, 568445, 558614, 570192, 594584,
>598305, 593769, 598278, 620147, 615884, 611033, 609304, 630458,
>624325, 614356, 627192, 649324, 645988, 642965, 645125, 669471,
>665529, 664248, 669670, 694719), na.action = structure(1:64, class =
>"omit"), .Tsp = c(1991,
>2015.5, 4), class = "ts")
>
>fit2 <- auto.arima(ts2, approximation=FALSE,trace=FALSE)
>
>pred2 <- forecast(fit2, h=2)
>
>###
>
>So, I have an original quarterly time series ts2 and a forecast for 2
>quarters pred2.
>
>I would like to combine ts2 and pred2 (just the prediction) into a new
>time series (in other words, just stretch a bit ts2).
>How can I do that?

A possible way is this

ts3 <- ts(c(ts2,pred2$mean),start=start(ts2),frequency=frequency(ts2))

Most  likely there are more ways of getting what you want.

Berend


>Many thanks
>
>Lorenzo
>


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] matrix which results singular but at the same time positive definite

2015-12-15 Thread Paul Gilbert

Stefano

I think in other response to in this thread you got the answer to the 
question you asked, but you may be asking the wrong question. I'm not 
familiar with the specific papers you mention and you have not provided 
enough detail about what you are doing, so I am guessing a bit. The term 
"dynamic linear model" can refer to both linear ARMA/ARIMA models and to 
linear state-space models, however some authors use it to refer 
exclusively to state-space models and from your phrasing I am guessing 
you are doing that. There would be many state-space models equivalent to 
a given ARMA/ARIMA model, but without specifying structural aspects of 
the system you will likely be using one of the innovations form 
state-space models that are equivalent. In an innovations form 
state-space model the state is defined as an expectation. From a 
practical point of view, this is one of the most important differences 
between an innovation form and a non-innovations form state-space model. 
Since the expectation is known exactly, the state-tracking error is 
zero. That means the covariance matrix from the filter or smoother 
should be a zero matrix, which you should not be trying to invert. In a 
non-innovations form the state has a physical interpretation rather than 
being an expectation, so the state tracking error should not be 
degenerate in that case.


I mention all this because your covariance matrix looks very close to zero.

Paul Gilbert

On 12/11/2015 06:00 AM, r-help-requ...@r-project.org wrote:

Dear John, thank you for your considerations. This matrix (which is a
variance matrix) is part of an algorithm for forward-filtering and
backward-sampling of Dynamic Linear Models (West and Harrison, 1997),
applied to DLM representation of ARIMA processes (Petris, Petrone,
Campagnoli).  It is therefore very difficult to explain why this
variance matrix becomes so ill conditioned. This already happens at
the first iteration of the algorithm. I will try to work on initial
conditions and some fixed parameters.

Thank you again Stefano



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] package submission

2014-11-29 Thread Paul Gilbert


I was similarly bitten a couple of days ago when I submitted an update
of my deldir package.  I'm*pretty*  sure that I completed all the
requisite steps in the steps in the web-based submission procedure, but
I never received the promised confirmation email.  Being a thicko, I
overlooked this fact, and started to wonder why the update never
appeared on CRAN.  Eventually I got my act together, went over the
instructions and re-submitted the package.  This time I got the
confirmation email, and the update duly appeared on CRAN.

Maybe there's a buglet in the software that effects the automated
submission procedure.


I was similarly bitten but, paying closer attention on the second 
attempt, I suspect I thought I was finished at  successfully uploaded 
and I did not go all the way to the very bottom to see that I then had 
to click on submit to get the successfully submitted page. (Text of 
the messages from memory, so may not be exact.) Once successfully 
submitted, the email asking for confirmation is almost instantaneous.


For users like me that push the boundaries of discovering possible user 
errors, it would be helpful if the first message said successfully 
uploaded - now read this page and submit at the bottom.


Paul

I would have said it's more likely that I

fumble-fingered something in my initial submission attempt.  However,
hearing that someone else has had a similar experience makes me a wee
bit more suspicious that there's a bug.  Could be tricky to track down
but, given that if there is a bug its effect is intermittent and rare.

cheers,

Rolf Turner

P. S.  On my second try, the confirmation email arrived essentially
instantaneously.  So I would advise package submitters:  if you don't
get the confirmation email*right away*  then something has very likely
gone wrong.  Check the siteftp://CRAN.R-project.org/incoming/  as
advised and if your package ain't there, try again.

R. T.

On 29/11/14 09:11, Henrik Bengtsson wrote:

Fromhttp://cran.r-project.org/web/packages/policies.html:

When submitting a package to CRAN you should use the submission form
athttp://CRAN.R-project.org/submit.html  (and not send an email). You
will be sent a confirmation email which needs to be accepted.(*)
...
In either case, you can check that the submission was received by
looking atftp://CRAN.R-project.org/incoming/.;

(*) That confirmation email is sent automatically and instantaneously.
It confirms you as a maintainer and the submission (not the
acceptance).

/Henrik

On Fri, Nov 28, 2014 at 6:35 AM, Duncan Murdoch
murdoch.dun...@gmail.com  wrote:

On 28/11/2014 9:16 AM, Dan Vatnik wrote:


Hi,

I am the maintainer of the traj package.
I have added a vignette to the package this week(Wednesday) and uploaded
the new tarball to CRAN.
I have not yet received a confirmation e-mail.I do not know if the e-mail
is sent automatically or if a person needs to approve manually. I am just
wondering if I did everything correctly
or if there is a mistake in my uploading procedure.



The CRAN people make the decisions manually, so it may just be that nobody
has got to it yet.If you don't hear from them in a week, you might
consider writing to the submission email address, but I wouldn't do it
sooner than that.

-- Rolf Turner Technical Editor ANZJS


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] RODBC and PosgreSQL problems

2014-05-31 Thread Paul Gilbert
I'm not sure if this was due to mailer wrap, but I think you are putting 
a return between the end of the table name and the quote mark, and 
possibly that or a space is being included in the table name in the 
sqlQuery() statement. Do you have the same problem if you put the quote 
mark immediately after the table name, or if you put a ; after the table 
name?


(BTW, you will probably get better responses to this sort of question on 
the r-sig-db mailing list.)


Paul


Message: 23 Date: Fri, 30 May 2014 18:00:06 + From: Fraser D.
Neimanfnei...@monticello.org To:r-help@r-project.org
r-help@r-project.org Subject: [R] RODBC and  PosgreSQL problems
Message-ID:
2176ad174d58cb4abbda99f3458c20171be55...@granger.monticello.org
Content-Type: text/plain


Dear All,

I am trying for the first time to run SQL queries against a remote
PostgreSQL database via RODBC. I am able to establish a connection
just fine, as shown by getting results back from the sqlTables(),
sqlColumns() and sqlPrimary Key() functions in RODBC. However, when I
try to run a SQL query using the sqlQuery() function I get

[1] 42P01 7 ERROR: relation \tblceramicware\ does not
exist;\nError while executing the query [2] [RODBC] ERROR: Could
not SQLExecDirect '\n SELECT * \n FROM
tblCeramicWare

What am I doing wrong?

Here are the relevant snips from the R console.  What's puzzling is
that tblcermicWare is recognized as an argument to sqlColumns() and
sqlPrimaryKey() . But NOT in sqlQuery() .

Thanks for any pointers.

best, Fraser


library(RODBC)

# connect to DAACS and assign a name (DAACSch) to the connection
DRCch - odbcConnect(postgreSQL35W , case= nochange, uid
=XX,pwd=XX);

#list the tables that are avalailabale sqlTables(DRCch, tableType
= TABLE)

TABLE_QUALIFIER TABLE_OWNER   TABLE_NAME
TABLE_TYPE REMARKS 1   daacs-production  public
TempSTPTable  TABLE 2   daacs-production  public
activities  TABLE 3   daacs-production  public
articles  TABLE 4   daacs-production  public
schema_migrations  TABLE 5   daacs-production  public
tblACDistance  TABLE 6   daacs-production  public
tblArtifactBox  TABLE 7   daacs-production  public
tblArtifactImage  TABLE 8   daacs-production  public
tblBasicColor  TABLE 9   daacs-production  public
tblBead  TABLE



sqlColumns(DRCch, tblCeramicWare)

TABLE_QUALIFIER TABLE_OWNER TABLE_NAME COLUMN_NAME DATA_TYPE
TYPE_NAME PRECISION LENGTH SCALE RADIX NULLABLE 1 daacs-production
public tblCeramicWare  WareID 4  int410
4 0100 2 daacs-production  public tblCeramicWare
Ware-9   varchar50100NANA1
REMARKS COLUMN_DEF SQL_DATA_TYPE
SQL_DATETIME_SUB CHAR_OCTET_LENGTH ORDINAL_POSITION 1
nextval('global_id_seq'::regclass) 4   NA
-11 2   NA
-9   NA   1002 IS_NULLABLE
DISPLAY_SIZE FIELD_TYPE AUTO_INCREMENT PHYSICAL NUMBER TABLE OID BASE
TYPEID TYPMOD 1NA   11 23  1
1 27441   0 -1 2NA   50   1043
0   2 27441   0 50

sqlPrimaryKeys(DRCch, tblCeramicWare)

TABLE_QUALIFIER TABLE_OWNER TABLE_NAME COLUMN_NAME KEY_SEQ
PK_NAME 1 daacs-production  public tblCeramicWare  WareID
1 tblCeramicWare_pkey


sqlQuery(DRCch,paste(

+  SELECT * +  FROM tblCeramicWare +  )) [1]
42P01 7 ERROR: relation \tblceramicware\ does not exist;\nError
while executing the query [2] [RODBC] ERROR: Could not
SQLExecDirect '\n SELECT * \n FROM tblCeramicWare \n
'





Fraser D. Neiman Department of Archaeology, Monticello (434) 984
9812




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem connecting to database via RPostgreSQL/RS-DBI: could not connect error

2014-02-24 Thread Paul Gilbert
A post with subject Basic R to SQL considerations has been put on 
R-sig-DB to address some of the basic questions here. The R-sig-DB 
thread is not specific to  RPostgreSQL.



I have learned enough since this posting to be aware that my question was
based on assumptions so false to fact that they verge on the nonsensical,
but not enough to reframe it more sensibly. So I am just going to say what I
believed then and what I believe now, and perhaps you can head off the more
extreme excursions from reality in my current beliefs.

Because I knew that libpq depended on the version of PostgreSQL that was
installed, and that it used to be the case that you could not just install
the package, but it is now possible because of the addition of libpq, I
concluded that libpq contained an installer for PostgreSQL. I now think that
this is false, and that libpq is some sort of tailored interface that varies
with the OS and the PostgreSQL version. Tailored where and by what, I still
don't know, but maybe I do not need to know.

Since I thought I had to use the version of PostgreSQL installed by libpq, I
did not try to independently install PostgreSQL. Now I think that PostgreSQL
has to be installed and working before you start to install RPostgreSQL, in
order, among other things, to get the right libpq contents or settings.

I spent most of a day trying to figure out how to do what I was thinking of
as a strictly local installation of PostgreSQL, meaning one that did not
connect to the internet via a port and TCP/IP. I now believe that there is
no such thing, at least on a Windows machine, or that doing so would be
complex and difficult, compared to a connection via a machine-internal
TCP/IP connection -- something I did not know existed until the day before
yesterday.

For the last few days I have been trying, so far unsuccessfully, to install
PostgreSQL from the (misnamed) one click installer from  enterprisedb.com.
I have figured out that the installation and error logs are written to
stderr, and where my OS puts it. I've gone through the 40+ pages of log.
Although the log contains more than 2 dozen warnings and errors, I now
believe that all of the ones that matter derive in some way from this one:
Executing batch file 'rad3BBD8.bat'...
'icacls' is not recognized as an internal or external command, operable
program or batch file.

. . . and that this means I am having some sort of problem with the Windows
XP/NT access control/permissions system. Unfortunately I know nothing
whatsoever about the Windows permission system. Learning about it is the
next thing on my list.

I am really quite good at framing feasible research agendas to address
important policy questions on a shoestring. I'm good with regression, with
displaying complicated argument in sensible graphics. I am new to using
databases on my own behalf (rather than having someone to do it for me), but
I am finding the query side of databases elegant and intuitive. But I am not
a competent IT/sysadmin/tech support person, and I am not going to be one
soon, or maybe ever. So I find it somewhat disheartening that I am spending
so much time being just that.

Still, I am very grateful for the help and support of this list and the
various Stack Exchange lists, which have made possible such learning as I
have been able to accomplish.  Like the glaciers, my progress is slow but
inexorable. Or so I like to believe.

Warmest regards, andrewH



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Plotting multiple time series with variables having different units

2014-02-05 Thread Paul Gilbert
I'm not sure if I understand exactly what you mean, but if you want 
separate panels, one above the other, with a common time span but each 
with their own Y scale, then the function tfplot() in package tfplot 
does this. There are examples in the User Guide at 
http://cran.at.r-project.org/web/packages/tfplot/vignettes/Guide.pdf. Of 
course, if you want fine control over the plot then you are better off 
with one of the graphics packages, as others have suggested. Those would 
give you complete access to the underlying details. The objective of 
tfplot() is to quickly give you something that is fairly good.


I work mostly with economic data, where the different scales is 
important, but I rarely work with daily data so your mileage may vary. 
Please let me know if you discover problems.


(Sorry to come to this discussion a bit late, but I did not want to 
point you to a version I was about to replace. A new version (2014.2-1) 
is now on CRAN and will be on the mirrors shortly.)


Paul

On 02/03/2014 06:00 AM, r-help-requ...@r-project.org wrote:

Date: Sun, 02 Feb 2014 14:09:38 -0500
From: David Parkhurstparkh...@imap.iu.edu
To:r-help@r-project.org
Subject: [R] Plotting multiple time series with variables having
different   units
Message-ID:52ee97f2.3000...@imap.iu.edu
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

I've tried to figure out how to do this from what I read, but haven't
been successful.  Suppose I have a dataframe with variables Date, X, and
Y (and maybe U, V, and Z) where X, Y, etc. have different units.  I'd
like to plot Y vs. Time above X vs. Time, above one another.

For example, X is the number of gulls counted on a reservoir, and Y is
the number of coliform bacteria counted on a petri plate from a water
sample leaving the reservoir, so these have very different ranges. U and
V might be numbers of geese and numbers of ducks counted on the same days.

What commands would I use to create such a set of plots?
Thank you for any help.
David



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Package dependencies in building R packages

2013-12-31 Thread Paul Gilbert
The responses to this seem to be assuming that you want users to have 
access to your tt() function, that is, you export it. Just in case the 
really simple case has been overlooked: if you are only using this 
function internally in your package there should be no problem. Your 
package's namespace ensures that your package's functions find your 
tt(), and if you do not export it then user references to tt() will not 
find your version.


As others have pointed out, if you want users to have access to your 
tt() and they also need access to survival's tt(), then there is a 
problem. You either need to change the name or users will need to be 
explicit about which one they want.


(BTW - this is a question that could be asked on the R-devel list.)

Paul

On 13-12-31 06:00 AM, r-help-requ...@r-project.org wrote:

Date: Mon, 30 Dec 2013 20:01:46 +0100
From: Axel Urbizaxel.ur...@gmail.com
To: Duncan Murdochmurdoch.dun...@gmail.com
Cc:R-help@r-project.org  R-help@r-project.org
Subject: Re: [R] Package dependencies in building R packages
Message-ID:
CAAyVsXKCN1Zn4gbG=QkD759kaDadPbgp+9e3W6nPvOLVwAsU=w...@mail.gmail.com
Content-Type: text/plain

Thanks for your kind response Duncan. To be more specific, I'm using the
function mvrnorm from MASS. The issue is that MASS depends on survival and
I have a function in my package named tt() which conflicts with a function
in survival of the same name. I can think of 2 alternatives solutions to my
problem, but I'm to an expert:

1) Copy mvrnorm into my package, which I thought was not a good idea
2) Rename my tt() function to something else in my package, but this is
painful as I have it all over the place in other functions.

Any suggestions would be much appreciated.

Best,
Axel.


On Mon, Dec 30, 2013 at 7:51 PM, Duncan Murdochmurdoch.dun...@gmail.comwrote:


On 13-12-30 1:24 PM, Axel Urbiz wrote:


Dear users,

My package {foo} depends on a function miscFUN which is on package
{foo_depend}. This last package also depends on other packages, say {A, B,
C}, but miscFUN is not dependent on A, B, C (only on foo_depend).

In my package {foo}, is there a way to only have it depend on the function
miscFUN from {foo_depend} without having the user to have installed A, B,
C? (as none of those packages are needed for my package to work properly).
Also, is this a best practice?



There's no way for your package to tell R to ignore the dependencies
declared by foo_depend.

If you really only need one function from that package, simply copy the
source of miscFUN into your package (assuming foo_depend's license permits
that).  But this is not best practice, unless that function is very simple.
  Best practice is to declare your dependence by importing that function
from foo_depend.

Duncan Murdoch




[[alternative HTML version deleted]]



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Should there be an R-beginners list?

2013-11-25 Thread Paul Gilbert



On 13-11-25 06:00 AM, r-help-requ...@r-project.org wrote:

Date: Sun, 24 Nov 2013 13:04:43 -0600
From: Yihui Xiex...@yihui.name
To: Bert Guntergunter.ber...@gene.com
Cc:r-help@r-project.org  r-help@r-project.org
Subject: Re: [R] Should there be an R-beginners list?
Message-ID:
CANROs4d=usu3ofqmtxbugv9v4_t9r+c3m0s15lrdcfpdj1q...@mail.gmail.com
Content-Type: text/plain; charset=ISO-8859-1

I'm not aware of a discussion on this, but I would say no.


Just for the record, there was a discussion of this in a thread called 
newbie list in August 2001 when R-help started getting busy:

  http://tolstoy.newcastle.edu.au/R/help/01c/0880.html
Predates StackOverflow I think, but several of the comments may still be 
valid.


Paul


Fragmentation is bad. Further fragmentation is worse.

TL;DR
=

Actually I'd say all mailing lists except r-devel should be moving to
StackOverlow in the future (disclaimer: I'm not affiliated with it).
Mailing lists are good for a smaller group of people, and especially
good when more focused on discussions on development (including bug
reports). The better place for questions is a web forum. Both you and
I have been staying in these R mailing lists for a few years now. You
can recall how many times a user was asked to post to another mailing
list (this is not an appropriate list to ask your question; please
post to r-such-and-such instead), how many times you see something
like Alternative HTML removed, how many times you see a post Bla
Bla (was: Foo Bar), and how many times users were reminded Please
read the posting guide, Please do read, and PLEASE do read. But
it just does not help much even if you write PLEASE DO READ.

Why do we have such problems in the mailing lists again and again? Is
that simply because users are not respecting the rules? I do not think
so. I believe that is the flaw of mailing lists. A mailing list is
managed by a small team (hey, Martin, thank you). On StackOverflow,
you simply edit the tags of a post to make it belong to a new mailing
list (you can post with tags r+ubuntu+graphics, or r+lattice,
etc). There is no need to request and wait for the system admin to
make a decision. Users can help themselves, and help others as well.
HTML can be good in many cases, actually. Who hates syntax
highlighting and R plots in an R question? You are free to ask a
question that is poorly formatted, and there are good chances that it
will be immediately edited by another experienced user. You are free
to yell in the comments asking for more details before posting a
formal answer. You can express ah, this is a bad question by
down-voting so that future readers know that guy screwed up and we
just let the world ignore the noise. It is like peer-review, and the
reviewers can help you improve your post. In a mailing list, when you
are done, you are done. You are forever written in history, right or
wrong, smart or stupid. You want to delete your record in the history?
No, no, gentleman, it was your fault not reading the post guide.

For me, I understand all the rationale behind the mailing list model.
I'm just saying, the primary goal for such a service is to discuss
issues about R, instead of issues induced by the mailing list itself.
We could have made some issues not directly related to R go away by
community efforts instead of giving instructions a million times,
given an appropriate platform.

Five years, 42,000 posts:http://stackoverflow.com/questions/tagged/r
I'm not terribly worried about transition from mailing lists to SO.

Sorry about the generalization of the original topic, but I hate using
a new title Should there be R mailing lists? (was: Should there be an
R-beginners list?)

Last but not least, I probably need to clarify that I benefited a lot
from the mailing lists in the past, and I truly appreciate it. I wrote
this with the future in mind, not the past. The past was good, and the
future can be better.

Regards,
Yihui
--
Yihui Xiexieyi...@gmail.com
Web:http://yihui.name
Department of Statistics, Iowa State University
2215 Snedecor Hall, Ames, IA


On Sun, Nov 24, 2013 at 11:13 AM, Bert Guntergunter.ber...@gene.com  wrote:

Folks:

If this has been previously discussed and settled, please say so and
refer me to the discussion. If you believe this to be inappropriate or
otherwise frivolous, also please say so, as I do not wish to waste
your time or this space.

I write as a long time reader and sometimes contributor to r-help. Due
to R's growth in usage by a broad data analysis community (engineers,
scientists, social scientists, finance, informaticians, as well as
more traditional statisticians), this list seems to me to becoming
deluged by requests for help by casual users and students for whom R
is not going to be regularly or extensively used. I would characterize
this group as having only basic statistical, programming, and data
analysis skills. This is not meant as a criticism, and there are
certainly many for whom this is inaccurate. But 

Re: [R] Am I working with regularly spaced time series?

2013-10-22 Thread Paul Gilbert



On 13-10-22 06:00 AM, Weiwu Zhang zhangwe...@realss.com wrote:

My data is sampled once per minute.


At the same second each minute or not? Regularly spaced would mean 
exactly one minute between observations.


There are invalid samples, leaving

a lot of holes in the samples, successful sample is around 80% of all
minutes in a day. and during the last 4 months sampling, one month's
data was stored on a harddisk that failed, leaving a month's gap in
between.


This is called missing observations. With regular spacing you need to 
fill in the holes with NA. With irregular spacing you can either drop 
the missing observations or, if you know the time at which they were 
missed, you could fill in with NA.




So am I working with regularly spaced time series or not? Should I
padd all missing data with NAs, and start with ts(), and followed by
forecast package (which seems to have all the functions I need in the
begining) or should I start with a library with irregular time series
in mind?

Also, ts() manual didn't say how to create time-series with one minute
as daltat. Its seems to assume time-series is about dates. So the data
I have with me, is it really time series at all?


ts() representations works best with regularly spaced monthly, 
quarterly, or annual data. You can use it for other things if they fit 
nicely into the regular spaced observations with a frequency of 
observation, such as 12 times per year or 60 times per hour. This 
usually only makes sense if the frequency has something to do with your 
problem, like seasonality questions. You can also use frequency 1 for 
one observation per period, like annual data, which in your case would 
be once per minute. I'm inclined to think that a zoo (see package zoo) 
represenation would fit your problem better.


HTH,
Paul


Newbie question indeed. Thanks.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] creation of an high frequency series

2012-11-26 Thread Paul Gilbert


(from R-help Digest, Vol 117, Issue 26)


Date: Sun, 25 Nov 2012 07:49:05 -0800 (PST)
From: billycorgcandi...@gmail.com
To:r-help@r-project.org
Subject: [R] creation of an high frequency series
Message-ID:1353858545067-4650744.p...@n4.nabble.com
Content-Type: text/plain; charset=us-ascii

Hi R Users!

I would like to create an high frequency series but I am experiencing some
difficulties.

My series should start at 09.30 a.m. each day and end at 16.00 for, let's
say, 2 years. I don't care on how many observations are for each day. It's
ok also one observation each minute. In this case, I would have 390
observations each day.

I have tried the following:

start - ISOdatetime(year=2001, month=1, day=1, hour=9, min=0, sec=0,
tz=GMT)
end   - ISOdatetime(year=2001, month=1, day=1, hour=16, min=0, sec=0,
tz=GMT)

z - zooreg(NA, start = start, end = end, frequency=390)


For this problem
 z - zooreg(rep(NA,390), start = start, end = end, frequency=390)

works, however, if your real problem is that you have time stamps, 
possibly not equally spaced, then you should consider zoo() with the 
order.by argument, for example:


z - zoo(rep(NA,390), order.by = start + 30 * 0:389)

If your data source actually provides the time stamps then this should 
be very easy.


Note that neither of these do consistency checking between the length of 
the series and the specified time frame, that is up to you to do. You 
might get warning messages later indicating that something is not right 
if you made a mistake. Your zooreg() above seems to just ignore the 
inconsistency whereas I think zoo() with order.by recycles the data.


Paul



but the result is a poor, single NA. And I am not considering multiple
days..

How could I solve this problem??

Thank you so much!!


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] hessian fails for box-constrained problems when close to boundary?

2012-11-17 Thread Paul Gilbert

Matthieu

(regarding the final question, but not the ones before that.)

The short answer is that you seem to have a solution, since hessian in 
numDeriv works. The long answer follows.


Outside a constraint area one cannot assume that a function will 
evaluate properly. Some constraints are imposed because the function is 
not defined outside the region. This would mean that the gradient and 
hessian are not defined in the usual sense at the boundary; the left 
and right derivatives should exist and be the same in the limit. I 
have had on my to-do list for some time a project to add one-sided 
approximates in numDeriv, but that will probably not happen soon. If 
your solution is actually on the boundary, you might consider some 
substitution which removes the constraint, and consider the hessian on 
the reduced, unconstrained, space.


In your situation I would also be concerned about the optimum being very 
close, but not on the boundary. You might want to check that this is not 
an artifact of the convergence criteria. Check that the objective is 
actually inferior at a point on the boundary close to your optimum. 
Also, the gradient should be zero at the optimum, but that may not be 
exactly so because of numerical convergence criteria. If the gradient is 
pointing toward the boundary, that is a sign that you may just have 
converged too soon.


Also, the hessian coming from the optimization is built up as an 
approximation using information from the steps of the optimization. I am 
not familiar enough with the numerical handling of boundary constraints 
to know how this approximation from the optimization might be affected, 
but you may need to be careful in this regard.


Paul

On 12-11-15 10:25 AM, Matthieu Stigler wrote:

Hi

I am trying to recover the hessian of a problem optimised with
box-constraints. The problem is that in some cases, my estimates are
very close to the boundary, which will make optim(..., hessian=TRUE) or
optimHessian() fail, as they do not follow the box-constraints, and
hence estimate the function in the unfeasible parameter space.

As a simple example (my problem is more complex though, simple param
transformations do not apply ashere), imagine estimating mu and sigma
(restricted to be 0) of a simple normally distributed data, where
however the true sigma is very close to zero:

LLNorm - function(para, dat=rn)  return(-sum(dnorm(dat, para[1],
para[2], log=TRUE)))
rn2 - c(rep(10.3, 2000), 10.31)

 optim(c(10,1), fn=LLNorm, method=L-BFGS-B, lower=c(-Inf, 0.001),
dat=rn2,hessian=TRUE)
Error in optim(c(10, 1), fn = LLNorm, method = L-BFGS-B, lower =
c(-Inf,  : non-finite finite-difference value [2]

The only solution/workaround I found is to do a two steps procedure: use
optim() without hessian, then use optimHess, specifying the length of
the numerical variations (arg ndeps) as smaller as the distance between
the parameter and the bound, i.e.:
op-optim(c(10,1), fn=LLNorm, method=L-BFGS-B, lower=c(-Inf,
0.001), dat=rn2,hessian=FALSE)

H-optimHess(op$par, LLNorm, dat=rn2,
control=list(ndeps=rep((op$par[2]-0)/1000,2)))

While this solution works, it is surely not elegant, and I have
following questions:
1) is this solution to use smaller steps good at all?
2) I am not sure either I understand the reasons why the hessian in a
constrained optim problem is evaluated without the constraints, or at
least the problem transformed into a warning?

Furthermore, I realised that using the numDeriv package, the function
hessian() does not offer either constraints for the parameters, yet in
the special case of the example above, it does not fail (although would
have problems with parameter even closer to the bound), unlike the
optimHessian() function? See:
library(numDeriv)
hessian(LLNorm, op$par, dat=rn2)

This brings me to the final question: is there a technical reason not to
allow a constrained hessian, which seems to indicate the fact that the
two implementations in R do not do it? I found a very interesting answer
by Spencer Grave for  a similar question:
http://tolstoy.newcastle.edu.au/R/e4/help/08/06/15061.html
although this regards more the statistical implications than the
numerical issues.

Thanks a lot!

Matthieu


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Date Math

2012-10-15 Thread Paul Gilbert



On 12-10-15 06:00 AM, r-help-requ...@r-project.org wrote:

Jeff,

My understanding is that the lag command will lag an entire time series.  That 
isn't what I'm looking for.

I just want, for example, today, and 5 entries back.

for exmple:

iter - '2011-05-18'

observations[iter]   # works fine, returns the row at that date.

index(observations[iter[) # just returns the iter back

index(observations[iter]) - 5  # returns 2011-05-17 23:59:57 PDT, so it 
subtracted 3 seconds.

really, I want to find:  iter- 5 days.


Switching between 5 entries back and 5 days back has me confused about 
whether you have more than one entry some days or not. If you mean you 
have at most one entry per day, and you want the last five days that you 
have observations, then this part of your problem can be solved by 
switching from POSIXct to dates:

  z -  Sys.time()
 z
[1] 2012-10-15 08:34:58 EDT
 z -5
[1] 2012-10-15 08:34:53 EDT
 as.Date(z) -5
[1] 2012-10-10

(BTW, a complete example is always useful when asking for help. I think 
your obviously fails code would be pretty close to working, depending 
on how the dates were generated.)


Paul




--
Noah Silverman
Smart Media Corp.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Is there any package for Vector Auto-regressive with exogenous variable other than fastVAR?

2012-10-04 Thread Paul Gilbert


Package dse also does vector auto-regression with exogenous variables 
(and also vector ARMA and state-space models with exogenous variables).
But I don't understand what you mean by not taking the base in the 
model so I don't know if it will solve your problem.


Paul

On 12-10-04 06:00 AM, r-help-requ...@r-project.org wrote:

Date: Wed, 3 Oct 2012 23:08:54 -0700 (PDT)
From: Sankalpsankalpsing...@gmail.com
To:r-help@r-project.org
Subject: [R] Is there any package for Vector Auto-regressive with
exogenous variable other than fastVAR?
Message-ID:1349330934183-4644964.p...@n4.nabble.com
Content-Type: text/plain; charset=us-ascii

Is there any package for Vector Auto-regressive with exogenous variable other
than fastVAR?
Because it is not able to solve my problem of not taking the base in the
model.
Please suggest some appropriate solution



--
View this message in 
context:http://r.789695.n4.nabble.com/Is-there-any-package-for-Vector-Auto-regressive-with-exogenous-variable-other-than-fastVAR-tp4644964.html
Sent from the R help mailing list archive at Nabble.com.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about R performance on UNIX/LINUX with, different memory/swap configurations

2012-09-17 Thread Paul Gilbert



On 12-09-17 06:00 AM, r-help-requ...@r-project.org wrote:

Date: Sun, 16 Sep 2012 14:41:42 -0500
From: Dirk Eddelbuettele...@debian.org
To: Eberle, Anthonyae...@allstate.com
Cc:r-help@r-project.org
Subject: Re: [R] Question about R performance on UNIX/LINUX with
different   memory/swap configurations
Message-ID:20566.11126.175103.643...@max.nulle.part
Content-Type: text/plain; charset=us-ascii


On 16 September 2012 at 13:30, Eberle, Anthony wrote:
| Does anyone have any guidance on swap and memory configuration when
| running R v2.15.1 on UNIX/LINUX?  Through some benchmarking across
| multiple hardware (UNIX, LINUX, SPARC, x86, Windows, physical, virtual)
| it seems that the smaller memory machines have an advantage.

Would you be able to provide some empirical proofs for that conjecture?
Please demonstrate how, say, a X gb ram machine outperforms one with Y gb
where Y  X.

Less memory is/never/  better.


Exceptions to the rule:
There are a couple of unusual situations were less memory can be better. 
As someone else pointed out, if you start swapping then performance 
takes a hit. So, if you have enough physical memory to do your own job, 
but not enough to let other jobs run and push you into swapping, then 
you may get better performance on the single job. Second, if you have 
long recursions that grab lots of memory, then these will use swap. 
Typically one would want the job to succeed, and thus swap is good, but 
slow. However, if your code is bad and doing an infinite recursion, then 
with swap this can take a very long time to fail, whereas with only 
physical memory it fails much more quickly.
Third, there were (are?) situations where 32 bit machines/OSes perform 
(very slightly) faster than 64 bit machines/OSes, and since the former 
typically have less memory, one could have the impression that less 
memory is faster.


Paul


Dirk

-- Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R-help Digest, Vol 107, Issue 14

2012-01-15 Thread Paul Gilbert

On 12-01-14 06:00 AM, r-help-requ...@r-project.org wrote:

Date: Fri, 13 Jan 2012 02:09:09 -0800 (PST)
From: statquant2statqu...@gmail.com
To:r-help@r-project.org
Subject: Re: [R] simulating stable VAR process
Message-ID:1326449349804-4291835.p...@n4.nabble.com
Content-Type: text/plain; charset=us-ascii

Hello Paul
Thanks for the answer but my point is not how to simulate a VAR(p) process
and check that it is stable.
My question is more how can I generate a VAR(p) such that I already know
that it is stable.


Just to be clear, my answer was telling you how to check that the model 
is stable, not how to simulate a process and check if the process is 
stationary, which is probably what you mean when you say stable process. 
Beware that the terminology stable model is often used nowadays to 
refer to whether the parameters are changing over time. This has little 
to do with whether the model is stable in the sense of the modulus of 
the determinant of the polynomial. Elsewhere in this thread, I think 
John Frain has given you an algorithm for generating models.



We know a condition that assure that it is stable (see first message) but
this is not a condition on coefficients etc...


Yes, it is a condition on coefficients.


What I want is
generate say a 1000 random VAR(3) processes over say 500 time periods that
will be STABLE (meaning If I run stability() all will pass the test)


I am a bit confused by what you mean here. If you are saying generate 
500 time periods for each of the 1000 models, you do not have to 
generate any time periods unless you want simulated data, you can check 
if the models are stable directly, because this is a check on the 
coefficients of the model. [If you are saying that the model is changing 
over time, then the models may all individually be stable in the 
determinant sense, but this is not stable (by construction) in the 
second sense above. And, the generated process will probably not be 
stationary according to many of the usual definitions.]


When I try to do that it seems that none of the VAR I am generating pass
this test, so I assume that the class of stable VAR(p) is very small
compared to the whole VAR(p) process.


When you say small I think you are talking about a measure on the 
space of coefficients. If you use the usual metric and generated the 
coefficients uniformly randomly over R^n, then the models that have 
determinant inside the unit ball would probably be a very small set. 
Even with considerable restrictions, say for example, the coefficients 
all between -1 and 1, I think the stable models would be a small set.


HTH,
Paul



--
View this message in 
context:http://r.789695.n4.nabble.com/simulating-stable-VAR-process-tp4261177p4291835.html
Sent from the R help mailing list archive at Nabble.com.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The Future of R | API to Public Databases

2012-01-14 Thread Paul Gilbert
The situation for this kind of interface is much more advanced (for 
economic time series data) than has been suggested in other postings. 
Several of the organizations you mention support SDMX and I believe 
there is a working R interface to SDMX which has not yet been made 
public. A more complete list of organizations that I think already have 
working server side support for SDMX is: the OECD, Eurostat, the ECB, 
the IMF, the UN, the BIS, the Federal Reserve Board, the World Bank, the 
Italian Statistics agency, and to a small extent by the Bank of Canada. 
 I have a working API to several time series databases (TS* packages on 
CRAN), and a partially working interface to SDMX, but have postponed 
further development of that in the hope that the already working code 
will be made available. Please see http://tsdbi.r-forge.r-project.org/ 
for more details. I would, of course, be happy to have other developers 
involved in this project. If you think you can contribute then see 
r-forge.r-project.org for details on how to join projects.


Paul

On 12-01-14 06:00 AM, r-help-requ...@r-project.org wrote:

Date: Sat, 14 Jan 2012 02:44:07 +0530
From: Benjamin Weberm...@bwe.im
To:r-help@r-project.org
Subject: [R] The Future of R | API to Public Databases
Message-ID:
cany9q8k+zyvrkjjgbjp+jtnyaw15gqkocivyvpgwgyqa9dl...@mail.gmail.com
Content-Type: text/plain; charset=UTF-8

Dear R Users -

R is a wonderful software package. CRAN provides a variety of tools to
work on your data. But R is not apt to utilize all the public
databases in an efficient manner.
I observed the most tedious part with R is searching and downloading
the data from public databases and putting it into the right format. I
could not find a package on CRAN which offers exactly this fundamental
capability.
Imagine R is the unified interface to access (and analyze) all public
data in the easiest way possible. That would create a real impact,
would put R a big leap forward and would enable us to see the world
with different eyes.

There is a lack of a direct connection to the API of these databases,
to name a few:

- Eurostat
- OECD
- IMF
- Worldbank
- UN
- FAO
- data.gov
- ...

The ease of access to the data is the key of information processing with R.

How can we handle the flow of information noise? R has to give an
answer to that with an extensive API to public databases.

I would love your comments and ideas as a contribution in a vital discussion.

Benjamin


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] simulating stable VAR process

2012-01-05 Thread Paul Gilbert
The simulate function in dse lets you specify the model and the 
distribution of the noise term (or even their values so you can get any 
distribution you like).  So, you should be able to do what you want, 
with either a VAR(p) or a vector ARMA process.  If you are getting a 
process that explodes then your model is probably not stable. If it is a 
dse TSmodel you can check it with stability(), see ?stability in dse.


Beware that the condition Modulus 1  depends on whether your  lagged 
parameters are specified on the left or right side of the equation. This 
changes the sign of the lag parameters  and inverts the condition. Dse 
assumes lagged terms are specified on the left side, which is a bit 
unusual compared to introductory text books. However, when you get to 
hard problems it has advantages because the AR term is a matrix 
polynomial ring and so it is easier to apply some useful mathematics.


Paul

Date: Wed, 4 Jan 2012 05:17:05 -0800 (PST)
From: statquant2statqu...@gmail.com
To:r-help@r-project.org
Subject: Re: [R] simulating stable VAR process
Message-ID:1325683025141-4261210.p...@n4.nabble.com
Content-Type: text/plain; charset=us-ascii

More specifically.
I know that a condition for a VAR(p) process to be stable (weakly
stationary)  is that the companion form of the equation (see AWESOME Pfaff
book analysis of integrated and cointegrated time series in R) as
eigenvalues of modulus1.

My problem is that I want to generate such processes...

When I try to generate random VAR(p) processes they seems to explode
(clearly they are not weakly stationary...)
Is there a way somebody know?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Varma models in the dse package

2011-11-23 Thread Paul Gilbert
Tanja

Your TSdata object dfdata as printed in your email looks a bit funny. It should 
be a list with a matrix of numeric data in the element named output: 

 dfdata - TSdata(output=matrix(rnorm(200), 100,2))
 dfdata
output data:
   Series 1 Series 2
10.01905979 -0.096441240
20.73873854 -0.654558048
...


I'm just guessing that you specified this incorrectly somehow. If not, please 
try to give a reproducible example.

Paul

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Tanja Krone
 Sent: November 22, 2011 10:36 AM
 To: r-help@r-project.org
 Subject: [R] Varma models in the dse package
 
 Hi,
 
 I tried to run the VARMA model in the dse package. I specified a model:
  arma
 
 A(L) =
 1+0.244L10+0.05L1
 0-0.325L11-0.234L1
 
 B(L) =
 1-0.277L10+0.211L1
 0-0.206L11+0.238L1
 
 and have a TSdata object:
 
  dfdata
 output data:
   Series 1 Series 2
 1 difex2 difem2
 
 but I get this warning message:
  estMaxLik(arma, dfdata)
 Error in l.ARMA(setArrays(Shape, coefficients = coefficients), data,
 result
 = like,  :
   NA/NaN/Inf in foreign function call (arg 14)
 In addition: Warning message:
 In l.ARMA(setArrays(Shape, coefficients = coefficients), data, result =
 like,  :
   NAs introduced by coercion
 
 bots series are the same length (69). Is there something I am missing?
 
 Kind regards and thanks in advance,
 
 Tanja
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.


La version française suit le texte anglais.



This email may contain privileged and/or confidential information, and the Bank 
of
Canada does not waive any related rights. Any distribution, use, or copying of 
this
email or the information it contains by other than the intended recipient is
unauthorized. If you received this email in error please delete it immediately 
from
your system and notify the sender promptly by email that you have done so. 



Le présent courriel peut contenir de l'information privilégiée ou 
confidentielle.
La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute 
diffusion,
utilisation ou copie de ce courriel ou des renseignements qu'il contient par une
personne autre que le ou les destinataires désignés est interdite. Si vous 
recevez
ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans 
délai à
l'expéditeur un message électronique pour l'aviser que vous avez éliminé de 
votre
ordinateur toute copie du courriel reçu.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] a question on R optimization functions

2011-03-25 Thread Paul Gilbert
It seems more likely that the return value from your function is NA or NaN or 
Inf. This might then result in an NA parameter value being calculated for the 
next step. This is possible, for example,  because the line search extends 
outside the feasible region. You can re-write your function to check for that 
case and return a large negative value (or positive if minimizing), or stop() 
if that is more appropriate.

Paul

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Dajiang J. Liu
 Sent: March 25, 2011 3:59 AM
 To: r-help@r-project.org
 Subject: [R] a question on R optimization functions
 
 Dear All,
 
 I use nlminb or optim for maximizing likelihood functions. Sometimes,
 the parameter values happen to be NA, then the program will hang there
 and iterate forever without stopping. No error message will be
 produced. So I can not use error catch method such as try. Are there
 any suggestions how I can circumvent this problem? Maybe I can time a
 function, and if the time exceeds a threshold, it will be stopped. I
 am not sure if this is feasible in R. As a note, I am running
 simulations with thousands of replicates, so there needs to be a
 systematic way of doing this. Are there any suggestions on how to do
 this? Thank you very much!
 
 Regards,
 Dajiang
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.


La version française suit le texte anglais.



This email may contain privileged and/or confidential information, and the Bank 
of
Canada does not waive any related rights. Any distribution, use, or copying of 
this
email or the information it contains by other than the intended recipient is
unauthorized. If you received this email in error please delete it immediately 
from
your system and notify the sender promptly by email that you have done so. 



Le présent courriel peut contenir de l'information privilégiée ou 
confidentielle.
La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute 
diffusion,
utilisation ou copie de ce courriel ou des renseignements qu'il contient par une
personne autre que le ou les destinataires désignés est interdite. Si vous 
recevez
ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans 
délai à
l'expéditeur un message électronique pour l'aviser que vous avez éliminé de 
votre
ordinateur toute copie du courriel reçu.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to obtain seed after generating random number?

2010-08-27 Thread Paul Gilbert
It is nearly impossible to go back after the fact and figure out the
seed you started with. So, you need to be careful to record the seed
first. If you are doing your simulations with a function then it is a
good idea to always start in the function by saving a record of the seed
and returning it with the result. Then you always have the information
you need.

Beware that the seed is not the only thing you need to save. You also
need to record the uniform generator you are using, and the generator
for the distribution you are using. Of course, if you always only use
the default then you won't have a problem here, until the default
changes.

You might want to look at the utilities and examples in the package
setRNG.

Paul

-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org]
On Behalf Of Bogaso Christofer
Sent: August 24, 2010 1:12 PM
To: r-help@r-project.org
Subject: [R] How to obtain seed after generating random number?

Dear all, I was doing an experiment to disprove some theory therefore
performing lot of random simulation. Goal is to show the audience that
although something has very rare chance to occur but it doesn't mean
that
event would be impossible.



In this case after getting that rare event I need to show that same
scenario
for multiple times to explain other audience. Hence I need to somehow
save
that seed which generates that random numbers after doing the
experiment.
However as it is very rare event it is not very practical to start with
a
fixed seed and then generate random numbers. Hence I am looking for
some
way
which will tell me about that corresponding seed which was responsible
to
generate that particular series of random numbers responsible for
occurrence
of that rare event.



In short, I need to know the seed ***after*** generating the random
numbers.



Is there any possibility to know this?



Thanks and regards,


   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.


La version française suit le texte anglais.



This email may contain privileged and/or confidential information, and the Bank 
of
Canada does not waive any related rights. Any distribution, use, or copying of 
this
email or the information it contains by other than the intended recipient is
unauthorized. If you received this email in error please delete it immediately 
from
your system and notify the sender promptly by email that you have done so. 



Le présent courriel peut contenir de l'information privilégiée ou 
confidentielle.
La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute 
diffusion,
utilisation ou copie de ce courriel ou des renseignements qu'il contient par une
personne autre que le ou les destinataires désignés est interdite. Si vous 
recevez
ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans 
délai à
l'expéditeur un message électronique pour l'aviser que vous avez éliminé de 
votre
ordinateur toute copie du courriel reçu.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Can RMySQL be used for a paramterized query?

2010-06-10 Thread Paul Gilbert
Ted

I'm not sure I fully understand the question, but you may want to consider 
creating a temporary table with a join, which you can do with a query from your 
R session, and then query that table to bring the data into R. Roughly, the 
logic is to leave the data in the db if you are not doing any fancy 
calculations. You might also find order by is useful. (This is what I use in 
TSdbi to make sure data comes back in the right order as a time series.) It may 
even be possible to get everything you want back in one step using this and 
group by, rather than looping, but depending on the analysis you want to do 
in R, that may not be the most convenient way.

BTW, I think you realize you do not have to use the TSMySQL commands to access 
the TSMySQL database. They are usually convenient, but you can query the tables 
directly with RMySQL functions.

Paul 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of Henrique Dallazuanna
Sent: June 10, 2010 8:47 AM
To: Ted Byers
Cc: R-help Forum
Subject: Re: [R] Can RMySQL be used for a paramterized query?

I think you can do this:

ids - dbGetQuery(conn, SELECT id FROM my_table) other_table -
dbGetQuery(conn, sprintf(SELECT * FROM my_other_table WHERE t1_id in
(%s), paste(ids, collapse = ,)))

On Wed, Jun 9, 2010 at 11:24 PM, Ted Byers r.ted.by...@gmail.com
wrote:

 I have not found anything about this except the following from the DBI
 documentation :

 Bind variables: the interface is heavily biased towards queries, as
 opposed
  to general
  purpose database development. In particular we made no attempt to
  define bind variables; this is a mechanism by which the contents
  of R/S objects are implicitly moved to the database during SQL
  execution. For instance, the following embedded SQL statement
  /* SQL */
  SELECT * from emp_table where emp_id = :sampleEmployee would take
  the vector sampleEmployee and iterate over each of its
 elements
  to get the result. Perhaps the DBI could at some point in the future
  implement this feature.
 

 I can connect, and execute a SQL query such as SELECT id FROM
 my_table, and display a frame with all the IDs from my_table.  But I
 need also to do something like SELECT * FROM my_other_table WHERE
 t1_id = x  where 'x' is one of the IDs returned by the first select
 statement.  Actually, I have to do this in two contexts, one where the
 data are not ordered by time and one where it is (and thus where I'd
 have to use TSMySQL to execute something like SELECT
record_datetime,value FROM my_ts_table WHERE t2_id = x).

 I'd like to embed this in a loop where I iterate over the IDs returned
 by the first select, get the appropriate data from the second for each
 ID, analyze that data and store results in another table in the DB,
 and then proceed to the next ID in the list.  I suppose an alternative
 would be to get all the data at once, but the resulting resultset
 would be huge, and I don't (yet) know how to take a subset of the data
 in a frame based on a given value in one ot the fields and analyze
 that.  Can you point me to an example of how this is done, or do I
 have to use a mix of perl (to get the
 data) and R (to do the analysis)?

 Any insights on how to proceed would be appreciated.  Thanks.

 Ted

[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




--
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

   [[alternative HTML version deleted]]



La version française suit le texte anglais.



This email may contain privileged and/or confidential in...{{dropped:26}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading dates in R using SQL and otherwise (and someinteresting behavior by the data editor)

2010-04-08 Thread Paul Gilbert
I can use as.Date() on the result from an sql date field, but this may depend 
on the backend database engine too. There may also be some sensitivity to 
character set encoding used on the database and the client, and from you email 
it looks a bit like you could have some problems in this respect.

Paul Gilbert

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of Paul Miller
Sent: April 8, 2010 11:14 AM
To: r-help@r-project.org
Subject: [R] Reading dates in R using SQL and otherwise (and
someinteresting behavior by the data editor)

Hello Everyone,
Â
I am a newbie with about a month's worth of experience using R. I've
just spent a little time learning how to work with date values. Mostly,
this has involved converting text values into dates.
Â
So far, I've managed to figure out how to do this in R proper using
as.Date. However, I'm still struggling with doing this using SQL and
RODBC.
Â
In the process of learning to create date values in R proper, I noticed
some interesting behavior on the part of the data editor. At first, this
led me to believe that my efforts had been unsuccessful. The output from
my R console below illustrates this behavior.
Â
 test - mydata
 test$test_date - as.Date(test$ae_datestarted, format='%m/%d/%Y')
 class(test$test_date)
[1] Date
 mode(test$test_date)
[1] numeric

 fix(test)
Â
(At this point, I clicked on the test_date column)

Warning: class discarded from column ‘test_date’

 class(test$test_date)
[1] character
 mode(test$test_date)
[1] character

Â
When I run my code, it works correctly. But when I open the data frame
in the data editor and click on the test_date column, the editor says
that it is character. And beyond that, the editor discards the class for
test_date. Should the editor do this? Or is it my fault for trying to
look at test_date in the editor in the first place? In SAS, I'm used to
creating data and then opening the dataset to look at what I've done.
Maybe I shouldn't be doing this in R though.
Â
Returning to the issue of converting text values to dates using SQL
(Server) and RODBC. Does anyone know how ot do this? I've been trying to
do this using things like Cast and Convert. Usually, these attempts
fail. When SQL Server does seem to be sending something back, it
appears that R cannot accept it. Any help with this problem would be
greatly appreciated.
Â
Thanks,
Â
Paul    Â
Â


  __
[[elided Yahoo spam]]
avourite sites. Download it now

   [[alternative HTML version deleted]]



La version française suit le texte anglais.



This email may contain privileged and/or confidential information, and the Bank 
of
Canada does not waive any related rights. Any distribution, use, or copying of 
this
email or the information it contains by other than the intended recipient is
unauthorized. If you received this email in error please delete it immediately 
from
your system and notify the sender promptly by email that you have done so. 



Le présent courriel peut contenir de l'information privilégiée ou 
confidentielle.
La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute 
diffusion,
utilisation ou copie de ce courriel ou des renseignements qu'il contient par une
personne autre que le ou les destinataires désignés est interdite. Si vous 
recevez
ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans 
délai à
l'expéditeur un message électronique pour l'aviser que vous avez éliminé de 
votre
ordinateur toute copie du courriel reçu.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] new dse package

2009-11-07 Thread Paul Gilbert

With the release of R-2.10.0 I have divided the dse bundle into packages. The 
bundled package dse1 and part of dse2 are now in the new package dse. The 
remaining part of dse2 is in a new package EvalEst. The package dse does 
multivariate ARMA and state space time series modelling and forecasting, while 
package EvalEst is now focused on the evaluation of estimation methods. To aid 
transition, there are packages dse1 and dse2 on CRAN. These do nothing except 
require the new packages dse and EvalEst respectively, but should help prevent 
breaking of existing code that requires() dse1 or dse2.

I have also taken this opportunity to do a long overdue cleanup of the Users' 
Guides distributed as vignettes in the packages.

Paul Gilbert



La version française suit le texte anglais.



This email may contain privileged and/or confidential in...{{dropped:26}}

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] dse model setup help

2009-07-17 Thread Paul Gilbert
I think the problem here is that you seem to be trying to specify a 
non-innovations form model, for which both Q and R need to be specified, 
but you have only specified Q.


The code guesses whether you are specifying an innovations model or an 
non-innovations model based on whether you specify the Kalman gain K. 
Since you don't specify K, it assumes a non-innovations form and expects 
to find both Q and R. It seems one of my error checks could be improved.


Paul

Bob McCall wrote:

I tried to specify a model in dse1 but something isn't right. Anybody
have any tips?

 model-SS(F=f,G=g,H=h,Q=q,z0=z,P0=p)
Error in locateSS(model$R, constants$R, R, p, p, plist) :
  The dimension of something in the SS model structure is bad.



dim(f)


[1] 5 5


dim(g)


[1] 5 1


dim(h)


[1] 1 5


dim(q)


[1] 5 5


dim(z)


[1] 5 1


dim(p)


[1] 5 5

thanks,
Bob

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



La version française suit le texte anglais.



This email may contain privileged and/or confidential information, and the Bank 
of
Canada does not waive any related rights. Any distribution, use, or copying of 
this
email or the information it contains by other than the intended recipient is
unauthorized. If you received this email in error please delete it immediately 
from
your system and notify the sender promptly by email that you have done so. 




Le présent courriel peut contenir de l'information privilégiée ou 
confidentielle.
La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute 
diffusion,
utilisation ou copie de ce courriel ou des renseignements qu'il contient par une
personne autre que le ou les destinataires désignés est interdite. Si vous 
recevez
ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans 
délai à
l'expéditeur un message électronique pour l'aviser que vous avez éliminé de 
votre
ordinateur toute copie du courriel reçu.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SQL Queries from Multiple Servers

2009-05-14 Thread Paul Gilbert

Tom Schenk Jr wrote:

I use RODBC as my conduit from R to SQL. It works well when the tables are
stored on one channel, e.g.,

channel - odbcConnect(data_base_01, uid=, dsn=)

However, I often need to match tables across multiple databases, e.g.,
data_base_01 and data_base_02. However, odbcConnect() appears limited
insofar as you may only query from tables within a single channel, e.g.,
database. I do not have access to write and create new tables on the SQL
servers, which is a possible solution (e.g., copy all tables into a single
database).

Is there any way, in RODBC or another R-friendly SQL package, to perform SQL
operations across multiple databases?
  
I'm not sure if this can be done with odbc, but with MySQL it is 
possible to do joins across multiple databases, and creating temporary 
tables may be possible even without the write access you would need for 
a permanent table. I'm not sure if you can pass this kind of statement 
from R, because the connection usually specifies the database. However, 
I have constructed temporary tables with a simple mysql client and then 
queried them from R. They stay around as long as you don't quit the 
simple client.  I am not really sure this is suppose to work.  Another 
option is two connections and do some of the comparison in R, or write 
the results to an SQLite connection, on which you usually have write 
access. This might be slow  and you may have to deal with chunks if you 
have big tables.  Joins across databases are also possible with 
PostgreSQL, I'm told, but they are more difficult.


Paul

Warm regards.

  



La version française suit le texte anglais.



This email may contain privileged and/or confidential in...{{dropped:26}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] remote browser and per-session dir

2009-05-06 Thread Paul Gilbert
Running R on a remote system I can almost never get help.start() to 
work, because it passes the request to my local browser, which cannot 
see the links in per-session dir created in /tmp on the remote 
machine.  Is there a way to put the per-session dir  in my home dir 
(which is NFS mounted), or some other fix for this feature?


Paul Gilbert


La version française suit le texte anglais.



This email may contain privileged and/or confidential information, and the Bank 
of
Canada does not waive any related rights. Any distribution, use, or copying of 
this
email or the information it contains by other than the intended recipient is
unauthorized. If you received this email in error please delete it immediately 
from
your system and notify the sender promptly by email that you have done so. 




Le présent courriel peut contenir de l'information privilégiée ou 
confidentielle.
La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute 
diffusion,
utilisation ou copie de ce courriel ou des renseignements qu'il contient par une
personne autre que le ou les destinataires désignés est interdite. Si vous 
recevez
ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans 
délai à
l'expéditeur un message électronique pour l'aviser que vous avez éliminé de 
votre
ordinateur toute copie du courriel reçu.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] windows vs. linux code

2009-02-26 Thread Paul Gilbert

Gabor Grothendieck wrote:


Try

if (.Platform$OS.type == windows) ... else ...


Gabor has suggested what I think is the best way to do this check, but 
in my experience, if you are doing this check then you are almost 
certainly missing some feature of R that will let you avoid doing it.


For graphs, dev.new() has been mentioned.  I would be curious to know 
when people find it necessary to do this check, other than to issue a 
system() command for some very specialized local system reason (i.e. 
something that would never be used in a package and rarely in code run 
on different computers - not just different OSs).


Paul


La version française suit le texte anglais.



This email may contain privileged and/or confidential information, and the Bank 
of
Canada does not waive any related rights. Any distribution, use, or copying of 
this
email or the information it contains by other than the intended recipient is
unauthorized. If you received this email in error please delete it immediately 
from
your system and notify the sender promptly by email that you have done so. 




Le présent courriel peut contenir de l'information privilégiée ou 
confidentielle.
La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute 
diffusion,
utilisation ou copie de ce courriel ou des renseignements qu'il contient par une
personne autre que le ou les destinataires désignés est interdite. Si vous 
recevez
ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans 
délai à
l'expéditeur un message électronique pour l'aviser que vous avez éliminé de 
votre
ordinateur toute copie du courriel reçu.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] New international competition for the Digital Humanities

2009-01-23 Thread Paul Gilbert
(Slightly off topic and outside my area - but this may be of interest to 
the R community)


WASHINGTON (January 16, 2009) -- Today, a new international competition 
called the Digging into Data Challenge was announced by four leading 
research agencies: the Joint Information Systems Committee (JISC) from 
the United Kingdom, the National Endowment for the Humanities (NEH) and 
the National Science Foundation (NSF) from the United States, and the 
Social Sciences and Humanities Research Council (SSHRC) from Canada. The 
Digging into Data Challenge encourages humanities and social science 
research using large-scale data analysis, challenging scholars to 
develop international partnerships and explore vast digital resources, 
including electronic repositories of books, newspapers, and photographs 
to identify new opportunities for scholarship.


Interested applicants must first submit a letter of intent by March 15, 
2009. Further information about the competition and the application 
process can be found at http://www.diggingintodata.org .


See the full press release at: 
http://www.sshrc-crsh.gc.ca/web/whatsnew/press_releases/2009/digging_into_data_e.asp



La version française suit le texte anglais.



This email may contain privileged and/or confidential information, and the Bank 
of
Canada does not waive any related rights. Any distribution, use, or copying of 
this
email or the information it contains by other than the intended recipient is
unauthorized. If you received this email in error please delete it immediately 
from
your system and notify the sender promptly by email that you have done so. 




Le présent courriel peut contenir de l'information privilégiée ou 
confidentielle.
La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute 
diffusion,
utilisation ou copie de ce courriel ou des renseignements qu'il contient par une
personne autre que le ou les destinataires désignés est interdite. Si vous 
recevez
ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans 
délai à
l'expéditeur un message électronique pour l'aviser que vous avez éliminé de 
votre
ordinateur toute copie du courriel reçu.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R ORM?

2008-11-19 Thread Paul Gilbert



Faheem Mitha wrote:


Hi Hans,

Thanks for the reply.

On Tue, 18 Nov 2008, Hans Werner Borchers wrote:


Faheem Mitha faheem at email.unc.edu writes:


Hi,

Does anyone know of an R ORM (Object Relational Mapper)? I'm thinking of
something similar to sqlalchemy (http://www.sqlalchemy.org/).
Alternatively or additionally, can people offer suggestions about 
managing

relational databases from R. I'm currently using postgresql, but would
like a flexible solution that works across different databases, hence 
the

enquiry about ORMs.


Hi Faheem,

I am wondering, what kind of objects or classes you would like to
apply the object relational mapper to. S4 classes? -- I doubt that
very much.


Yes. R classes. As long as they support the object oriented paradigm, it 
doesn't matter too much to me what kind exactly. ORMs map between 
classes and db tables.


The attraction of an ORM is high level abstraction capabilities as well 
as a helping of syntactic sugar.



The only reasonable structure for which one could use a mapper in R is
the data frame. All R database packages provide support for getting
and sending data in data frames to and from databases. Admittedly,
this is not what someone from the Python community would accept as an
object relational mapper, but may be sufficient given that there is
only this one class concerned.


You mean eg http://rpgsql.sourceforge.net/ can be used to read and write 
from a postgresql db, and there are similar packages for mysql etc.?



The only way I know to become a somewhat independent from specific
databases is to turn to RODBC (Windows or Linux, you didn't say that).


The only way I know is to use standard SQL, and avoid product specific 
extensions in all your code that uses the database.  Even that is not 
perfect, but it is a pretty good start. I don't think RODBC really 
solves this problem for you. It does allow you to talk to different 
products, but so do the various DBI drivers. But in either case, if you 
use product specific extensions then you will tie yourself to the 
product (which is one reason vendors like to add such useful 
extensions).  I read the initial remark to be about porting the database 
itself. In my limited experience this may be a small part compared to 
the code for maintaining and using the database.


Paul


I use Linux (Debian). You mean 
http://cran.r-project.org/web/packages/RODBC/index.html? Is this a 
recommended approach?


CCing back to list, just in case someone else is interested...

Faheem.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



La version française suit le texte anglais.



This email may contain privileged and/or confidential in...{{dropped:26}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] Packages for time series databases

2008-11-18 Thread Paul Gilbert
I recently put several packages for time series databases on CRAN. The 
main package, TSdbi, provides a common interface to time series 
databases. The objective is to define a standard interface so users can 
retrieve time series data from various sources with a simple, common, 
set of commands, and so programs can be written to be portable with 
respect to the data source. The SQL implementations also provide a 
database table design, so users needing to set up a time series database 
have a reasonably complete way to do this easily. The interface provides 
for a variety of options with respect to the representation of time 
series in R. There is also a (not yet well tested) mechanism to handle 
multilingual data documentation.


-TSdbi  is the main package.

-TSMySQL,  TSPostgreSQL, and  TSSQLite provide interfaces for these 
three SQL databases  (using RMySQL, RSQLite, and RPostgreSQL).


-TSodbc  provides an interface using RODBC (tested only with a 
PostgreSQL server).


-TSfame  provides an interface to Fame using the fame package.

-TSpadi  provides an interface to Fame using a (somewhat dated) padi server.

-TShistQuote  provides an interface to get.hist.quote.

Many thanks to the authors and maintainers of the underlying packages, 
and several people who have commented on early versions of the TS* packages.


Paul Gilbert


La version française suit le texte anglais.



This email may contain privileged and/or confidential information, and the Bank 
of
Canada does not waive any related rights. Any distribution, use, or copying of 
this
email or the information it contains by other than the intended recipient is
unauthorized. If you received this email in error please delete it immediately 
from
your system and notify the sender promptly by email that you have done so. 




Le présent courriel peut contenir de l'information privilégiée ou 
confidentielle.
La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute 
diffusion,
utilisation ou copie de ce courriel ou des renseignements qu'il contient par une
personne autre que le ou les destinataires désignés est interdite. Si vous 
recevez
ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans 
délai à
l'expéditeur un message électronique pour l'aviser que vous avez éliminé de 
votre
ordinateur toute copie du courriel reçu.
___
R-packages mailing list
[EMAIL PROTECTED]
https://stat.ethz.ch/mailman/listinfo/r-packages
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Ottawa area R Users meeting

2008-11-05 Thread Paul Gilbert
There will be a meeting of the Ottawa Gatineau R Users Group on Monday 
November 10 from 4pm to 6pm. More details are provided below. Please 
pass this message along to friends you think may be interested.


Paul Gilbert
_

Dear R friends:

To get OGRUG going again I've reserved room DMS 6160 in the Desmarais 
Building of the University of Ottawa for 1600-1800 November 10. This is 
the big curved building opposite National Defence HQ at 55 Laurier E. 
The transitway stops are at the door or across the road. Parking exists 
in the basement -- bring gold or at least a gold credit card if you 
intend to drive!  6160 is on the 6th floor.


Paul Gilbert and I went to the Use R meeting in Dortmund and will give 
an appreciation of what we learned at that meeting, which was very well 
attended.


We will also try to gain a picture of the usage of R in Ottawa by 
exchange of information, with the aim of making OGRUG a modest but 
effective resource for R users in the region.


As an ongoing agenda item, may I suggest that we invite anyone with a 
particular question, glitch, note or such about R to bring it up. For 
items needing some show as well as tell, please email me materials 
and I'll try to arrange for presentation. As an example, I'll show a 
2-line script that draws multi-page graphs (e.g., stock market trades by 
the minute for last month over 10 by 2 sheets of paper).


Please share this with others who may be interested.


Cheers,

John Nash [EMAIL PROTECTED]


La version française suit le texte anglais.



This email may contain privileged and/or confidential in...{{dropped:26}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] dbConnect

2008-08-15 Thread Paul Gilbert

Christiane Reuter wrote:

Hi everybody,

I'm having a problem with connecting to my MySQL database.
Each time I try to connect

library(RMySQL)
m - dbDriver(MySQL)
con - dbConnect (m, host=my_host,username=my_username,  
password=my_password, dbname=name_of_db)


it says

Fehler in mysqlNewConnection(drv, ...) :
   RS-DBI driver: (could not connect my_host on dbname name_of_db
Error:Can't connect to MySQL server on 'my_host' (60)

I access the database via https. Might this be the problem?
(I have an other database that I access via http where I have no  
probelems at all...)
  
If I understand correctly, you are using http to the web server and it 
is access MySQL running on the web server machine, which means the 
database server is listening for localhost connections. The  dbConnect 
is going over the network directly to the server, which means the server 
needs to be set up to listen on an network address/port.   If it is not, 
you would get tis message, but there are other possible causes too.


Paul
And if so, is there anyone who knows what I need to do to overcome  
that problem?


Thank you so much!

-Christiane
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
  



La version française suit le texte anglais.



This email may contain privileged and/or confidential in...{{dropped:26}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using acf() for multiple columns

2008-08-11 Thread Paul Gilbert

rcoder wrote:

Hi Gary,

Thanks for your reply. This works fine. Is there any way to send the ACF
data to a matrix, so I could analyse the data in excel for e.g.?
  
The coefficients are returned in the acf element of the result, as 
mentioned in the documentation (?acf). The side affect of plotting can 
be controlled by the plot argument:

  A - matrix(rnorm(1500),nrow=500)
  z -acf(A, lag.max=10, plot=FALSE)$acf
 dim(z)
[1] 11  3  3

I think what you are looking for are the elements z[,1,1], z[[,2,2] and 
z[,3,3].

  z[,1,1]
[1]  1.0  0.080656537  0.041652217 -0.051640091 -0.032176245
[6] -0.063410768  0.033985699  0.011444829 -0.027779925  0.007840191
[11] -0.042187141

This is doing some extra calculation compared to what you want, because 
it is calculating all the cross correlations. The trade-off is that the 
acf calculation gets passed to C code and is fast, whereas splitting out 
the rows may not be. For a small number of columns there is probably not 
much difference.


If you are going to transfer this to something else by copying and 
pasting the result, you should print it to a higher precision. I'm not 
sure why you would do that to analyse ... the usual problem is getting 
the data into R.


Paul Gilbert

Thanks,

rcoder

Ling, Gary (Electronic Trading) wrote:
  

Hi, here is one possible solution ...
-gary

### example ###

# create a 500x3 multi-ts
A - matrix(rnorm(1500),nrow=500)

# misc. graphical setting
par(mfrow=c(ncol(A),1))

# then our friend -- lapply(split(...))
lapply(split(A,col(A)), acf)
#Or
lapply(split(A,col(A)), function(ts) acf(ts, lag.max=10))





-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of rcoder
Sent: Wednesday, August 06, 2008 4:13 PM
To: r-help@r-project.org
Subject: [R] using acf() for multiple columns



Hi everyone,

I'm trying to use the acf() function to calculate the autocorrelation of
each column in a matrix. The trouble is that I can only seem to get the
function to work if I extract the data in the column into a separate
matrix
and then apply the acf() function to this column.

I have something like this: acf(mat,lag.max=10,na.action=na.pass)

...but I would really like to apply the function to 'mat' where 'mat' is
a
matrix as opposed to a vector. The function actually doesn't return an
acf
coefficient, but instead plots the data. So, in addition to handling
matrices with multiple columns, would anyone know how to coerce the
function
to output the underlying data?

Finally, when working with a matrix, is there a way I can specify how
many
plots I can display after the function executes? I managed to generate a
multiple plot when I was experimenting, but the titles suggested the acf
was
calculated between adjacent columns in the matrix, which is something I
was
puzzled about.

Thanks,

rcoder
--
View this message in context:
http://www.nabble.com/using-acf%28%29-for-multiple-columns-tp18858672p18
858672.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


This message w/attachments (message) may be privileged, confidential or
proprietary, and if you are not an intended recipient, please notify the
sender, do not use or share it and delete it. Unless specifically
indicated, this message is not an offer to sell or a solicitation of any
investment products or other financial product or service, an official
confirmation of any transaction, or an official statement of Merrill
Lynch. Subject to applicable law, Merrill Lynch may monitor, review and
retain e-communications (EC) traveling through its networks/systems. The
laws of the country of each sender/recipient may impact the handling of
EC, and EC may be archived, supervised and produced in countries other
than the country in which you are located. This message cannot be
guaranteed to be secure or error-free. This message is subject to terms
available at the following link:
http://www.ml.com/e-communications_terms/. By messaging with Merrill Lynch
you consent to the foregoing.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





  



La version française suit le texte anglais.



This email may contain privileged and/or confidential in...{{dropped:26}}

__
R-help@r-project.org mailing list
https

Re: [R] Exception Handling

2008-07-07 Thread Paul Gilbert
I do this kind of thing under control of make. If you organize the 
individual pieces as separate targets then make -k will keep going and 
make -j can be used to run things in parallel.  R can be made to throw 
an error with something like this:


egTarget:   
@$(MKDIR) $(FLAGS)
@echo library(tools); z - doSomething() ; \
  if (0 == length(capture.output(z))) q('no', status=0) else \
   {print(z); q('no', status=1)}   | R --vanilla --slave 
$(FLAGS)/[EMAIL PROTECTED]
@mv  $(FLAGS)/[EMAIL PROTECTED] $(FLAGS)/$@
@touch $(FLAGS)/$@ # better time stamp in some parallel situations

In this example I am assuming that printing z has zero length output if 
everything was ok, but you could set up different situations, and you 
will probably want to use try() in doSomething().


HTH.
Paul Gilbert


On Sun, Jul 6, 2008 at 7:36 PM, Tudor Bodea [EMAIL PROTECTED] wrote:


Dear useRs:

Please provide me with your thoughts on an issue related to the design of a
production level system.  For example, let's suppose that I need to run the
same R script for a finite sequence of items (e.g., in the energy industry, I
may need to asses the profitability of all gas stations in the state of
Florida).  For each of the items, the R script accesses some remote databases,
gets all the necessary information and processes the data locally.  If the
data for every item in the item list were not corrupted then the R script
would easily do its job.  However, every now and then, the data for some items
is partially missing and the R script returns in such cases an error message
(e.g., for a given gas station, 1 month worth of data is missing from a 3
month decision horizon).  As of right now, using the error option of options
(), whenever an exception happens, the R script sends me an email error
message and kills the current R instance.  As I already figured it out, this
is not necessarily the most efficient way to deal with such exceptions.
Ideally, I would like the script to inform me of the data problem but continue
the analysis with the remaining of the items.  Based on my searches, I believe
that try / stop can answer my problem.  However, if any of you already
implemented such a safe fail system I would really appreciate your taking the
time to (1) share with me what you think constitutes the best practices and/or
(2) point out to me any online material relevant to the topic.  I run various
versions of R on Windows and multiple UNIX platforms.  The list of items is
read from a .csv file and stored in a dataframe.

Thank you.

Tudor


--
Tudor Dan Bodea
Georgia Institute of Technology
School of Civil and Environmental Engineering
Web: http://www.prism.gatech.edu/~gtg757i

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.









La version française suit le texte anglais.



This email may contain privileged and/or confidential information, and the Bank 
of
Canada does not waive any related rights. Any distribution, use, or copying of 
this
email or the information it contains by other than the intended recipient is
unauthorized. If you received this email in error please delete it immediately 
from
your system and notify the sender promptly by email that you have done so. 




Le présent courriel peut contenir de l'information privilégiée ou 
confidentielle.
La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute 
diffusion,
utilisation ou copie de ce courriel ou des renseignements qu'il contient par une
personne autre que le ou les destinataires désignés est interdite. Si vous 
recevez
ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans 
délai à
l'expéditeur un message électronique pour l'aviser que vous avez éliminé de 
votre
ordinateur toute copie du courriel reçu.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] NumDeriv - derivatives of covariance matrix

2008-04-30 Thread Paul Gilbert
Your calculation can be thought of as a function from R^m to R^(n*n), 
and functions in numDeriv can be used to calculate a numerical 
approximation to the derivative of the function. However, the functions 
in numDeriv try to calculate accurate approximations, as opposed to 
quick approximations like one might want in an optimization problem. 
Given that you already have an analytic solution, I doubt that even a 
quick approximation will be faster. You might better look at trying to 
convert parts of your double loop into vector or matrix calculations, or 
 focusing on the fact that the matrix is symmetric.


Paul Gilbert

Daomeng Gao wrote:

Hello R-help,

I need to compute matrices of first derivatives of a covariance matrix C
with entries given by c_ij=theta*exp(-0.5* sum(eta*(x[i,]-x[j,])^2)), wrt to
elements of eta, a m-dimensional vector of parameters, given a n*m data
matrix x. So far, I have been computing matrices for each parameter (given
by par[index]) analytically, using the following

kmatder- function(x, par, index) {
## x: n*m matrix
## par: vector of parameters, m=length(par)=ncol(x)
## compute matrix of partial derivatives wrt parameter par[index]: Cder
= d C/d par[index]
theta-1
  eta-par
n-nrow(x)
Cder-matrix(0,n,n)
for (i in 1:n) {
for (j in i:n) {
Cder[i,j]-(-0.5*((x[i,index]-x[j,index])^2))*theta*exp(-0.5*
sum(eta*(x[i,]-x[j,])^2))
}
}
Cder-0.5*(Cder+t(Cder))
Cder
}

I was wondering whether it might be possible to speed up things using
numDeriv (jacobian). If so, what would be the right way to implement a
suitable method ?

Cheers,
Gao Daomeng

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



La version française suit le texte anglais.



This email may contain privileged and/or confidential in...{{dropped:26}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] History of R

2008-02-22 Thread Paul Gilbert
Kathy

The dedication of the developers and several other important things have 
already been mentioned. Here are a few points I have not seen.

- I believe S was originally open source (before the term existed and 
before GPL, and license issues were probably clouded with respect to 
changing the code). This meant parts of the community of S user had this 
tradition. Some, no doubt, were a bit upset about the Splus move to 
closed source.

- This community had also significantly contributed to Statlib, so there 
were some packages that could be leveraged in the beginning. This may 
have been not so important for what the packages did, but for the fact 
that they gave an extensive test suite, so one could have considerable 
confidence in the results.

- Purchase cost is typically not so important for corporate and 
institutional users, since it is usually dominated by support costs. 
However, young users may often feel they would prefer to have their 
personal investment in something they can easily take with them if they 
move.  Some of us at the other end like the idea that we don't need a 
corporate account to continue research we might be interested in doing 
when we retire.

- All risk averse users should like the idea that programs and acquired 
skills are not tied to the operating system and hardware flavor of the 
month. (R has excelled in this respect.)

- Help on the R lists has always been exceptionally good (sometimes even 
if you don't read the documentation first - but expect to be chastised). 
If you look at the S help list over the past 15 years, you will find 
many of the most difficult questions were answered by people involved 
with R.

- I ran my own code interchangeably in Splus and R for many years 
(starting with R-0.16). For a long time Splus was production and R was 
so I would have a backup. For me, the defining factor in moving to R for 
production was the introduction of the package system.  This is 
really special in the way that it develops the synergy of the community. 
By packaging your code you get to leverage all the code checking and 
documentation checking of the system, and you get to add your own tests 
that run automatically when you build your package. Not only that, but 
if you make your package availabe on CRAN you get not only the useful 
feedback from users, but also the automatic information about what is 
going to break in your code in the next release of R (from the daily 
checks on multiple platforms). This is not only useful to package 
developers, but provides R itself with what I would guess is the largest 
automatic test bed in the industry. The system is also interesting in 
the way that it has resolved one of the big problems of Statlib: there 
is an automatic mechanism for removing broken and unmaintained packages.

Paul Gilbert

Kathy Gerber wrote:
 Earlier today I sent a question to Frank Harrell as an R developer with 
 whom I am most familiar.  He suggested also that I put my questions to 
 the list for additional responses.  Next month I'll be giving a talk on 
 R as an example of high quality open source software.  I think there is 
 much to learn from R as a high quality extensible product that (at least 
 as far as I can tell) has never been spun or hyped like so many open 
 source fads.
 
 The question that intrigues me the most is why is R as an open source 
 project is so incredibly successful and other projects, say for example, 
 Octave don't enjoy that level of success?
 
 I have some ideas of course, but I would really like to know your 
 thoughts when you look at R from such a vantage point.
 
 Thanks.
 Kathy Gerber
 University of Virginia
 ITC - Research Computing Support
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


La version française suit le texte anglais.



This email may contain privileged and/or confidential information, and the Bank 
of
Canada does not waive any related rights. Any distribution, use, or copying of 
this
email or the information it contains by other than the intended recipient is
unauthorized. If you received this email in error please delete it immediately 
from
your system and notify the sender promptly by email that you have done so. 



Le présent courriel peut contenir de l'information privilégiée ou 
confidentielle.
La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute 
diffusion,
utilisation ou copie de ce courriel ou des renseignements qu'il contient par une
personne autre que le ou les destinataires désignés

Re: [R] Using R in a university course: dealing with proposal comments

2008-02-11 Thread Paul Gilbert
Stas Kolenikov wrote:
 ...
 Training researchers of tomorrow might be great, but ifyour students get on
 the market in the end of the semester, they won't have the luxury of waiting
 until R becomes THE package of choice.
   
Not being a teacher, I usually follow these discussions with a bit of 
amusement and some befuddlement. We hire young people hoping they will 
bring in bright new ideas from academia, and academics are training the 
students based on what they think are the old things we use.   
Fortunately, R is already one of the packages of choice many places.

Another point that needs more emphasis is that R is actually a 
programming language, like Matlab and and APL, so it really has more 
general usefulness than statistics packages that one might use in the 
narrower context of a statistics course.

Paul Gilbert


La version française suit le texte anglais.



This email may contain privileged and/or confidential in...{{dropped:26}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] building packages for Linux vs. Windows

2008-02-11 Thread Paul Gilbert
Erin Hodgess wrote:
 Hi R People:

 I sure that this is a really easy question, but here goes:

 I'm trying to build a package that will run on both Linux and Windows.

 However, there are several commands in a section that will be
 different in Linux than they are in Windows.
   
Erin

Several people have indicated how to do this, but I encourage you to be 
sure you really need to do it. Many things can be made to work the same 
way on all OSs, and packages are much easier to maintain if you do not 
have several variants.  You might consider posting a few example of 
where you find this necessary, and ask if there is an OS independent way 
to do it.

Paul Gilbert
 Would I be better off just to build two separate packages, please?
 If just one is needed, how could I determine which system is running
 in order to use the correct command, please?

 Thanks in advance,
 Erin


   


La version française suit le texte anglais.



This email may contain privileged and/or confidential in...{{dropped:26}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] things that are difficult/impossible to do in SAS or SPSSbut simple in R

2008-01-17 Thread Paul Gilbert
The argument for SAS (and Stata) when working with large dataset comes 
up fairly often.  I have not had much experience in this area, but have 
been pleasantly surprised using R in combination with an SQL interface, 
in situations with modestly large, messy datasets.  I certainly would 
appreciate comments on the relative merits from anyone that has more 
experience in this area.

Paul Gilbert

Walter Paczkowski wrote:
 Good morning,
 
 I use SAS and R/S-Plus as my primary tools so I have a lot of experience with 
 these programs.  By far and away, SAS is superior for handling the messy 
 datasets, but also the very large ones.  I work at times with datasets in the 
 hundreds of thousands (and on occasion, millions) of records.  SAS, and 
 especially PROC SQL, are invaluable for this.  But once I get to datasets 
 manageable for R/S-Plus, then I ship to these tools for the programming and 
 graphics.  This seems to work great.
 
 Walt Paczkowski
 Data Analytics Corp.
 
 
 -Original Message-
 
From: Rob Robinson [EMAIL PROTECTED]
Sent: Jan 17, 2008 4:31 AM
To: [EMAIL PROTECTED]
Subject: Re: [R] things that are difficult/impossible to do in SAS or SPSSbut 
simple in R


I wonder if those who complain about SAS as a programming environment have
discovered SAS/IML which provides a programming environment akin to Matlab
which is more than capable (at least for those problems which can be treated
with a matrix like approach). As someone who uses both SAS and R - graphical
output is so much easier in R, but for handling large 'messy' datasets SAS
wins hands down...
Cheers
Rob

*** Want to know about Britain's birds? Try  www.bto.org/birdfacts ***

Dr Rob Robinson, Senior Population Biologist
British Trust for Ornithology, The Nunnery, Thetford, Norfolk, IP24 2PU
Ph: +44 (0)1842 750050 E: [EMAIL PROTECTED]
Fx: +44 (0)1842 750030 W: http://www.bto.org

 How can anyone be enlightened, when truth is so poorly lit =
 


-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Jeffrey J. Hallman
Sent: 16 January 2008 22:38
To: [EMAIL PROTECTED]
Subject: Re: [R] things that are difficult/impossible to do 
in SAS or SPSSbut simple in R

SAS has no facilities for date arithmetic and no easy way to 
build it yourself.  In fact, that's the biggest problem with 
SAS: it stinks as a programming environment, so it's always 
much more difficult than it should be to do something new.  
As soon as you get away from the canned procs and have to 
write something of your own, SAS falls down.

I don't know enough about SPSS to comment.
--
Jeff

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


La version française suit le texte anglais.



This email may contain privileged and/or confidential information, and the Bank 
of
Canada does not waive any related rights. Any distribution, use, or copying of 
this
email or the information it contains by other than the intended recipient is
unauthorized. If you received this email in error please delete it immediately 
from
your system and notify the sender promptly by email that you have done so. 



Le présent courriel peut contenir de l'information privilégiée ou 
confidentielle.
La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute 
diffusion,
utilisation ou copie de ce courriel ou des renseignements qu'il contient par une
personne autre que le ou les destinataires désignés est interdite. Si vous 
recevez
ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans 
délai à
l'expéditeur un message électronique pour l'aviser que vous avez éliminé de 
votre
ordinateur toute copie du courriel reçu.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] An R is slow-article

2008-01-09 Thread Paul Gilbert

Gustaf Rydevik wrote:
 Hi all,
 
 Reading the wikipedia page on R, I stumbled across the following:
 http://fluff.info/blog/arch/0172.htm
 
There are certainly situations where one would want to consider faster 
solutions than interpreted languages but, having been through these 
arguments a few times over the years, here are a few things you might 
consider:

1/ How much is your time worth, how much does the computer time cost, 
and how much does a faster computer cost when you start writing your code?

2/ How much is your time worth, how much does the computer time cost, 
and how much does a faster computer cost when you finish writing your code?

3/ If you tweak the code, or use someone else's private tweaks, how much 
do you trust the results relative to more widely used and tested versions?

4/ You should do speed comparisons with something resembling your real 
problem.

5/ If you want to make R look really bad use a loop that gobbles lots of 
memory, so your machine starts to swap. (This is my guess of part of the 
problem with the script.)

6/ If you want your code to be really fast, don't do any error checking. 
(This also avoids the enormous amount of time you waste when you find 
errors.)

 It does seem interesting that the C execution is that much slower from
 R than from a native C program. Could any of the more technically
 knowledgeable people explain why this is so?
 
 The author also have some thought-provoking opinions on R being
 no-good and that you should write everything in C 

People used to say assembler, that's progress.

Paul Gilbert

instead (mainly
 because R is slow and too good at graphics, encouraging data
 snooping). See  http://fluff.info/blog/arch/0041.htm
  While I don't agree (granted, I can't really write C), it was
 interesting to read something from a very different perspective than
 I'm used to.
 
 Best regards,
 
 Gustaf
 
 _
 Department of Epidemiology,
 Swedish Institute for Infectious Disease Control
 work email: gustaf.rydevik at smi dot ki dot se
 skype:gustaf_rydevik
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


La version française suit le texte anglais.



This email may contain privileged and/or confidential in...{{dropped:26}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R Compilation error on Ubuntu

2007-12-24 Thread Paul Gilbert
Taka

You might want to try R-SIG-Debian for this kind of question. Here are 
instructions from Dirk in response to my similar question a few days ago.

HTH,
Paul
_

On 12 December 2007 at 18:17, Paul Gilbert wrote:
| I'm trying to build R from source on Ubuntu Gutsy Gibbon. I've done
| apt-get install r-base-dev and  apt-get libX11-dev,  but R configure is

You're close: 'r-base-dev' gives you (about) everything to run
'install.package()' once you have R; libX11-dev ought to give you everything
but somehow doesn't so ...

| still complaining about X11 headers/libs are not available.  What else
| do I need?

...  use the 'apt-get build-dep $package' command, ie

$ sudo apt-get build-dep r-base

which will pull in this list (as per the current Debian unstable package):

Build-Depends: gcc (= 4:4.1.0), g++ (= 4:4.1.0), gfortran (= 
4:4.1.0), refblas3-dev | atlas3-base-dev, tcl8.4-dev, tk8.4-dev, bison, 
groff-base, libncurses5-dev, libreadline5-dev, debhelper (= 5.0.0), 
texi2html, texinfo (= 4.1-2), libbz2-dev, libpcre3-dev, tetex-bin, 
tetex-extra, xpdf-reader, zlib1g-dev, libpng12-dev, libjpeg62-dev, 
libx11-dev, libxt-dev, x-dev

So in this particular case, libxt-dev and x-dev were still missing.

'apt-get build-dep $foo' is really convenient. Try 'apt-get source $foo' as
well to get sources (if you have entries like

deb-src http://ftp.us.debian.org/debian/ unstable main contrib non-free

I use that all the time to fetch Debian sources onto Ubuntu machines at
work.  It may at times conflict with what packages Ubuntu has, or what names
they used -- but for R and one or two dozen other Debian packages it has
always been useful for me on Ubuntu too.

Cheers, Dirk

Takatsugu Kobayashi wrote:
 Hi
 
 I bought a new laptop HP dv9500 just a week ago and installed a Ubuntu
 gutsy ribbon on this laptop. I wanted to install Fedora but there are
 more threads on Ubuntu, so I decided to install Ubuntu. After hours of
 struggle in configuring x server/graphic card stuff, I installed R for
 Gutsy ribbon using sudo apt-get install g77 r-core'. 
 
 Now when I tried to install 'sp' package I got this error message: 
 
 Warning in install.packages(sp) : argument 'lib' is missing: using
 '/usr/local/lib/R/site-library'
 --- Please select a CRAN mirror for use in this session ---
 Loading Tcl/Tk interface ... done
 trying URL 'http://cran.mtu.edu/src/contrib/sp_0.9-17.tar.gz'
 Content type 'application/x-gzip' length 359194 bytes
 opened URL
 ==
 downloaded 350Kb
 
 * Installing *source* package 'sp' ...
 ** libs
 gcc-4.2 -std=gnu99 -I/usr/share/R/include -I/usr/share/R/include
 -fpic  -g -O2 -c gcdist.c -o gcdist.o
 /bin/bash: gcc-4.2: command not found
 make: *** [gcdist.o] Error 127
 ERROR: compilation failed for package 'sp'
 ** Removing '/usr/local/lib/R/site-library/sp'
 
 The downloaded packages are in
 /tmp/RtmpfOk3H7/downloaded_packages
 Warning message:
 installation of package 'sp' had non-zero exit status in:
 install.packages(sp) 
 
 Then, I removed this gutsy ribbon version of R and attempted to install
 fresh R-2.6.1.tar.gz. When I tried to build it, I got this error
 message:
 
 configure: error: --with-readline=yes (default) and headers/libs are not
 available
 
 I installed xserver-xorg-dev ok. 
 
 I am using Fedora 6 on my desktop and have never seen these above error
 messages. Should I install a Fedora and R on it? Or should I just create
 a link using 'ln -s'? 
 
 I appreciate your help.
 
 Have a festive holiday!
 
 taka
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


La version française suit le texte anglais.



This email may contain privileged and/or confidential in...{{dropped:26}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 00LOCK error with site-library

2007-12-10 Thread Paul Gilbert


Mark W Kimpel wrote:
 I have identical R.profiles and R_HOME directories set up on both my 
 local machine and a remote linux cluster. To keep my libraries and R 
 install separate, I use a site-library on both machines.
 
 The first line of my .Rprofile is:
 '.libPaths(new= ~/R_HOME/site-library) #tell R where site-library is'
 
 Until R-2.6.0 this was working fine on both machines, but since I have 
 been unable to upgrade packages on the remote machine. Every attempt at 
 upgrade results in a directory '00LOCK' being place in the 
 site-directory and then the upgrade fails.

This directory gets placed there and then removed when the upgrade is 
finished, to prevent two processes from trying to simultaneously update 
the directory. If you terminated an upgrade in an unexpected way, say by 
dropping your connection, then the file will be left and you need to 
manually remove it before you try to upgrade again.

If you have removed it, and it keeps re-appearing, it is a sign of 
something else going wrong (usually outside of R).

Paul
 
 This has happened with R-2.6.0, 2.6.1, and now with R-devel (2.7.0).
 
 If I have this setup incorrectly I am puzzled as to why it works with my 
 local machine and why it worked with versions of R prior to 2.6.0.
 
 Ideas?
 Thanks,
 Mark
 


La version française suit le texte anglais.



This email may contain privileged and/or confidential in...{{dropped:26}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] kalman filter estimation

2007-11-15 Thread Paul Gilbert


[EMAIL PROTECTED] wrote:
 Hi,
 
 Following convention below:
 y(t) = Ax(t)+Bu(t)+eps(t) # observation eq
 x(t) = Cx(t-1)+Du(t)+eta(t) # state eq
 
 I modified the following routine (which I copied from: 
 http://www.stat.pitt.edu/stoffer/tsa2/Rcode/Kall.R) to accommodate u(t), an 
 exogenous input to the system. 
 
 for (i in 2:N){
  xp[[i]]=C%*%xf[[i-1]]
  Pp[[i]]=C%*%Pf[[i-1]]%*%t(C)+Q
siginv=A[[i]]%*%Pp[[i]]%*%t(A[[i]])+R
  sig[[i]]=(t(siginv)+siginv)/2 # make sure sig is symmetric
siginv=solve(sig[[i]])  # now siginv is sig[[i]]^{-1}
  K=Pp[[i]]%*%t(A[[i]])%*%siginv
  innov[[i]]=as.matrix(yobs[i,])-A[[i]]%*%xp[[i]]
  xf[[i]]=xp[[i]]+K%*%innov[[i]]
  Pf[[i]]=Pp[[i]]-K%*%A[[i]]%*%Pp[[i]]
  like= like + log(det(sig[[i]])) + t(innov[[i]])%*%siginv%*%innov[[i]]
  }
like=0.5*like
list(xp=xp,Pp=Pp,xf=xf,Pf=Pf,like=like,innov=innov,sig=sig,Kn=K)
 }
 
 I tried to fit my problem and observe that I got positive log likelihood 
 mainly because the log of determinant of my variance matrix is largely 
 negative. That's not good because they should be positive. Have anyone 
 experience this kind of instability?
 
The determinant should not be negative (the line # make sure sig is 
symmetric should look after that) but the log can be negative.

 Also, I realize that I have about 800 sample points. The above routine when 
 being plugged to optim becomes very slow. Could anyone share a faster way to 
 compute kalman filter? 

The KF recursion you show is not time varying, but the code you show 
looks like it may allow for time varying models. (I'm just guessing, I'm 
not familiar with the code.) If you do not need the time varying aspect 
then this is probably a slow way to implement. Package dse1 in the dse 
bundle implements the recursion the way you show, with an exogenous 
input u(t), so you don't even have to modify the code. It is implemented 
in R for demonstration, but also in fortran for speed. See the dse Users 
Guide for more details, and also ?SS and ?l.SS. One caveat is that the 
way the likelihood is cumulated over i in your code above allows for 
time varying sig, which in theory can be important for a diffuse prior. 
In dse the likelihood is calculated using the residual to get a sample 
estimate of sig, which is faster. I have not found cases where this 
makes much difference (but would be interested to know of one).

 And my last problem is, optim with my defined feasible space does not 
 converge. I have about 20 variables that I need to identify using MLE method. 
 Is there any other way that I can try out? I tried most of the methods 
 available in optim already. They do not converge at all.. Thank you.

This used to be a well known problem for multivariate state space 
models, but seems to be largely forgotten. It does not seriously affect 
univariate models. For a review of the old literature see section 5 of 
Gilbert 1993, available at 
http://www.bank-banque-canada.ca/pgilbert/research.php#MiscResearch. 
The bottom line is that you are in trouble trying to use hill climbing 
methods and state-space models unless you have a very good starting 
point, or are luck. This is a feature of the parameter space, not a 
reflection on the optimization routine. One solution is to start by 
estimating a VAR or ARMA model and then convert it to a state space 
model, which was one of the original purposes of dse. (See ?bft for 
example.)

Paul Gilbert

 - adschai
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


La version française suit le texte anglais.



This email may contain privileged and/or confidential information, and the Bank 
of
Canada does not waive any related rights. Any distribution, use, or copying of 
this
email or the information it contains by other than the intended recipient is
unauthorized. If you received this email in error please delete it immediately 
from
your system and notify the sender promptly by email that you have done so. 



Le présent courriel peut contenir de l'information privilégiée ou 
confidentielle.
La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute 
diffusion,
utilisation ou copie de ce courriel ou des renseignements qu'il contient par une
personne autre que le ou les destinataires désignés est interdite. Si vous 
recevez
ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans 
délai à
l'expéditeur un message électronique pour l'aviser que vous avez éliminé de 
votre
ordinateur toute copie du courriel reçu

Re: [R] kalman filter estimation

2007-11-15 Thread Paul Gilbert


Giovanni Petris wrote:
 Kalman filter for general state space models, especially in its naive
 version, is known for its numerical instability.  This is the reason
 why people developed square root filters, based on Cholesky
 decomposition of variance matrices. In package dlm the implementation
 of Kalman filter is based on the singular value decomposition (SVD) of
 the relevant variance matrices, providing for a more robust algorithm
 than the standard square root filter.
 
 
 The lack of convergence of optimization algorithms in finding MLEs of
 unknown parameters is not surprising to me, especially if you have
 many parameters. When using state space models it is easy to end up
 with a fairly flat, or multimodal likelihood function. These are two
 signs of poor identifiability of the model. 

Lack of identification makes things worse, but these problems can occur 
even when the model is identified. The problem is actually worse then 
what you indicate. There can be large regions of parameter space where 
the likelihood increases as parameters diverge to infinity. This was the 
basis of the literature on chart switching procedures. Also, it is not 
so much the number of parameters that cause the problem as it is the 
dimension of the output (though they tend to be related). A univariate 
model with a large state is usually not too bad if it is identified and 
there is enough data. However, multiple series can cause problems even 
with relatively few parameters.

You can live with it, but
 be aware that it is there. My suggestion is to start the optimization
 from several different initial values and compare maximized values of
 the likelihood. Simulated annealing may be used to better explore the
 parameter space. 

Yes.  Are you aware of any work on this in R?

Paul
 
 HTH,
 Giovanni
 
 
 
 
Date: Thu, 15 Nov 2007 04:41:26 + (GMT)
From: [EMAIL PROTECTED]
Sender: [EMAIL PROTECTED]
Priority: normal
Precedence: list

Hi,

Following convention below:
y(t) = Ax(t)+Bu(t)+eps(t) # observation eq
x(t) = Cx(t-1)+Du(t)+eta(t) # state eq

I modified the following routine (which I copied from: 
http://www.stat.pitt.edu/stoffer/tsa2/Rcode/Kall.R) to accommodate u(t), an 
exogenous input to the system. 

for (i in 2:N){
 xp[[i]]=C%*%xf[[i-1]]
 Pp[[i]]=C%*%Pf[[i-1]]%*%t(C)+Q
   siginv=A[[i]]%*%Pp[[i]]%*%t(A[[i]])+R
 sig[[i]]=(t(siginv)+siginv)/2 # make sure sig is symmetric
   siginv=solve(sig[[i]])  # now siginv is sig[[i]]^{-1}
 K=Pp[[i]]%*%t(A[[i]])%*%siginv
 innov[[i]]=as.matrix(yobs[i,])-A[[i]]%*%xp[[i]]
 xf[[i]]=xp[[i]]+K%*%innov[[i]]
 Pf[[i]]=Pp[[i]]-K%*%A[[i]]%*%Pp[[i]]
 like= like + log(det(sig[[i]])) + t(innov[[i]])%*%siginv%*%innov[[i]]
 }
   like=0.5*like
   list(xp=xp,Pp=Pp,xf=xf,Pf=Pf,like=like,innov=innov,sig=sig,Kn=K)
}

I tried to fit my problem and observe that I got positive log likelihood 
mainly because the log of determinant of my variance matrix is largely 
negative. That's not good because they should be positive. Have anyone 
experience this kind of instability?

Also, I realize that I have about 800 sample points. The above routine when 
being plugged to optim becomes very slow. Could anyone share a faster way to 
compute kalman filter? 

And my last problem is, optim with my defined feasible space does not 
converge. I have about 20 variables that I need to identify using MLE method. 
Is there any other way that I can try out? I tried most of the methods 
available in optim already. They do not converge at all.. Thank you.

- adschai

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


 
 


La version française suit le texte anglais.



This email may contain privileged and/or confidential information, and the Bank 
of
Canada does not waive any related rights. Any distribution, use, or copying of 
this
email or the information it contains by other than the intended recipient is
unauthorized. If you received this email in error please delete it immediately 
from
your system and notify the sender promptly by email that you have done so. 



Le présent courriel peut contenir de l'information privilégiée ou 
confidentielle.
La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute 
diffusion,
utilisation ou copie de ce courriel ou des renseignements qu'il contient par une
personne autre que le ou les destinataires désignés est interdite. Si vous 
recevez
ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans 
délai à
l'expéditeur un message électronique pour l'aviser que 

Re: [R] Multivariate time series

2007-11-12 Thread Paul Gilbert
Package dse does do linear, multivariate time series with multivariate 
exogenous inputs, using VAR, multivariate ARMA, and state-space models, 
just like you are describing. You can specify the model or have it 
automatically determined. Perhaps you could give a few more details 
about what you are looking for. (I assume you have looked at the dse 
Users Guide distributed with the bundle, or some of the help, for 
example ?estVARXls, ?bft, and ?forecast. Sections 6 and 7 of the guide 
are about the prediction problem, very much like you describe.)

Paul

Giusy wrote:
 Hello to everyone!
 I have a question for you..I need to predict multivariate time series, for
 example sales of 2 products related one to the other, having the 2 prices
 like inputs..
 Is there in R a function to do it? I saw dse package but I didn't find what
 a I'm looking for..
 Could anyone help me?
 Thank you very much
 Giusy


La version française suit le texte anglais.



This email may contain privileged and/or confidential in...{{dropped:26}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error: X11 cannot allocate additional graphics colours.

2007-10-15 Thread Paul Gilbert
(This could be fixed, it has not happened to me in a long time, but I 
will mention it mainly because it is not something you are likely to 
think of.)

It used to be that the X colours might be defined by the first 
application that needed them, so if the systems administrator happened 
to start up X and mozilla on the server, especially if the server had 
limited graphics, then that could potentially cause this problem. I 
never really understood why it would be like this, so probably it is 
fixed. (This was not an R bug.)

Paul

Marc Schwartz wrote:
 This is not a X server version issue.
 
 There are Linux system (not R) display settings that will dictate the
 number of simultaneous colors that can be displayed. This will be
 dependent upon the display resolution defined and the amount of video
 RAM on the graphics card. The higher the display resolution the more
 video memory it takes since you have more pixels. The more colors you
 want to display simultaneously, the more memory per pixel it takes.
 
 Those concepts, BTW, are the same on Windows and OS X.
 
 So if you are running directly on the RHEL system, check the display
 setting to be sure that it is set for a sufficient number of colors.
 This would be an administrative task requiring root privilege. Ideally,
 you want so-called TrueColor or Millions of Colors to be set. This
 requires a pixel depth of 24 bpp or 32 bpp, depending upon VRAM
 available.
 
 On the other hand, if you are connected remotely to the RHEL server,
 using ssh and Xvfb on a server that is not running X, then you will need
 to adjust (or have the SysAdmin adjust) the -pixdepths setting on the
 RHEL server. This controls the 'bpp' available on the server. 
 
 See 'man Xvfb' for more information for the latter scenario.
 
 HTH,
 
 Marc Schwartz
 
 On Mon, 2007-10-15 at 20:53 +0100, michael watson (IAH-C) wrote:
 Thanks for the response...

 My confusion about plot stems from the fact I am plotting 82 points
 with 82 colours, so surely all colours get plotted?

 As for updating X, I recently installed the latest version of XFree86
 for my version of linux, RHEL 4.

 As for Brian's e-mail you quoted, I do try and look at things like
 that, but I don't know what arguments he refers to:

 Run the X11 device with the arguments stated

 Which arguments, and how do I run my X11 device with them?

 I've pretty much had this problem with every version of R on linux,
 all the way from SuSe 8.2 to SuSe 9.2, through to RHEL3 and now RHEL4
 - I always have this problem.

 Perhaps you could tell me which version of X you use to generate 1000
 colours without getting the error message?  I'm sorry, but what i
 would really love is someone to say RHEL4?  Aha , you need to install
 X version foo, available from http://www.foo.com/bar.rpm...)

 -Original Message-
 From: Charles C. Berry [mailto:[EMAIL PROTECTED]
 Sent: Mon 15/10/2007 7:54 PM
 To: michael watson (IAH-C)
 Cc: r-help@r-project.org
 Subject: Re: [R] Error: X11 cannot allocate additional graphics colours.
  

 You knew this?

  http://tolstoy.newcastle.edu.au/R/e2/help/06/09/0640.html

 I cannot replicate your error. I use n - 1000 on R-2.6.0, and it still 
 works.


 Only a guess, but maybe your X setup is out of date. Maybe an update would 
 help?

 As for why axis triggers this, axis uses all the colors, but image only 
 uses (something like) those that fall into the bins shown here:

  hist( mat, breaks=n )



 As you see there are usually some empty bins and those colors do not get 
 rendered till axis() does its thing.

 Chuck

 On Mon, 15 Oct 2007, michael watson (IAH-C) wrote:

 Dear All

 Another one I have touched on before with a much older OS and version.

 My sessionInfo() is:
 sessionInfo()
 R version 2.5.1 (2007-06-27)
 i686-redhat-linux-gnu

 locale:
 LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.U
 TF-8;LC_MONETARY=en_GB.UTF-8;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-
 8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_ID
 ENTIFICATION=C

 attached base packages:
 [1] stats graphics  grDevices utils datasets
 methods
 [7] base

 other attached packages:
 DetectiV
   1.1

 I'm getting the following error message when plotting a quite complex
 graph:

 Error in axis(side = 2, at = c(min[i] - sptl, max[i] + sptl), labels =
 NA,  :
Error: X11 cannot allocate additional graphics colours.
 Consider using X11 with colortype=pseudo.cube or gray.

 The problem is I only get it under certain circumstances.  I have some
 quite convoluted test code:

 n - 82
 mat - matrix(rnorm(n*10), ncol=n)
 fcolors - terrain.colors(n)
 image(z=mat, axes=FALSE)
 oneis - 1 / ncol(mat)
 sptl - oneis / 3
 max - 1:n * oneis
 min - c(0, max[1:length(max)-1])
 for (i in 1:n) {
 axis(side=2, at=c(min[i]-sptl,max[i]+sptl), labels=NA, line=0.9,
 lwd=3, lty=1, tick=TRUE, tck=0, col=fcolors[i],lend=2)
 }

 Now, this code works without error on values of n up to and