Re: [math] redesigning the optimization/estimation packages (phase 2)

Luc Maisonobe Sat, 28 Feb 2009 05:40:44 -0800

I'm afraid what I propose is far more basic than what you need. I hardly
understand the concepts you use in your message (I'm not a statistician
at all, I work in space flight dynamics). I assume we use the same words
with slightly different meanings in mind (models, residuals,
observations, variables ...).


I'm clearly unable to fulfill your requirements.

Does this imply we continue with only revamping the basic setup that was
imported from mantissa for 1.2 in order to have a simplistic but
consistent framework ? Does this imply we get rid of this limited view
and take the opportunity of version 2.0 to build a complete
statistically sound estimation package ?

I can do the former (taking into account the residuals computation
issues, and probably slightly changing the meaning of the weights to
compute wi*(ri^2) rather than (wi*ri)^2 as done now). I cannot do the
latter.

Luc

Ted Dunning wrote :
> Here is what I would do in R to do a linear model with observations
> consisting of input variables x1 ... x3 and target variable y:
> 
>      m <- lm(y ~ x1 + x2 + x3, data)
> 
> The first argument lets me supply the form of the regression.  I could have
> regressed on log(x3), for instance, or used x1*x2 + x3 to include x1, x2, x3
> and the x1:x2 interaction or (x1+x2+x3)^2.
> 
> To do logistic regression, I would switch to glm (for generalized linear
> model):
> 
>      m <- glm(y ~ x1 + x2 + x3, data, family=binomial())
> 
> To do Bayesian logistic regression, I would use bayesglm from the arm
> library:
> 
>     # this happens once to install the package
>     install.packages("arm");
>     # this happens once per session
>     library(arm)
>     # and this gives me my model
>     m <- bayesglm(y ~ x1 + x2 + x3, data, family=binomial())
> 
> To give weights to observations using the k column of the data:
> 
>     m <- bayesglm(y ~ x1 + x2 + x3, data, family=binomial(), weights=data$k)
> 
> If I want to compute the posterior distribution of outputs, I can use
> sims(m)$coefs to get a bunch of samples from the coefficient posterior and
> multiply them by the observation matrix in one or two (fairly simple) lines
> of code.  If I just want outputs for MAP model estimates, then I do
> 
>     ytwiddle = predict(m, newdata)
> 
> This works for any of the models above.
> 
> If I could do anything like this using commons math in less than 10x as many
> lines of code, I would be thrilled.
> 
> Some notes about the user experience here:
> 
> a) I don't have to care about residuals, but if I do care, they are in the
> resulting model structure and I can look at them.  If I do plot(m) I get a
> bunch of interesting diagnostic plots.  If I do coefplot(m), I get my
> coefficients with error bars.
> 
> b) I don't care *how* my weights affect the algorithm.  It just works.  I
> can dig into the guts in various ways, but I don't have to.
> 
> c) I can effectively subset my data column-wise without creating new
> matrices.  I can also change the structure of my model all over the place
> very easily without changing the data.  This is (was) a controversial aspect
> of R's modeling structure and it makes it really hard to find logistic
> regression in the index of a book because the libraries are oriented around
> the generalized linear modeling view of the world (which isn't so bad after
> you get going with it).
> 
> d) lots of diagnostic info gets embedded into the resulting model.
> 
> 
> 
> 
> On Thu, Feb 26, 2009 at 12:14 PM, Luc Maisonobe <[email protected]>wrote:
> 
>> This way, the residual would not be seen and only the
>> pure observation vectors should be provided by user.
>>
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [math] redesigning the optimization/estimation packages (phase 2)

Reply via email to