Vadim and Oxana Marmer  <[EMAIL PROTECTED]> wrote:

>> You will easily be able to see that that residuals from this
>> regression are not independent.  So this isn't a counterexample to my
>> claim that "There is certainly nothing wrong with using standard
>> regression when an explanatory variable is randomly generated, from
>> whatever sort of stochastic process you please, as long as the
>> regression residuals are independent".
>
>You do not need independent residuals for regression

Yes, I know that.  You do need to change how you do significance test
if the residuals aren't independent.

In any case, the original poster explicitly claimed that regression
with an explanatory variable that was generated by a non-stationary
process was invalid even if the residuals of the regression are
independent.  I claim that this is not true.


>> If you account for this dependence in your test, I don't think you
>> will reject the null hypothesis that b=0.
>
>Yes you will, if you use standard regression diagnostic.

This discussion isn't going to go anywhere if you insist on making
one-line replies like this, in which you don't define what you mean by
"standard regression diagnostics". 


>> >Now the intuition. Consider two time series: 1) US GDP,
>> >2) cummulative amount of rain in Brazil. You can think that these series
>> >are independent, but try to run 2 on 1 and you will have very
>> >significant coefficients.
>>
>> The two time series may be independent, but if you fit a regression
>> model, it will be obvious that the residuals are autocorrelated, and
>> you need to adjust for this in doing your significance test.
>
>simple adjustment for autocorrelation won't help

Here's a little experiment, in R:

    > library(ts)
    > x<-rep(0,1000)
    > y<-rep(0,1000)
    > for (i in 2:1000) { x[i]<-x[i-1]+rnorm(1); y[i]<-y[i-1]+rnorm(1); }
    > m<-lm(y~x)
    > summary(m)
    
    Call:
    lm(formula = y ~ x)
    
    Residuals:
        Min      1Q  Median      3Q     Max 
    -23.317  -7.335  -2.091   7.848  25.966 
    
    Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
    (Intercept) 15.51024    0.62466   24.83   <2e-16 ***
    x            0.40863    0.01898   21.52   <2e-16 ***
    ---
    Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 
    
    Residual standard error: 10.49 on 998 degrees of freedom
    Multiple R-Squared: 0.3171, Adjusted R-squared: 0.3164 
    F-statistic: 463.3 on 1 and 998 DF,  p-value:     0 
    
    > a<-acf(residuals(m),lag.max=999)
    > 1+2*sum(a$acf[1:200])
    [1] 221.2409
    
Sure, enough, the p-values found assuming independent residuals are
very small, even though there is no real relationship.  However, the
plot (not shown) of the estimated autocorrelations of the residuals
shows that the estimated autocorrelations don't get close to zero
until lag 200.  (Of course, the real autocorrelations are undefined,
since the process isn't stationary, but we're seeing here what would
happen if you looked for autocorrelaton to see if you should trust the
p-values.)  If this were actually a stationary process, the effective
sample size would be reduced by roughly one plus twice the sum of
autocorrelatons, which is 221 in this case.  That means that after
adjusting for autocorrelation you will conclude that you effectively
have about five data points' worth of information.  I don't think you
will reject the null hypothesis.


>When I say that you cannot treat regressors as fixed I mean following.
>Suppose Y=consumption, X=GDP then E(inv(X'X)X'Y) is not equal to
>inv(X'X)X'E(Y) since both X and Y are random variables, and you need a
>little bit different treatment of
>regression. So, "mechanics" of OLS changes a little bit, and of course,
>interpretation of regression is different.

Why are you interested in E(inv(X'X)X'Y)?  I think you may be trying
to find standard errors by finding the unconditional variance of the
estimators.  You shouldn't do this, however.  You should be finding
the variance conditional on the observed X, since X in itself is not
informative regarding the regression coefficients.

In the example, if the residuals of the regression of consumption on
GDP are indepenent, and of constant variance, plain ordinary
regression can indeed be used, with no change in methodology.  (This
assumption may be false, of course, as is the case whenever you use
regression.  It would be more likely to be true if you looked at log
consumption and log GDP.  Most likely, even then the residuals won't
be independent, but they may perhaps form a stationary process, in
which case adjustment for autocorrelation will work fine.)

   Radford Neal

----------------------------------------------------------------------------
Radford M. Neal                                       [EMAIL PROTECTED]
Dept. of Statistics and Dept. of Computer Science [EMAIL PROTECTED]
University of Toronto                     http://www.cs.utoronto.ca/~radford
----------------------------------------------------------------------------


=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to