RE: [Impute] conundrum combining pearson r and r-squared

Maarten Buis Fri, 18 Jan 2008 00:58:35 -0800

--- A Rangel wrote:
> I am hoping to get some expert opinions on a
> relatively simple problem.  Suppose that you want to
> combine Pearson r and r-squared values from 20 imputed
> data sets.  It seems that standard advice (in small to
> moderate samples) is to first transform r using
 >Fisher's (1915) r-to-z transformation.  Similarly,
> ln(r-squared) seems to be an appropriate
> transformation.
>
> After combining and back-transforming, it is quite
> possible -- perhaps likely -- that you get
> inconsistent estimates.  What I mean by that is that
> squaring the combined Pearson r value can be quite
> different from the combined R-square value that you
> get from back-transforming ln(R-sq).


I would be surprised if the difference were big. The
two statistics are closely related and use the same
imputed datasets. There is likely to be some 
difference due to the fact that you are computing
means in combination with non-linear 
transformations, but if both are reasonable than both 
should give you similar results. 

One way to get a feel for this is to do a number of
simulations. Below I put some Stata code for one
such simulation. In this simulation there is a 
systematic difference between the two methods, but 
only in the third digit. I would consider that 
sufficiently small to be ignorable.


This simulation uses the Stata port of MICE, called 
ice by Patrick Royston, which can be downloaded by 
typing: ssc install ice . Notice that the Fisher's 
Z transformation is the arc-hyperbolic tangent 
(Stata function atanh) and its inverse is the 
hyperbolic tangent (Stata function tanh). Some of 
the other tricks used are explained in:
http://home.fsw.vu.nl/m.buis/stata/exampleFAQ.html
and http://home.fsw.vu.nl/m.buis/wp/discrete.html

Hope this helps,
Maarten

*------------ begin simulation ---------------------
set more off
capture program drop sim
program define sim, rclass
        drop _all
        matrix C = ( 1, .5 \ ///
              .5,  1 )
        drawnorm x y, n(1000) corr(C)

        replace x = . if uniform() < invlogit(-1 + y)

        cd h:\temp
        ice x y using imp, m(5) replace

        use imp.dta, clear

        scalar r = 0
        scalar rsq = 0

        forvalues i = 1/5 {
                corr y x if _mj == `i'
                scalar r = r + atanh(`r(rho)')
                reg y x
                scalar rsq = rsq + ln(`e(r2)')
        }
        return scalar diff = sqrt(exp(rsq/5)) - tanh(r/5)
end

simulate diff=r(diff), reps(1000): sim
hist diff
*---------------- end simulation ----------------------


-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology 
Vrije Universiteit Amsterdam 
Boelelaan 1081 
1081 HV Amsterdam 
The Netherlands

visiting address:
Buitenveldertselaan 3 (Metropolitan), room Z434 

+31 20 5986715

http://home.fsw.vu.nl/m.buis/
-----------------------------------------



_______________________________________________
Impute mailing list
Impute@lists.utsouthwestern.edu
http://lists.utsouthwestern.edu/mailman/listinfo/impute

RE: [Impute] conundrum combining pearson r and r-squared

Reply via email to