Hi everyone, apologies if the answer to this is in an obvious place.  I've
been searching for about a day and haven't found anything..


I'm trying to replicate Stata's confidence intervals in R with the survey
package, and the numbers are very very close but not exact.  My ultimate
goal is to replicate Berkeley's SDA website with R (http://sda.berkeley.edu/),
which seems to match Stata precisely.  Everything lines up, except for the
confidence intervals, and I'm wondering if there's a relatively
straightforward reason why the numbers are different.


To review the two major cross-package comparison documents on Dr. Lumley's
homepage of the survey package --

http://faculty.washington.edu/tlumley/survey/ucla-examples.pdf -- doesn't
contain confidence interval output.

http://faculty.washington.edu/tlumley/survey/YRBS-report-extension.pdf --
confirms that the confidence interval differences between SUDAAN and R in
Table 3 exist, but doesn't provide much detail about why.

I tried running the analysis example below in SUDAAN as well, which
calculated confidence intervals not matching either R or Stata -- I'm
confused why all three would be different.

I understand (as the report above quotes) these differences are
statistically inconsequential, but I aim to convince people to switch to R,
so hitting numbers dead-on might provide some reassurance to skeptics..



I've pasted some ultra-simple Stata code below and (more importantly) the
output it generates.  Below that, I've pasted some commented R code that
shows how everything matches exactly except for the confidence intervals,
and then displays a number of my failed attempts at hitting the numbers
right on the nose.


Thanks!!

Anthony Damico
Kaiser Family Foundation




* Stata/MP 11.2

* example stata code to replicate in R

use http://www.ats.ucla.edu/stat/stata/library/apiclus1, clear
svyset dnum [pw=pw], fpc(fpc)
gen ell0 = 1
replace ell0 = 0 if ell != 0
svy: mean ell0


* results of the final command above:

(running mean on estimation sample)

Survey: Mean estimation

Number of strata =       1          Number of obs    =     183
Number of PSUs   =      15          Population size  =  9235.4
                                    Design df        =      14

--------------------------------------------------------------
             |             Linearized
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
        ell0 |   .0218579   .0226225     -.0266624    .0703783
--------------------------------------------------------------






# R x64

# simple example using example code pulled from ?svymean
options(digits=20)
library(survey)
library(foreign)

# read the data file from a website, to make sure you're using the same
data in both R and stata
x <- read.dta( "http://www.ats.ucla.edu/stat/stata/library/apiclus1.dta"; )
dclus1<-svydesign(id=~dnum, fpc=~fpc, data=x)

# mean matches exactly
coef(svymean(~I(ell==0), dclus1))

# SE matches exactly
SE(svymean(~I(ell==0), dclus1))

# design df matches exactly
degf(dclus1)

# unwtd count matches exactly
unwtd.count( ~ell, dclus1)

# wtd count matches exactly
svytotal( ~!is.na(ell), dclus1)

# number of clusters match exactly
dclus1


# none of the confidence interval options match exactly

# the standard confint() will certainly be wrong,
# since stata gives an asymmetric confidence interval

confint(svymean(~I(ell==0), dclus1))
svyciprop(~I(ell==0), dclus1, method="me")    # 2nd way to perform confint()


# but none of the other methods match either..
svyciprop(~I(ell==0), dclus1, method="li")
svyciprop(~I(ell==0), dclus1, method="lo")
svyciprop(~I(ell==0), dclus1, method="as")
svyciprop(~I(ell==0), dclus1, method="be")



In case it's of any value, here's my R session info:

> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252
[3] LC_MONETARY=English_United States.1252
LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] foreign_0.8-50 survey_3.28-2

loaded via a namespace (and not attached):
[1] MASS_7.3-18  tools_2.15.1

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to