Re: [R] Cox regression model for matched data with replacement

Therneau, Terry M., Ph.D. Wed, 13 Aug 2014 06:27:08 -0700

Ok, I will try to do a short tutorial answer.

1. The score statistic for a Cox model is a sum of (x - xbar), where "x" is the covariatevector of the subject who had an event, and xbar is the mean covariate vector for thepopulation, at that event time.

  - the usual Cox model uses the mean of {everyone still at risk} as xbar

- matched Cox models use a mean of {some subset of those at risk}, and work fine aslong as that subset is an honest estimate of xbar. You do, of course, have to sample fromthose still at risk at the time point, since that is the xbar you are trying to estimate.Someone who dies or is censored at time 10 can't be a control at time 20.- in an ordinary Cox model the program figures out who belongs in each xbar average allon its own, using the time variable. In a matched model you need to supply the "whodances with who" information. The usual way is to assign each of the sets {subject whodied + their controls} to a separate stratum. (If there is only one death in each stratumthen the time variable will not be needed and you can plug in a dummy value; this is whatclogit does.) You can have more than one control per case by the way.

2. Variance. In the matched model you run the risk, a quite small risk, that the sameperson would be picked again and again as the control. If this unfortunate thing were tohappen then the usual model based variance would be too optimistic --- because of itsoverdependence on one single subject the fit is more unstable than it looks. Threesolutions: a) don't worry about it (my usual approach), b) when selecting controls,ensure that this doesn't happen (classic matched case control), c) use a robust variance.For the latter make sure that each subject in the data set has a unique value for somevariable "id" and add "+ cluster(id)" to the model statement.

3. The most common mistake in matching is to exclude, at a given death time t, any subjectwith a future event from the list of potential controls at time t. This does not lead toan unbiased estimate of xbar, and the resulting numerical bias in the coefficients isshockingly large.There are more clever ways to pick the subset at each event time, e.g., if you had someprior information on all the subjects that can classify them into high/medium/low risk.Survey sampling principles come into play for selection and the xbar at each time isreplaced with an appropriate weighted survey estimate. See various papers by Brian Langholz.


Terry T


On 08/13/2014 07:26 AM, John Pura wrote:

Hi Dr. Therneau,

The original question on the forum was:

My problem was how to build a Cox model for the matched data (1:n) with
replacement. Usually, we can use stratified Cox regression model when the
data were matched without replacement. However, if the data were matched
with replacement, due to the re-use of subjects, we should give a weight
for each pair, then how to incorporate this weight into a Cox model. I also
checked the "clogit" function in survival package, it seems suitable to the
logistic model for the matched data with replacement, rather than Cox
model. Because it sets the time to a constant. Anyone can give me some
suggestions?

I’m facing a very similar situation, in which I have multiple controls to 
multiple cases.
How would I go about taking that dependency into account in a Cox model? Is 
this weighting
appropriate and to get robust sandwich estimates, can I take my id variable to 
be the id
for the unique cases?

Thanks,

John

On 08/13/2014 05:00 AM, John Purda wrote:

I am curious about this problem as well. How do you go about creating the 
weights for each pair, and are you suggesting that we can just incorporate a 
weight statement in the model as opposed to the strata statement? And Dr. 
Therneau, let's say I have 140 cases matched with replacement to 2 controls. Is 
my id variable the number of cases?




  The above has an incorrect assumption that I notice ALL survival questions on 
the list

-- which was false in this case.  Could you clue me in as to the original 
question and

discussion -- assuming that you want "Dr Therneau" to respond intelligently :-)



Terry T.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cox regression model for matched data with replacement

Reply via email to