My thanks to all who responded to my query.
On Feb 27, 2005, at 8:58 AM, Alan Zaslavsky wrote:
There are a couple of ways of looking at this, and it pays to think a
bit beyond the definitions to the analyses that would be required. I
agree that your first scenario is MCAR. In the second scenario,
missingness is clustered by months. An analysis that ignored the
clustering (by ignoring the dependence of missingness on the month
labels) would be likely to be wrong, esepcially if you were interested
in inference beyond the sampled months. However we sometimes make a
distinction between design variables (those known before data are
collected) and data; with that distinction, the month labels are
design variables but the missingness is independent of both observed
and unobserved data values.
In any case the bottom line is that an analysis that didn't take into
account the fact that data are only collected in some months would
most likely be incorrect in some way.
Date: Fri, 25 Feb 2005 15:39:12 -0700
From: Melissa Roberts <[EMAIL PROTECTED]>
Subject: [Impute] MCAR & MAR assessment
I have monthly event data that covers many individuals over several
years. For discussion purposes say I have 100 people and 36 months
of data for them, so I have 3600 observations. Events are not
consistent from month to month, but there is some consistency in
events across months at an individual level.
If I randomly sample from those 3600 observations - for example take
20% using a uniform random number generator - then I am confident is
saying the unsampled data can be characterized as MCAR - missing
completely at random. The mechanism for being unsampled has nothing
to do with variables in the data.
NOW, another sampling method is to randomly sample the MONTHS in the
dataset. I take a 20% sample of the months (producing 7 months), and
take all the people represented in those months (100 each month), for
a total of 700 observations.
Can I assert that this second sample is also MCAR? The mechanism for
not being sampled is based solely on a random number generator.
Is the fact that some months are not represented a problem? Would it
be just MAR because the nature of the events in those unsampled
months could be different than those sampled months? Would
characterizing it as MAR be a problem also?
_______________________________________________
Impute mailing list
Impute@lists.utsouthwestern.edu
http://lists.utsouthwestern.edu/mailman/listinfo/impute
_________________________________
Melissa H. Roberts
Energy, Economic and Environmental Consultants
E3c, Inc.
5600 Wyoming Blvd. NE, Suite 225
Albuquerque, NM 87109
(505) 822-9760 (voice)
(505) 822-9762 (fax)
[EMAIL PROTECTED]
__________________________________
_______________________________________________
Impute mailing list
Impute@lists.utsouthwestern.edu
http://lists.utsouthwestern.edu/mailman/listinfo/impute