Bob Hayden wrote:
> While I agree that online surveys are of dubious value, some of the
> opposition sounds too moralistic or contentious for my taste. Here is
> a parable.
> (excellent example of the spherical Statue of Liberty omitted)
>So I don't mind if you use an online sample, but you need to develop
>the theory and formulae for inference from such samples, just as
>others have developed the theory and formulae for cluster sampling,
>stratified sampling, etc. That work has not been done yet.
Can it ever be done, in a way that will permit definite outcomes? What
seems to be needed is a way of inferring the value of a parameter for a
population based on a sample that omits a large nonrandom majority of
the population.
If I attempt to survey 1000 people and 950 answer, of whom 600 give a
positive response, I can consider the extremes of 650 in 1000 and 600
in 1000, create confidence intervals, and say (eg) that _in_any_case_
the proportion of positives in the population at large is over 50%. But
if I attempt to survey 1000 people and 100 answer with 60 positives,
I can only consider bounds of 60 in 1000 and 960 in 1000. Neither of
these extremes is impossible if the response and the probability of
responding are strongly correlated. With such numbers I can do nothing;
the correct outcome is a failure to reject any null hypothesis and a
conclusion that the data do not support any definite conclusion at all.
If I make some sort of estimate of the range of possible rates of
potential positive responses among those who did not respond, my final
conclusion will be based almost entirely upon my estimate, not upon my
data.
I would argue that for _most_ questions it is _a_priori_ plausible that
the outcome correlates significantly with the probability of response.
Recall the famous self-selected survey done by (Ann Landers/Dear Abby)
in which readers with children were asked whether they would make the
same decision again WRT having children. Something like half the
respondents said "no", they would not have kids if they had a second
chance. Somebody (sorry, no reference) checked this with a randomized
phone survey and concluded that the true incidence was much lower
(around 10%?).
Again, suppose somebody asks "Does any member of your family suffer
from diabetes?" The response rate among those who do have a family
history of diabetes and thus a stronger personal stake in such research
would probably be higher. How much? No way to tell, and it would
probably depend upon many unmeasurable factors including the exact setup
of the questionnaire. It is entirely plausible that phrasings of
questions that minimized differential nonresponse might not be those
that minimized phrasing-induced bias.
Now suppose the question is "Does any member of your family suffer from
AIDS?" Some people will decide to take the time to answer because a
family member does suffer from AIDS, and would not have responded
otherwise. Others may choose not to respond for the same reason. Which
group is bigger? By how much? Small differences here can lead to big
changes in the proportion of positive responses. And the interaction
structure is subtle; the scenario in which you understand it but do not
know the answer to the original question seems rather artificial.
For all these reasons, I would hazard a guess that any rigorous theory
of internet or other "shotgun" surveys would, for any _realistic_
response figures, return the answer "no conclusion can be drawn".
Moreover, I would suggest that this is already predictable without doing
the fancy algebra.
> Using a formula for something else won't do.
Exactly. And - pardon me if I'm incorrect, moralistic, or contentious
<grin> - that seems to me to be exactly what was being proposed (given
that SPSS does not AFAIK have a module for corect handling of such
surveys), and what was being defended as acceptable practice. And, as
you say, it won't do.
-Robert Dawson
=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
http://jse.stat.ncsu.edu/
=================================================================