On Wed, 21 Oct 2015, Ravi Varadhan wrote:
Hi, I am dealing with a regression problem where the response variable,
time (second) to walk 15 ft, is rounded to the nearest integer. I do
not care for the regression coefficients per se, but my main interest is
in getting the prediction equation for walking speed, given the
predictors (age, height, sex, etc.), where the predictions will be real
numbers, and not integers. The hope is that these predictions should
provide unbiased estimates of the "unrounded" walking speed. These
sounds like a measurement error problem, where the measurement error is
due to rounding and hence would be uniformly distributed (-0.5, 0.5).
Not the usual "measurement error model" problem, though, where the errors
are in X and not independent of XB.
Look back at the proof of the unbiasedness of least squares under the
Gauss-Markov setup. The errors in Y need to have expectation zero.
From your description (but see caveat below) this is true of walking
*time*, but not not exactly true of walking *speed* (modulo the usual
assumptions if they apply to time). In fact if E(epsilon) = 0 were true of
unrounded time, it would not be true of unrounded speed (and vice versa).
Are there any canonical approaches for handling this type of a problem?
Work out the bias analytically? Parametric bootstrap? Data augmentation
and friends?
What is wrong with just doing the standard linear regression?
Well, what do the actual values look like?
If half the subjects have a value of 5 seconds and the rest are split
between 4 and 6, your assertion that rounding induces an error of
dunif(epsilon,-0.5,0.5) is surely wrong (more positive errors in the 6
second group and more negative errors in the 4 second group under any
plausible model).
HTH,
Chuck
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.