On Wed, 21 Oct 2015, Ravi Varadhan wrote:

Hi, I am dealing with a regression problem where the response variable, time (second) to walk 15 ft, is rounded to the nearest integer. I do not care for the regression coefficients per se, but my main interest is in getting the prediction equation for walking speed, given the predictors (age, height, sex, etc.), where the predictions will be real numbers, and not integers. The hope is that these predictions should provide unbiased estimates of the "unrounded" walking speed. These sounds like a measurement error problem, where the measurement error is due to rounding and hence would be uniformly distributed (-0.5, 0.5).


Not the usual "measurement error model" problem, though, where the errors are in X and not independent of XB.

Look back at the proof of the unbiasedness of least squares under the Gauss-Markov setup. The errors in Y need to have expectation zero.

From your description (but see caveat below) this is true of walking
*time*, but not not exactly true of walking *speed* (modulo the usual assumptions if they apply to time). In fact if E(epsilon) = 0 were true of unrounded time, it would not be true of unrounded speed (and vice versa).


Are there any canonical approaches for handling this type of a problem?

Work out the bias analytically? Parametric bootstrap? Data augmentation and friends?

What is wrong with just doing the standard linear regression?


Well, what do the actual values look like?

If half the subjects have a value of 5 seconds and the rest are split between 4 and 6, your assertion that rounding induces an error of dunif(epsilon,-0.5,0.5) is surely wrong (more positive errors in the 6 second group and more negative errors in the 4 second group under any plausible model).


HTH,

Chuck

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to