"DaveM" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED]
(Snip...) > I want to correlate the OUTSIDE (easily observed) to the WHOLE (time > consuming). I want to develop a regression line (equation) where I > could in the field quickly observe the OUTSIDE and then express the > level of infestation as percent infested of the WHOLE. > > The WHOLE variable is what I would call the true infestation level. > The OUTSIDE is some lesser part of that. Here are some questions? > > Q1. Is it correct to call the WHOLE the independent variable, and > assign it as x on the regression graph? OUTSIDE would be the dependent > variable? "Standard" regression, as in (probably) your stat package, assumes that the dependent variable is a random variable, and that the independent variable is not a random variable. There are other statistical models for situations in which both dependent and independent variables are random. To the extent that WHOLE is a good measure of the true level of infection (that is, measured both accurately and precisely), standard regression may be good enough for your purposes. (Snipped...) > Q2. Do I force thru origin. Biologically is that appropriate? With > this plant/insect system if WHOLE is zero, then OUTSIDE is zero. If > Whole >zero OUTSIDE can be zero. In fact, at the lower infestation > levels WHOLE needs to get to values of >= 5% before OUTSIDE > infestation starts to consistently show. The fit of the line looks > better to me w/o going thru zero. As both Rich and Jay have suggested, you do not want to force the regression through the origin. Your biological explanation suggests that the intercept may--biologically--be nonzero. Regardless, both statistical concerns and biological concerns are usually better addressed with an intercept-included model. (Snipped...) > Q4. The end use of this is to go out to the field, observe the OUTSIDE > and use that to predict WHOLE via a regression equation or graph. I > make my field observations (say count 100 scales), find my OUTSIDE > value on the y-axis, run across to the regression line and drop down > to the x-axis WHOLE value. Any problem with that setup? This is known as the "calibration problem" or the "inverse prediction" problem. Both Zar (Biostatistical Analysis) and Sokal and Rohlf (Biometry) have short sections on inverse prediction, including how to obtain a CI on that prediction. You'll want to compute a CI--it gives you a nicely intuitive sense of just how good (or poor) your inverse prediction is. I must admit, I don't see the point in looking at (WHOLE-OUTSIDE) versus OUTSIDE. Perhaps I'm missing something this morning. In situations like yours, I plot WHOLE against OUTSIDE and include the 1:1 reference line (e.g., OUTSIDE plotted against OUTSIDE), and I look at how the scatterplot relates to the reference line. The very first thing I look for is whether the pattern of the data points appears linear. If so, then analysis is more simple. If it's not linear (and there's no reason why it must be linear), then things get more complicated. I also look for weird points, whether the variability in the scatterplot is about equal across the full range of OUTSIDE (i.e., homogeneity of variance), etc. Assuming linearity.... If the data points fall "on" (allowing for noise, of course) the reference line, then that implies that WHOLE and OUTSIDE have a "perfect" relationship. If the data points fall along a line (allowing for noise, still) that is roughly parallel to the reference line (i.e., intercept not equal to zero, slope equal to one), then that implies that there is a consistent, _absolute_ difference between WHOLE and OUTSIDE, and that the magnitude of that difference does not vary with the value of OUTSIDE. It also suggests that a single variable--the difference WHOLE-OUTSIDE--could replace the two variables WHOLE and OUTSIDE in an analysis, although you don't have to take a simplified approach. If the data points fall along a line that is not parallel to the reference line and that passes through the origin (i.e., intercept equal to zero, slope not equal to one), then that implies a _relative_ difference between WHOLE and OUTSIDE and that the magnitude of the absolute difference varies with the value of OUTSIDE. This pattern is common in biology/ecology--responses are often proportional, not absolute. A single variable--the ratio WHOLE/OUTSIDE--could replace the two variables WHOLE and OUTSIDE in an analysis, although you don't have to take a simplified approach. Then you could have data points along a line with nonzero intercept and non-one slope. It's the most complicated pattern of this lot, and is usually addressed by regression of WHOLE on OUTSIDE. Hope this helps, Susan -- Susan Durham Utah State University Ecology Center [EMAIL PROTECTED] . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
