You might want to consider the implications of using data with
different supports -
3x3 neighborhoods and points. The 3x3 neighborhoods are
probably larger than the
areas associated with LAI field values. Thus the 3x3 mean
NDVI's can be considered
to be estimates of NDVI's at the points that have associated
error. Error in the independent
(x) variable leads to underestimated correlation and (ordinary
least squares (OLS) regression) slope.
There are a number of alternatives to OLS, such a MA and RMA
regression, that might lead to
improved slope estimates, but I prefer to correct the
correlation and slope estimates using
estimates of the precision of the x variable.
You can roughly estimate the precision of the point NDVI
by : 1) calculating the standard deviation of the nine pixel
NDVI's associated with each field observation and 2) plotting the
standard deviations versus the means to check for dependence of
variation with magnitude,
3) estimate standard error of NDVI as the mean standard
deviation if no dependence, otherwise consider
regression of LAI versus log(NDVI) or log-log regression.
Note that if the "point" area is much smaller than a pixel,
the x error
will be underestimated - but fixing this would involve either a
geostatistical analysis of point values for NDVI
or estimation involving fractal analysis.
The error estimate can then be used to correct the estimates of
correlation and slope (my apologies
for cutting and pasting a Word file so that subsrcipts and
superscripts are lost, and maybe also Greek font, and illustrating
with S commands):
Preliminaries: First,
consider regression of variable x (the independent variable) versus y
(the dependent variable). The usual formula for the slope
is:
S [(xi -mx)*(yi - my)]/S (xi - mx)2
(1)
where summation is over the index i for individual data points, and the means are mx and my. This formula (section 1.2 in N. Draper and H. Smith, Applied Regression Analysis, John Wiley & Sons, Inc., New York, 1966) is correct, and computationally simple and accurate, that is, works well to preserve floating point accuracy. However, formulae involving descriptive statistics (correlation or covariance of x and y, and the standard devations of x and y) convey more information about the factors related to the slope:
cor(x,y)*sy/sx
or cov(x,y)/sx2
(2)
where one can see that the magnitude of the slope increases with the correlation and range of the dependent variable y (as measured by the standard deviation), and decreases with range of the independent variable. If one of the formulae in (2) is used with n data points, it will be accurate (unbiased) if multiplied by the square root of (n-2)/(n-1) to correct for the effect of using estimated, rather than "true", means and if the usual assumptions, including accurate values for the indpendent variable, are correct. If the range of the independent variable is inflated by errors, the slope will decrease, that is. will be biased low.
Predicting the slope when precise values of independent variable variable x are replaced by the estimated or measured values variable x, following Section 29.56 in M. Kendall and A. Stuart, The Advanced Theory of Statistics: Volume 2: Inference and Relationship, 4th Edition, Charles Griffin & Company Limited, London, 1979 (copy in your mailbox). Let's assume that the measurements are made without bias and with a precision represented as a standard deviation in error: the observed measurements (x1,x2, �) of the dependent variable x can be considered as sums of the true values (x1, x2, � ) with 0 standard error plus errors (d1,d2, �.) with average of 0 and standard deviation sd. The least squares regression slope is cov(x,y)/sx2 = cov(x,y)/( sx2 + sd2), where cov(x,y) is the covariance between x and y, i.e. the correlation times the product of the standard deviations of x and y. Now if the least squares slope with no errors is cov(x,y)/ sx2 = 1, then the slope with the errors is:
where summation is over the index i for individual data points, and the means are mx and my. This formula (section 1.2 in N. Draper and H. Smith, Applied Regression Analysis, John Wiley & Sons, Inc., New York, 1966) is correct, and computationally simple and accurate, that is, works well to preserve floating point accuracy. However, formulae involving descriptive statistics (correlation or covariance of x and y, and the standard devations of x and y) convey more information about the factors related to the slope:
where one can see that the magnitude of the slope increases with the correlation and range of the dependent variable y (as measured by the standard deviation), and decreases with range of the independent variable. If one of the formulae in (2) is used with n data points, it will be accurate (unbiased) if multiplied by the square root of (n-2)/(n-1) to correct for the effect of using estimated, rather than "true", means and if the usual assumptions, including accurate values for the indpendent variable, are correct. If the range of the independent variable is inflated by errors, the slope will decrease, that is. will be biased low.
Predicting the slope when precise values of independent variable variable x are replaced by the estimated or measured values variable x, following Section 29.56 in M. Kendall and A. Stuart, The Advanced Theory of Statistics: Volume 2: Inference and Relationship, 4th Edition, Charles Griffin & Company Limited, London, 1979 (copy in your mailbox). Let's assume that the measurements are made without bias and with a precision represented as a standard deviation in error: the observed measurements (x1,x2, �) of the dependent variable x can be considered as sums of the true values (x1, x2, � ) with 0 standard error plus errors (d1,d2, �.) with average of 0 and standard deviation sd. The least squares regression slope is cov(x,y)/sx2 = cov(x,y)/( sx2 + sd2), where cov(x,y) is the covariance between x and y, i.e. the correlation times the product of the standard deviations of x and y. Now if the least squares slope with no errors is cov(x,y)/ sx2 = 1, then the slope with the errors is:
cov(x,y)/( sx2 + sd2) = (cov(x,y)/sx2) * [sx2/ (sx2 + sd2)] = sx2/( sx2 + sd2)
The _expression_ on the right is a function of the relative magnitude sd/sx of the measurement error to data range for the independent variable, where standard deviation is the metric. The range term sx can be approximated with sx, the standard deviation of the measured values, if the range of measurements is large compared to the measurement errors. Otherwise, correct for the effect of measurement error by using , leading (as you have noted) to (sx2 - sd2)/sx2 as the predicted slope. The estimate for sd is generally known from an independent source, such as instrument specs or calibration analysis.
You will note that the slope predicted with (1) is always less than 1. The mathematical cause is due to the inflation of the denominator from sx2 to sx2 + sd2. Perhaps what is counter-intuitive is that the slope is biased due to mean zero errors. Shouldn't the errors just degrade the precision of the least squares slope? No - because least squares regression is asymmetrical in the way the independent and dependent variables are treated: the sum of squares to be minimized are the squared residuals that are distances of points to the regression line in the y direction. One way to correct the problem, i.e. predict the slope one would have if true x values were known, is by multiplying by [1 + (sd/sx)2]. Another approach is to use a least squares technique using distances from data points to nearest point on line.
You may also note that inflation of the variance and standard deviation of the independent variable leads to degradation in correlation between the independent and independent variable:
If the dependent variable is a transformed variable, then the standard deviations are statistics of the transformed variable rather then the original variable. For example, your independent variable was the log-transformed predicted leaf area per vine (log(LA)), using a calibration equation from regression analysis with this same variable (log(LA)) as the dependent variable and pruning weight as the independent variable. So the standard deviation of the calibration regression residuals is a good estimate of sd. This predicted log(LA) is the independent variable in the validation regression you are concerned with, that is, the one with a slope of less than one. So a good estimate of sx is either the standard deviation s of the validation values for log(LA) or the corrected value sqrt(s2 - sd2).
Example calculation:
measurement std. dev. = 0.326
std. dev. of independent variable = 0.616
> 1/(1 + .326^2/.616^2)
[1] 0.7812045
> sqrt(.616^2 - .326^2) # corrected std dev of indep var
[1] 0.5226662
> 1/(1 + .326^2/.522^2)
[1] 0.7194107
The least squares slope was 0.67 +/- 0.14.
Chris
I'd like to know if the approach I used to derive LAI from NDVI is correct.
STEP 1: I've got 46 field point values of LAI (leaf area index, namely the
cover of plant leafs)
STEP 2: I derived the NDVI index from a multispectral image.
STEP 3: For every field plot I calculated the mean NDVI of 3x3 neighbour
cells
STEP 4: I made a regression between mean_NDVI and LAI.
STEP 5: r^2 was low (0.34), r being 0.70, but t, measured as
r/sqr((1-r^2)/(n-2)) was over the minimum t 2.7, being my t 5.75
STEP 6: Since the correlation was highly significant p<0.01 I applied the
equation of the regression line y= 4.9053x + 0.2406 where y was LAI and x
was NDVI to the NDVI map, obtaining the LAI map
STEP 7: I made a control on the accuracy of the model by measuring the mbe
(mean bias error) calculated as the mean of single errors for every plot
(46 measures): mean of P-O
where P was the estimated LAI value and O the observed, by obtaining a mbe
of 0.03249587
Questions:
1. Could I apply the equation as in step 6?
2. Could I control my model by using the same input observed values as in
step 7?
Thanks
Duccio
--
* To post a message to the list, send it to [EMAIL PROTECTED]
* As a general service to the users, please remember to post a summary of any useful responses to your questions.
* To unsubscribe, send an email to [EMAIL PROTECTED] with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
* Support to the list is provided at http://www.ai-geostats.org
--
***************************************
Chris Hlavka
NASA/Ames Research Center 242-4
Moffett Field, CA 94035-1000
(650)604-3328 FAX 604-4680
[EMAIL PROTECTED]
***************************************
***************************************
Chris Hlavka
NASA/Ames Research Center 242-4
Moffett Field, CA 94035-1000
(650)604-3328 FAX 604-4680
[EMAIL PROTECTED]
***************************************
