Thank you Oscar for that extensive lesson. I will study it over and see if I can apply something similar in my case.
One thing to note: x1, x2, and x3 are highly correlated. Generally speaking, I'm trying to estimate some unknown function f, where y = f(x1, x2, x3) (see note below). But if I only use one input, say x1, there are distinct trends in the fitted response (due to how x1 'sees' (or doesn't see) the real data). There are similar trends if you plot x2 or x3 vs. the 'true' y which we are fitting to (hence the correlation between the three inputs). But the relationships between each of the three inputs and y are different enough such that when I then add terms with x2, x3, and higher orders of all three inputs, the trends are greatly reduced (a good thing!). In other words, x2 and x3 tell us something new that x1 couldn't detect. But the problem, as stated in my original post, is how to select these terms based on an 'importance' factor. And I would also like the addition of more terms to not significantly affect the lower order coefficients. You described a possible way to handle this in option (2) below. I will give that a try and see if it works for my situation. Thanks heaps for your help. Pat Note: I said I'm trying to estimate some unknown function f, where y = f(x1, x2, x3). There is really no such function due to the physical nature of the problem. x1, x2, x3, and y are all basically weighted averages of a (higher resolution) set of data. But these weights are all different. We are then trying to estimate y (which has one set of weights) using x1, x2, and x3, which all have weights different from each other and from y. So you can see it's not possible to perfectly describe how y relates to the three inputs, but for a particular set of data, we can use regression to approximate y, assuming we will only be applying this transformation to a similar set of data. "Oscar Lanzi III" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > It is possible to obtain orthogonal polynomials inn more than one > variable by using the Graham-Schmidt technique, but there are some > subtle aspects that do not occur with one variable. Let's consider the > following table of independent vaiavble values. Here x1 and x2 are the > independent variables; I have left off the dependent variable values > which do not enter into the orthogonalization. > > Data point x1 x2 > > 1 1 1 > 2 2 2 > 3 1 2 > 4 2 3 > > We define our set of orthogonal polynomials: > > phi_0 = 1 > phi_1 = x1 + a10*phi_0 > phi_2 = x2 + a21*phi_1 + a20*phi_0 > phi_3 = 1*x2 + a32*phi_2 + ... + a30*phi_0 > > (if I don't use the ellipsis I can't get the epression for phi_3 on one > line on my screen.) > > We get a10 by plugging the epression for phi_1 into the orthogonality > condition > > sum(phi_0*phi_1) = 0 > > as in the usual G-S algoithm. Thus a10 = -1.5 meanng phi_1 = x1 - 1.5 > (of course 1.5 is the average value of x1). > > Similarly for phi2 we need to satisfy > > sum(phi_0*phi_2) = 0 > sum(phi_1*phi_2) = 0 > > wit phi_0 and phi_1 both known. Of course the coefficients a21 and a20 > are decoupled by virtue of phi_1 and phi_0 being orthogonal, so the > system is easy to solve and we get: > > a10 = -2 > a11 = -1 > > and thus > > phi_2 = x2 - x1 - 0.5 > > For phi_3 we use the form given above and plug into the orthogonality > relations > > sum(phi3*phi_j) = 0; j = 0, 1, 2 > > and the known values for phi_0, phi_1, and phi_2. The result is: > > phi_3 = x1x2 - 1.5 x2 - 2 x1 + 2.75 > > Let us take a closer look at just phi_2. We see that since there is an > inherent nonorthogonality between x1 and x2, we can't cleanly separate > their effects with orthogonal polynomials. So we're stuck with phi_2 > containing a rather clumsy mixure of the two. > > There is also another problem, ralated to the first. In the above > development we introduced the terms in the order x1, 2, x1*x2. But what > if we were to choose a different order? Suppose we select x2 first and > then x1. Then > > phi_1 = x2 - 2 > phi_2 = x1 - 0.5 x2 - 0.5 > > and these fail to match any of the polynomials given above! We do get > the same polynomial as before for phi_3, but the cat is out of the bag: > not only are we stuck with a "mixed" linear polynomial but we also find > that our set of polynomials depends on the order in which we introduce > the terms. > > So you have to do one of two things. > > 1) Set up you independent variable values beforehand, in a designed > experiment, so that they are "orthogonal and balanced", to use > statistical experimental design language. Suppose we had started with > the following independent variable values: > > Data point x1 x2 > > 1 1 1 > 2 2 1 > 3 1 3 > 4 2 3 > > Now orthognality and balance are satisfied; geometrically (for this very > simple example) we can place the independent variable values in a nice, > symmetrical rectangle. No matter how we introduce the terms we get a > unique set of orthogonal polynomials. Putting x1 first, then x2, then > 1x2, we get: > > phi_0 = 1 > phi_1 = x1 - 1.5 > phi_2 = x2 - 2 > phi_3 = x1x2 - 2 x1 - 1.5 x2 + 3 = phi_1*phi_2 > > If we use the order x2, x1, x1x2 we get the same polynomials except that > phi_1 and phi_2 switch labels. Not only is the set unique, but the > linear effects are truly decoupled and the interaction term is a product > of the components -- in an orthognal and balanced design interaction > polynomials are always such products of components. > > 2) If we can't get an orthogonal and balanced design because our lab > apparatus blew up in the middle of the experiment, or (more practically) > because we have plant data in which the "independent" variables are > constrained by process and product conditions, then we have to think > about the order in which we introduce terms. We have to guess at which > terms are most likely to be important (or "cheat" by seeing what the > data are srongly correlated with, using univariate correlations) and > introduce those first. In our nonorthogonal example, we would use the > x1-x2-x1x2 ordering, giving phi_1 = x1 - 1.5, if we thought x1 would be > more important than x2. In the opposite case we'd use the x2-x1-x1x2 > ordering and thus start with phi_1 = x2 - 2. In real (more complicated) > situations you have to consider such questions as: > > Surely x1 is more important than x2, but is the QUADRATIC effect of x1 > probably more important than the LINEAR effect of x2? > > Is an inherently weaker factor, x3, more or less important than the > interaction between x1 and x2? > > You may even want to try different orderings to see which one gives the > best correlation with the simplest equation. Bring on Occam's Razor! > > Well, that's my basic lesson in orthogonalizing multivariate data. I > hope you can see that the math is a lot easier than the upfront work and > decisionmaking that will surely be involved. > > --OL > . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
