Hi Kevin, On 29 March 2010 01:38, Kevin Dunn wrote: >> Message: 5 >> Date: Sun, 28 Mar 2010 00:24:01 +0000 >> From: Andrea Gavana <andrea.gav...@gmail.com> >> Subject: [Numpy-discussion] Interpolation question >> To: Discussion of Numerical Python <numpy-discussion@scipy.org> >> Message-ID: >> <d5ff27201003271724o6c82ec75v225d819c84140...@mail.gmail.com> >> Content-Type: text/plain; charset=ISO-8859-1 >> >> Hi All, >> >> I have an interpolation problem and I am having some difficulties >> in tackling it. I hope I can explain myself clearly enough. >> >> Basically, I have a whole bunch of 3D fluid flow simulations (close to >> 1000), and they are a result of different combinations of parameters. >> I was planning to use the Radial Basis Functions in scipy, but for the >> moment let's assume, to simplify things, that I am dealing only with >> one parameter (x). In 1000 simulations, this parameter x has 1000 >> values, obviously. The problem is, the outcome of every single >> simulation is a vector of oil production over time (let's say 40 >> values per simulation, one per year), and I would like to be able to >> interpolate my x parameter (1000 values) against all the simulations >> (1000x40) and get an approximating function that, given another x >> parameter (of size 1x1) will give me back an interpolated production >> profile (of size 1x40). > > [I posted the following earlier but forgot to change the subject - it > appears as a new thread called "NumPy-Discussion Digest, Vol 42, Issue > 85" - please ignore that thread] > > Andrea, may I suggest a different approach to RBF's. > > Realize that your vector of 40 values for each row in y are not > independent of each other (they will be correlated). First build a > principal component analysis (PCA) model on this 1000 x 40 matrix and > reduce it down to a 1000 x A matrix, called your scores matrix, where > A is the number of independent components. A is selected so that it > adequately summarizes Y without over-fitting and you will find A << > 40, maybe A = 2 or 3. There are tools, such as cross-validation, that > will help select a reasonable value of A. > > Then you can relate your single column of X to these independent > columns in A using a tool such as least squares: one least squares > model per column in the scores matrix. This works because each column > in the score vector is independent (contains totally orthogonal > information) to the others. But I would be surprised if this works > well enough, unless A = 1. > > But it sounds like your don't just have a single column in your > X-variables (you hinted that the single column was just for > simplification). In that case, I would build a projection to latent > structures model (PLS) model that builds a single latent-variable > model that simultaneously models the X-matrix, the Y-matrix as well as > providing the maximal covariance between these two matrices. > > If you need some references and an outline of code, then I can readily > provide these. > > This is a standard problem with data from spectroscopic instruments > and with batch processes. They produce hundreds, sometimes 1000's of > samples per row. PCA and PLS are very effective at summarizing these > down to a much smaller number of independent columns, very often just > a handful, and relating them (i.e. building a predictive model) to > other data matrices. > > I also just saw the suggestions of others to center the data by > subtracting the mean from each column in Y and scaling (by dividing > through by the standard deviation). This is a standard data > preprocessing step, called autoscaling and makes sense for any data > analysis, as you already discovered.
I have got some success by using time-based RBFs interpolations, but I am always open to other possible implementations (as the one I am using can easily fail for strange combinations of input parameters). Unfortunately, my understanding of your explanation is very very limited: I am not an expert at all, so it's a bit hard for me to translate the mathematical technical stuff in something I can understand. If you have an example code (even a very trivial one) for me to study so that I can understand what the code is actually doing, I would be more than grateful for your help :-) Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ ==> Never *EVER* use RemovalGroup for your house removal. You'll regret it forever. http://thedoomedcity.blogspot.com/2010/03/removal-group-nightmare.html <== _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion