The last time I worked on a large financial modeling system, we passed all the data around as a map and configured multiple processing steps as simple transformations on the map.
That was particularly nice when it came time for variable selection because it is easy to ignore items in a map that you don't know about. The performance was entirely acceptable since the highest throughput needed was a few thousand decisions per second. I would expect that for high-end modelling this approach might be a bit slow (perhaps fast enough for a sparse matrix app), but more importantly, a fully general map is probably much too large to build large scale applications around. One way to avoid this would be have an input format that splits abstract sparse matrices into views each of which contains the column labels (once), but stores the data in a terse format. That would allow the application the opportunity to maintain the fiction a map based implementation, but avoid the memory and object creation overhead. On 2/15/08 10:21 AM, "Isabel Drost" <[EMAIL PROTECTED]> wrote: > On Friday 15 February 2008, Lukas Vlcek wrote: >> Speaking about various implementations of some algorithm it reminds me that >> folks in commons-math faced the same situation and they decided to use >> Strategy design pattern for this (good choice I think). See >> http://commons.apache.org/math/index.html#summary item #4. > > I would even guess, we should look into the Chain of Commands lib. Usually you > do not want to do only one pre processing step but you want to do a > configurable chain. Something along the lines of first selecting only > relevant features, then scaling the data... > > Cheers, > Isabel >
