Re: Weighted Manhattan Distance Metric

Ted Dunning Fri, 15 Feb 2008 10:33:16 -0800


The last time I worked on a large financial modeling system, we passed all
the data around as a map and configured multiple processing steps as simple
transformations on the map.

That was particularly nice when it came time for variable selection because
it is easy to ignore items in a map that you don't know about.

The performance was entirely acceptable since the highest throughput needed
was a few thousand decisions per second.  I would expect that for high-end
modelling this approach might be a bit slow (perhaps fast enough for a
sparse matrix app), but more importantly, a fully general map is probably
much too large to build large scale applications around.

One way to avoid this would be have an input format that splits abstract
sparse matrices into views each of which contains the column labels (once),
but stores the data in a terse format.  That would allow the application the
opportunity to maintain the fiction a map based implementation, but avoid
the memory and object creation overhead.

On 2/15/08 10:21 AM, "Isabel Drost" <[EMAIL PROTECTED]> wrote:

> On Friday 15 February 2008, Lukas Vlcek wrote:
>> Speaking about various implementations of some algorithm it reminds me that
>> folks in commons-math faced the same situation and they decided to use
>> Strategy design pattern for this (good choice I think). See
>> http://commons.apache.org/math/index.html#summary item #4.
> 
> I would even guess, we should look into the Chain of Commands lib. Usually you
> do not want to do only one pre processing step but you want to do a
> configurable chain. Something along the lines of first selecting only
> relevant features, then scaling the data...
> 
> Cheers,
> Isabel
>

Re: Weighted Manhattan Distance Metric

Reply via email to