Re: Mahalanobis Distance Implementation

Ted Dunning Sat, 17 Jul 2010 14:41:04 -0700

On Sat, Jul 17, 2010 at 2:07 PM, Nicolas Maillot <[email protected]> wrote:


> Hi all,
>
> I would like to contribute to Mahout by starting with a simple  task.
>
> I had the idea of Implementing the Mahalanobis distance :
> http://en.wikipedia.org/wiki/Mahalanobis_distance
> This should be quite easy and could be a useful feature.
>
> After looking  at the Mahout code for a couple of hours, I have three
> questions:
>
> 1) Generally speaking, what is the level of reliability of algorithms
> implemented in matrix.linalg ?
>

They should be reasonably good, but they mostly lack unit tests.  Further,
they have generally not been converted to use Vector and Matrix which is
something that is probably necessary over time.  I have done a little bit in
this line, but we need quite a bit more.

As you find bits that you want to use, you should convert them over.  THis
will be a bit like pulling a thread on a sweater in that it will cause lots
of additional bits to need conversion.  I will help with this.  As part of
the conversion, we should add unit tests as well.


> 2) As inverting a covariance matrix is required, is using the method
> inverse(DoubleMatrix2D A) from class Algebra a good way to achieve this ?
>

Generally it is considered very bad practice to invert a matrix.  Instead,
you should use some sort of easily solved matrix decomposition.  Commonly,
QR decomposition is used, but SVD might be useful in certain situations.

As I remember Mahalonobis, it uses moments to estimate principal components
in the form of the inverse covariance matrix.  Mahalonobis distances are
then computed in this reference frame.  This should be amenable to QR
decomposition.


> 3) Does is look reasonable if the computation of the distance is based on
> dense matrices ( inside the method distance(Vector v1,Vector v2) method
> part
> of the future MahalanobisDistanceMeasure class) ?
>

For use in recommendations, it should be assumed that the inputs are sparse,
but internal products might well be dense if they are fixed in size and
relatively small.  The computation of distance should accept sparse vectors,
but projecting them down to small dense vectors to do the distance
computation is just fine.


> 4) What is the best way to create a DoubleMatrix1D from a Vector ?
>

Don't.

Just convert the code using the DoubleMatrix1D as mentioned above.

Re: Mahalanobis Distance Implementation

Reply via email to