kpretterhofer opened a new pull request #1153:
URL: https://github.com/apache/systemds/pull/1153


   This implementation computes a simple gaussian classifier, e.g. it outputs 
the respective parameters which are needed for classifying.
   
   As input, the function basically just receives a feature matrix, and a 
target vector (and some small value for smoothing along the variances, to 
prevent numerical errors). The function computes and returns:
   
   - prior probability
   - means
   - determinants
   - inverse covariance matrix
   
   per class. 
   For classifying one can compute: p(C=c | x) = p(x | c) * p(c)
   where p(x | c) is the (multivariate) Gaussian PDF for class c, and p(c) is 
the prior probability for class c. 
   
   Please let me know if and how I can still improve the code, s.t. it fits 
well into SystemDS. 
   
   One thing where I was quite unsure was the unit tests. Since calculating 
determinants and the inverse of the covariance matrices can lead to floating 
point errors,  I was not quite sure how to compare the results.  I did compare 
most of them, as suggested in the mailing list, with the avg. bit distance, 
with a quite high maxUnitsOfLeastPrecssion. 
   Although the values from the inverse covariance matrices  can differ a lot 
(systemDS vs R), i am pretty sure that the computation  is correct, since 
multiplying it with the covariance matrix itself, leads to the identity (which 
I tested during development). 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to