Baunsgaard commented on pull request #1123:
URL: https://github.com/apache/systemds/pull/1123#issuecomment-743764247


   1. I changed the replace Nan because the NaN would be introduced in cases of 
division by zero. therefore it made sense to change the replacement on the 
scale factor. This would of cause not remove already existing NaN values in the 
matrix, but i would say it's fair to do such a cleanup before calling PCA, even 
if it change the external behavior.
   2. the changes in API allowed us to make a PCA predict function, such that 
you could "train" a PCA, using the methods already provided and actually 
reconstruct the approximation of the original data using the extra returned 
parameters, and predict using PCA on unseen data. If the colMeans, and scale is 
not returned this is impossible. Furthermore those returns could simply be 
ignored by the end user with no impact on performance.
   3. I wanted to have a measure of how much a PCA was effected by lossy 
compression since this would give a fair measure of lost data, therefore i 
needed the extra outputs in the API. 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to