Baunsgaard commented on pull request #1123: URL: https://github.com/apache/systemds/pull/1123#issuecomment-743764247
1. I changed the replace Nan because the NaN would be introduced in cases of division by zero. therefore it made sense to change the replacement on the scale factor. This would of cause not remove already existing NaN values in the matrix, but i would say it's fair to do such a cleanup before calling PCA, even if it change the external behavior. 2. the changes in API allowed us to make a PCA predict function, such that you could "train" a PCA, using the methods already provided and actually reconstruct the approximation of the original data using the extra returned parameters, and predict using PCA on unseen data. If the colMeans, and scale is not returned this is impossible. Furthermore those returns could simply be ignored by the end user with no impact on performance. 3. I wanted to have a measure of how much a PCA was effected by lossy compression since this would give a fair measure of lost data, therefore i needed the extra outputs in the API. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
