Baunsgaard commented on pull request #1123:
URL: https://github.com/apache/systemds/pull/1123#issuecomment-743772534


   > ad 1) besides the changed overall behavior, the comment referred to 
`replace(target=ScaleFactor, pattern=NaN, replacement=1e-16);`, which would 
need to replace zero as this happens before the division.
   
   Yes, and exactly why i replace all the 0 values, to avoid introducing NaN. 
instead of removing the NaN on the full matrix after the division operation.
    
   > ad 2/3) for local experiments this is fine, but for the hierarchy of 
builtin functions, we should aim for simple and stable APIs and find other ways 
to automatically remove redundancy (e.g., by lineage-based reuse).
   
   I agree simplicity is of utmost importance, but with the current PCA, we are 
limited to only apply the standard version without scaling if we want to be 
able to reuse the model on unseen data. This results in a limited system 
overall. The redundancy of the extra return values, should be handled already 
by the system.
   
   > Passing all these intermediates around would quickly become really messy 
and confusing for both users and developers. Which PCA predict do you refer to 
- is there already a builtin function for it?
   
   I agree that it becomes messy, and it have been for a while, especially if 
you consider the neural network part of the system. where each layers weight 
and bias is parsed back. But these returns are necessary for inference on 
unseen data. I have added a PR with the PCA predict #1124 . Maybe it could be 
nice containing all the methods for an algorithm in the same file, such that 
the predict is located inside the same file?
   
   > Generally, please don't rewrite the builtin functions just to make them 
more amenable for compressed operations.
   
   I would argue that these changes are for the "greater good", and therefore 
also better for compression, I am avoiding changing things just for 
compressions sake, but thanks for the reminder :smile: .
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to