Baunsgaard commented on pull request #1123:
URL: https://github.com/apache/systemds/pull/1123#issuecomment-743772534
> ad 1) besides the changed overall behavior, the comment referred to
`replace(target=ScaleFactor, pattern=NaN, replacement=1e-16);`, which would
need to replace zero as this happens before the division.
Yes, and exactly why i replace all the 0 values, to avoid introducing NaN.
instead of removing the NaN on the full matrix after the division operation.
> ad 2/3) for local experiments this is fine, but for the hierarchy of
builtin functions, we should aim for simple and stable APIs and find other ways
to automatically remove redundancy (e.g., by lineage-based reuse).
I agree simplicity is of utmost importance, but with the current PCA, we are
limited to only apply the standard version without scaling if we want to be
able to reuse the model on unseen data. This results in a limited system
overall. The redundancy of the extra return values, should be handled already
by the system.
> Passing all these intermediates around would quickly become really messy
and confusing for both users and developers. Which PCA predict do you refer to
- is there already a builtin function for it?
I agree that it becomes messy, and it have been for a while, especially if
you consider the neural network part of the system. where each layers weight
and bias is parsed back. But these returns are necessary for inference on
unseen data. I have added a PR with the PCA predict #1124 . Maybe it could be
nice containing all the methods for an algorithm in the same file, such that
the predict is located inside the same file?
> Generally, please don't rewrite the builtin functions just to make them
more amenable for compressed operations.
I would argue that these changes are for the "greater good", and therefore
also better for compression, I am avoiding changing things just for
compressions sake, but thanks for the reminder :smile: .
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]