Baunsgaard commented on a change in pull request #1124:
URL: https://github.com/apache/systemds/pull/1124#discussion_r541622750
##########
File path: scripts/builtin/scale.dml
##########
@@ -19,29 +19,48 @@
#
#-------------------------------------------------------------
-# Scale and center individual features in the input matrix
-# (column-wise) using z-score to scale the values.
-# -----------------------------------------------------------------------------
-# NAME TYPE DEFAULT MEANING
-# -----------------------------------------------------------------------------
-# X Matrix --- Input feature matrix
-# center Boolean TRUE Indicates whether or not to center the feature matrix
-# scale Boolean TRUE Indicates whether or not to scale the feature matrix
-# -----------------------------------------------------------------------------
-# Y Matrix --- Output feature matrix with K columns
-# -----------------------------------------------------------------------------
+# Scale and center individual features in the input matrix (column wise.)
using z-score to scale the values.
+#
---------------------------------------------------------------------------------------------
+# NAME TYPE DEFAULT MEANING
+#
---------------------------------------------------------------------------------------------
+# X Matrix --- Input feature matrix
+# Center Boolean TRUE Indicates whether or not to center the
feature matrix
+# Scale Boolean TRUE Indicates whether or not to scale the
feature matrix
+#
---------------------------------------------------------------------------------------------
+# Y Matrix --- Output feature matrix with K columns
+# ColMean Matrix --- The column means of the input, subtracted if
Center was TRUE
+# ScaleFactor Matrix --- The Scaling of the values, to make each
dimension have similar value ranges
+#
---------------------------------------------------------------------------------------------
-m_scale = function(Matrix[Double] X, Boolean center, Boolean scale) return
(Matrix[Double] Y) {
- if( center )
- X = X - colMeans(X);
+m_scale = function(Matrix[Double] X, Boolean center, Boolean scale)
+ return (Matrix[Double] Y, Matrix[Double] ColMean, Matrix[Double]
ScaleFactor)
+{
+ if(center){
+ ColMean = colMeans(X)
+ X = X - ColMean
+ }
+ else {
+ ColMean = matrix(0,rows=0,cols=0)
+ }
if (scale) {
- cvars = colSums(X^2)/(nrow(X)-1);
+ N = nrow(X);
+
+ ScaleFactor = sqrt(colSums(X^2)/(N-1))
- #scale by std-dev and replace NaNs with 0's
- X = replace(target=X/sqrt(cvars),
- pattern=NaN, replacement=0);
+ # Replace entries in the scale factor that are 0 with an small eps
+ # To avoid division by 0, and thereby introducing NaN to the ouput.
+ ScaleFactor = replace(target=ScaleFactor,
+ pattern=NaN, replacement=1e-16);
Review comment:
Alternatively we could replace the values with 1, that would maybe be
better since that would remove the scaling of that dimension.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]