Baunsgaard commented on a change in pull request #1124:
URL: https://github.com/apache/systemds/pull/1124#discussion_r541622750



##########
File path: scripts/builtin/scale.dml
##########
@@ -19,29 +19,48 @@
 #
 #-------------------------------------------------------------
 
-# Scale and center individual features in the input matrix 
-# (column-wise) using z-score to scale the values.
-# -----------------------------------------------------------------------------
-# NAME   TYPE    DEFAULT  MEANING
-# -----------------------------------------------------------------------------
-# X      Matrix  ---      Input feature matrix
-# center Boolean TRUE     Indicates whether or not to center the feature matrix
-# scale  Boolean TRUE     Indicates whether or not to scale the feature matrix
-# -----------------------------------------------------------------------------
-# Y            Matrix ---  Output feature matrix with K columns
-# -----------------------------------------------------------------------------
+# Scale and center individual features in the input matrix (column wise.) 
using z-score to scale the values.
+# 
---------------------------------------------------------------------------------------------
+# NAME         TYPE      DEFAULT  MEANING
+# 
---------------------------------------------------------------------------------------------
+# X            Matrix    ---      Input feature matrix
+# Center       Boolean   TRUE     Indicates whether or not to center the 
feature matrix
+# Scale        Boolean   TRUE     Indicates whether or not to scale the 
feature matrix
+# 
---------------------------------------------------------------------------------------------
+# Y            Matrix    ---      Output feature matrix with K columns
+# ColMean      Matrix    ---      The column means of the input, subtracted if 
Center was TRUE
+# ScaleFactor  Matrix    ---      The Scaling of the values, to make each 
dimension have similar value ranges
+# 
---------------------------------------------------------------------------------------------
 
-m_scale = function(Matrix[Double] X, Boolean center, Boolean scale) return 
(Matrix[Double] Y) {
-  if( center )
-    X = X - colMeans(X);
+m_scale = function(Matrix[Double] X, Boolean center, Boolean scale) 
+  return (Matrix[Double] Y, Matrix[Double] ColMean, Matrix[Double] 
ScaleFactor) 
+{
+  if(center){
+    ColMean = colMeans(X)
+    X =  X - ColMean
+  }
+  else {
+    ColMean = matrix(0,rows=0,cols=0)
+  }
 
   if (scale) {
-    cvars = colSums(X^2)/(nrow(X)-1);
+    N = nrow(X);
+
+    ScaleFactor = sqrt(colSums(X^2)/(N-1))
 
-    #scale by std-dev and replace NaNs with 0's
-    X = replace(target=X/sqrt(cvars),
-      pattern=NaN, replacement=0);
+    # Replace entries in the scale factor that are 0 with an small eps
+    # To avoid division by 0, and thereby introducing NaN to the ouput.
+    ScaleFactor = replace(target=ScaleFactor,
+      pattern=NaN, replacement=1e-16);

Review comment:
       Alternatively we could replace the values with 1, that would maybe be 
better since that would remove the scaling of that dimension.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to