Baunsgaard commented on a change in pull request #1124: URL: https://github.com/apache/systemds/pull/1124#discussion_r541622750
########## File path: scripts/builtin/scale.dml ########## @@ -19,29 +19,48 @@ # #------------------------------------------------------------- -# Scale and center individual features in the input matrix -# (column-wise) using z-score to scale the values. -# ----------------------------------------------------------------------------- -# NAME TYPE DEFAULT MEANING -# ----------------------------------------------------------------------------- -# X Matrix --- Input feature matrix -# center Boolean TRUE Indicates whether or not to center the feature matrix -# scale Boolean TRUE Indicates whether or not to scale the feature matrix -# ----------------------------------------------------------------------------- -# Y Matrix --- Output feature matrix with K columns -# ----------------------------------------------------------------------------- +# Scale and center individual features in the input matrix (column wise.) using z-score to scale the values. +# --------------------------------------------------------------------------------------------- +# NAME TYPE DEFAULT MEANING +# --------------------------------------------------------------------------------------------- +# X Matrix --- Input feature matrix +# Center Boolean TRUE Indicates whether or not to center the feature matrix +# Scale Boolean TRUE Indicates whether or not to scale the feature matrix +# --------------------------------------------------------------------------------------------- +# Y Matrix --- Output feature matrix with K columns +# ColMean Matrix --- The column means of the input, subtracted if Center was TRUE +# ScaleFactor Matrix --- The Scaling of the values, to make each dimension have similar value ranges +# --------------------------------------------------------------------------------------------- -m_scale = function(Matrix[Double] X, Boolean center, Boolean scale) return (Matrix[Double] Y) { - if( center ) - X = X - colMeans(X); +m_scale = function(Matrix[Double] X, Boolean center, Boolean scale) + return (Matrix[Double] Y, Matrix[Double] ColMean, Matrix[Double] ScaleFactor) +{ + if(center){ + ColMean = colMeans(X) + X = X - ColMean + } + else { + ColMean = matrix(0,rows=0,cols=0) + } if (scale) { - cvars = colSums(X^2)/(nrow(X)-1); + N = nrow(X); + + ScaleFactor = sqrt(colSums(X^2)/(N-1)) - #scale by std-dev and replace NaNs with 0's - X = replace(target=X/sqrt(cvars), - pattern=NaN, replacement=0); + # Replace entries in the scale factor that are 0 with an small eps + # To avoid division by 0, and thereby introducing NaN to the ouput. + ScaleFactor = replace(target=ScaleFactor, + pattern=NaN, replacement=1e-16); Review comment: Alternatively we could replace the values with 1, that would maybe be better since that would remove the scaling of that dimension. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org