Repository: incubator-systemml Updated Branches: refs/heads/master fb55a74d1 -> 651725651
[SYSTEMML-1463] Rename `batch_norm.dml` and `spatial_batch_norm.dml` Rename `batch_norm.dml` and `spatial_batch_norm.dml` to `batch_norm1d.dml` and `batch_norm2d.dml`. Closes #453. Project: http://git-wip-us.apache.org/repos/asf/incubator-systemml/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-systemml/commit/f5ef628c Tree: http://git-wip-us.apache.org/repos/asf/incubator-systemml/tree/f5ef628c Diff: http://git-wip-us.apache.org/repos/asf/incubator-systemml/diff/f5ef628c Branch: refs/heads/master Commit: f5ef628c0dbe4e5ce8dec61f5e05c5597e341c95 Parents: fb55a74 Author: Mike Dusenberry <[email protected]> Authored: Mon Apr 10 17:20:13 2017 -0700 Committer: Mike Dusenberry <[email protected]> Committed: Mon Apr 10 17:20:13 2017 -0700 ---------------------------------------------------------------------- .../SystemML-NN/nn/layers/batch_norm.dml | 209 ---------------- .../SystemML-NN/nn/layers/batch_norm1d.dml | 210 ++++++++++++++++ .../SystemML-NN/nn/layers/batch_norm2d.dml | 238 +++++++++++++++++++ .../nn/layers/spatial_batch_norm.dml | 235 ------------------ .../staging/SystemML-NN/nn/test/grad_check.dml | 68 +++--- .../staging/SystemML-NN/nn/test/run_tests.dml | 8 +- scripts/staging/SystemML-NN/nn/test/test.dml | 24 +- 7 files changed, 495 insertions(+), 497 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/f5ef628c/scripts/staging/SystemML-NN/nn/layers/batch_norm.dml ---------------------------------------------------------------------- diff --git a/scripts/staging/SystemML-NN/nn/layers/batch_norm.dml b/scripts/staging/SystemML-NN/nn/layers/batch_norm.dml deleted file mode 100644 index caad100..0000000 --- a/scripts/staging/SystemML-NN/nn/layers/batch_norm.dml +++ /dev/null @@ -1,209 +0,0 @@ -#------------------------------------------------------------- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -#------------------------------------------------------------- - -/* - * Batch Normalization layer. - */ - -forward = function(matrix[double] X, matrix[double] gamma, matrix[double] beta, - string mode, matrix[double] ema_mean, matrix[double] ema_var, - double mu, double epsilon) - return (matrix[double] out, matrix[double] ema_mean_upd, matrix[double] ema_var_upd, - matrix[double] cache_mean, matrix[double] cache_var, matrix[double] cache_norm) { - /* - * Computes the forward pass for a batch normalization layer. - * - * A batch normalization layer uses the per-feature sample mean and - * per-feature uncorrected sample variance during training to - * normalize each feature of the input data. Additionally, it - * introduces learnable parameters (gamma, beta) to control the - * amount of normalization. - * - * `y = ((x-mean) / sqrt(var+eps)) * gamma + beta` - * - * This implementation maintains exponential moving averages of the - * mean and variance during training for use during testing. - * - * Reference: - * - Batch Normalization: Accelerating Deep Network Training by - * Reducing Internal Covariate Shift, S. Ioffe & C. Szegedy, 2015 - * - https://arxiv.org/abs/1502.03167 - * - * Inputs: - * - X: Inputs, of shape (N, D). - * - gamma: Scale parameters, of shape (1, D). - * - beta: Shift parameters, of shape (1, D). - * - mode: 'train' or 'test' to indicate if the model is currently - * being trained or tested. During training, the current batch - * mean and variance will be used to normalize the inputs, while - * during testing, the exponential average of the mean and - * variance over all previous batches will be used. - * - ema_mean: Exponential moving average of the mean, of - * shape (1, D). - * - ema_var: Exponential moving average of the variance, of - * shape (1, D). - * - mu: Momentum value for moving averages. - * Typical values are in the range of [0.9, 0.999]. - * - epsilon: Smoothing term to avoid divide by zero errors. - * Typical values are in the range of [1e-5, 1e-3]. - * - * Outputs: - * - out: Outputs, of shape (N, D). - * - ema_mean_upd: Updated exponential moving average of the mean, - * of shape (1, D). - * - ema_var_upd: Updated exponential moving average of the variance, - * of shape (1, D). - * - cache_mean: Cache of the batch mean, of shape (1, D). - * Note: This is used for performance during training. - * - cache_var: Cache of the batch variance, of shape (1, D). - * Note: This is used for performance during training. - * - cache_norm: Cache of the normalized inputs, of shape (N, D). - * Note: This is used for performance during training. - */ - N = nrow(X) - - if(mode == 'train') { - # Compute feature-wise mean and variance - mean = colMeans(X) # shape (1, D) - # var = (1/N) * colSums((X-mean)^2) - var = colVars(X) * ((N-1)/N) # compute uncorrected variance, of shape (1, D) - # Update moving averages - ema_mean_upd = mu*ema_mean + (1-mu)*mean - ema_var_upd = mu*ema_var + (1-mu)*var - } - else { - # Use moving averages of mean and variance during testing - mean = ema_mean - var = ema_var - ema_mean_upd = ema_mean - ema_var_upd = ema_var - } - - # Normalize, shift, and scale - # norm = (X-mean)*(var+epsilon)^(-1/2) - norm = (X-mean) / sqrt(var+epsilon) # shape (N, D) - out = norm*gamma + beta # shape (N, D) - - # Save variable for backward pass - cache_mean = mean - cache_var = var - cache_norm = norm -} - -backward = function(matrix[double] dout, matrix[double] out, - matrix[double] ema_mean_upd, matrix[double] ema_var_upd, - matrix[double] cache_mean, matrix[double] cache_var, matrix[double] cache_norm, - matrix[double] X, matrix[double] gamma, matrix[double] beta, - string mode, matrix[double] ema_mean, matrix[double] ema_var, - double mu, double epsilon) - return (matrix[double] dX, matrix[double] dgamma, matrix[double] dbeta) { - /* - * Computes the backward pass for a batch normalization layer. - * - * Inputs: - * - dout: Gradient wrt `out` from upstream, of shape (N, D). - * - out: Outputs from the forward pass, of shape (N, D). - * - ema_mean_upd: Updated exponential moving average of the mean - * from the forward pass, of shape (1, D). - * - ema_var_upd: Updated exponential moving average of the variance - * from the forward pass, of shape (1, D). - * - cache_mean: Cache of the batch mean from the forward pass, of - * shape (1, D). Note: This is used for performance during - * training. - * - cache_var: Cache of the batch variance from the forward pass, - * of shape (1, D). Note: This is used for performance during - * training. - * - cache_norm: Cache of the normalized inputs from the forward - * pass, of shape (N, D). Note: This is used for performance - * during training. - * - X: Inputs, of shape (N, D). - * - gamma: Scale parameters, of shape (1, D). - * - beta: Shift parameters, of shape (1, D). - * - mode: 'train' or 'test' to indicate if the model is currently - * being trained or tested. During training, the current batch - * mean and variance will be used to normalize the inputs, while - * during testing, the exponential average of the mean and - * variance over all previous batches will be used. - * - ema_mean: Exponential moving average of the mean, of - * shape (1, D). - * - ema_var: Exponential moving average of the variance, of - * shape (1, D). - * - mu: Momentum value for moving averages. - * Typical values are in the range of [0.9, 0.999]. - * - epsilon: Smoothing term to avoid divide by zero errors. - * Typical values are in the range of [1e-5, 1e-3]. - * - * Outputs: - * - dX: Gradient wrt `X`, of shape (N, D). - * - dgamma: Gradient wrt `W`, of shape (1, D). - * - dbeta: Gradient wrt `b`, of shape (1, D). - * - */ - N = nrow(X) - mean = cache_mean - var = cache_var - norm = cache_norm - centered = X-mean - - if (mode == 'train') { - # Compute gradients during training - dgamma = colSums(norm*dout) # shape (1, D) - dbeta = colSums(dout) # shape (1, D) - dnorm = dout * gamma # shape (N, D) - dvar = (-1/2) * colSums(centered * (var+epsilon)^(-3/2) * dnorm) # shape (1, D) - dmean = colSums((-dnorm/sqrt(var+epsilon)) + ((-2/N)*centered*dvar)) # shape (1, D) - dX = (dnorm/sqrt(var+epsilon)) + ((2/N)*centered*dvar) + ((1/N)*dmean) # shape (N, D) - } - else { - # Compute gradients during testing - dgamma = colSums(norm*dout) # shape (1, D) - dbeta = colSums(dout) # shape (1, D) - dnorm = dout * gamma # shape (N, D) - dX = dnorm / sqrt(var+epsilon) # shape (N, D) - } -} - -init = function(int D) - return (matrix[double] gamma, matrix[double] beta, - matrix[double] ema_mean, matrix[double] ema_var) { - /* - * Initialize the parameters of this layer. - * - * Note: This is just a convenience function, and parameters - * may be initialized manually if needed. - * - * Inputs: - * - D: Dimensionality of the input features (number of features). - * - * Outputs: - * - gamma: Scale parameters, of shape (1, D). - * - beta: Shift parameters, of shape (1, D). - * - ema_mean: Exponential moving average of the mean, of - * shape (1, D). - * - ema_var: Exponential moving average of the variance, of - * shape (1, D). - */ - gamma = matrix(1, rows=1, cols=D) - beta = matrix(0, rows=1, cols=D) - ema_mean = matrix(0, rows=1, cols=D) - ema_var = matrix(1, rows=1, cols=D) -} - http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/f5ef628c/scripts/staging/SystemML-NN/nn/layers/batch_norm1d.dml ---------------------------------------------------------------------- diff --git a/scripts/staging/SystemML-NN/nn/layers/batch_norm1d.dml b/scripts/staging/SystemML-NN/nn/layers/batch_norm1d.dml new file mode 100644 index 0000000..9ecbd77 --- /dev/null +++ b/scripts/staging/SystemML-NN/nn/layers/batch_norm1d.dml @@ -0,0 +1,210 @@ +#------------------------------------------------------------- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +#------------------------------------------------------------- + +/* + * 1D Batch Normalization layer. + */ + +forward = function(matrix[double] X, matrix[double] gamma, matrix[double] beta, + string mode, matrix[double] ema_mean, matrix[double] ema_var, + double mu, double epsilon) + return (matrix[double] out, matrix[double] ema_mean_upd, matrix[double] ema_var_upd, + matrix[double] cache_mean, matrix[double] cache_var, matrix[double] cache_norm) { + /* + * Computes the forward pass for a 1D batch normalization layer. + * The input data has N examples, each with D features. + * + * A batch normalization layer uses the per-feature sample mean and + * per-feature uncorrected sample variance during training to + * normalize each feature of the input data. Additionally, it + * introduces learnable parameters (gamma, beta) to control the + * amount of normalization. + * + * `y = ((x-mean) / sqrt(var+eps)) * gamma + beta` + * + * This implementation maintains exponential moving averages of the + * mean and variance during training for use during testing. + * + * Reference: + * - Batch Normalization: Accelerating Deep Network Training by + * Reducing Internal Covariate Shift, S. Ioffe & C. Szegedy, 2015 + * - https://arxiv.org/abs/1502.03167 + * + * Inputs: + * - X: Inputs, of shape (N, D). + * - gamma: Scale parameters, of shape (1, D). + * - beta: Shift parameters, of shape (1, D). + * - mode: 'train' or 'test' to indicate if the model is currently + * being trained or tested. During training, the current batch + * mean and variance will be used to normalize the inputs, while + * during testing, the exponential average of the mean and + * variance over all previous batches will be used. + * - ema_mean: Exponential moving average of the mean, of + * shape (1, D). + * - ema_var: Exponential moving average of the variance, of + * shape (1, D). + * - mu: Momentum value for moving averages. + * Typical values are in the range of [0.9, 0.999]. + * - epsilon: Smoothing term to avoid divide by zero errors. + * Typical values are in the range of [1e-5, 1e-3]. + * + * Outputs: + * - out: Outputs, of shape (N, D). + * - ema_mean_upd: Updated exponential moving average of the mean, + * of shape (1, D). + * - ema_var_upd: Updated exponential moving average of the variance, + * of shape (1, D). + * - cache_mean: Cache of the batch mean, of shape (1, D). + * Note: This is used for performance during training. + * - cache_var: Cache of the batch variance, of shape (1, D). + * Note: This is used for performance during training. + * - cache_norm: Cache of the normalized inputs, of shape (N, D). + * Note: This is used for performance during training. + */ + N = nrow(X) + + if(mode == 'train') { + # Compute feature-wise mean and variance + mean = colMeans(X) # shape (1, D) + # var = (1/N) * colSums((X-mean)^2) + var = colVars(X) * ((N-1)/N) # compute uncorrected variance, of shape (1, D) + # Update moving averages + ema_mean_upd = mu*ema_mean + (1-mu)*mean + ema_var_upd = mu*ema_var + (1-mu)*var + } + else { + # Use moving averages of mean and variance during testing + mean = ema_mean + var = ema_var + ema_mean_upd = ema_mean + ema_var_upd = ema_var + } + + # Normalize, shift, and scale + # norm = (X-mean)*(var+epsilon)^(-1/2) + norm = (X-mean) / sqrt(var+epsilon) # shape (N, D) + out = norm*gamma + beta # shape (N, D) + + # Save variable for backward pass + cache_mean = mean + cache_var = var + cache_norm = norm +} + +backward = function(matrix[double] dout, matrix[double] out, + matrix[double] ema_mean_upd, matrix[double] ema_var_upd, + matrix[double] cache_mean, matrix[double] cache_var, matrix[double] cache_norm, + matrix[double] X, matrix[double] gamma, matrix[double] beta, + string mode, matrix[double] ema_mean, matrix[double] ema_var, + double mu, double epsilon) + return (matrix[double] dX, matrix[double] dgamma, matrix[double] dbeta) { + /* + * Computes the backward pass for a 1D batch normalization layer. + * + * Inputs: + * - dout: Gradient wrt `out` from upstream, of shape (N, D). + * - out: Outputs from the forward pass, of shape (N, D). + * - ema_mean_upd: Updated exponential moving average of the mean + * from the forward pass, of shape (1, D). + * - ema_var_upd: Updated exponential moving average of the variance + * from the forward pass, of shape (1, D). + * - cache_mean: Cache of the batch mean from the forward pass, of + * shape (1, D). Note: This is used for performance during + * training. + * - cache_var: Cache of the batch variance from the forward pass, + * of shape (1, D). Note: This is used for performance during + * training. + * - cache_norm: Cache of the normalized inputs from the forward + * pass, of shape (N, D). Note: This is used for performance + * during training. + * - X: Inputs, of shape (N, D). + * - gamma: Scale parameters, of shape (1, D). + * - beta: Shift parameters, of shape (1, D). + * - mode: 'train' or 'test' to indicate if the model is currently + * being trained or tested. During training, the current batch + * mean and variance will be used to normalize the inputs, while + * during testing, the exponential average of the mean and + * variance over all previous batches will be used. + * - ema_mean: Exponential moving average of the mean, of + * shape (1, D). + * - ema_var: Exponential moving average of the variance, of + * shape (1, D). + * - mu: Momentum value for moving averages. + * Typical values are in the range of [0.9, 0.999]. + * - epsilon: Smoothing term to avoid divide by zero errors. + * Typical values are in the range of [1e-5, 1e-3]. + * + * Outputs: + * - dX: Gradient wrt `X`, of shape (N, D). + * - dgamma: Gradient wrt `W`, of shape (1, D). + * - dbeta: Gradient wrt `b`, of shape (1, D). + * + */ + N = nrow(X) + mean = cache_mean + var = cache_var + norm = cache_norm + centered = X-mean + + if (mode == 'train') { + # Compute gradients during training + dgamma = colSums(dout*norm) # shape (1, D) + dbeta = colSums(dout) # shape (1, D) + dnorm = dout * gamma # shape (N, D) + dvar = (-1/2) * colSums(centered * (var+epsilon)^(-3/2) * dnorm) # shape (1, D) + dmean = colSums((-dnorm/sqrt(var+epsilon)) + ((-2/N)*centered*dvar)) # shape (1, D) + dX = (dnorm/sqrt(var+epsilon)) + ((2/N)*centered*dvar) + ((1/N)*dmean) # shape (N, D) + } + else { + # Compute gradients during testing + dgamma = colSums(dout*norm) # shape (1, D) + dbeta = colSums(dout) # shape (1, D) + dnorm = dout * gamma # shape (N, D) + dX = dnorm / sqrt(var+epsilon) # shape (N, D) + } +} + +init = function(int D) + return (matrix[double] gamma, matrix[double] beta, + matrix[double] ema_mean, matrix[double] ema_var) { + /* + * Initialize the parameters of this layer. + * + * Note: This is just a convenience function, and parameters + * may be initialized manually if needed. + * + * Inputs: + * - D: Dimensionality of the input features (number of features). + * + * Outputs: + * - gamma: Scale parameters, of shape (1, D). + * - beta: Shift parameters, of shape (1, D). + * - ema_mean: Exponential moving average of the mean, of + * shape (1, D). + * - ema_var: Exponential moving average of the variance, of + * shape (1, D). + */ + gamma = matrix(1, rows=1, cols=D) + beta = matrix(0, rows=1, cols=D) + ema_mean = matrix(0, rows=1, cols=D) + ema_var = matrix(1, rows=1, cols=D) +} + http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/f5ef628c/scripts/staging/SystemML-NN/nn/layers/batch_norm2d.dml ---------------------------------------------------------------------- diff --git a/scripts/staging/SystemML-NN/nn/layers/batch_norm2d.dml b/scripts/staging/SystemML-NN/nn/layers/batch_norm2d.dml new file mode 100644 index 0000000..fb25b2c --- /dev/null +++ b/scripts/staging/SystemML-NN/nn/layers/batch_norm2d.dml @@ -0,0 +1,238 @@ +#------------------------------------------------------------- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +#------------------------------------------------------------- + +/* + * 2D (Spatial) Batch Normalization layer. + */ +source("nn/util.dml") as util + +forward = function(matrix[double] X, matrix[double] gamma, matrix[double] beta, + int C, int Hin, int Win, string mode, + matrix[double] ema_mean, matrix[double] ema_var, + double mu, double epsilon) + return (matrix[double] out, matrix[double] ema_mean_upd, matrix[double] ema_var_upd, + matrix[double] cache_mean, matrix[double] cache_var, matrix[double] cache_norm) { + /* + * Computes the forward pass for a 2D (spatial) batch normalization + * layer. The input data has N examples, each represented as a 3D + * volume unrolled into a single vector. + * + * A spatial batch normalization layer uses the per-channel sample + * mean and per-channel uncorrected sample variance during training + * to normalize each channel of the input data. Additionally, it + * introduces learnable parameters (gamma, beta) to control the + * amount of normalization. + * + * `y = ((x-mean) / sqrt(var+eps)) * gamma + beta` + * + * This implementation maintains exponential moving averages of the + * mean and variance during training for use during testing. + * + * Reference: + * - Batch Normalization: Accelerating Deep Network Training by + * Reducing Internal Covariate Shift, S. Ioffe & C. Szegedy, 2015 + * - https://arxiv.org/abs/1502.03167 + * + * Inputs: + * - X: Inputs, of shape (N, C*Hin*Win). + * - gamma: Scale parameters, of shape (C, 1). + * - beta: Shift parameters, of shape (C, 1). + * - C: Number of input channels (dimensionality of input depth). + * - Hin: Input height. + * - Win: Input width. + * - mode: 'train' or 'test' to indicate if the model is currently + * being trained or tested. During training, the current batch + * mean and variance will be used to normalize the inputs, while + * during testing, the exponential average of the mean and + * variance over all previous batches will be used. + * - ema_mean: Exponential moving average of the mean, of + * shape (C, 1). + * - ema_var: Exponential moving average of the variance, of + * shape (C, 1). + * - mu: Momentum value for moving averages. + * Typical values are in the range of [0.9, 0.999]. + * - epsilon: Smoothing term to avoid divide by zero errors. + * Typical values are in the range of [1e-5, 1e-3]. + * + * Outputs: + * - out: Outputs, of shape (N, C*Hin*Win). + * - ema_mean_upd: Updated exponential moving average of the mean, + * of shape (C, 1). + * - ema_var_upd: Updated exponential moving average of the variance, + * of shape (C, 1). + * - cache_mean: Cache of the batch mean, of shape (C, 1). + * Note: This is used for performance during training. + * - cache_var: Cache of the batch variance, of shape (C, 1). + * Note: This is used for performance during training. + * - cache_norm: Cache of the normalized inputs, of + * shape (C, N*Hin*Win). Note: This is used for performance + * during training. + */ + N = nrow(X) + + if(mode == 'train') { + # Compute channel-wise mean and variance + # Since we don't have tensors, we will compute the means and variances in a piece-wise fashion. + # - mean of total group is mean of subgroup means + # - variance is the mean of the subgroup variances + the variance of the subgroup means + subgrp_means = matrix(colMeans(X), rows=C, cols=Hin*Win) + subgrp_vars = matrix(colVars(X) * ((N-1)/N), rows=C, cols=Hin*Win) # uncorrected variances + mean = rowMeans(subgrp_means) # shape (C, 1) + var = rowMeans(subgrp_vars) + rowVars(subgrp_means)*(((Hin*Win)-1)/(Hin*Win)) # shape (C, 1) + # Update moving averages + ema_mean_upd = mu*ema_mean + (1-mu)*mean + ema_var_upd = mu*ema_var + (1-mu)*var + } + else { + # Use moving averages of mean and variance during testing + mean = ema_mean + var = ema_var + ema_mean_upd = ema_mean + ema_var_upd = ema_var + } + + # Normalize, shift, and scale + # norm = (X-mean)*(var+epsilon)^(-1/2) + # = (X-mean) / sqrt(var+epsilon) + centered = bias_add(X, -mean) # shape (N, C*Hin*Win) + norm = bias_multiply(centered, 1/sqrt(var+epsilon)) # shape (N, C*Hin*Win) + # out = norm*gamma + beta + scaled = bias_multiply(norm, gamma) # shape (N, C*Hin*Win) + out = bias_add(scaled, beta) # shape (N, C*Hin*Win) + + # Save variable for backward pass + cache_mean = mean + cache_var = var + cache_norm = norm +} + +backward = function(matrix[double] dout, matrix[double] out, + matrix[double] ema_mean_upd, matrix[double] ema_var_upd, + matrix[double] cache_mean, matrix[double] cache_var, matrix[double] cache_norm, + matrix[double] X, matrix[double] gamma, matrix[double] beta, + int C, int Hin, int Win, string mode, + matrix[double] ema_mean, matrix[double] ema_var, + double mu, double epsilon) + return (matrix[double] dX, matrix[double] dgamma, matrix[double] dbeta) { + /* + * Computes the backward pass for a 2D (spatial) batch normalization + * layer. + * + * Inputs: + * - dout: Gradient wrt `out` from upstream, of shape (N, C*Hin*Win). + * - out: Outputs from the forward pass, of shape (N, C*Hin*Win). + * - ema_mean_upd: Updated exponential moving average of the mean + * from the forward pass, of shape (C, 1). + * - ema_var_upd: Updated exponential moving average of the variance + * from the forward pass, of shape (C, 1). + * - cache_mean: Cache of the batch mean from the forward pass, of + * shape (C, 1). Note: This is used for performance during + * training. + * - cache_var: Cache of the batch variance from the forward pass, + * of shape (C, 1). Note: This is used for performance during + * training. + * - cache_norm: Cache of the normalized inputs from the forward + * pass, of shape (C, N*Hin*Win). Note: This is used for + * performance during training. + * - X: Input data matrix to the forward pass, of + * shape (N, C*Hin*Win). + * - gamma: Scale parameters, of shape (C, 1). + * - beta: Shift parameters, of shape (C, 1). + * - C: Number of input channels (dimensionality of input depth). + * - Hin: Input height. + * - Win: Input width. + * - mode: 'train' or 'test' to indicate if the model is currently + * being trained or tested. During training, the current batch + * mean and variance will be used to normalize the inputs, while + * during testing, the exponential average of the mean and + * variance over all previous batches will be used. + * - ema_mean: Exponential moving average of the mean, of + * shape (C, 1). + * - ema_var: Exponential moving average of the variance, of + * shape (C, 1). + * - mu: Momentum value for moving averages. + * Typical values are in the range of [0.9, 0.999]. + * - epsilon: Smoothing term to avoid divide by zero errors. + * Typical values are in the range of [1e-5, 1e-3]. + * + * Outputs: + * - dX: Gradient wrt `X`, of shape (N, C*Hin*Win). + * - dgamma: Gradient wrt `W`, of shape (C, 1). + * - dbeta: Gradient wrt `b`, of shape (C, 1). + * + */ + N = nrow(X) + mean = cache_mean + var = cache_var + norm = cache_norm + centered = bias_add(X, -mean) # shape (N, C*Hin*Win) + + if (mode == 'train') { + # Compute gradients during training + dgamma = util::channel_sums(dout*norm, C, Hin, Win) # shape (C, 1) + dbeta = util::channel_sums(dout, C, Hin, Win) # shape (C, 1) + dnorm = bias_multiply(dout, gamma) # shape (N, C*Hin*Win) + dvar = util::channel_sums((-1/2) * bias_multiply(centered, (var+epsilon)^(-3/2)) * dnorm, + C, Hin, Win) # shape (C, 1) + dmean_norm_branch = util::channel_sums(bias_multiply(dnorm, -1/sqrt(var+epsilon)), C, Hin, Win) + dmean_var_branch = util::channel_sums((-2/(N*Hin*Win)) * centered, C, Hin, Win) + dmean_var_branch = dmean_var_branch * dvar # we can't use a function within an expression yet + dmean = dmean_norm_branch + dmean_var_branch # shape (C, 1) + dX_norm_branch = bias_multiply(dnorm, 1/sqrt(var+epsilon)) + dX_mean_branch = (1/(N*Hin*Win)) * bias_add(matrix(0, rows=1, cols=C*Hin*Win), dmean) + dX_var_branch = (2/(N*Hin*Win)) * bias_multiply(centered, dvar) + dX = dX_norm_branch + dX_mean_branch + dX_var_branch # shape (N, C*Hin*Win) + } + else { + # Compute gradients during testing + dgamma = util::channel_sums(dout*norm, C, Hin, Win) # shape (C, 1) + dbeta = util::channel_sums(dout, C, Hin, Win) # shape (C, 1) + dnorm = bias_multiply(dout, gamma) # shape (N, C*Hin*Win) + dX = bias_multiply(dnorm, 1/sqrt(var+epsilon)) # shape (N, C*Hin*Win) + } +} + +init = function(int C) + return (matrix[double] gamma, matrix[double] beta, + matrix[double] ema_mean, matrix[double] ema_var) { + /* + * Initialize the parameters of this layer. + * + * Note: This is just a convenience function, and parameters + * may be initialized manually if needed. + * + * Inputs: + * - C: Number of input channels (dimensionality of input depth). + * + * Outputs: + * - gamma: Scale parameters, of shape (C, 1). + * - beta: Shift parameters, of shape (C, 1). + * - ema_mean: Exponential moving average of the mean, of + * shape (C, 1). + * - ema_var: Exponential moving average of the variance, of + * shape (C, 1). + */ + gamma = matrix(1, rows=C, cols=1) + beta = matrix(0, rows=C, cols=1) + ema_mean = matrix(0, rows=C, cols=1) + ema_var = matrix(1, rows=C, cols=1) +} + http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/f5ef628c/scripts/staging/SystemML-NN/nn/layers/spatial_batch_norm.dml ---------------------------------------------------------------------- diff --git a/scripts/staging/SystemML-NN/nn/layers/spatial_batch_norm.dml b/scripts/staging/SystemML-NN/nn/layers/spatial_batch_norm.dml deleted file mode 100644 index 6e57b05..0000000 --- a/scripts/staging/SystemML-NN/nn/layers/spatial_batch_norm.dml +++ /dev/null @@ -1,235 +0,0 @@ -#------------------------------------------------------------- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -#------------------------------------------------------------- - -/* - * Spatial Batch Normalization layer. - */ -source("nn/util.dml") as util - -forward = function(matrix[double] X, matrix[double] gamma, matrix[double] beta, - int C, int Hin, int Win, string mode, - matrix[double] ema_mean, matrix[double] ema_var, - double mu, double epsilon) - return (matrix[double] out, matrix[double] ema_mean_upd, matrix[double] ema_var_upd, - matrix[double] cache_mean, matrix[double] cache_var, matrix[double] cache_norm) { - /* - * Computes the forward pass for a spatial batch normalization layer. - * - * A spatial batch normalization layer uses the per-channel sample - * mean and per-channel uncorrected sample variance during training - * to normalize each channel of the input data. Additionally, it - * introduces learnable parameters (gamma, beta) to control the - * amount of normalization. - * - * `y = ((x-mean) / sqrt(var+eps)) * gamma + beta` - * - * This implementation maintains exponential moving averages of the - * mean and variance during training for use during testing. - * - * Reference: - * - Batch Normalization: Accelerating Deep Network Training by - * Reducing Internal Covariate Shift, S. Ioffe & C. Szegedy, 2015 - * - https://arxiv.org/abs/1502.03167 - * - * Inputs: - * - X: Inputs, of shape (N, C*Hin*Win). - * - gamma: Scale parameters, of shape (C, 1). - * - beta: Shift parameters, of shape (C, 1). - * - C: Number of input channels (dimensionality of input depth). - * - Hin: Input height. - * - Win: Input width. - * - mode: 'train' or 'test' to indicate if the model is currently - * being trained or tested. During training, the current batch - * mean and variance will be used to normalize the inputs, while - * during testing, the exponential average of the mean and - * variance over all previous batches will be used. - * - ema_mean: Exponential moving average of the mean, of - * shape (C, 1). - * - ema_var: Exponential moving average of the variance, of - * shape (C, 1). - * - mu: Momentum value for moving averages. - * Typical values are in the range of [0.9, 0.999]. - * - epsilon: Smoothing term to avoid divide by zero errors. - * Typical values are in the range of [1e-5, 1e-3]. - * - * Outputs: - * - out: Outputs, of shape (N, C*Hin*Win). - * - ema_mean_upd: Updated exponential moving average of the mean, - * of shape (C, 1). - * - ema_var_upd: Updated exponential moving average of the variance, - * of shape (C, 1). - * - cache_mean: Cache of the batch mean, of shape (C, 1). - * Note: This is used for performance during training. - * - cache_var: Cache of the batch variance, of shape (C, 1). - * Note: This is used for performance during training. - * - cache_norm: Cache of the normalized inputs, of - * shape (C, N*Hin*Win). Note: This is used for performance - * during training. - */ - N = nrow(X) - - if(mode == 'train') { - # Compute channel-wise mean and variance - # Since we don't have tensors, we will compute the means and variances in a piece-wise fashion. - # - mean of total group is mean of subgroup means - # - variance is the mean of the subgroup variances + the variance of the subgroup means - subgrp_means = matrix(colMeans(X), rows=C, cols=Hin*Win) - subgrp_vars = matrix(colVars(X) * ((N-1)/N), rows=C, cols=Hin*Win) # uncorrected variances - mean = rowMeans(subgrp_means) # shape (C, 1) - var = rowMeans(subgrp_vars) + rowVars(subgrp_means)*(((Hin*Win)-1)/(Hin*Win)) # shape (C, 1) - # Update moving averages - ema_mean_upd = mu*ema_mean + (1-mu)*mean - ema_var_upd = mu*ema_var + (1-mu)*var - } - else { - # Use moving averages of mean and variance during testing - mean = ema_mean - var = ema_var - ema_mean_upd = ema_mean - ema_var_upd = ema_var - } - - # Normalize, shift, and scale - # norm = (X-mean)*(var+epsilon)^(-1/2) - # = (X-mean) / sqrt(var+epsilon) - centered = bias_add(X, -mean) # shape (N, C*Hin*Win) - norm = bias_multiply(centered, 1/sqrt(var+epsilon)) # shape (N, C*Hin*Win) - # out = norm*gamma + beta - scaled = bias_multiply(norm, gamma) # shape (N, C*Hin*Win) - out = bias_add(scaled, beta) # shape (N, C*Hin*Win) - - # Save variable for backward pass - cache_mean = mean - cache_var = var - cache_norm = norm -} - -backward = function(matrix[double] dout, matrix[double] out, - matrix[double] ema_mean_upd, matrix[double] ema_var_upd, - matrix[double] cache_mean, matrix[double] cache_var, matrix[double] cache_norm, - matrix[double] X, matrix[double] gamma, matrix[double] beta, - int C, int Hin, int Win, string mode, - matrix[double] ema_mean, matrix[double] ema_var, - double mu, double epsilon) - return (matrix[double] dX, matrix[double] dgamma, matrix[double] dbeta) { - /* - * Computes the backward pass for a spatial batch normalization layer. - * - * Inputs: - * - dout: Gradient wrt `out` from upstream, of shape (N, C*Hin*Win). - * - out: Outputs from the forward pass, of shape (N, C*Hin*Win). - * - ema_mean_upd: Updated exponential moving average of the mean - * from the forward pass, of shape (C, 1). - * - ema_var_upd: Updated exponential moving average of the variance - * from the forward pass, of shape (C, 1). - * - cache_mean: Cache of the batch mean from the forward pass, of - * shape (C, 1). Note: This is used for performance during - * training. - * - cache_var: Cache of the batch variance from the forward pass, - * of shape (C, 1). Note: This is used for performance during - * training. - * - cache_norm: Cache of the normalized inputs from the forward - * pass, of shape (C, N*Hin*Win). Note: This is used for - * performance during training. - * - X: Input data matrix to the forward pass, of - * shape (N, C*Hin*Win). - * - gamma: Scale parameters, of shape (C, 1). - * - beta: Shift parameters, of shape (C, 1). - * - C: Number of input channels (dimensionality of input depth). - * - Hin: Input height. - * - Win: Input width. - * - mode: 'train' or 'test' to indicate if the model is currently - * being trained or tested. During training, the current batch - * mean and variance will be used to normalize the inputs, while - * during testing, the exponential average of the mean and - * variance over all previous batches will be used. - * - ema_mean: Exponential moving average of the mean, of - * shape (C, 1). - * - ema_var: Exponential moving average of the variance, of - * shape (C, 1). - * - mu: Momentum value for moving averages. - * Typical values are in the range of [0.9, 0.999]. - * - epsilon: Smoothing term to avoid divide by zero errors. - * Typical values are in the range of [1e-5, 1e-3]. - * - * Outputs: - * - dX: Gradient wrt `X`, of shape (N, C*Hin*Win). - * - dgamma: Gradient wrt `W`, of shape (C, 1). - * - dbeta: Gradient wrt `b`, of shape (C, 1). - * - */ - N = nrow(X) - mean = cache_mean - var = cache_var - norm = cache_norm - centered = bias_add(X, -mean) # shape (N, C*Hin*Win) - - if (mode == 'train') { - # Compute gradients during training - dgamma = util::channel_sums(norm*dout, C, Hin, Win) # shape (C, 1) - dbeta = util::channel_sums(dout, C, Hin, Win) # shape (C, 1) - dnorm = bias_multiply(dout, gamma) # shape (N, C*Hin*Win) - dvar = util::channel_sums((-1/2) * bias_multiply(centered, (var+epsilon)^(-3/2)) * dnorm, - C, Hin, Win) # shape (C, 1) - dmean_norm_branch = util::channel_sums(bias_multiply(dnorm, -1/sqrt(var+epsilon)), C, Hin, Win) - dmean_var_branch = util::channel_sums((-2/(N*Hin*Win)) * centered, C, Hin, Win) - dmean_var_branch = dmean_var_branch * dvar # we can't use a function within an expression yet - dmean = dmean_norm_branch + dmean_var_branch # shape (C, 1) - dX_norm_branch = bias_multiply(dnorm, 1/sqrt(var+epsilon)) - dX_mean_branch = (1/(N*Hin*Win)) * bias_add(matrix(0, rows=1, cols=C*Hin*Win), dmean) - dX_var_branch = (2/(N*Hin*Win)) * bias_multiply(centered, dvar) - dX = dX_norm_branch + dX_mean_branch + dX_var_branch # shape (N, C*Hin*Win) - } - else { - # Compute gradients during testing - dgamma = util::channel_sums(norm*dout, C, Hin, Win) # shape (C, 1) - dbeta = util::channel_sums(dout, C, Hin, Win) # shape (C, 1) - dnorm = bias_multiply(dout, gamma) # shape (N, C*Hin*Win) - dX = bias_multiply(dnorm, 1/sqrt(var+epsilon)) # shape (N, C*Hin*Win) - } -} - -init = function(int C) - return (matrix[double] gamma, matrix[double] beta, - matrix[double] ema_mean, matrix[double] ema_var) { - /* - * Initialize the parameters of this layer. - * - * Note: This is just a convenience function, and parameters - * may be initialized manually if needed. - * - * Inputs: - * - C: Number of input channels (dimensionality of input depth). - * - * Outputs: - * - gamma: Scale parameters, of shape (C, 1). - * - beta: Shift parameters, of shape (C, 1). - * - ema_mean: Exponential moving average of the mean, of - * shape (C, 1). - * - ema_var: Exponential moving average of the variance, of - * shape (C, 1). - */ - gamma = matrix(1, rows=C, cols=1) - beta = matrix(0, rows=C, cols=1) - ema_mean = matrix(0, rows=C, cols=1) - ema_var = matrix(1, rows=C, cols=1) -} - http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/f5ef628c/scripts/staging/SystemML-NN/nn/test/grad_check.dml ---------------------------------------------------------------------- diff --git a/scripts/staging/SystemML-NN/nn/test/grad_check.dml b/scripts/staging/SystemML-NN/nn/test/grad_check.dml index 1b42b67..f21811c 100644 --- a/scripts/staging/SystemML-NN/nn/test/grad_check.dml +++ b/scripts/staging/SystemML-NN/nn/test/grad_check.dml @@ -23,7 +23,8 @@ * Gradient checks for various architectures. */ source("nn/layers/affine.dml") as affine -source("nn/layers/batch_norm.dml") as batch_norm +source("nn/layers/batch_norm1d.dml") as batch_norm1d +source("nn/layers/batch_norm2d.dml") as batch_norm2d source("nn/layers/conv2d.dml") as conv2d source("nn/layers/conv2d_builtin.dml") as conv2d_builtin source("nn/layers/cross_entropy_loss.dml") as cross_entropy_loss @@ -40,7 +41,6 @@ source("nn/layers/relu.dml") as relu source("nn/layers/rnn.dml") as rnn source("nn/layers/sigmoid.dml") as sigmoid source("nn/layers/softmax.dml") as softmax -source("nn/layers/spatial_batch_norm.dml") as spatial_batch_norm source("nn/layers/tanh.dml") as tanh source("nn/test/conv2d_simple.dml") as conv2d_simple source("nn/test/max_pool2d_simple.dml") as max_pool2d_simple @@ -125,11 +125,11 @@ affine = function() { } } -batch_norm = function() { +batch_norm1d = function() { /* - * Gradient check for the batch normalization layer. + * Gradient check for the 1D batch normalization layer. */ - print("Grad checking the batch normalization layer with L2 loss.") + print("Grad checking the 1D batch normalization layer with L2 loss.") # Generate data N = 3 # num examples @@ -142,7 +142,7 @@ batch_norm = function() { beta = rand(rows=1, cols=D) ema_mean = rand(rows=1, cols=D) ema_var = rand(rows=1, cols=D) - #[dummy, dummy, ema_mean, ema_var] = batch_norm::init(D) + #[dummy, dummy, ema_mean, ema_var] = batch_norm1d::init(D) # Check training & testing modes for (i in 1:2) { @@ -154,11 +154,11 @@ batch_norm = function() { # Compute analytical gradients of loss wrt parameters [out, ema_mean_upd, ema_var_upd, cache_mean, cache_var, cache_norm] = - batch_norm::forward(X, gamma, beta, mode, ema_mean, ema_var, mu, eps) + batch_norm1d::forward(X, gamma, beta, mode, ema_mean, ema_var, mu, eps) dout = l2_loss::backward(out, y) - [dX, dgamma, dbeta] = batch_norm::backward(dout, out, ema_mean_upd, ema_var_upd, - cache_mean, cache_var, cache_norm, - X, gamma, beta, mode, ema_mean, ema_var, mu, eps) + [dX, dgamma, dbeta] = batch_norm1d::backward(dout, out, ema_mean_upd, ema_var_upd, + cache_mean, cache_var, cache_norm, + X, gamma, beta, mode, ema_mean, ema_var, mu, eps) # Grad check h = 1e-5 @@ -169,11 +169,11 @@ batch_norm = function() { old = as.scalar(X[i,j]) X[i,j] = old - h [outmh, ema_mean_upd, ema_var_upd, cache_mean, cache_var, cache_norm] = - batch_norm::forward(X, gamma, beta, mode, ema_mean, ema_var, mu, eps) + batch_norm1d::forward(X, gamma, beta, mode, ema_mean, ema_var, mu, eps) lossmh = l2_loss::forward(outmh, y) X[i,j] = old + h [outph, ema_mean_upd, ema_var_upd, cache_mean, cache_var, cache_norm] = - batch_norm::forward(X, gamma, beta, mode, ema_mean, ema_var, mu, eps) + batch_norm1d::forward(X, gamma, beta, mode, ema_mean, ema_var, mu, eps) lossph = l2_loss::forward(outph, y) X[i,j] = old # reset dX_num = (lossph-lossmh) / (2*h) # numerical derivative @@ -190,11 +190,11 @@ batch_norm = function() { old = as.scalar(gamma[i,j]) gamma[i,j] = old - h [outmh, ema_mean_upd, ema_var_upd, cache_mean, cache_var, cache_norm] = - batch_norm::forward(X, gamma, beta, mode, ema_mean, ema_var, mu, eps) + batch_norm1d::forward(X, gamma, beta, mode, ema_mean, ema_var, mu, eps) lossmh = l2_loss::forward(outmh, y) gamma[i,j] = old + h [outph, ema_mean_upd, ema_var_upd, cache_mean, cache_var, cache_norm] = - batch_norm::forward(X, gamma, beta, mode, ema_mean, ema_var, mu, eps) + batch_norm1d::forward(X, gamma, beta, mode, ema_mean, ema_var, mu, eps) lossph = l2_loss::forward(outph, y) gamma[i,j] = old # reset dgamma_num = (lossph-lossmh) / (2*h) # numerical derivative @@ -212,11 +212,11 @@ batch_norm = function() { old = as.scalar(beta[i,j]) beta[i,j] = old - h [outmh, ema_mean_upd, ema_var_upd, cache_mean, cache_var, cache_norm] = - batch_norm::forward(X, gamma, beta, mode, ema_mean, ema_var, mu, eps) + batch_norm1d::forward(X, gamma, beta, mode, ema_mean, ema_var, mu, eps) lossmh = l2_loss::forward(outmh, y) beta[i,j] = old + h [outph, ema_mean_upd, ema_var_upd, cache_mean, cache_var, cache_norm] = - batch_norm::forward(X, gamma, beta, mode, ema_mean, ema_var, mu, eps) + batch_norm1d::forward(X, gamma, beta, mode, ema_mean, ema_var, mu, eps) lossph = l2_loss::forward(outph, y) beta[i,j] = old # reset dbeta_num = (lossph-lossmh) / (2*h) # numerical derivative @@ -1276,11 +1276,11 @@ softmax = function() { } } -spatial_batch_norm = function() { +batch_norm2d = function() { /* - * Gradient check for the spatial batch normalization layer. + * Gradient check for the 2D (spatial) batch normalization layer. */ - print("Grad checking the spatial batch normalization layer with L2 loss.") + print("Grad checking the 2D (spatial) batch normalization layer with L2 loss.") # Generate data N = 3 # num examples @@ -1296,7 +1296,7 @@ spatial_batch_norm = function() { beta = rand(rows=C, cols=1) ema_mean = rand(rows=C, cols=1) ema_var = rand(rows=C, cols=1) - #[dummy, dummy, ema_mean, ema_var] = spatial_batch_norm::init(C) + #[dummy, dummy, ema_mean, ema_var] = batch_norm2d::init(C) # Check training & testing modes for (i in 1:2) { @@ -1308,12 +1308,12 @@ spatial_batch_norm = function() { # Compute analytical gradients of loss wrt parameters [out, ema_mean_upd, ema_var_upd, cache_mean, cache_var, cache_norm] = - spatial_batch_norm::forward(X, gamma, beta, C, Hin, Win, mode, ema_mean, ema_var, mu, eps) + batch_norm2d::forward(X, gamma, beta, C, Hin, Win, mode, ema_mean, ema_var, mu, eps) dout = l2_loss::backward(out, y) - [dX, dgamma, dbeta] = spatial_batch_norm::backward(dout, out, ema_mean_upd, ema_var_upd, - cache_mean, cache_var, cache_norm, - X, gamma, beta, C, Hin, Win, mode, - ema_mean, ema_var, mu, eps) + [dX, dgamma, dbeta] = batch_norm2d::backward(dout, out, ema_mean_upd, ema_var_upd, + cache_mean, cache_var, cache_norm, + X, gamma, beta, C, Hin, Win, mode, + ema_mean, ema_var, mu, eps) # Grad check h = 1e-5 @@ -1324,13 +1324,11 @@ spatial_batch_norm = function() { old = as.scalar(X[i,j]) X[i,j] = old - h [outmh, ema_mean_upd, ema_var_upd, cache_mean, cache_var, cache_norm] = - spatial_batch_norm::forward(X, gamma, beta, C, Hin, Win, mode, - ema_mean, ema_var, mu, eps) + batch_norm2d::forward(X, gamma, beta, C, Hin, Win, mode, ema_mean, ema_var, mu, eps) lossmh = l2_loss::forward(outmh, y) X[i,j] = old + h [outph, ema_mean_upd, ema_var_upd, cache_mean, cache_var, cache_norm] = - spatial_batch_norm::forward(X, gamma, beta, C, Hin, Win, mode, - ema_mean, ema_var, mu, eps) + batch_norm2d::forward(X, gamma, beta, C, Hin, Win, mode, ema_mean, ema_var, mu, eps) lossph = l2_loss::forward(outph, y) X[i,j] = old # reset dX_num = (lossph-lossmh) / (2*h) # numerical derivative @@ -1347,13 +1345,11 @@ spatial_batch_norm = function() { old = as.scalar(gamma[i,j]) gamma[i,j] = old - h [outmh, ema_mean_upd, ema_var_upd, cache_mean, cache_var, cache_norm] = - spatial_batch_norm::forward(X, gamma, beta, C, Hin, Win, mode, - ema_mean, ema_var, mu, eps) + batch_norm2d::forward(X, gamma, beta, C, Hin, Win, mode, ema_mean, ema_var, mu, eps) lossmh = l2_loss::forward(outmh, y) gamma[i,j] = old + h [outph, ema_mean_upd, ema_var_upd, cache_mean, cache_var, cache_norm] = - spatial_batch_norm::forward(X, gamma, beta, C, Hin, Win, mode, - ema_mean, ema_var, mu, eps) + batch_norm2d::forward(X, gamma, beta, C, Hin, Win, mode, ema_mean, ema_var, mu, eps) lossph = l2_loss::forward(outph, y) gamma[i,j] = old # reset dgamma_num = (lossph-lossmh) / (2*h) # numerical derivative @@ -1371,13 +1367,11 @@ spatial_batch_norm = function() { old = as.scalar(beta[i,j]) beta[i,j] = old - h [outmh, ema_mean_upd, ema_var_upd, cache_mean, cache_var, cache_norm] = - spatial_batch_norm::forward(X, gamma, beta, C, Hin, Win, mode, - ema_mean, ema_var, mu, eps) + batch_norm2d::forward(X, gamma, beta, C, Hin, Win, mode, ema_mean, ema_var, mu, eps) lossmh = l2_loss::forward(outmh, y) beta[i,j] = old + h [outph, ema_mean_upd, ema_var_upd, cache_mean, cache_var, cache_norm] = - spatial_batch_norm::forward(X, gamma, beta, C, Hin, Win, mode, - ema_mean, ema_var, mu, eps) + batch_norm2d::forward(X, gamma, beta, C, Hin, Win, mode, ema_mean, ema_var, mu, eps) lossph = l2_loss::forward(outph, y) beta[i,j] = old # reset dbeta_num = (lossph-lossmh) / (2*h) # numerical derivative http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/f5ef628c/scripts/staging/SystemML-NN/nn/test/run_tests.dml ---------------------------------------------------------------------- diff --git a/scripts/staging/SystemML-NN/nn/test/run_tests.dml b/scripts/staging/SystemML-NN/nn/test/run_tests.dml index dc53cb9..0279363 100644 --- a/scripts/staging/SystemML-NN/nn/test/run_tests.dml +++ b/scripts/staging/SystemML-NN/nn/test/run_tests.dml @@ -37,7 +37,8 @@ tmp = grad_check::log_loss() # Other layers tmp = grad_check::affine() -tmp = grad_check::batch_norm() +tmp = grad_check::batch_norm1d() +tmp = grad_check::batch_norm2d() tmp = grad_check::conv2d_simple() tmp = grad_check::conv2d() tmp = grad_check::conv2d_builtin() @@ -52,7 +53,6 @@ tmp = grad_check::relu() tmp = grad_check::rnn() tmp = grad_check::sigmoid() tmp = grad_check::softmax() -tmp = grad_check::spatial_batch_norm() tmp = grad_check::tanh() # Example model @@ -69,13 +69,13 @@ print("") print("Starting other tests.") print("---") -tmp = test::batch_norm() +tmp = test::batch_norm1d() +tmp = test::batch_norm2d() tmp = test::im2col() tmp = test::padding() tmp = test::conv2d() tmp = test::cross_entropy_loss() tmp = test::max_pool2d() -tmp = test::spatial_batch_norm() tmp = test::tanh() print("---") http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/f5ef628c/scripts/staging/SystemML-NN/nn/test/test.dml ---------------------------------------------------------------------- diff --git a/scripts/staging/SystemML-NN/nn/test/test.dml b/scripts/staging/SystemML-NN/nn/test/test.dml index 958c2c5..3928fac 100644 --- a/scripts/staging/SystemML-NN/nn/test/test.dml +++ b/scripts/staging/SystemML-NN/nn/test/test.dml @@ -22,24 +22,24 @@ /* * Various tests, not including gradient checks. */ -source("nn/layers/batch_norm.dml") as batch_norm +source("nn/layers/batch_norm1d.dml") as batch_norm1d +source("nn/layers/batch_norm2d.dml") as batch_norm2d source("nn/layers/conv2d.dml") as conv2d source("nn/layers/conv2d_builtin.dml") as conv2d_builtin source("nn/layers/cross_entropy_loss.dml") as cross_entropy_loss source("nn/layers/max_pool2d.dml") as max_pool2d source("nn/layers/max_pool2d_builtin.dml") as max_pool2d_builtin -source("nn/layers/spatial_batch_norm.dml") as spatial_batch_norm source("nn/layers/tanh.dml") as tanh source("nn/test/conv2d_simple.dml") as conv2d_simple source("nn/test/max_pool2d_simple.dml") as max_pool2d_simple source("nn/test/util.dml") as test_util source("nn/util.dml") as util -batch_norm = function() { +batch_norm1d = function() { /* - * Test for the batch normalization function. + * Test for the 1D batch normalization function. */ - print("Testing the batch normalization function.") + print("Testing the 1D batch normalization function.") # Generate data N = 4 # Number of examples @@ -50,11 +50,11 @@ batch_norm = function() { X = matrix(seq(1,16), rows=N, cols=D) # Create layer - [gamma, beta, ema_mean, ema_var] = batch_norm::init(D) + [gamma, beta, ema_mean, ema_var] = batch_norm1d::init(D) # Forward [out, ema_mean_upd, ema_var_upd, cache_mean, cache_var, cache_norm] = - batch_norm::forward(X, gamma, beta, mode, ema_mean, ema_var, mu, eps) + batch_norm1d::forward(X, gamma, beta, mode, ema_mean, ema_var, mu, eps) # Equivalency check target = matrix("-1.34160721 -1.34160721 -1.34160733 -1.34160709 @@ -428,11 +428,11 @@ max_pool2d = function() { tmp = test_util::check_all_equal(out_builtin, target) } -spatial_batch_norm = function() { +batch_norm2d = function() { /* - * Test for the spatial batch normalization function. + * Test for the 2D (spatial) batch normalization function. */ - print("Testing the spatial batch normalization function.") + print("Testing the 2D (spatial) batch normalization function.") # Generate data N = 2 # Number of examples @@ -474,11 +474,11 @@ spatial_batch_norm = function() { 55 58 52 0 99", rows=N, cols=C*Hin*Win) # Create layer - [gamma, beta, ema_mean, ema_var] = spatial_batch_norm::init(C) + [gamma, beta, ema_mean, ema_var] = batch_norm2d::init(C) # Forward [out, ema_mean_upd, ema_var_upd, cache_mean, cache_var, cache_norm] = - spatial_batch_norm::forward(X, gamma, beta, C, Hin, Win, mode, ema_mean, ema_var, mu, eps) + batch_norm2d::forward(X, gamma, beta, C, Hin, Win, mode, ema_mean, ema_var, mu, eps) # Equivalency check target = matrix("0.86215019 -0.76679718 -1.00517964 0.26619387 0.94161105
