[SYSTEMML-445] Added builtin functions for efficient computation of batch normalization and lstm layers.
- Following builtin functions are added: lstm, batch_norm2d and batch_norm2d_backward. - The DML language documentation and the NN layers are also updated. - Since the builtin function for lstm backward data/weights is not added in this commit, the nn layer for lstm is not updated. Instead a new lstm_staging.dml is added, which will eventually replace lstm.dml. - The above builtin functions are only supported on GPU via CuDNN. The CP and Spark implementation will be added in subsequent commits. Closes #773. Project: http://git-wip-us.apache.org/repos/asf/systemml/repo Commit: http://git-wip-us.apache.org/repos/asf/systemml/commit/f5ae0596 Tree: http://git-wip-us.apache.org/repos/asf/systemml/tree/f5ae0596 Diff: http://git-wip-us.apache.org/repos/asf/systemml/diff/f5ae0596 Branch: refs/heads/gh-pages Commit: f5ae0596d60798930ad06988ad00a5dddc0069a5 Parents: c130c47 Author: Niketan Pansare <[email protected]> Authored: Fri Jun 1 10:49:46 2018 -0700 Committer: Niketan Pansare <[email protected]> Committed: Fri Jun 1 10:49:46 2018 -0700 ---------------------------------------------------------------------- dml-language-reference.md | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/systemml/blob/f5ae0596/dml-language-reference.md ---------------------------------------------------------------------- diff --git a/dml-language-reference.md b/dml-language-reference.md index b4ed9c8..3212806 100644 --- a/dml-language-reference.md +++ b/dml-language-reference.md @@ -1511,16 +1511,18 @@ The images are assumed to be stored NCHW format, where N = batch size, C = #chan Hence, the images are internally represented as a matrix with dimension (N, C * H * W). -| Function name | Input matrices | Dimension of first input matrix | Dimension of second input matrix (if applicable) | Dimension of output matrix | Input Parameters | Notes | -|---------------------------------------------|----------------|-----------------------------------------------------------|-----------------------------------------------------------|------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------| -| conv2d | input, filter | [batch_size X num_channels* height_image* width_image] | [num_filters X num_channels* height_filter* width_filter] | [batch_size X num_channels_out* height_out* width_out] | stride=[stride_h, stride_w], padding=[pad_h, pad_w], input_shape=[batch_size, num_channels, height_image, width_image], filter_shape=[num_filters, num_channels, height_filter, width_filter] | Performs 2D convolution operation | -| conv2d_backward_filter | input, dout | [batch_size X num_channels* height_image* width_image] | [batch_size X num_channels_out* height_out* width_out] | [num_filters X num_channels* height_filter* width_filter] | stride=[stride_h, stride_w], padding=[pad_h, pad_w], input_shape=[batch_size, num_channels, height_image, width_image], filter_shape=[num_filters, num_channels, height_filter, width_filter] | Computes the gradients wrt filter of 2D convolution | -| conv2d_backward_data | filter, dout | [num_filters X num_channels* height_filter* width_filter] | [batch_size X num_channels_out* height_out* width_out] | [batch_size X num_channels* height_image* width_image] | stride=[stride_h, stride_w], padding=[pad_h, pad_w], input_shape=[batch_size, num_channels, height_image, width_image], filter_shape=[num_filters, num_channels, height_filter, width_filter] | Computes the gradients wrt input of 2D convolution | -| max_pool, avg_pool | input | [batch_size X num_channels* height_image* width_image] | | [batch_size X num_channels* height_out* width_out] | stride=[stride_h, stride_w], padding=[pad_h, pad_w], input_shape=[batch_size, num_channels, height_image, width_image], pool_size=[height_pool, width_pool] | Performs max/average pooling operation | -| max_pool_backward, avg_pool_backward | input, dout | [batch_size X num_channels* height_image* width_image] | [batch_size X num_channels* height_out* width_out] | [batch_size X num_channels* height_image* width_image] | stride=[stride_h, stride_w], padding=[pad_h, pad_w], input_shape=[batch_size, num_channels, height_image, width_image], pool_size=[height_pool, width_pool] | Computes the gradients wrt input of 2D max pooling, average pooling | -| bias_add | input, bias | [batch_size X num_channels* height_image* width_image] | [num_channels X 1] | [batch_size X num_channels* height_image* width_image] | | Adds the bias (row vector of size num_channels) to input with the given num_channels | -| bias_multiply | input, bias | [batch_size X num_channels* height_image* width_image] | [num_channels X 1] | [batch_size X num_channels* height_image* width_image] | | Multiplies the bias (row vector of size num_channels) to input with the given num_channels | - +| Function name | Input matrices | Dimension of first input matrix | Dimension of second input matrix (if applicable) | Dimension of (first) output matrix | Input Parameters | Notes | +|---------------------------------------------|--------------------------|-----------------------------------------------------------|-----------------------------------------------------------|---------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------| +| conv2d | input, filter | [batch_size X num_channels* height_image* width_image] | [num_filters X num_channels* height_filter* width_filter] | [batch_size X num_channels_out* height_out* width_out] | stride=[stride_h, stride_w], padding=[pad_h, pad_w], input_shape=[batch_size, num_channels, height_image, width_image], filter_shape=[num_filters, num_channels, height_filter, width_filter] | Performs 2D convolution operation | +| conv2d_backward_filter | input, dout | [batch_size X num_channels* height_image* width_image] | [batch_size X num_channels_out* height_out* width_out] | [num_filters X num_channels* height_filter* width_filter] | stride=[stride_h, stride_w], padding=[pad_h, pad_w], input_shape=[batch_size, num_channels, height_image, width_image], filter_shape=[num_filters, num_channels, height_filter, width_filter] | Computes the gradients wrt filter of 2D convolution | +| conv2d_backward_data | filter, dout | [num_filters X num_channels* height_filter* width_filter] | [batch_size X num_channels_out* height_out* width_out] | [batch_size X num_channels* height_image* width_image] | stride=[stride_h, stride_w], padding=[pad_h, pad_w], input_shape=[batch_size, num_channels, height_image, width_image], filter_shape=[num_filters, num_channels, height_filter, width_filter] | Computes the gradients wrt input of 2D convolution | +| max_pool, avg_pool | input | [batch_size X num_channels* height_image* width_image] | | [batch_size X num_channels* height_out* width_out] | stride=[stride_h, stride_w], padding=[pad_h, pad_w], input_shape=[batch_size, num_channels, height_image, width_image], pool_size=[height_pool, width_pool] | Performs max/average pooling operation | +| max_pool_backward, avg_pool_backward | input, dout | [batch_size X num_channels* height_image* width_image] | [batch_size X num_channels* height_out* width_out] | [batch_size X num_channels* height_image* width_image] | stride=[stride_h, stride_w], padding=[pad_h, pad_w], input_shape=[batch_size, num_channels, height_image, width_image], pool_size=[height_pool, width_pool] | Computes the gradients wrt input of 2D max pooling, average pooling | +| bias_add | input, bias | [batch_size X num_channels* height_image* width_image] | [num_channels X 1] | [batch_size X num_channels* height_image* width_image] | | Adds the bias (row vector of size num_channels) to input with the given num_channels | +| bias_multiply | input, bias | [batch_size X num_channels* height_image* width_image] | [num_channels X 1] | [batch_size X num_channels* height_image* width_image] | | Multiplies the bias (row vector of size num_channels) to input with the given num_channels | +| lstm | X, W, bias, out0, c0 | [batch_size X seq_length*num_features] | [num_features+hidden_size X 4*hidden_size] | [batch_size X seq_length*hidden_size] if return_sequences else [batch_size X hidden_size] | return_sequences | Perform computation for single-layer unidirectional LSTM (outputs: out, carryOut, reserveSpace) | +| batch_norm2d | input | [batch_size X num_channels* height_image* width_image] | | [batch_size X num_channels* height_image* width_image] | scale, shift, exponentialMovingAverage_Mean, exponentialMovingAverage_Variance, mode, epsilon, momentum | Performs batch normalization operation (outputs: updated exponential moving average mean and variance, cache of the batch mean and variance) | +| batch_norm2d_backward | input, dout | [batch_size X num_channels* height_image* width_image] | [batch_size X num_channels* height_image* width_image] | [batch_size X num_channels* height_image* width_image] | scale, epsilon, cache_mean (from forward), cache_inv_var (from forward) | Computed backpropagation error for batch normalization operation | Examples:
