[MINOR] Update `nn` library formatting
Project: http://git-wip-us.apache.org/repos/asf/systemml/repo Commit: http://git-wip-us.apache.org/repos/asf/systemml/commit/d49ab981 Tree: http://git-wip-us.apache.org/repos/asf/systemml/tree/d49ab981 Diff: http://git-wip-us.apache.org/repos/asf/systemml/diff/d49ab981 Branch: refs/heads/master Commit: d49ab98116054c98aa65a1f0d173a9c181b1f32f Parents: 324bea5 Author: Mike Dusenberry <[email protected]> Authored: Thu Jun 15 15:20:47 2017 -0700 Committer: Mike Dusenberry <[email protected]> Committed: Thu Jun 15 15:20:47 2017 -0700 ---------------------------------------------------------------------- scripts/nn/examples/mnist_lenet.dml | 4 +- scripts/nn/layers/conv2d_transpose.dml | 149 +++++++++++++--------------- 2 files changed, 71 insertions(+), 82 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/systemml/blob/d49ab981/scripts/nn/examples/mnist_lenet.dml ---------------------------------------------------------------------- diff --git a/scripts/nn/examples/mnist_lenet.dml b/scripts/nn/examples/mnist_lenet.dml index 986d797..57b8ba6 100644 --- a/scripts/nn/examples/mnist_lenet.dml +++ b/scripts/nn/examples/mnist_lenet.dml @@ -114,8 +114,8 @@ train = function(matrix[double] X, matrix[double] Y, # Compute forward pass ## layer 1: conv1 -> relu1 -> pool1 - [outc1, Houtc1, Woutc1] = conv2d::forward(X_batch, W1, b1, C, Hin, Win, Hf, Wf, stride, stride, - pad, pad) + [outc1, Houtc1, Woutc1] = conv2d::forward(X_batch, W1, b1, C, Hin, Win, Hf, Wf, + stride, stride, pad, pad) outr1 = relu::forward(outc1) [outp1, Houtp1, Woutp1] = max_pool2d::forward(outr1, F1, Houtc1, Woutc1, Hf=2, Wf=2, strideh=2, stridew=2, pad=0, pad=0) http://git-wip-us.apache.org/repos/asf/systemml/blob/d49ab981/scripts/nn/layers/conv2d_transpose.dml ---------------------------------------------------------------------- diff --git a/scripts/nn/layers/conv2d_transpose.dml b/scripts/nn/layers/conv2d_transpose.dml index 0838563..eee19a5 100644 --- a/scripts/nn/layers/conv2d_transpose.dml +++ b/scripts/nn/layers/conv2d_transpose.dml @@ -63,25 +63,24 @@ forward = function(matrix[double] X, matrix[double] W, matrix[double] b, Hout = strideh * (Hin-1) - 2*padh + Hf + out_padh Wout = stridew * (Win-1) - 2*padw + Wf + out_padw - /* - * Transpose convolution aims to go in the other direction of - * (direct) convolution, i.e., given input X, produce output O such - * that running convolution on O recovers X. This is achieved by - * conv2d_backward_data (since the derivative wrt data must produce - * output of same size as the input to conv2d). By reusing a built-in - * operator we achieve efficiency and restrict the number of built-in - * operators to manageable levels. Plus, most other deep-learning - * packages make use of the same strategy which means this - * implementation of transpose convolution is 'in-sync' with them. - * - * One potential downside of reusing conv2d_backward_data is the fact - * that it rotates the filter by 180 degrees before applying it. This - * needs to be kept in mind when interpreting the output of transpose - * convolution. - */ + # Transpose convolution aims to go in the other direction of + # (direct) convolution, i.e., given input X, produce output O such + # that running convolution on O recovers X. This is achieved by + # conv2d_backward_data (since the derivative wrt data must produce + # output of same size as the input to conv2d). By reusing a built-in + # operator we achieve efficiency and restrict the number of built-in + # operators to manageable levels. Plus, most other deep-learning + # packages make use of the same strategy which means this + # implementation of transpose convolution is 'in-sync' with them. + # + # One potential downside of reusing conv2d_backward_data is the fact + # that it rotates the filter by 180 degrees before applying it. This + # needs to be kept in mind when interpreting the output of transpose + # convolution. out = conv2d_backward_data(W, X, stride=[strideh,stridew], padding=[padh,padw], input_shape=[N,F,Hout,Wout], filter_shape=[C,F,Hf,Wf]) + # Add bias term to each output filter out = bias_add(out, b) } @@ -120,36 +119,33 @@ backward = function(matrix[double] dout, int Hout, int Wout, N = nrow(X) F = nrow(b) - /* - * conv2d_backward_filter takes the input and delta map as first and - * second args, respectively. Given that we need to compute the - * grad (wrt to filter) for transpose convolution where the roles of - * the input and output are reversed, we reverse the order of the - * args (along with setting input_shape to the delta map shape). - * Effectively, we are running a direct convolution with X as the - * filter and the dout as the input. To convince oneself that the - * interconnections between the cells of the filter, input and delta - * map are preserved please keep in mind that the forward of - * convolution transpose rotates the filter by 180 degrees before - * applying it. - */ + # conv2d_backward_filter takes the input and delta map as first and + # second args, respectively. Given that we need to compute the + # grad (wrt to filter) for transpose convolution where the roles of + # the input and output are reversed, we reverse the order of the + # args (along with setting input_shape to the delta map shape). + # Effectively, we are running a direct convolution with X as the + # filter and the dout as the input. To convince oneself that the + # interconnections between the cells of the filter, input and delta + # map are preserved please keep in mind that the forward of + # convolution transpose rotates the filter by 180 degrees before + # applying it. dW = conv2d_backward_filter(dout, X, stride=[strideh,stridew], padding=[padh,padw], input_shape=[N,F,Hout,Wout], filter_shape=[C,F,Hf,Wf]) - /* - * Since the forward for transpose convolution makes a call to - * conv2d_backward_data, to compute its derivative wrt to data - * we can run conv2d by applying the filter on the delta - * map (this makes sense because convolution transpose is the - * 'reverse' of convolution). Its easy to see that this will produce - * output of the required size. To convince oneself that conv2d will - * respect the interconnections between the cells in the delta map - * and the filter, keep in mind that the forward function rotates the - * filter by 180 degrees before applying it. - */ + # Since the forward for transpose convolution makes a call to + # conv2d_backward_data, to compute its derivative wrt to data + # we can run conv2d by applying the filter on the delta + # map (this makes sense because convolution transpose is the + # 'reverse' of convolution). Its easy to see that this will produce + # output of the required size. To convince oneself that conv2d will + # respect the interconnections between the cells in the delta map + # and the filter, keep in mind that the forward function rotates the + # filter by 180 degrees before applying it. dX = conv2d(dout, W, input_shape=[N,F,Hout,Wout], filter_shape=[C,F,Hf,Wf], stride=[strideh,stridew], padding=[padh,padw]) + # Partial derivatives for bias vector db = rowSums(matrix(colSums(dout), rows=F, cols=Hout*Wout)) } @@ -202,47 +198,40 @@ init_bilinear = function(int C, int K) vect = 1 - abs(seq(0, K-1) / factor_up - center) weights = matrix(vect %*% t(vect), rows=1, cols=K*K) - /* - * To create a multi-channel channel-independent upsampling filter, - * we need to intersperse the filter weights with 0s. For instance, - * consider the case of 2X upsampling. In this case, K=4 and we have - * K^2=16 weights to include into the 3D tensor representing the - * filter which should look like the following (assuming 3 channels): - * - * <-16 weights-> <---------32 0s---------> - * X X ...... X X 0 0 0 ............. 0 0 0 - * 0 .......... 0 X X .... X X 0 ...... 0 0 - * 0 0 0 ............... 0 0 0 X X .... X X - * - * To be clear, the second row should have 16 0s followed by 16 - * weights followed by 16 0s. - * - * To create the above filter, we take advantage of the fact that - * between two sets of non-zero weights, there is always a sequence - * of C*K*K 0s. In the above example, C*K^2 = 48 (e.g., 32 trailing - * 0s in the first row and 16 leading 0s in the second row). - * - * Note that, in the special case of C=1 we do not need to - * intersperse with 0s (no question of being channel-wise independent - * since we have only 1 channel). - */ - #if(C > 1){ - /* - * Append C*K*K trailing 0s to the K*K kernel and replicate the - * resulting row C times - */ - repl_weights = matrix(1, rows=C, cols=1) %*% cbind(weights, matrix(0, rows=1, cols=C*K*K)) + # To create a multi-channel channel-independent upsampling filter, + # we need to intersperse the filter weights with 0s. For instance, + # consider the case of 2X upsampling. In this case, K=4 and we have + # K^2=16 weights to include into the 3D tensor representing the + # filter which should look like the following (assuming 3 channels): + # + # <-16 weights-> <---------32 0s---------> + # X X ...... X X 0 0 0 ............. 0 0 0 + # 0 .......... 0 X X .... X X 0 ...... 0 0 + # 0 0 0 ............... 0 0 0 X X .... X X + # + # To be clear, the second row should have 16 0s followed by 16 + # weights followed by 16 0s. + # + # To create the above filter, we take advantage of the fact that + # between two sets of non-zero weights, there is always a sequence + # of C*K*K 0s. In the above example, C*K^2 = 48 (e.g., 32 trailing + # 0s in the first row and 16 leading 0s in the second row). + # + # Note that, in the special case of C=1 we do not need to + # intersperse with 0s (no question of being channel-wise independent + # since we have only 1 channel). + + # Append C*K*K trailing 0s to the K*K kernel and replicate the + # resulting row C times + repl_weights = matrix(1, rows=C, cols=1) %*% cbind(weights, matrix(0, rows=1, cols=C*K*K)) - /* - * The above operation added extra C*K*K trailing 0s in the last row - * that we do not need. Thus, we need to: - * 1) reshape the resulting matrix into a row - * 2) 'Clip off' the last few 0s using indexing and reshape the - * result into the expected filter shape ([C, C, K, K]) - */ - repl_weights_row = matrix(repl_weights, rows=1, cols=C*(C+1)*K^2) - W = matrix(repl_weights_row[1,1:(C*K)^2], rows=C, cols=C*K^2) - #}else W = weights + # The above operation added extra C*K*K trailing 0s in the last row + # that we do not need. Thus, we need to: + # 1) reshape the resulting matrix into a row + # 2) 'Clip off' the last few 0s using indexing and reshape the + # result into the expected filter shape ([C, C, K, K]) + repl_weights_row = matrix(repl_weights, rows=1, cols=C*(C+1)*K^2) + W = matrix(repl_weights_row[1,1:(C*K)^2], rows=C, cols=C*K^2) b = matrix(0, rows=C, cols=1) }
