[SYSTEMML-1493] [SYSTEMML-1500] Added TanH and Euclidean Loss in Caffe2DML - Added the reference documentation - Added a new layer softmax_loss.dml - Added compute_loss_accuracy to l2_loss.dml - Updated tanh layer to invoke newly added builtin function
Closes #672. Project: http://git-wip-us.apache.org/repos/asf/systemml/repo Commit: http://git-wip-us.apache.org/repos/asf/systemml/commit/61dcc85e Tree: http://git-wip-us.apache.org/repos/asf/systemml/tree/61dcc85e Diff: http://git-wip-us.apache.org/repos/asf/systemml/diff/61dcc85e Branch: refs/heads/master Commit: 61dcc85e48a390c1bb63ee4c42aad9a3fade7d06 Parents: b5ef21f Author: Niketan Pansare <npan...@us.ibm.com> Authored: Thu Sep 28 10:54:56 2017 -0700 Committer: Niketan Pansare <npan...@us.ibm.com> Committed: Thu Sep 28 10:56:03 2017 -0700 ---------------------------------------------------------------------- docs/beginners-guide-caffe2dml.md | 539 +--------- docs/index.md | 4 +- docs/reference-guide-caffe2dml.md | 986 +++++++++++++++++++ scripts/nn/layers/l2_loss.dml | 1 - scripts/nn/layers/tanh.dml | 15 +- scripts/nn/test/run_tests.dml | 2 + scripts/nn/test/test.dml | 53 + .../org/apache/sysml/api/dl/CaffeLayer.scala | 131 ++- .../org/apache/sysml/api/dl/CaffeNetwork.scala | 3 + 9 files changed, 1171 insertions(+), 563 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/systemml/blob/61dcc85e/docs/beginners-guide-caffe2dml.md ---------------------------------------------------------------------- diff --git a/docs/beginners-guide-caffe2dml.md b/docs/beginners-guide-caffe2dml.md index 220e02c..4d6b7fd 100644 --- a/docs/beginners-guide-caffe2dml.md +++ b/docs/beginners-guide-caffe2dml.md @@ -27,12 +27,10 @@ limitations under the License. <br/> -## Introduction - -Caffe2DML is an **experimental API** that converts an Caffe specification to DML. +Caffe2DML is an **experimental API** that converts a Caffe specification to DML. It is designed to fit well into the mllearn framework and hence supports NumPy, Pandas as well as PySpark DataFrame. -### Training Lenet +# Training Lenet To create a Caffe2DML object, one needs to create a solver and network file that conforms to the [Caffe specification](http://caffe.berkeleyvision.org/). @@ -148,7 +146,7 @@ Iter:2000, validation loss:173.66147359346, validation accuracy:97.4897540983606 0.97399999999999998 ``` -### Additional Configuration +# Additional Configuration - Print the generated DML script along with classification report: `lenet.set(debug=True)` - Print the heavy hitters instruction and the execution plan (advanced users): `lenet.setStatistics(True).setExplain(True)` @@ -168,7 +166,7 @@ and `allreduce`). Here are some common settings: | Distributed prediction | `lenet.set(test_algo="allreduce")` | | | Distributed synchronous training | `lenet.set(train_algo="allreduce_parallel_batches", parallel_batches=num_cluster_cores)` | Ensure that `batch_size` is set to appropriate value (for example: 64) | -### Saving the trained model +# Saving the trained model ```python lenet.fit(X_train, y_train) @@ -178,7 +176,7 @@ new_lenet.load('trained_weights') new_lenet.score(X_test, y_test) ``` -### Loading a pretrained caffemodel +# Loading a pretrained caffemodel We provide a converter utility to convert `.caffemodel` trained using Caffe to SystemML format. @@ -210,529 +208,4 @@ vgg.predict(X_test) # OR Fine-Tuning: vgg.fit(X_train, y_train) ``` -## Frequently asked questions - -#### What is the purpose of Caffe2DML API ? - -Most deep learning experts are more likely to be familiar with the Caffe's specification -rather than DML language. For these users, the Caffe2DML API reduces the learning curve to using SystemML. -Instead of requiring the users to write a DML script for training, fine-tuning and testing the model, -Caffe2DML takes as an input a network and solver specified in the Caffe specification -and automatically generates the corresponding DML. - -#### With Caffe2DML, does SystemML now require Caffe to be installed ? - -Absolutely not. We only support Caffe's API for convenience of the user as stated above. -Since the Caffe's API is specified in the protobuf format, we are able to generate the java parser files -and donot require Caffe to be installed. This is also true for Tensorboard feature of Caffe2DML. - -``` -Dml.g4 ---> antlr ---> DmlLexer.java, DmlListener.java, DmlParser.java ---> parse foo.dml -caffe.proto ---> protoc ---> target/generated-sources/caffe/Caffe.java ---> parse caffe_network.proto, caffe_solver.proto -``` - -Again, the SystemML engine doesnot invoke (or depend on) Caffe for any of its runtime operators. -Since the grammar files for the respective APIs (i.e. `caffe.proto`) are used by SystemML, -we include their licenses in our jar files. - -#### How can I speedup the training with Caffe2DML ? - -- Enable native BLAS to improve the performance of CP convolution and matrix multiplication operators. -If you are using OpenBLAS, please ensure that it was built with `USE_OPENMP` flag turned on. -For more detail see http://apache.github.io/systemml/native-backend - -```python -caffe2dmlObject.setConfigProperty("sysml.native.blas", "auto") -``` - -- Turn on the experimental codegen feature. This should help reduce unnecessary allocation cost after every binary operation. - -```python -caffe2dmlObject.setConfigProperty("sysml.codegen.enabled", "true").setConfigProperty("sysml.codegen.plancache", "true") -``` - -- Tuned the [Garbage Collector](http://spark.apache.org/docs/latest/tuning.html#garbage-collection-tuning). - -- Enable GPU support (described below). - -#### How to enable GPU support in Caffe2DML ? - -To be consistent with other mllearn algorithms, we recommend that you use following method instead of setting -the `solver_mode` in solver file. - -```python -# The below method tells SystemML optimizer to use a GPU-enabled instruction if the operands fit in the GPU memory -caffe2dmlObject.setGPU(True) -# The below method tells SystemML optimizer to always use a GPU-enabled instruction irrespective of the memory requirement -caffe2dmlObject.setForceGPU(True) -``` - -#### What is lr_policy in the solver specification ? - -The parameter `lr_policy` specifies the learning rate decay policy. Caffe2DML supports following policies: -- `fixed`: always return `base_lr`. -- `step`: return `base_lr * gamma ^ (floor(iter / step))` -- `exp`: return `base_lr * gamma ^ iter` -- `inv`: return `base_lr * (1 + gamma * iter) ^ (- power)` -- `poly`: the effective learning rate follows a polynomial decay, to be zero by the max_iter. return `base_lr (1 - iter/max_iter) ^ (power)` -- `sigmoid`: the effective learning rate follows a sigmod decay return b`ase_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))` - -#### How to set batch size ? - -Batch size is set in `data_param` of the Data layer: - -``` -layer { - name: "mnist" - type: "Data" - top: "data" - top: "label" - data_param { - source: "mnist_train" - batch_size: 64 - backend: LMDB - } -} -``` - -#### How to set maximum number of iterations for training ? - -The maximum number of iterations can be set in the solver specification - -```bash -# The maximum number of iterations -max_iter: 2000 -``` - -#### How to set the size of the validation dataset ? - -The size of the validation dataset is determined by the parameters `test_iter` and the batch size. For example: If the batch size is 64 and -`test_iter` is 10, then the validation size is 640. This setting generates following DML code internally: - -```python -num_images = nrow(y_full) -BATCH_SIZE = 64 -num_validation = 10 * BATCH_SIZE -X = X_full[(num_validation+1):num_images,]; y = y_full[(num_validation+1):num_images,] -X_val = X_full[1:num_validation,]; y_val = y_full[1:num_validation,] -num_images = nrow(y) -``` - -#### How to monitor loss via command-line ? - -To monitor loss, please set following parameters in the solver specification - -``` -# Display training loss and accuracy every 100 iterations -display: 100 -# Carry out validation every 500 training iterations and display validation loss and accuracy. -test_iter: 10 -test_interval: 500 -``` - -#### How to pass a single jpeg image to Caffe2DML for prediction ? - -To convert a jpeg into NumPy matrix, you can use the [pillow package](https://pillow.readthedocs.io/) and -SystemML's `convertImageToNumPyArr` utility function. The below pyspark code demonstrates the usage: - -```python -from PIL import Image -import systemml as sml -from systemml.mllearn import Caffe2DML -img_shape = (3, 224, 224) -input_image = sml.convertImageToNumPyArr(Image.open(img_file_path), img_shape=img_shape) -resnet = Caffe2DML(sqlCtx, solver='ResNet_50_solver.proto', weights='ResNet_50_pretrained_weights', input_shape=img_shape) -resnet.predict(input_image) -``` - -#### How to prepare a directory of jpeg images for training with Caffe2DML ? - -The below pyspark code assumes that the input dataset has 2 labels `cat` and `dogs` and the filename has these labels as prefix. -We iterate through the directory and convert each jpeg image into pyspark.ml.linalg.Vector using pyspark. -These vectors are stored as DataFrame and randomized using Spark SQL's `orderBy(rand())` function. -The DataFrame is then saved in parquet format to reduce the cost of preprocessing for repeated training. - -```python -from systemml.mllearn import Caffe2DML -from pyspark.sql import SQLContext -import numpy as np -import urllib, os, scipy.ndimage -from pyspark.ml.linalg import Vectors -from pyspark import StorageLevel -import systemml as sml -from pyspark.sql.functions import rand -# ImageNet specific parameters -img_shape = (3, 224, 224) -train_dir = '/home/biuser/dogs_vs_cats/train' -def getLabelFeatures(filename): - from PIL import Image - vec = Vectors.dense(sml.convertImageToNumPyArr(Image.open(os.path.join(train_dir, filename)), img_shape=img_shape)[0,:]) - if filename.lower().startswith('cat'): - return (1, vec) - elif filename.lower().startswith('dog'): - return (2, vec) - else: - raise ValueError('Expected the filename to start with either cat or dog') -list_jpeg_files = os.listdir(train_dir) -# 10 files per partition -train_df = sc.parallelize(list_jpeg_files, int(len(list_jpeg_files)/10)).map(lambda filename : getLabelFeatures(filename)).toDF(['label', 'features']).orderBy(rand()) -# Optional: but helps seperates conversion-related from training -# Alternatively, this dataframe can be passed directly to `caffe2dml_model.fit(train_df)` -train_df.write.parquet('kaggle-cats-dogs.parquet') -``` - -An alternative way to load images into a PySpark DataFrame for prediction, is to use MLLib's LabeledPoint class: - -```python -list_jpeg_files = os.listdir(train_dir) -train_df = sc.parallelize(list_jpeg_files, int(len(list_jpeg_files)/10)).map(lambda filename : LabeledPoint(0, sml.convertImageToNumPyArr(Image.open(os.path.join(train_dir, filename)), img_shape=img_shape)[0,:])).toDF().select('features') -# Note: convertVectorColumnsToML has an additional serialization cost -train_df = MLUtils.convertVectorColumnsToML(train_df) -``` - - -#### Can I use Caffe2DML via Scala ? - -Though we recommend using Caffe2DML via its Python interfaces, it is possible to use it by creating an object of the class -`org.apache.sysml.api.dl.Caffe2DML`. It is important to note that Caffe2DML's scala API is packaged in `systemml-*-extra.jar`. - -#### How can I get summary information of my network ? - - -```python -lenet.summary() -``` - -Output: - -``` -+-----+---------------+--------------+------------+---------+-----------+---------+ -| Name| Type| Output| Weight| Bias| Top| Bottom| -+-----+---------------+--------------+------------+---------+-----------+---------+ -|mnist| Data| (, 1, 28, 28)| | |mnist,mnist| | -|conv1| Convolution|(, 32, 28, 28)| [32 X 25]| [32 X 1]| conv1| mnist| -|relu1| ReLU|(, 32, 28, 28)| | | relu1| conv1| -|pool1| Pooling|(, 32, 14, 14)| | | pool1| relu1| -|conv2| Convolution|(, 64, 14, 14)| [64 X 800]| [64 X 1]| conv2| pool1| -|relu2| ReLU|(, 64, 14, 14)| | | relu2| conv2| -|pool2| Pooling| (, 64, 7, 7)| | | pool2| relu2| -| ip1| InnerProduct| (, 512, 1, 1)|[3136 X 512]|[1 X 512]| ip1| pool2| -|relu3| ReLU| (, 512, 1, 1)| | | relu3| ip1| -|drop1| Dropout| (, 512, 1, 1)| | | drop1| relu3| -| ip2| InnerProduct| (, 10, 1, 1)| [512 X 10]| [1 X 10]| ip2| drop1| -| loss|SoftmaxWithLoss| (, 10, 1, 1)| | | loss|ip2,mnist| -+-----+---------------+--------------+------------+---------+-----------+---------+ -``` - -#### How can I view the script generated by Caffe2DML ? - -To view the generated DML script (and additional debugging information), please set the `debug` parameter to True. - -```python -lenet.set(debug=True) -``` - -Output: -``` -001|debug = TRUE -002|source("nn/layers/softmax.dml") as softmax -003|source("nn/layers/cross_entropy_loss.dml") as cross_entropy_loss -004|source("nn/layers/conv2d_builtin.dml") as conv2d_builtin -005|source("nn/layers/relu.dml") as relu -006|source("nn/layers/max_pool2d_builtin.dml") as max_pool2d_builtin -007|source("nn/layers/affine.dml") as affine -008|source("nn/layers/dropout.dml") as dropout -009|source("nn/optim/sgd_momentum.dml") as sgd_momentum -010|source("nn/layers/l2_reg.dml") as l2_reg -011|X_full_path = ifdef($X, " ") -012|X_full = read(X_full_path) -013|y_full_path = ifdef($y, " ") -014|y_full = read(y_full_path) -015|num_images = nrow(y_full) -016|# Convert to one-hot encoding (Assumption: 1-based labels) -017|y_full = table(seq(1,num_images,1), y_full, num_images, 10) -018|weights = ifdef($weights, " ") -019|# Initialize the layers and solvers -020|X_full = X_full * 0.00390625 -021|BATCH_SIZE = 64 -022|[conv1_weight,conv1_bias] = conv2d_builtin::init(32,1,5,5) -023|[conv2_weight,conv2_bias] = conv2d_builtin::init(64,32,5,5) -024|[ip1_weight,ip1_bias] = affine::init(3136,512) -025|[ip2_weight,ip2_bias] = affine::init(512,10) -026|conv1_weight_v = sgd_momentum::init(conv1_weight) -027|conv1_bias_v = sgd_momentum::init(conv1_bias) -028|conv2_weight_v = sgd_momentum::init(conv2_weight) -029|conv2_bias_v = sgd_momentum::init(conv2_bias) -030|ip1_weight_v = sgd_momentum::init(ip1_weight) -031|ip1_bias_v = sgd_momentum::init(ip1_bias) -032|ip2_weight_v = sgd_momentum::init(ip2_weight) -033|ip2_bias_v = sgd_momentum::init(ip2_bias) -034|num_validation = 10 * BATCH_SIZE -035|# Sanity check to ensure that validation set is not too large -036|if(num_validation > ceil(0.3 * num_images)) { -037| max_test_iter = floor(ceil(0.3 * num_images) / BATCH_SIZE) -038| stop("Too large validation size. Please reduce test_iter to " + max_test_iter) -039|} -040|X = X_full[(num_validation+1):num_images,]; y = y_full[(num_validation+1):num_images,]; X_val = X_full[1:num_validation,]; y_val = y_full[1:num_validation,]; num_images = nrow(y) -041|num_iters_per_epoch = ceil(num_images / BATCH_SIZE) -042|max_epochs = ceil(2000 / num_iters_per_epoch) -043|iter = 0 -044|lr = 0.01 -045|for(e in 1:max_epochs) { -046| for(i in 1:num_iters_per_epoch) { -047| beg = ((i-1) * BATCH_SIZE) %% num_images + 1; end = min(beg + BATCH_SIZE - 1, num_images); Xb = X[beg:end,]; yb = y[beg:end,]; -048| iter = iter + 1 -049| # Perform forward pass -050| [out3,ignoreHout_3,ignoreWout_3] = conv2d_builtin::forward(Xb,conv1_weight,conv1_bias,1,28,28,5,5,1,1,2,2) -051| out4 = relu::forward(out3) -052| [out5,ignoreHout_5,ignoreWout_5] = max_pool2d_builtin::forward(out4,32,28,28,2,2,2,2,0,0) -053| [out6,ignoreHout_6,ignoreWout_6] = conv2d_builtin::forward(out5,conv2_weight,conv2_bias,32,14,14,5,5,1,1,2,2) -054| out7 = relu::forward(out6) -055| [out8,ignoreHout_8,ignoreWout_8] = max_pool2d_builtin::forward(out7,64,14,14,2,2,2,2,0,0) -056| out9 = affine::forward(out8,ip1_weight,ip1_bias) -057| out10 = relu::forward(out9) -058| [out11,mask11] = dropout::forward(out10,0.5,-1) -059| out12 = affine::forward(out11,ip2_weight,ip2_bias) -060| out13 = softmax::forward(out12) -061| # Perform backward pass -062| dProbs = cross_entropy_loss::backward(out13,yb); dOut13 = softmax::backward(dProbs,out12); dOut13_12 = dOut13; dOut13_2 = dOut13; -063| [dOut12,ip2_dWeight,ip2_dBias] = affine::backward(dOut13_12,out11,ip2_weight,ip2_bias); dOut12_11 = dOut12; -064| dOut11 = dropout::backward(dOut12_11,out10,0.5,mask11); dOut11_10 = dOut11; -065| dOut10 = relu::backward(dOut11_10,out9); dOut10_9 = dOut10; -066| [dOut9,ip1_dWeight,ip1_dBias] = affine::backward(dOut10_9,out8,ip1_weight,ip1_bias); dOut9_8 = dOut9; -067| dOut8 = max_pool2d_builtin::backward(dOut9_8,7,7,out7,64,14,14,2,2,2,2,0,0); dOut8_7 = dOut8; -068| dOut7 = relu::backward(dOut8_7,out6); dOut7_6 = dOut7; -069| [dOut6,conv2_dWeight,conv2_dBias] = conv2d_builtin::backward(dOut7_6,14,14,out5,conv2_weight,conv2_bias,32,14,14,5,5,1,1,2,2); dOut6_5 = dOut6; -070| dOut5 = max_pool2d_builtin::backward(dOut6_5,14,14,out4,32,28,28,2,2,2,2,0,0); dOut5_4 = dOut5; -071| dOut4 = relu::backward(dOut5_4,out3); dOut4_3 = dOut4; -072| [dOut3,conv1_dWeight,conv1_dBias] = conv2d_builtin::backward(dOut4_3,28,28,Xb,conv1_weight,conv1_bias,1,28,28,5,5,1,1,2,2); dOut3_2 = dOut3; -073| # Update the parameters -074| conv1_dWeight_reg = l2_reg::backward(conv1_weight, 5.000000237487257E-4) -075| conv1_dWeight = conv1_dWeight + conv1_dWeight_reg -076| [conv1_weight,conv1_weight_v] = sgd_momentum::update(conv1_weight,conv1_dWeight,(lr * 1.0),0.8999999761581421,conv1_weight_v) -077| [conv1_bias,conv1_bias_v] = sgd_momentum::update(conv1_bias,conv1_dBias,(lr * 2.0),0.8999999761581421,conv1_bias_v) -078| conv2_dWeight_reg = l2_reg::backward(conv2_weight, 5.000000237487257E-4) -079| conv2_dWeight = conv2_dWeight + conv2_dWeight_reg -080| [conv2_weight,conv2_weight_v] = sgd_momentum::update(conv2_weight,conv2_dWeight,(lr * 1.0),0.8999999761581421,conv2_weight_v) -081| [conv2_bias,conv2_bias_v] = sgd_momentum::update(conv2_bias,conv2_dBias,(lr * 2.0),0.8999999761581421,conv2_bias_v) -082| ip1_dWeight_reg = l2_reg::backward(ip1_weight, 5.000000237487257E-4) -083| ip1_dWeight = ip1_dWeight + ip1_dWeight_reg -084| [ip1_weight,ip1_weight_v] = sgd_momentum::update(ip1_weight,ip1_dWeight,(lr * 1.0),0.8999999761581421,ip1_weight_v) -085| [ip1_bias,ip1_bias_v] = sgd_momentum::update(ip1_bias,ip1_dBias,(lr * 2.0),0.8999999761581421,ip1_bias_v) -086| ip2_dWeight_reg = l2_reg::backward(ip2_weight, 5.000000237487257E-4) -087| ip2_dWeight = ip2_dWeight + ip2_dWeight_reg -088| [ip2_weight,ip2_weight_v] = sgd_momentum::update(ip2_weight,ip2_dWeight,(lr * 1.0),0.8999999761581421,ip2_weight_v) -089| [ip2_bias,ip2_bias_v] = sgd_momentum::update(ip2_bias,ip2_dBias,(lr * 2.0),0.8999999761581421,ip2_bias_v) -090| # Compute training loss & accuracy -091| if(iter %% 100 == 0) { -092| loss = 0 -093| accuracy = 0 -094| tmp_loss = cross_entropy_loss::forward(out13,yb) -095| loss = loss + tmp_loss -096| true_yb = rowIndexMax(yb) -097| predicted_yb = rowIndexMax(out13) -098| accuracy = mean(predicted_yb == true_yb)*100 -099| training_loss = loss -100| training_accuracy = accuracy -101| print("Iter:" + iter + ", training loss:" + training_loss + ", training accuracy:" + training_accuracy) -102| if(debug) { -103| num_rows_error_measures = min(10, ncol(yb)) -104| error_measures = matrix(0, rows=num_rows_error_measures, cols=5) -105| for(class_i in 1:num_rows_error_measures) { -106| tp = sum( (true_yb == predicted_yb) * (true_yb == class_i) ) -107| tp_plus_fp = sum( (predicted_yb == class_i) ) -108| tp_plus_fn = sum( (true_yb == class_i) ) -109| precision = tp / tp_plus_fp -110| recall = tp / tp_plus_fn -111| f1Score = 2*precision*recall / (precision+recall) -112| error_measures[class_i,1] = class_i -113| error_measures[class_i,2] = precision -114| error_measures[class_i,3] = recall -115| error_measures[class_i,4] = f1Score -116| error_measures[class_i,5] = tp_plus_fn -117| } -118| print("class \tprecision\trecall \tf1-score\tnum_true_labels\n" + toString(error_measures, decimal=7, sep="\t")) -119| } -120| } -121| # Compute validation loss & accuracy -122| if(iter %% 500 == 0) { -123| loss = 0 -124| accuracy = 0 -125| validation_loss = 0 -126| validation_accuracy = 0 -127| for(iVal in 1:num_iters_per_epoch) { -128| beg = ((iVal-1) * BATCH_SIZE) %% num_validation + 1; end = min(beg + BATCH_SIZE - 1, num_validation); Xb = X_val[beg:end,]; yb = y_val[beg:end,]; -129| # Perform forward pass -130| [out3,ignoreHout_3,ignoreWout_3] = conv2d_builtin::forward(Xb,conv1_weight,conv1_bias,1,28,28,5,5,1,1,2,2) -131| out4 = relu::forward(out3) -132| [out5,ignoreHout_5,ignoreWout_5] = max_pool2d_builtin::forward(out4,32,28,28,2,2,2,2,0,0) -133| [out6,ignoreHout_6,ignoreWout_6] = conv2d_builtin::forward(out5,conv2_weight,conv2_bias,32,14,14,5,5,1,1,2,2) -134| out7 = relu::forward(out6) -135| [out8,ignoreHout_8,ignoreWout_8] = max_pool2d_builtin::forward(out7,64,14,14,2,2,2,2,0,0) -136| out9 = affine::forward(out8,ip1_weight,ip1_bias) -137| out10 = relu::forward(out9) -138| [out11,mask11] = dropout::forward(out10,0.5,-1) -139| out12 = affine::forward(out11,ip2_weight,ip2_bias) -140| out13 = softmax::forward(out12) -141| tmp_loss = cross_entropy_loss::forward(out13,yb) -142| loss = loss + tmp_loss -143| true_yb = rowIndexMax(yb) -144| predicted_yb = rowIndexMax(out13) -145| accuracy = mean(predicted_yb == true_yb)*100 -146| validation_loss = validation_loss + loss -147| validation_accuracy = validation_accuracy + accuracy -148| } -149| validation_accuracy = validation_accuracy / num_iters_per_epoch -150| print("Iter:" + iter + ", validation loss:" + validation_loss + ", validation accuracy:" + validation_accuracy) -151| } -152| } -153| # Learning rate -154| lr = (0.009999999776482582 * 0.949999988079071^e) -155|} - -Iter:100, training loss:0.24014199350958168, training accuracy:87.5 -class precision recall f1-score num_true_labels -1.0000000 1.0000000 1.0000000 1.0000000 3.0000000 -2.0000000 1.0000000 1.0000000 1.0000000 8.0000000 -3.0000000 0.8888889 0.8888889 0.8888889 9.0000000 -4.0000000 0.7500000 0.7500000 0.7500000 4.0000000 -5.0000000 0.7500000 1.0000000 0.8571429 3.0000000 -6.0000000 0.8333333 1.0000000 0.9090909 5.0000000 -7.0000000 1.0000000 1.0000000 1.0000000 8.0000000 -8.0000000 0.8571429 0.7500000 0.8000000 8.0000000 -9.0000000 1.0000000 0.5714286 0.7272727 7.0000000 -10.0000000 0.7272727 0.8888889 0.8000000 9.0000000 - -Iter:200, training loss:0.09555593867171894, training accuracy:98.4375 -class precision recall f1-score num_true_labels -1.0000000 1.0000000 1.0000000 1.0000000 10.0000000 -2.0000000 1.0000000 1.0000000 1.0000000 3.0000000 -3.0000000 1.0000000 1.0000000 1.0000000 9.0000000 -4.0000000 1.0000000 1.0000000 1.0000000 6.0000000 -5.0000000 1.0000000 1.0000000 1.0000000 7.0000000 -6.0000000 1.0000000 1.0000000 1.0000000 8.0000000 -7.0000000 1.0000000 0.6666667 0.8000000 3.0000000 -8.0000000 1.0000000 1.0000000 1.0000000 9.0000000 -9.0000000 0.8571429 1.0000000 0.9230769 6.0000000 -10.0000000 1.0000000 1.0000000 1.0000000 3.0000000 - -Iter:300, training loss:0.058686794512570216, training accuracy:98.4375 -class precision recall f1-score num_true_labels -1.0000000 1.0000000 1.0000000 1.0000000 6.0000000 -2.0000000 1.0000000 1.0000000 1.0000000 9.0000000 -3.0000000 1.0000000 1.0000000 1.0000000 4.0000000 -4.0000000 1.0000000 1.0000000 1.0000000 8.0000000 -5.0000000 1.0000000 1.0000000 1.0000000 6.0000000 -6.0000000 1.0000000 0.8750000 0.9333333 8.0000000 -7.0000000 1.0000000 1.0000000 1.0000000 5.0000000 -8.0000000 1.0000000 1.0000000 1.0000000 2.0000000 -9.0000000 0.8888889 1.0000000 0.9411765 8.0000000 -10.0000000 1.0000000 1.0000000 1.0000000 8.0000000 - -Iter:400, training loss:0.08742103541529415, training accuracy:96.875 -class precision recall f1-score num_true_labels -1.0000000 1.0000000 1.0000000 1.0000000 6.0000000 -2.0000000 0.8000000 1.0000000 0.8888889 8.0000000 -3.0000000 1.0000000 0.8333333 0.9090909 6.0000000 -4.0000000 1.0000000 1.0000000 1.0000000 4.0000000 -5.0000000 1.0000000 1.0000000 1.0000000 4.0000000 -6.0000000 1.0000000 1.0000000 1.0000000 6.0000000 -7.0000000 1.0000000 1.0000000 1.0000000 7.0000000 -8.0000000 1.0000000 1.0000000 1.0000000 6.0000000 -9.0000000 1.0000000 1.0000000 1.0000000 4.0000000 -10.0000000 1.0000000 0.9230769 0.9600000 13.0000000 - -Iter:500, training loss:0.05873836245880005, training accuracy:98.4375 -class precision recall f1-score num_true_labels -1.0000000 1.0000000 1.0000000 1.0000000 3.0000000 -2.0000000 1.0000000 1.0000000 1.0000000 5.0000000 -3.0000000 1.0000000 1.0000000 1.0000000 6.0000000 -4.0000000 1.0000000 1.0000000 1.0000000 9.0000000 -5.0000000 1.0000000 1.0000000 1.0000000 4.0000000 -6.0000000 1.0000000 0.8571429 0.9230769 7.0000000 -7.0000000 0.8571429 1.0000000 0.9230769 6.0000000 -8.0000000 1.0000000 1.0000000 1.0000000 9.0000000 -9.0000000 1.0000000 1.0000000 1.0000000 10.0000000 -10.0000000 1.0000000 1.0000000 1.0000000 5.0000000 - -Iter:500, validation loss:260.1580978627665, validation accuracy:96.43954918032787 -Iter:600, training loss:0.07584116043829209, training accuracy:98.4375 -class precision recall f1-score num_true_labels -1.0000000 1.0000000 1.0000000 1.0000000 8.0000000 -2.0000000 1.0000000 1.0000000 1.0000000 4.0000000 -3.0000000 1.0000000 1.0000000 1.0000000 4.0000000 -4.0000000 1.0000000 1.0000000 1.0000000 4.0000000 -5.0000000 1.0000000 1.0000000 1.0000000 5.0000000 -6.0000000 1.0000000 1.0000000 1.0000000 8.0000000 -7.0000000 1.0000000 1.0000000 1.0000000 8.0000000 -8.0000000 1.0000000 0.9230769 0.9600000 13.0000000 -9.0000000 1.0000000 1.0000000 1.0000000 5.0000000 -10.0000000 0.8333333 1.0000000 0.9090909 5.0000000 - -Iter:700, training loss:0.07973166944626336, training accuracy:98.4375 -class precision recall f1-score num_true_labels -1.0000000 1.0000000 1.0000000 1.0000000 5.0000000 -2.0000000 1.0000000 1.0000000 1.0000000 4.0000000 -3.0000000 1.0000000 1.0000000 1.0000000 6.0000000 -4.0000000 1.0000000 1.0000000 1.0000000 4.0000000 -5.0000000 1.0000000 1.0000000 1.0000000 5.0000000 -6.0000000 1.0000000 1.0000000 1.0000000 6.0000000 -7.0000000 1.0000000 1.0000000 1.0000000 10.0000000 -8.0000000 0.8000000 1.0000000 0.8888889 4.0000000 -9.0000000 1.0000000 1.0000000 1.0000000 8.0000000 -10.0000000 1.0000000 0.9166667 0.9565217 12.0000000 - -Iter:800, training loss:0.0063778595034221855, training accuracy:100.0 -class precision recall f1-score num_true_labels -1.0000000 1.0000000 1.0000000 1.0000000 9.0000000 -2.0000000 1.0000000 1.0000000 1.0000000 6.0000000 -3.0000000 1.0000000 1.0000000 1.0000000 7.0000000 -4.0000000 1.0000000 1.0000000 1.0000000 7.0000000 -5.0000000 1.0000000 1.0000000 1.0000000 4.0000000 -6.0000000 1.0000000 1.0000000 1.0000000 9.0000000 -7.0000000 1.0000000 1.0000000 1.0000000 6.0000000 -8.0000000 1.0000000 1.0000000 1.0000000 8.0000000 -9.0000000 1.0000000 1.0000000 1.0000000 2.0000000 -10.0000000 1.0000000 1.0000000 1.0000000 6.0000000 - -Iter:900, training loss:0.019673112167879484, training accuracy:100.0 -class precision recall f1-score num_true_labels -1.0000000 1.0000000 1.0000000 1.0000000 3.0000000 -2.0000000 1.0000000 1.0000000 1.0000000 4.0000000 -3.0000000 1.0000000 1.0000000 1.0000000 3.0000000 -4.0000000 1.0000000 1.0000000 1.0000000 5.0000000 -5.0000000 1.0000000 1.0000000 1.0000000 6.0000000 -6.0000000 1.0000000 1.0000000 1.0000000 10.0000000 -7.0000000 1.0000000 1.0000000 1.0000000 7.0000000 -8.0000000 1.0000000 1.0000000 1.0000000 7.0000000 -9.0000000 1.0000000 1.0000000 1.0000000 12.0000000 -10.0000000 1.0000000 1.0000000 1.0000000 7.0000000 - -Iter:1000, training loss:0.06137978002508307, training accuracy:96.875 -class precision recall f1-score num_true_labels -1.0000000 1.0000000 1.0000000 1.0000000 5.0000000 -2.0000000 1.0000000 1.0000000 1.0000000 7.0000000 -3.0000000 1.0000000 1.0000000 1.0000000 8.0000000 -4.0000000 0.8333333 0.8333333 0.8333333 6.0000000 -5.0000000 1.0000000 1.0000000 1.0000000 5.0000000 -6.0000000 1.0000000 1.0000000 1.0000000 10.0000000 -7.0000000 1.0000000 1.0000000 1.0000000 3.0000000 -8.0000000 0.8888889 0.8888889 0.8888889 9.0000000 -9.0000000 1.0000000 1.0000000 1.0000000 7.0000000 -10.0000000 1.0000000 1.0000000 1.0000000 4.0000000 - -Iter:1000, validation loss:238.62301345198944, validation accuracy:97.02868852459017 -Iter:1100, training loss:0.023325103696013115, training accuracy:100.0 -class precision recall f1-score num_true_labels -1.0000000 1.0000000 1.0000000 1.0000000 4.0000000 -2.0000000 1.0000000 1.0000000 1.0000000 10.0000000 -3.0000000 1.0000000 1.0000000 1.0000000 6.0000000 -4.0000000 1.0000000 1.0000000 1.0000000 4.0000000 -5.0000000 1.0000000 1.0000000 1.0000000 2.0000000 -6.0000000 1.0000000 1.0000000 1.0000000 10.0000000 -7.0000000 1.0000000 1.0000000 1.0000000 7.0000000 -8.0000000 1.0000000 1.0000000 1.0000000 6.0000000 -9.0000000 1.0000000 1.0000000 1.0000000 9.0000000 -10.0000000 1.0000000 1.0000000 1.0000000 6.0000000 -... -``` +Please see [Caffe2DML's reference guide](http://apache.github.io/systemml/reference-guide-caffe2dml) for more details. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/systemml/blob/61dcc85e/docs/index.md ---------------------------------------------------------------------- diff --git a/docs/index.md b/docs/index.md index d1dded7..1178009 100644 --- a/docs/index.md +++ b/docs/index.md @@ -50,8 +50,9 @@ for running SystemML from Spark via Scala, Python, or Java. * [Standalone](standalone-guide) - Standalone mode allows data scientists to rapidly prototype algorithms on a single machine in R-like and Python-like declarative languages. * [JMLC](jmlc) - Java Machine Learning Connector. -* *Experimental* [Caffe2DML API](beginners-guide-caffe2dml) for Deep Learning. +* *Experimental* Caffe2DML API for Deep Learning ([beginner's guide](beginners-guide-caffe2dml), [reference guide](reference-guide-caffe2dml)) - Converts a Caffe specification to DML. * *Experimental* [Keras2DML API](beginners-guide-keras2dml) for Deep Learning. + ## Language Guides * [Python API Reference](python-reference) - API Reference Guide for Python users. @@ -79,3 +80,4 @@ command-line interface. * [Engine Developer Guide](engine-dev-guide) - Guide for internal SystemML engine development. * [Troubleshooting Guide](troubleshooting-guide) - Troubleshoot various issues related to SystemML. * [Release Process](release-process) - Description of the SystemML release process. +* [Using Native BLAS](native-backend) in SystemML. http://git-wip-us.apache.org/repos/asf/systemml/blob/61dcc85e/docs/reference-guide-caffe2dml.md ---------------------------------------------------------------------- diff --git a/docs/reference-guide-caffe2dml.md b/docs/reference-guide-caffe2dml.md new file mode 100644 index 0000000..24d5753 --- /dev/null +++ b/docs/reference-guide-caffe2dml.md @@ -0,0 +1,986 @@ +--- +layout: global +title: Beginner's Guide for Caffe2DML users +description: Beginner's Guide for Caffe2DML users +--- +<!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> + +* This will become a table of contents (this text will be scraped). +{:toc} + +<br/> + + +# Layers supported in Caffe2DML + +Caffe2DML to be as compatible with [the Caffe specification](http://caffe.berkeleyvision.org/tutorial/layers.html) as possible. +The main differences are given below along with the usage guide that mirrors the Caffe specification. + +## Vision Layers + +### Convolution Layer + +Invokes [nn/layers/conv2d_builtin.dml](https://github.com/apache/systemml/blob/master/scripts/nn/layers/conv2d_builtin.dml) +or [nn/layers/conv2d_depthwise.dml](https://github.com/apache/systemml/blob/master/scripts/nn/layers/conv2d_depthwise.dml) layer. + +**Required Parameters:** + +- num_output: the number of filters +- kernel_size (or kernel_h and kernel_w): specifies height and width of each filter + +**Optional Parameters:** + +- bias_term (default true): specifies whether to learn and apply a set of additive biases to the filter outputs +- pad (or pad_h and pad_w) (default 0): specifies the number of pixels to (implicitly) add to each side of the input +- stride (or stride_h and stride_w) (default 1): specifies the intervals at which to apply the filters to the input +- group (g) (default 1): If g > 1, we restrict the connectivity of each filter to a subset of the input. +Specifically, the input and output channels are separated into g groups, +and the ith output group channels will be only connected to the ith input group channels. +Note: we only support depthwise convolution, hence `g` should be divisible by number of channels + +**Parameters that are ignored:** + +- weight_filler: We use the heuristic by He et al., which limits the magnification of inputs/gradients +during forward/backward passes by scaling unit-Gaussian weights by a factor of sqrt(2/n), +under the assumption of relu neurons. +- bias_filler: We use `constant bias_filler` with `value:0` + +**Sample Usage:** +``` +layer { + name: "conv1" + type: "Convolution" + bottom: "data" + top: "conv1" + # learning rate and decay multipliers for the filters + param { lr_mult: 1 decay_mult: 1 } + # learning rate and decay multipliers for the biases + param { lr_mult: 2 decay_mult: 0 } + convolution_param { + num_output: 96 # learn 96 filters + kernel_size: 11 # each filter is 11x11 + stride: 4 # step 4 pixels between each filter application + weight_filler { + type: "xavier" # initialize the filters from a Gaussian + } + bias_filler { + type: "constant" # initialize the biases to zero (0) + value: 0 + } + } + } + ``` + +### Pooling Layer + +Invokes [nn/layers/max_pool2d_builtin.dml](https://github.com/apache/systemml/blob/master/scripts/nn/layers/max_pool2d_builtin.dml) layer. + +**Required Parameters:** + +- kernel_size (or kernel_h and kernel_w): specifies height and width of each filter + +**Optional Parameters:** +- pool (default MAX): the pooling method. Currently, we only support MAX, not AVE, or STOCHASTIC. +- pad (or pad_h and pad_w) (default 0): specifies the number of pixels to (implicitly) add to each side of the input +- stride (or stride_h and stride_w) (default 1): specifies the intervals at which to apply the filters to the input + +**Sample Usage:** +``` +layer { + name: "pool1" + type: "Pooling" + bottom: "conv1" + top: "pool1" + pooling_param { + pool: MAX + kernel_size: 3 # pool over a 3x3 region + stride: 2 # step two pixels (in the bottom blob) between pooling regions + } +} +``` + +### Deconvolution Layer + +Invokes [nn/layers/conv2d_transpose.dml](https://github.com/apache/systemml/blob/master/scripts/nn/layers/conv2d_transpose.dml) +or [nn/layers/conv2d_transpose_depthwise.dml](https://github.com/apache/systemml/blob/master/scripts/nn/layers/conv2d_transpose_depthwise.dml) layer. + +**Required Parameters:** + +- num_output: the number of filters +- kernel_size (or kernel_h and kernel_w): specifies height and width of each filter + +**Optional Parameters:** + +- bias_term (default true): specifies whether to learn and apply a set of additive biases to the filter outputs +- pad (or pad_h and pad_w) (default 0): specifies the number of pixels to (implicitly) add to each side of the input +- stride (or stride_h and stride_w) (default 1): specifies the intervals at which to apply the filters to the input +- group (g) (default 1): If g > 1, we restrict the connectivity of each filter to a subset of the input. +Specifically, the input and output channels are separated into g groups, +and the ith output group channels will be only connected to the ith input group channels. +Note: we only support depthwise convolution, hence `g` should be divisible by number of channels + +**Parameters that are ignored:** + +- weight_filler: We use the heuristic by He et al., which limits the magnification of inputs/gradients +during forward/backward passes by scaling unit-Gaussian weights by a factor of sqrt(2/n), +under the assumption of relu neurons. +- bias_filler: We use `constant bias_filler` with `value:0` + +**Sample Usage:** +``` +layer { + name: "upconv_d5c_u4a" + type: "Deconvolution" + bottom: "u5d" + top: "u4a" + param { + lr_mult: 0.0 + decay_mult: 0.0 + } + convolution_param { + num_output: 190 + bias_term: false + pad: 1 + kernel_size: 4 + group: 190 + stride: 2 + weight_filler { + type: "bilinear" + } + } +} +``` + + +## Common Layers + +### Inner Product / Fully Connected Layer + +Invokes [nn/layers/affine.dml](https://github.com/apache/systemml/blob/master/scripts/nn/layers/affine.dml) layer. + +**Required Parameters:** + +- num_output: the number of filters + +**Parameters that are ignored:** +- weight_filler (default type: 'constant' value: 0): We use the heuristic by He et al., which limits the magnification +of inputs/gradients during forward/backward passes by scaling unit-Gaussian weights by a factor of sqrt(2/n), under the +assumption of relu neurons. +- bias_filler (default type: 'constant' value: 0): We use the default type and value. +- bias_term (default true): specifies whether to learn and apply a set of additive biases to the filter outputs. We use `bias_term=true`. + +**Sample Usage:** +``` +layer { + name: "fc8" + type: "InnerProduct" + # learning rate and decay multipliers for the weights + param { lr_mult: 1 decay_mult: 1 } + # learning rate and decay multipliers for the biases + param { lr_mult: 2 decay_mult: 0 } + inner_product_param { + num_output: 1000 + weight_filler { + type: "xavier" + } + bias_filler { + type: "constant" + value: 0 + } + } + bottom: "fc7" + top: "fc8" +} +``` + +### Dropout Layer + +Invokes [nn/layers/dropout.dml](https://github.com/apache/systemml/blob/master/scripts/nn/layers/dropout.dml) layer. + +**Optional Parameters:** + +- dropout_ratio(default = 0.5): dropout ratio + +**Sample Usage:** +``` +layer { + name: "drop1" + type: "Dropout" + bottom: "relu3" + top: "drop1" + dropout_param { + dropout_ratio: 0.5 + } +} +``` + +## Normalization Layers + +### BatchNorm Layer + +This is used in combination with Scale layer. + +Invokes [nn/layers/batch_norm2d.dml](https://github.com/apache/systemml/blob/master/scripts/nn/layers/batch_norm2d.dml) layer. + +**Optional Parameters:** +- moving_average_fraction (default = .999): Momentum value for moving averages. Typical values are in the range of [0.9, 0.999]. +- eps (default = 1e-5): Smoothing term to avoid divide by zero errors. Typical values are in the range of [1e-5, 1e-3]. + +**Parameters that are ignored:** +- use_global_stats: If false, normalization is performed over the current mini-batch +and global statistics are accumulated (but not yet used) by a moving average. +If true, those accumulated mean and variance values are used for the normalization. +By default, it is set to false when the network is in the training phase and true when the network is in the testing phase. + +**Sample Usage:** +``` +layer { + bottom: "conv1" + top: "conv1" + name: "bn_conv1" + type: "BatchNorm" + batch_norm_param { + use_global_stats: true + } +} +layer { + bottom: "conv1" + top: "conv1" + name: "scale_conv1" + type: "Scale" + scale_param { + bias_term: true + } +} +``` + +## Activation / Neuron Layers + +In general, activation / Neuron layers are element-wise operators, taking one bottom blob and producing one top blob of the same size. +In the layers below, we will ignore the input and out sizes as they are identical. + +### ReLU / Rectified-Linear Layer + +Invokes [nn/layers/relu.dml](https://github.com/apache/systemml/blob/master/scripts/nn/layers/relu.dml) layer. + +**Parameters that are ignored:** +- negative_slope (default 0): specifies whether to leak the negative part by multiplying it with the slope value rather than setting it to 0. + +**Sample Usage:** +``` +layer { + name: "relu1" + type: "ReLU" + bottom: "conv1" + top: "conv1" +} +``` + +### TanH Layer + +Invokes [nn/layers/tanh.dml](https://github.com/apache/systemml/blob/master/scripts/nn/layers/tanh.dml) layer. + +**Sample Usage:** +``` +layer { + name: "tanh1" + type: "TanH" + bottom: "conv1" + top: "conv1" +} +``` + +### Sigmoid Layer + +Invokes [nn/layers/sigmoid.dml](https://github.com/apache/systemml/blob/master/scripts/nn/layers/sigmoid.dml) layer. + +**Sample Usage:** +``` +layer { + name: "sigmoid1" + type: "Sigmoid" + bottom: "conv1" + top: "conv1" +} +``` + + +### Threshold Layer + +Computes `X > threshold` + +**Parameters that are ignored:** +- threshold (default: 0):Strictly positive values + +**Sample Usage:** +``` +layer { + name: "threshold1" + type: "Threshold" + bottom: "conv1" + top: "conv1" +} +``` + +## Utility Layers + +### Eltwise Layer + +Element-wise operations such as product or sum between two blobs. + +**Parameters that are ignored:** +- operation(default: SUM): element-wise operation. only SUM supported for now. +- table_prod_grad(default: true): Whether to use an asymptotically slower (for >2 inputs) but stabler method +of computing the gradient for the PROD operation. (No effect for SUM op.) + +**Sample Usage:** +``` +layer { + bottom: "res2a_branch1" + bottom: "res2a_branch2c" + top: "res2a" + name: "res2a" + type: "Eltwise" +} +``` + +### Concat Layer + +**Inputs:** +- `n_i * c_i * h * w` for each input blob i from 1 to K. + +**Outputs:** +- out: Outputs, of shape + - if axis = 0: `(n_1 + n_2 + ... + n_K) * c_1 * h * w`, and all input `c_i` should be the same. + - if axis = 1: `n_1 * (c_1 + c_2 + ... + c_K) * h * w`, and all input `n_i` should be the same. + +**Optional Parameters:** +- axis (default: 1): The axis along which to concatenate. + +**Sample Usage:** +``` +layer { + name: "concat_d5cc_u5a-b" + type: "Concat" + bottom: "u5a" + bottom: "d5c" + top: "u5b" +} +``` + +### Softmax Layer + +Invokes [nn/layers/softmax.dml](https://github.com/apache/systemml/blob/master/scripts/nn/layers/softmax.dml) layer. + +Computes the forward pass for a softmax classifier. The inputs +are interpreted as unnormalized, log-probabilities for each of +N examples, and the softmax function transforms them to normalized +probabilities. + +This can be interpreted as a generalization of the sigmoid +function to multiple classes. + +`probs_ij = e^scores_ij / sum(e^scores_i)` + +**Parameters that are ignored:** +- axis (default: 1): The axis along which to perform the softmax. + +**Sample Usage:** +``` +layer { + name: "sm" + type: "Softmax" + bottom: "score" + top: "sm" +} +``` + +## Loss Layers + +Loss drives learning by comparing an output to a target and assigning cost to minimize. +The loss itself is computed by the forward pass and the gradient w.r.t. to the loss is computed by the backward pass. + +### Softmax with Loss Layer + +The softmax loss layer computes the multinomial logistic loss of the softmax of its inputs. +Itâs conceptually identical to a softmax layer followed by a multinomial logistic loss layer, but provides a more numerically stable gradient. + +Invokes [nn/layers/softmax.dml](https://github.com/apache/systemml/blob/master/scripts/nn/layers/softmax.dml) +and [nn/layers/cross_entropy_loss.dml](https://github.com/apache/systemml/blob/master/scripts/nn/layers/cross_entropy_loss.dml) +for classification problems. + +For image segmentation problems, invokes [nn/layers/softmax2d_loss.dml](https://github.com/apache/systemml/blob/master/scripts/nn/layers/softmax2d_loss.dml) layer. + +**Sample Usage:** +``` +layer { + name: "loss" + type: "SoftmaxWithLoss" + bottom: "ip2" + bottom: "label" + top: "loss" +} +``` + +### Euclidean layer + +The Euclidean loss layer computes the sum of squares of differences of its two inputs. + +Invokes [nn/layers/l2_loss.dml](https://github.com/apache/systemml/blob/master/scripts/nn/layers/l2_loss.dml) layer. + +**Sample Usage:** +``` +layer { + name: "loss" + type: "EuclideanLoss" + bottom: "ip2" + bottom: "label" + top: "loss" +} +``` + + +# Frequently asked questions + +#### What is the purpose of Caffe2DML API ? + +Most deep learning experts are more likely to be familiar with the Caffe's specification +rather than DML language. For these users, the Caffe2DML API reduces the learning curve to using SystemML. +Instead of requiring the users to write a DML script for training, fine-tuning and testing the model, +Caffe2DML takes as an input a network and solver specified in the Caffe specification +and automatically generates the corresponding DML. + +#### With Caffe2DML, does SystemML now require Caffe to be installed ? + +Absolutely not. We only support Caffe's API for convenience of the user as stated above. +Since the Caffe's API is specified in the protobuf format, we are able to generate the java parser files +and donot require Caffe to be installed. This is also true for Tensorboard feature of Caffe2DML. + +``` +Dml.g4 ---> antlr ---> DmlLexer.java, DmlListener.java, DmlParser.java ---> parse foo.dml +caffe.proto ---> protoc ---> target/generated-sources/caffe/Caffe.java ---> parse caffe_network.proto, caffe_solver.proto +``` + +Again, the SystemML engine doesnot invoke (or depend on) Caffe for any of its runtime operators. +Since the grammar files for the respective APIs (i.e. `caffe.proto`) are used by SystemML, +we include their licenses in our jar files. + +#### How can I speedup the training with Caffe2DML ? + +- Enable native BLAS to improve the performance of CP convolution and matrix multiplication operators. +If you are using OpenBLAS, please ensure that it was built with `USE_OPENMP` flag turned on. +For more detail see http://apache.github.io/systemml/native-backend + +```python +caffe2dmlObject.setConfigProperty("sysml.native.blas", "auto") +``` + +- Turn on the experimental codegen feature. This should help reduce unnecessary allocation cost after every binary operation. + +```python +caffe2dmlObject.setConfigProperty("sysml.codegen.enabled", "true").setConfigProperty("sysml.codegen.plancache", "true") +``` + +- Tuned the [Garbage Collector](http://spark.apache.org/docs/latest/tuning.html#garbage-collection-tuning). + +- Enable GPU support (described below). + +#### How to enable GPU support in Caffe2DML ? + +To be consistent with other mllearn algorithms, we recommend that you use following method instead of setting +the `solver_mode` in solver file. + +```python +# The below method tells SystemML optimizer to use a GPU-enabled instruction if the operands fit in the GPU memory +caffe2dmlObject.setGPU(True) +# The below method tells SystemML optimizer to always use a GPU-enabled instruction irrespective of the memory requirement +caffe2dmlObject.setForceGPU(True) +``` + +#### What is lr_policy in the solver specification ? + +The parameter `lr_policy` specifies the learning rate decay policy. Caffe2DML supports following policies: +- `fixed`: always return `base_lr`. +- `step`: return `base_lr * gamma ^ (floor(iter / step))` +- `exp`: return `base_lr * gamma ^ iter` +- `inv`: return `base_lr * (1 + gamma * iter) ^ (- power)` +- `poly`: the effective learning rate follows a polynomial decay, to be zero by the max_iter. return `base_lr (1 - iter/max_iter) ^ (power)` +- `sigmoid`: the effective learning rate follows a sigmod decay return b`ase_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))` + +#### How to set batch size ? + +Batch size is set in `data_param` of the Data layer: + +``` +layer { + name: "mnist" + type: "Data" + top: "data" + top: "label" + data_param { + source: "mnist_train" + batch_size: 64 + backend: LMDB + } +} +``` + +#### How to set maximum number of iterations for training ? + +The maximum number of iterations can be set in the solver specification + +```bash +# The maximum number of iterations +max_iter: 2000 +``` + +#### How to set the size of the validation dataset ? + +The size of the validation dataset is determined by the parameters `test_iter` and the batch size. For example: If the batch size is 64 and +`test_iter` is 10, then the validation size is 640. This setting generates following DML code internally: + +```python +num_images = nrow(y_full) +BATCH_SIZE = 64 +num_validation = 10 * BATCH_SIZE +X = X_full[(num_validation+1):num_images,]; y = y_full[(num_validation+1):num_images,] +X_val = X_full[1:num_validation,]; y_val = y_full[1:num_validation,] +num_images = nrow(y) +``` + +#### How to monitor loss via command-line ? + +To monitor loss, please set following parameters in the solver specification + +``` +# Display training loss and accuracy every 100 iterations +display: 100 +# Carry out validation every 500 training iterations and display validation loss and accuracy. +test_iter: 10 +test_interval: 500 +``` + +#### How to pass a single jpeg image to Caffe2DML for prediction ? + +To convert a jpeg into NumPy matrix, you can use the [pillow package](https://pillow.readthedocs.io/) and +SystemML's `convertImageToNumPyArr` utility function. The below pyspark code demonstrates the usage: + +```python +from PIL import Image +import systemml as sml +from systemml.mllearn import Caffe2DML +img_shape = (3, 224, 224) +input_image = sml.convertImageToNumPyArr(Image.open(img_file_path), img_shape=img_shape) +resnet = Caffe2DML(sqlCtx, solver='ResNet_50_solver.proto', weights='ResNet_50_pretrained_weights', input_shape=img_shape) +resnet.predict(input_image) +``` + +#### How to prepare a directory of jpeg images for training with Caffe2DML ? + +The below pyspark code assumes that the input dataset has 2 labels `cat` and `dogs` and the filename has these labels as prefix. +We iterate through the directory and convert each jpeg image into pyspark.ml.linalg.Vector using pyspark. +These vectors are stored as DataFrame and randomized using Spark SQL's `orderBy(rand())` function. +The DataFrame is then saved in parquet format to reduce the cost of preprocessing for repeated training. + +```python +from systemml.mllearn import Caffe2DML +from pyspark.sql import SQLContext +import numpy as np +import urllib, os, scipy.ndimage +from pyspark.ml.linalg import Vectors +from pyspark import StorageLevel +import systemml as sml +from pyspark.sql.functions import rand +# ImageNet specific parameters +img_shape = (3, 224, 224) +train_dir = '/home/biuser/dogs_vs_cats/train' +def getLabelFeatures(filename): + from PIL import Image + vec = Vectors.dense(sml.convertImageToNumPyArr(Image.open(os.path.join(train_dir, filename)), img_shape=img_shape)[0,:]) + if filename.lower().startswith('cat'): + return (1, vec) + elif filename.lower().startswith('dog'): + return (2, vec) + else: + raise ValueError('Expected the filename to start with either cat or dog') +list_jpeg_files = os.listdir(train_dir) +# 10 files per partition +train_df = sc.parallelize(list_jpeg_files, int(len(list_jpeg_files)/10)).map(lambda filename : getLabelFeatures(filename)).toDF(['label', 'features']).orderBy(rand()) +# Optional: but helps seperates conversion-related from training +# Alternatively, this dataframe can be passed directly to `caffe2dml_model.fit(train_df)` +train_df.write.parquet('kaggle-cats-dogs.parquet') +``` + +An alternative way to load images into a PySpark DataFrame for prediction, is to use MLLib's LabeledPoint class: + +```python +list_jpeg_files = os.listdir(train_dir) +train_df = sc.parallelize(list_jpeg_files, int(len(list_jpeg_files)/10)).map(lambda filename : LabeledPoint(0, sml.convertImageToNumPyArr(Image.open(os.path.join(train_dir, filename)), img_shape=img_shape)[0,:])).toDF().select('features') +# Note: convertVectorColumnsToML has an additional serialization cost +train_df = MLUtils.convertVectorColumnsToML(train_df) +``` + + +#### Can I use Caffe2DML via Scala ? + +Though we recommend using Caffe2DML via its Python interfaces, it is possible to use it by creating an object of the class +`org.apache.sysml.api.dl.Caffe2DML`. It is important to note that Caffe2DML's scala API is packaged in `systemml-*-extra.jar`. + +#### How can I get summary information of my network ? + + +```python +lenet.summary() +``` + +Output: + +``` ++-----+---------------+--------------+------------+---------+-----------+---------+ +| Name| Type| Output| Weight| Bias| Top| Bottom| ++-----+---------------+--------------+------------+---------+-----------+---------+ +|mnist| Data| (, 1, 28, 28)| | |mnist,mnist| | +|conv1| Convolution|(, 32, 28, 28)| [32 X 25]| [32 X 1]| conv1| mnist| +|relu1| ReLU|(, 32, 28, 28)| | | relu1| conv1| +|pool1| Pooling|(, 32, 14, 14)| | | pool1| relu1| +|conv2| Convolution|(, 64, 14, 14)| [64 X 800]| [64 X 1]| conv2| pool1| +|relu2| ReLU|(, 64, 14, 14)| | | relu2| conv2| +|pool2| Pooling| (, 64, 7, 7)| | | pool2| relu2| +| ip1| InnerProduct| (, 512, 1, 1)|[3136 X 512]|[1 X 512]| ip1| pool2| +|relu3| ReLU| (, 512, 1, 1)| | | relu3| ip1| +|drop1| Dropout| (, 512, 1, 1)| | | drop1| relu3| +| ip2| InnerProduct| (, 10, 1, 1)| [512 X 10]| [1 X 10]| ip2| drop1| +| loss|SoftmaxWithLoss| (, 10, 1, 1)| | | loss|ip2,mnist| ++-----+---------------+--------------+------------+---------+-----------+---------+ +``` + +#### How can I view the script generated by Caffe2DML ? + +To view the generated DML script (and additional debugging information), please set the `debug` parameter to True. + +```python +lenet.set(debug=True) +``` + +Output: +``` +001|debug = TRUE +002|source("nn/layers/softmax.dml") as softmax +003|source("nn/layers/cross_entropy_loss.dml") as cross_entropy_loss +004|source("nn/layers/conv2d_builtin.dml") as conv2d_builtin +005|source("nn/layers/relu.dml") as relu +006|source("nn/layers/max_pool2d_builtin.dml") as max_pool2d_builtin +007|source("nn/layers/affine.dml") as affine +008|source("nn/layers/dropout.dml") as dropout +009|source("nn/optim/sgd_momentum.dml") as sgd_momentum +010|source("nn/layers/l2_reg.dml") as l2_reg +011|X_full_path = ifdef($X, " ") +012|X_full = read(X_full_path) +013|y_full_path = ifdef($y, " ") +014|y_full = read(y_full_path) +015|num_images = nrow(y_full) +016|# Convert to one-hot encoding (Assumption: 1-based labels) +017|y_full = table(seq(1,num_images,1), y_full, num_images, 10) +018|weights = ifdef($weights, " ") +019|# Initialize the layers and solvers +020|X_full = X_full * 0.00390625 +021|BATCH_SIZE = 64 +022|[conv1_weight,conv1_bias] = conv2d_builtin::init(32,1,5,5) +023|[conv2_weight,conv2_bias] = conv2d_builtin::init(64,32,5,5) +024|[ip1_weight,ip1_bias] = affine::init(3136,512) +025|[ip2_weight,ip2_bias] = affine::init(512,10) +026|conv1_weight_v = sgd_momentum::init(conv1_weight) +027|conv1_bias_v = sgd_momentum::init(conv1_bias) +028|conv2_weight_v = sgd_momentum::init(conv2_weight) +029|conv2_bias_v = sgd_momentum::init(conv2_bias) +030|ip1_weight_v = sgd_momentum::init(ip1_weight) +031|ip1_bias_v = sgd_momentum::init(ip1_bias) +032|ip2_weight_v = sgd_momentum::init(ip2_weight) +033|ip2_bias_v = sgd_momentum::init(ip2_bias) +034|num_validation = 10 * BATCH_SIZE +035|# Sanity check to ensure that validation set is not too large +036|if(num_validation > ceil(0.3 * num_images)) { +037| max_test_iter = floor(ceil(0.3 * num_images) / BATCH_SIZE) +038| stop("Too large validation size. Please reduce test_iter to " + max_test_iter) +039|} +040|X = X_full[(num_validation+1):num_images,]; y = y_full[(num_validation+1):num_images,]; X_val = X_full[1:num_validation,]; y_val = y_full[1:num_validation,]; num_images = nrow(y) +041|num_iters_per_epoch = ceil(num_images / BATCH_SIZE) +042|max_epochs = ceil(2000 / num_iters_per_epoch) +043|iter = 0 +044|lr = 0.01 +045|for(e in 1:max_epochs) { +046| for(i in 1:num_iters_per_epoch) { +047| beg = ((i-1) * BATCH_SIZE) %% num_images + 1; end = min(beg + BATCH_SIZE - 1, num_images); Xb = X[beg:end,]; yb = y[beg:end,]; +048| iter = iter + 1 +049| # Perform forward pass +050| [out3,ignoreHout_3,ignoreWout_3] = conv2d_builtin::forward(Xb,conv1_weight,conv1_bias,1,28,28,5,5,1,1,2,2) +051| out4 = relu::forward(out3) +052| [out5,ignoreHout_5,ignoreWout_5] = max_pool2d_builtin::forward(out4,32,28,28,2,2,2,2,0,0) +053| [out6,ignoreHout_6,ignoreWout_6] = conv2d_builtin::forward(out5,conv2_weight,conv2_bias,32,14,14,5,5,1,1,2,2) +054| out7 = relu::forward(out6) +055| [out8,ignoreHout_8,ignoreWout_8] = max_pool2d_builtin::forward(out7,64,14,14,2,2,2,2,0,0) +056| out9 = affine::forward(out8,ip1_weight,ip1_bias) +057| out10 = relu::forward(out9) +058| [out11,mask11] = dropout::forward(out10,0.5,-1) +059| out12 = affine::forward(out11,ip2_weight,ip2_bias) +060| out13 = softmax::forward(out12) +061| # Perform backward pass +062| dProbs = cross_entropy_loss::backward(out13,yb); dOut13 = softmax::backward(dProbs,out12); dOut13_12 = dOut13; dOut13_2 = dOut13; +063| [dOut12,ip2_dWeight,ip2_dBias] = affine::backward(dOut13_12,out11,ip2_weight,ip2_bias); dOut12_11 = dOut12; +064| dOut11 = dropout::backward(dOut12_11,out10,0.5,mask11); dOut11_10 = dOut11; +065| dOut10 = relu::backward(dOut11_10,out9); dOut10_9 = dOut10; +066| [dOut9,ip1_dWeight,ip1_dBias] = affine::backward(dOut10_9,out8,ip1_weight,ip1_bias); dOut9_8 = dOut9; +067| dOut8 = max_pool2d_builtin::backward(dOut9_8,7,7,out7,64,14,14,2,2,2,2,0,0); dOut8_7 = dOut8; +068| dOut7 = relu::backward(dOut8_7,out6); dOut7_6 = dOut7; +069| [dOut6,conv2_dWeight,conv2_dBias] = conv2d_builtin::backward(dOut7_6,14,14,out5,conv2_weight,conv2_bias,32,14,14,5,5,1,1,2,2); dOut6_5 = dOut6; +070| dOut5 = max_pool2d_builtin::backward(dOut6_5,14,14,out4,32,28,28,2,2,2,2,0,0); dOut5_4 = dOut5; +071| dOut4 = relu::backward(dOut5_4,out3); dOut4_3 = dOut4; +072| [dOut3,conv1_dWeight,conv1_dBias] = conv2d_builtin::backward(dOut4_3,28,28,Xb,conv1_weight,conv1_bias,1,28,28,5,5,1,1,2,2); dOut3_2 = dOut3; +073| # Update the parameters +074| conv1_dWeight_reg = l2_reg::backward(conv1_weight, 5.000000237487257E-4) +075| conv1_dWeight = conv1_dWeight + conv1_dWeight_reg +076| [conv1_weight,conv1_weight_v] = sgd_momentum::update(conv1_weight,conv1_dWeight,(lr * 1.0),0.8999999761581421,conv1_weight_v) +077| [conv1_bias,conv1_bias_v] = sgd_momentum::update(conv1_bias,conv1_dBias,(lr * 2.0),0.8999999761581421,conv1_bias_v) +078| conv2_dWeight_reg = l2_reg::backward(conv2_weight, 5.000000237487257E-4) +079| conv2_dWeight = conv2_dWeight + conv2_dWeight_reg +080| [conv2_weight,conv2_weight_v] = sgd_momentum::update(conv2_weight,conv2_dWeight,(lr * 1.0),0.8999999761581421,conv2_weight_v) +081| [conv2_bias,conv2_bias_v] = sgd_momentum::update(conv2_bias,conv2_dBias,(lr * 2.0),0.8999999761581421,conv2_bias_v) +082| ip1_dWeight_reg = l2_reg::backward(ip1_weight, 5.000000237487257E-4) +083| ip1_dWeight = ip1_dWeight + ip1_dWeight_reg +084| [ip1_weight,ip1_weight_v] = sgd_momentum::update(ip1_weight,ip1_dWeight,(lr * 1.0),0.8999999761581421,ip1_weight_v) +085| [ip1_bias,ip1_bias_v] = sgd_momentum::update(ip1_bias,ip1_dBias,(lr * 2.0),0.8999999761581421,ip1_bias_v) +086| ip2_dWeight_reg = l2_reg::backward(ip2_weight, 5.000000237487257E-4) +087| ip2_dWeight = ip2_dWeight + ip2_dWeight_reg +088| [ip2_weight,ip2_weight_v] = sgd_momentum::update(ip2_weight,ip2_dWeight,(lr * 1.0),0.8999999761581421,ip2_weight_v) +089| [ip2_bias,ip2_bias_v] = sgd_momentum::update(ip2_bias,ip2_dBias,(lr * 2.0),0.8999999761581421,ip2_bias_v) +090| # Compute training loss & accuracy +091| if(iter %% 100 == 0) { +092| loss = 0 +093| accuracy = 0 +094| tmp_loss = cross_entropy_loss::forward(out13,yb) +095| loss = loss + tmp_loss +096| true_yb = rowIndexMax(yb) +097| predicted_yb = rowIndexMax(out13) +098| accuracy = mean(predicted_yb == true_yb)*100 +099| training_loss = loss +100| training_accuracy = accuracy +101| print("Iter:" + iter + ", training loss:" + training_loss + ", training accuracy:" + training_accuracy) +102| if(debug) { +103| num_rows_error_measures = min(10, ncol(yb)) +104| error_measures = matrix(0, rows=num_rows_error_measures, cols=5) +105| for(class_i in 1:num_rows_error_measures) { +106| tp = sum( (true_yb == predicted_yb) * (true_yb == class_i) ) +107| tp_plus_fp = sum( (predicted_yb == class_i) ) +108| tp_plus_fn = sum( (true_yb == class_i) ) +109| precision = tp / tp_plus_fp +110| recall = tp / tp_plus_fn +111| f1Score = 2*precision*recall / (precision+recall) +112| error_measures[class_i,1] = class_i +113| error_measures[class_i,2] = precision +114| error_measures[class_i,3] = recall +115| error_measures[class_i,4] = f1Score +116| error_measures[class_i,5] = tp_plus_fn +117| } +118| print("class \tprecision\trecall \tf1-score\tnum_true_labels\n" + toString(error_measures, decimal=7, sep="\t")) +119| } +120| } +121| # Compute validation loss & accuracy +122| if(iter %% 500 == 0) { +123| loss = 0 +124| accuracy = 0 +125| validation_loss = 0 +126| validation_accuracy = 0 +127| for(iVal in 1:num_iters_per_epoch) { +128| beg = ((iVal-1) * BATCH_SIZE) %% num_validation + 1; end = min(beg + BATCH_SIZE - 1, num_validation); Xb = X_val[beg:end,]; yb = y_val[beg:end,]; +129| # Perform forward pass +130| [out3,ignoreHout_3,ignoreWout_3] = conv2d_builtin::forward(Xb,conv1_weight,conv1_bias,1,28,28,5,5,1,1,2,2) +131| out4 = relu::forward(out3) +132| [out5,ignoreHout_5,ignoreWout_5] = max_pool2d_builtin::forward(out4,32,28,28,2,2,2,2,0,0) +133| [out6,ignoreHout_6,ignoreWout_6] = conv2d_builtin::forward(out5,conv2_weight,conv2_bias,32,14,14,5,5,1,1,2,2) +134| out7 = relu::forward(out6) +135| [out8,ignoreHout_8,ignoreWout_8] = max_pool2d_builtin::forward(out7,64,14,14,2,2,2,2,0,0) +136| out9 = affine::forward(out8,ip1_weight,ip1_bias) +137| out10 = relu::forward(out9) +138| [out11,mask11] = dropout::forward(out10,0.5,-1) +139| out12 = affine::forward(out11,ip2_weight,ip2_bias) +140| out13 = softmax::forward(out12) +141| tmp_loss = cross_entropy_loss::forward(out13,yb) +142| loss = loss + tmp_loss +143| true_yb = rowIndexMax(yb) +144| predicted_yb = rowIndexMax(out13) +145| accuracy = mean(predicted_yb == true_yb)*100 +146| validation_loss = validation_loss + loss +147| validation_accuracy = validation_accuracy + accuracy +148| } +149| validation_accuracy = validation_accuracy / num_iters_per_epoch +150| print("Iter:" + iter + ", validation loss:" + validation_loss + ", validation accuracy:" + validation_accuracy) +151| } +152| } +153| # Learning rate +154| lr = (0.009999999776482582 * 0.949999988079071^e) +155|} + +Iter:100, training loss:0.24014199350958168, training accuracy:87.5 +class precision recall f1-score num_true_labels +1.0000000 1.0000000 1.0000000 1.0000000 3.0000000 +2.0000000 1.0000000 1.0000000 1.0000000 8.0000000 +3.0000000 0.8888889 0.8888889 0.8888889 9.0000000 +4.0000000 0.7500000 0.7500000 0.7500000 4.0000000 +5.0000000 0.7500000 1.0000000 0.8571429 3.0000000 +6.0000000 0.8333333 1.0000000 0.9090909 5.0000000 +7.0000000 1.0000000 1.0000000 1.0000000 8.0000000 +8.0000000 0.8571429 0.7500000 0.8000000 8.0000000 +9.0000000 1.0000000 0.5714286 0.7272727 7.0000000 +10.0000000 0.7272727 0.8888889 0.8000000 9.0000000 + +Iter:200, training loss:0.09555593867171894, training accuracy:98.4375 +class precision recall f1-score num_true_labels +1.0000000 1.0000000 1.0000000 1.0000000 10.0000000 +2.0000000 1.0000000 1.0000000 1.0000000 3.0000000 +3.0000000 1.0000000 1.0000000 1.0000000 9.0000000 +4.0000000 1.0000000 1.0000000 1.0000000 6.0000000 +5.0000000 1.0000000 1.0000000 1.0000000 7.0000000 +6.0000000 1.0000000 1.0000000 1.0000000 8.0000000 +7.0000000 1.0000000 0.6666667 0.8000000 3.0000000 +8.0000000 1.0000000 1.0000000 1.0000000 9.0000000 +9.0000000 0.8571429 1.0000000 0.9230769 6.0000000 +10.0000000 1.0000000 1.0000000 1.0000000 3.0000000 + +Iter:300, training loss:0.058686794512570216, training accuracy:98.4375 +class precision recall f1-score num_true_labels +1.0000000 1.0000000 1.0000000 1.0000000 6.0000000 +2.0000000 1.0000000 1.0000000 1.0000000 9.0000000 +3.0000000 1.0000000 1.0000000 1.0000000 4.0000000 +4.0000000 1.0000000 1.0000000 1.0000000 8.0000000 +5.0000000 1.0000000 1.0000000 1.0000000 6.0000000 +6.0000000 1.0000000 0.8750000 0.9333333 8.0000000 +7.0000000 1.0000000 1.0000000 1.0000000 5.0000000 +8.0000000 1.0000000 1.0000000 1.0000000 2.0000000 +9.0000000 0.8888889 1.0000000 0.9411765 8.0000000 +10.0000000 1.0000000 1.0000000 1.0000000 8.0000000 + +Iter:400, training loss:0.08742103541529415, training accuracy:96.875 +class precision recall f1-score num_true_labels +1.0000000 1.0000000 1.0000000 1.0000000 6.0000000 +2.0000000 0.8000000 1.0000000 0.8888889 8.0000000 +3.0000000 1.0000000 0.8333333 0.9090909 6.0000000 +4.0000000 1.0000000 1.0000000 1.0000000 4.0000000 +5.0000000 1.0000000 1.0000000 1.0000000 4.0000000 +6.0000000 1.0000000 1.0000000 1.0000000 6.0000000 +7.0000000 1.0000000 1.0000000 1.0000000 7.0000000 +8.0000000 1.0000000 1.0000000 1.0000000 6.0000000 +9.0000000 1.0000000 1.0000000 1.0000000 4.0000000 +10.0000000 1.0000000 0.9230769 0.9600000 13.0000000 + +Iter:500, training loss:0.05873836245880005, training accuracy:98.4375 +class precision recall f1-score num_true_labels +1.0000000 1.0000000 1.0000000 1.0000000 3.0000000 +2.0000000 1.0000000 1.0000000 1.0000000 5.0000000 +3.0000000 1.0000000 1.0000000 1.0000000 6.0000000 +4.0000000 1.0000000 1.0000000 1.0000000 9.0000000 +5.0000000 1.0000000 1.0000000 1.0000000 4.0000000 +6.0000000 1.0000000 0.8571429 0.9230769 7.0000000 +7.0000000 0.8571429 1.0000000 0.9230769 6.0000000 +8.0000000 1.0000000 1.0000000 1.0000000 9.0000000 +9.0000000 1.0000000 1.0000000 1.0000000 10.0000000 +10.0000000 1.0000000 1.0000000 1.0000000 5.0000000 + +Iter:500, validation loss:260.1580978627665, validation accuracy:96.43954918032787 +Iter:600, training loss:0.07584116043829209, training accuracy:98.4375 +class precision recall f1-score num_true_labels +1.0000000 1.0000000 1.0000000 1.0000000 8.0000000 +2.0000000 1.0000000 1.0000000 1.0000000 4.0000000 +3.0000000 1.0000000 1.0000000 1.0000000 4.0000000 +4.0000000 1.0000000 1.0000000 1.0000000 4.0000000 +5.0000000 1.0000000 1.0000000 1.0000000 5.0000000 +6.0000000 1.0000000 1.0000000 1.0000000 8.0000000 +7.0000000 1.0000000 1.0000000 1.0000000 8.0000000 +8.0000000 1.0000000 0.9230769 0.9600000 13.0000000 +9.0000000 1.0000000 1.0000000 1.0000000 5.0000000 +10.0000000 0.8333333 1.0000000 0.9090909 5.0000000 + +Iter:700, training loss:0.07973166944626336, training accuracy:98.4375 +class precision recall f1-score num_true_labels +1.0000000 1.0000000 1.0000000 1.0000000 5.0000000 +2.0000000 1.0000000 1.0000000 1.0000000 4.0000000 +3.0000000 1.0000000 1.0000000 1.0000000 6.0000000 +4.0000000 1.0000000 1.0000000 1.0000000 4.0000000 +5.0000000 1.0000000 1.0000000 1.0000000 5.0000000 +6.0000000 1.0000000 1.0000000 1.0000000 6.0000000 +7.0000000 1.0000000 1.0000000 1.0000000 10.0000000 +8.0000000 0.8000000 1.0000000 0.8888889 4.0000000 +9.0000000 1.0000000 1.0000000 1.0000000 8.0000000 +10.0000000 1.0000000 0.9166667 0.9565217 12.0000000 + +Iter:800, training loss:0.0063778595034221855, training accuracy:100.0 +class precision recall f1-score num_true_labels +1.0000000 1.0000000 1.0000000 1.0000000 9.0000000 +2.0000000 1.0000000 1.0000000 1.0000000 6.0000000 +3.0000000 1.0000000 1.0000000 1.0000000 7.0000000 +4.0000000 1.0000000 1.0000000 1.0000000 7.0000000 +5.0000000 1.0000000 1.0000000 1.0000000 4.0000000 +6.0000000 1.0000000 1.0000000 1.0000000 9.0000000 +7.0000000 1.0000000 1.0000000 1.0000000 6.0000000 +8.0000000 1.0000000 1.0000000 1.0000000 8.0000000 +9.0000000 1.0000000 1.0000000 1.0000000 2.0000000 +10.0000000 1.0000000 1.0000000 1.0000000 6.0000000 + +Iter:900, training loss:0.019673112167879484, training accuracy:100.0 +class precision recall f1-score num_true_labels +1.0000000 1.0000000 1.0000000 1.0000000 3.0000000 +2.0000000 1.0000000 1.0000000 1.0000000 4.0000000 +3.0000000 1.0000000 1.0000000 1.0000000 3.0000000 +4.0000000 1.0000000 1.0000000 1.0000000 5.0000000 +5.0000000 1.0000000 1.0000000 1.0000000 6.0000000 +6.0000000 1.0000000 1.0000000 1.0000000 10.0000000 +7.0000000 1.0000000 1.0000000 1.0000000 7.0000000 +8.0000000 1.0000000 1.0000000 1.0000000 7.0000000 +9.0000000 1.0000000 1.0000000 1.0000000 12.0000000 +10.0000000 1.0000000 1.0000000 1.0000000 7.0000000 + +Iter:1000, training loss:0.06137978002508307, training accuracy:96.875 +class precision recall f1-score num_true_labels +1.0000000 1.0000000 1.0000000 1.0000000 5.0000000 +2.0000000 1.0000000 1.0000000 1.0000000 7.0000000 +3.0000000 1.0000000 1.0000000 1.0000000 8.0000000 +4.0000000 0.8333333 0.8333333 0.8333333 6.0000000 +5.0000000 1.0000000 1.0000000 1.0000000 5.0000000 +6.0000000 1.0000000 1.0000000 1.0000000 10.0000000 +7.0000000 1.0000000 1.0000000 1.0000000 3.0000000 +8.0000000 0.8888889 0.8888889 0.8888889 9.0000000 +9.0000000 1.0000000 1.0000000 1.0000000 7.0000000 +10.0000000 1.0000000 1.0000000 1.0000000 4.0000000 + +Iter:1000, validation loss:238.62301345198944, validation accuracy:97.02868852459017 +Iter:1100, training loss:0.023325103696013115, training accuracy:100.0 +class precision recall f1-score num_true_labels +1.0000000 1.0000000 1.0000000 1.0000000 4.0000000 +2.0000000 1.0000000 1.0000000 1.0000000 10.0000000 +3.0000000 1.0000000 1.0000000 1.0000000 6.0000000 +4.0000000 1.0000000 1.0000000 1.0000000 4.0000000 +5.0000000 1.0000000 1.0000000 1.0000000 2.0000000 +6.0000000 1.0000000 1.0000000 1.0000000 10.0000000 +7.0000000 1.0000000 1.0000000 1.0000000 7.0000000 +8.0000000 1.0000000 1.0000000 1.0000000 6.0000000 +9.0000000 1.0000000 1.0000000 1.0000000 9.0000000 +10.0000000 1.0000000 1.0000000 1.0000000 6.0000000 +... +``` + http://git-wip-us.apache.org/repos/asf/systemml/blob/61dcc85e/scripts/nn/layers/l2_loss.dml ---------------------------------------------------------------------- diff --git a/scripts/nn/layers/l2_loss.dml b/scripts/nn/layers/l2_loss.dml index 0482f25..67b9870 100644 --- a/scripts/nn/layers/l2_loss.dml +++ b/scripts/nn/layers/l2_loss.dml @@ -69,4 +69,3 @@ backward = function(matrix[double] pred, matrix[double] y) N = nrow(y) dpred = (pred-y) / N } - http://git-wip-us.apache.org/repos/asf/systemml/blob/61dcc85e/scripts/nn/layers/tanh.dml ---------------------------------------------------------------------- diff --git a/scripts/nn/layers/tanh.dml b/scripts/nn/layers/tanh.dml index d849d70..23fd106 100644 --- a/scripts/nn/layers/tanh.dml +++ b/scripts/nn/layers/tanh.dml @@ -29,10 +29,9 @@ forward = function(matrix[double] X) /* * Computes the forward pass for a tanh nonlinearity layer. * - * ``` - * tanh(x) = (e^x - e^-x) / (e^x + e^-x) - * = 2 * sigmoid(2x) - 1 - * ``` + * ``` + * tanh(x) = (e^x - e^-x) / (e^x + e^-x) + * ``` * * Inputs: * - X: Inputs, of shape (any, any). @@ -40,10 +39,7 @@ forward = function(matrix[double] X) * Outputs: * - out: Outputs, of same shape as `X`. */ - # out = (exp(X) - exp(-X)) / (exp(X) + exp(-X)) - # Simplification of the above formulation to use the sigmoid function: - sigma2X = sigmoid::forward(2*X) - out = 2*sigma2X - 1 + out = tanh(X) } backward = function(matrix[double] dout, matrix[double] X) @@ -58,8 +54,7 @@ backward = function(matrix[double] dout, matrix[double] X) * Outputs: * - dX: Gradient wrt `X`, of same shape as `X`. */ - sigma2X = sigmoid::forward(2*X) - out = 2*sigma2X - 1 + out = tanh(X) dX = (1-out^2) * dout } http://git-wip-us.apache.org/repos/asf/systemml/blob/61dcc85e/scripts/nn/test/run_tests.dml ---------------------------------------------------------------------- diff --git a/scripts/nn/test/run_tests.dml b/scripts/nn/test/run_tests.dml index 0f42816..27d6a4a 100644 --- a/scripts/nn/test/run_tests.dml +++ b/scripts/nn/test/run_tests.dml @@ -105,6 +105,8 @@ test::top_k_row() test::top_k() test::top_k2d() test::softmax2d() +test::compare_tanh_builtin_forward_with_old() +test::compare_tanh_builtin_backward_with_old() print("---") print("Other tests complete -- look for any ERRORs or WARNINGs.") http://git-wip-us.apache.org/repos/asf/systemml/blob/61dcc85e/scripts/nn/test/test.dml ---------------------------------------------------------------------- diff --git a/scripts/nn/test/test.dml b/scripts/nn/test/test.dml index 06f4632..2a04f97 100644 --- a/scripts/nn/test/test.dml +++ b/scripts/nn/test/test.dml @@ -39,6 +39,7 @@ source("nn/test/conv2d_simple.dml") as conv2d_simple source("nn/test/max_pool2d_simple.dml") as max_pool2d_simple source("nn/test/util.dml") as test_util source("nn/util.dml") as util +source("nn/layers/sigmoid.dml") as sigmoid batch_norm1d = function() { /* @@ -825,6 +826,58 @@ tanh = function() { } } +compare_tanh_builtin_forward_with_old = function() { + /* + * Test for the `tanh` forward function. + */ + print("Testing the tanh forward function.") + + # Generate data + N = 2 # num examples + C = 3 # num channels + X = rand(rows=N, cols=C, pdf="normal") + + out = tanh::forward(X) + + sigma2X = sigmoid::forward(2*X) + out_ref = 2*sigma2X - 1 + + # Equivalency check + for (i in 1:nrow(out)) { + for (j in 1:ncol(out)) { + rel_error = test_util::check_rel_error(as.scalar(out[i,j]), as.scalar(out_ref[i,j]), + 1e-10, 1e-12) + } + } +} + +compare_tanh_builtin_backward_with_old = function() { + /* + * Test for the `tanh` forward function. + */ + print("Testing the tanh forward function.") + + # Generate data + N = 2 # num examples + C = 3 # num channels + X = rand(rows=N, cols=C, pdf="normal") + dout = rand(rows=N, cols=C, pdf="normal") + + sigma2X = sigmoid::forward(2*X) + out = 2*sigma2X - 1 + out_ref = (1-out^2) * dout + + out = tanh::backward(dout, X) + + # Equivalency check + for (i in 1:nrow(out)) { + for (j in 1:ncol(out)) { + rel_error = test_util::check_rel_error(as.scalar(out[i,j]), as.scalar(out_ref[i,j]), + 1e-10, 1e-12) + } + } +} + threshold = function() { /* * Test for the threshold function.