Repository: systemml
Updated Branches:
  refs/heads/gh-pages 05792e0e9 -> 0ff267404


[SYSTEMML-540] Extended Caffe2DML to support image segmentation problems

- This commit extends Caffe2DML to support image segmentation problem,
  depthwise convolution and has couple of bugfixes regarding loading
  existing caffe model.
- Additionally, we have added a summary() method to Caffe2DML to print
network.

Closes #527.


Project: http://git-wip-us.apache.org/repos/asf/systemml/repo
Commit: http://git-wip-us.apache.org/repos/asf/systemml/commit/0ff26740
Tree: http://git-wip-us.apache.org/repos/asf/systemml/tree/0ff26740
Diff: http://git-wip-us.apache.org/repos/asf/systemml/diff/0ff26740

Branch: refs/heads/gh-pages
Commit: 0ff267404237640807915ffbf6b7a49791bda6c5
Parents: 05792e0
Author: Niketan Pansare <[email protected]>
Authored: Wed Jul 5 11:02:57 2017 -0700
Committer: Niketan Pansare <[email protected]>
Committed: Wed Jul 5 11:02:57 2017 -0700

----------------------------------------------------------------------
 beginners-guide-caffe2dml.md | 534 ++++++++++++++++++++++++++++++++++----
 python-reference.md          |   4 +
 2 files changed, 487 insertions(+), 51 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/systemml/blob/0ff26740/beginners-guide-caffe2dml.md
----------------------------------------------------------------------
diff --git a/beginners-guide-caffe2dml.md b/beginners-guide-caffe2dml.md
index f15e025..7671c32 100644
--- a/beginners-guide-caffe2dml.md
+++ b/beginners-guide-caffe2dml.md
@@ -32,24 +32,14 @@ limitations under the License.
 Caffe2DML is an **experimental API** that converts an Caffe specification to 
DML. 
 It is designed to fit well into the mllearn framework and hence supports 
NumPy, Pandas as well as PySpark DataFrame.
 
-## Examples
+### Training Lenet 
 
-### Train Lenet on MNIST dataset
-
-#### MNIST dataset
-
-The MNIST dataset was constructed from two datasets of the US National 
Institute of Standards and Technology (NIST). The training set consists of 
handwritten digits from 250 different people, 50 percent high school students, 
and 50 percent employees from the Census Bureau. Note that the test set 
contains handwritten digits from different people following the same split.
-In the below example, we are using mlxtend package to load the mnist dataset 
into Python NumPy arrays, but you are free to download it directly from 
http://yann.lecun.com/exdb/mnist/.
-
-```bash
-pip install mlxtend
-```
-
-#### Lenet network
-
-Lenet is a simple convolutional neural network, proposed by Yann LeCun in 
1998. It has 2 convolutions/pooling and fully connected layer. 
+To create a Caffe2DML object, one needs to create a solver and network file 
that conforms 
+to the [Caffe specification](http://caffe.berkeleyvision.org/).
+In this example, we will train Lenet which is a simple convolutional neural 
network, proposed by Yann LeCun in 1998. 
+It has 2 convolutions/pooling and fully connected layer. 
 Similar to Caffe, the network has been modified to add dropout. 
-For more detail, please see http://yann.lecun.com/exdb/lenet/
+For more detail, please see 
[http://yann.lecun.com/exdb/lenet/](http://yann.lecun.com/exdb/lenet/).
 
 The [solver 
specification](https://raw.githubusercontent.com/apache/systemml/master/scripts/nn/examples/caffe2dml/models/mnist_lenet/lenet_solver.proto)
 specifies to Caffe2DML to use following configuration when generating the 
training DML script:  
@@ -58,67 +48,167 @@ specifies to Caffe2DML to use following configuration when 
generating the traini
 - `display: 100`: Display training loss after every 100 iterations.
 - `test_interval: 500`: Display validation loss after every 500 iterations.
 - `test_iter: 10`: Validation data size = 10 * BATCH_SIZE.
- 
+
+```python
+from systemml.mllearn import Caffe2DML
+import urllib
+
+# Download the Lenet network
+urllib.urlretrieve('https://raw.githubusercontent.com/apache/systemml/master/scripts/nn/examples/caffe2dml/models/mnist_lenet/lenet.proto',
 'lenet.proto')
+urllib.urlretrieve('https://raw.githubusercontent.com/apache/systemml/master/scripts/nn/examples/caffe2dml/models/mnist_lenet/lenet_solver.proto',
 'lenet_solver.proto')
+# Train Lenet On MNIST using scikit-learn like API
+
+# MNIST dataset contains 28 X 28 gray-scale (number of channel=1).
+lenet = Caffe2DML(spark, solver='lenet_solver.proto', input_shape=(1, 28, 28))
+lenet.summary()
+```
+
+Output:
+
+```
++-----+---------------+--------------+------------+---------+-----------+---------+
+| Name|           Type|        Output|      Weight|     Bias|        Top|   
Bottom|
++-----+---------------+--------------+------------+---------+-----------+---------+
+|mnist|           Data| (, 1, 28, 28)|            |         |mnist,mnist|      
   |
+|conv1|    Convolution|(, 32, 28, 28)|   [32 X 25]| [32 X 1]|      conv1|    
mnist|
+|relu1|           ReLU|(, 32, 28, 28)|            |         |      relu1|    
conv1|
+|pool1|        Pooling|(, 32, 14, 14)|            |         |      pool1|    
relu1|
+|conv2|    Convolution|(, 64, 14, 14)|  [64 X 800]| [64 X 1]|      conv2|    
pool1|
+|relu2|           ReLU|(, 64, 14, 14)|            |         |      relu2|    
conv2|
+|pool2|        Pooling|  (, 64, 7, 7)|            |         |      pool2|    
relu2|
+|  ip1|   InnerProduct| (, 512, 1, 1)|[3136 X 512]|[1 X 512]|        ip1|    
pool2|
+|relu3|           ReLU| (, 512, 1, 1)|            |         |      relu3|      
ip1|
+|drop1|        Dropout| (, 512, 1, 1)|            |         |      drop1|    
relu3|
+|  ip2|   InnerProduct|  (, 10, 1, 1)|  [512 X 10]| [1 X 10]|        ip2|    
drop1|
+| loss|SoftmaxWithLoss|  (, 10, 1, 1)|            |         |       
loss|ip2,mnist|
++-----+---------------+--------------+------------+---------+-----------+---------+
+``` 
+
+To train the above lenet model, we use the MNIST dataset. 
+The MNIST dataset was constructed from two datasets of the US National 
Institute of Standards and Technology (NIST). 
+The training set consists of handwritten digits from 250 different people, 50 
percent high school students, and 50 percent employees from the Census Bureau. 
Note that the test set contains handwritten digits from different people 
following the same split.
+In this example, we are using mlxtend package to load the mnist dataset into 
Python NumPy arrays, but you are free to download it directly from 
http://yann.lecun.com/exdb/mnist/.
+
+```bash
+pip install mlxtend
+```
+
+We first split the MNIST dataset into train and test.  
 
 ```python
 from mlxtend.data import mnist_data
 import numpy as np
 from sklearn.utils import shuffle
-import urllib
-from systemml.mllearn import Caffe2DML
-
 # Download the MNIST dataset
 X, y = mnist_data()
 X, y = shuffle(X, y)
-
 # Split the data into training and test
 n_samples = len(X)
 X_train = X[:int(.9 * n_samples)]
 y_train = y[:int(.9 * n_samples)]
 X_test = X[int(.9 * n_samples):]
 y_test = y[int(.9 * n_samples):]
+```
 
-# Download the Lenet network
-urllib.urlretrieve('https://raw.githubusercontent.com/apache/systemml/master/scripts/nn/examples/caffe2dml/models/mnist_lenet/lenet.proto',
 'lenet.proto')
-urllib.urlretrieve('https://raw.githubusercontent.com/apache/systemml/master/scripts/nn/examples/caffe2dml/models/mnist_lenet/lenet_solver.proto',
 'lenet_solver.proto')
+Finally, we use the training and test dataset to perform training and 
prediction using scikit-learn like API.
 
-# Train Lenet On MNIST using scikit-learn like API
-# MNIST dataset contains 28 X 28 gray-scale (number of channel=1).
-lenet = Caffe2DML(sqlCtx, solver='lenet_solver.proto', input_shape=(1, 28, 28))
+```python
+# Since Caffe2DML is a mllearn API, it allows for scikit-learn like method for 
training.
+lenet.fit(X_train, y_train)
+# Either perform prediction: lenet.predict(X_test) or scoring:
+lenet.score(X_test, y_test)
+```
 
-# debug=True prints will print the generated DML script along with 
classification report. Please donot test this flag in production.
-lenet.set(debug=True)
+Output:
+```
+Iter:100, training loss:0.189008481420049, training accuracy:92.1875
+Iter:200, training loss:0.21657020576713149, training accuracy:96.875
+Iter:300, training loss:0.05780939180052287, training accuracy:98.4375
+Iter:400, training loss:0.03406193840071965, training accuracy:100.0
+Iter:500, training loss:0.02847187709112875, training accuracy:100.0
+Iter:500, validation loss:222.736109642486, validation 
accuracy:96.49077868852459
+Iter:600, training loss:0.04867848427394318, training accuracy:96.875
+Iter:700, training loss:0.043060905384304224, training accuracy:98.4375
+Iter:800, training loss:0.01861298388336358, training accuracy:100.0
+Iter:900, training loss:0.03495462005933769, training accuracy:100.0
+Iter:1000, training loss:0.04598737325942163, training accuracy:98.4375
+Iter:1000, validation loss:180.04232316810746, validation 
accuracy:97.28483606557377
+Iter:1100, training loss:0.05630274512793694, training accuracy:98.4375
+Iter:1200, training loss:0.027278141291535066, training accuracy:98.4375
+Iter:1300, training loss:0.04356275106270366, training accuracy:98.4375
+Iter:1400, training loss:0.00780793048139091, training accuracy:100.0
+Iter:1500, training loss:0.004135965492374173, training accuracy:100.0
+Iter:1500, validation loss:156.61636761709374, validation 
accuracy:97.48975409836065
+Iter:1600, training loss:0.007939063305475983, training accuracy:100.0
+Iter:1700, training loss:0.0025769653351162196, training accuracy:100.0
+Iter:1800, training loss:0.0023251742357435204, training accuracy:100.0
+Iter:1900, training loss:0.0016795711023936644, training accuracy:100.0
+Iter:2000, training loss:0.03676045262879483, training accuracy:98.4375
+Iter:2000, validation loss:173.66147359346, validation 
accuracy:97.48975409836065
+0.97399999999999998
+```
 
-# If you want to see the statistics as well as the plan
-lenet.setStatistics(True).setExplain(True)
+### Additional Configuration
 
-# If you want to force GPU execution. Please make sure the required dependency 
are available.  
-# lenet.setGPU(True).setForceGPU(True)
-# Example usage of train_algo, test_algo. Assume 2 gpus on driver
-# lenet.set(train_algo="allreduce_parallel_batches", test_algo="minibatch", 
parallel_batches=2)
+- Print the generated DML script along with classification report:  
`lenet.set(debug=True)`
+- Print the heavy hitters instruction and the execution plan (advanced users): 
`lenet.setStatistics(True).setExplain(True)`
+- (Optional but recommended) Enable [native 
BLAS](http://apache.github.io/systemml/native-backend): 
`lenet.setConfigProperty("native.blas", "auto")`
+- Enable experimental feature such as codegen: 
`lenet.setConfigProperty("codegen.enabled", 
"true").setConfigProperty("codegen.plancache", "true")`
+- Force GPU execution (please make sure the required jcuda dependency are 
included): lenet.setGPU(True).setForceGPU(True)
 
-# (Optional but recommended) Enable native BLAS. 
-lenet.setConfigProperty("native.blas", "auto")
+Unlike Caffe where default train and test algorithm is `minibatch`, you can 
specify the
+algorithm using the parameters `train_algo` and `test_algo` (valid values are: 
`minibatch`, `allreduce_parallel_batches`, 
+and `allreduce`). Here are some common settings:
 
-# In case you want to enable experimental feature such as codegen
-# lenet.setConfigProperty("codegen.enabled", 
"true").setConfigProperty("codegen.plancache", "true")
+|                                                                          | 
PySpark script                                                                  
                                                         | Changes to 
Network/Solver                                              |
+|--------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------|
+| Single-node CPU execution (similar to Caffe with solver_mode: CPU)       | 
`lenet.set(train_algo="minibatch", test_algo="minibatch")`                      
                                                         | Ensure that 
`batch_size` is set to appropriate value (for example: 64) |
+| Single-node single-GPU execution                                         | 
`lenet.set(train_algo="minibatch", 
test_algo="minibatch").setGPU(True).setForceGPU(True)`                          
                      | Ensure that `batch_size` is set to appropriate value 
(for example: 64) |
+| Single-node multi-GPU execution (similar to Caffe with solver_mode: GPU) | 
`lenet.set(train_algo="allreduce_parallel_batches", test_algo="minibatch", 
parallel_batches=num_gpu).setGPU(True).setForceGPU(True)`     | Ensure that 
`batch_size` is set to appropriate value (for example: 64) |
+| Distributed prediction                                                   | 
`lenet.set(test_algo="allreduce")`                                              
                                                         |                      
                                                  |
+| Distributed synchronous training                                         | 
`lenet.set(train_algo="allreduce_parallel_batches", 
parallel_batches=num_cluster_cores)`                                            
     | Ensure that `batch_size` is set to appropriate value (for example: 64) |
 
-# Since Caffe2DML is a mllearn API, it allows for scikit-learn like method for 
training.
+### Saving the trained model
+
+```python
 lenet.fit(X_train, y_train)
-lenet.predict(X_test)
+lenet.save('trained_weights')
+new_lenet = Caffe2DML(spark, solver='lenet_solver.proto', input_shape=(1, 28, 
28))
+new_lenet.load('trained_weights')
+new_lenet.score(X_test, y_test)
 ```
 
-For more detail on enabling native BLAS, please see the documentation for the 
[native backend](http://apache.github.io/systemml/native-backend).
+### Loading a pretrained caffemodel
 
-Common settings for `train_algo` and `test_algo` parameters:
+We provide a converter utility to convert `.caffemodel` trained using Caffe to 
SystemML format.
 
-|                                                                          | 
PySpark script                                                                  
                                                         | Changes to 
Network/Solver                                              |
-|--------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------|
-| Single-node CPU execution (similar to Caffe with solver_mode: CPU)       | 
`caffe2dml.set(train_algo="minibatch", test_algo="minibatch")`                  
                                                         | Ensure that 
`batch_size` is set to appropriate value (for example: 64) |
-| Single-node single-GPU execution                                         | 
`caffe2dml.set(train_algo="minibatch", 
test_algo="minibatch").setGPU(True).setForceGPU(True)`                          
                  | Ensure that `batch_size` is set to appropriate value (for 
example: 64) |
-| Single-node multi-GPU execution (similar to Caffe with solver_mode: GPU) | 
`caffe2dml.set(train_algo="allreduce_parallel_batches", test_algo="minibatch", 
parallel_batches=num_gpu).setGPU(True).setForceGPU(True)` | Ensure that 
`batch_size` is set to appropriate value (for example: 64) |
-| Distributed prediction                                                   | 
`caffe2dml.set(test_algo="allreduce")`                                          
                                                         |                      
                                                  |
-| Distributed synchronous training                                         | 
`caffe2dml.set(train_algo="allreduce_parallel_batches", 
parallel_batches=num_cluster_cores)`                                            
 | Ensure that `batch_size` is set to appropriate value (for example: 64) |
+```python
+# First download deploy file and caffemodel
+import urllib
+urllib.urlretrieve('https://raw.githubusercontent.com/apache/systemml/master/scripts/nn/examples/caffe2dml/models/imagenet/vgg19/VGG_ILSVRC_19_layers_deploy.proto',
 'VGG_ILSVRC_19_layers_deploy.proto')
+urllib.urlretrieve('http://www.robots.ox.ac.uk/~vgg/software/very_deep/caffe/VGG_ILSVRC_19_layers.caffemodel',
 'VGG_ILSVRC_19_layers.caffemodel')
+# Save the weights into trained_vgg_weights directory
+import systemml as sml
+sml.convert_caffemodel(sc, 'VGG_ILSVRC_19_layers_deploy.proto', 
'VGG_ILSVRC_19_layers.caffemodel',  'trained_vgg_weights')
+```
+
+We can then use the `trained_vgg_weights` directory for performing prediction 
or fine-tuning.
+
+```python
+# Download the VGG network
+urllib.urlretrieve('https://raw.githubusercontent.com/apache/systemml/master/scripts/nn/examples/caffe2dml/models/imagenet/vgg19/VGG_ILSVRC_19_layers_network.proto',
 'VGG_ILSVRC_19_layers_network.proto')
+urllib.urlretrieve('https://raw.githubusercontent.com/apache/systemml/master/scripts/nn/examples/caffe2dml/models/imagenet/vgg19/VGG_ILSVRC_19_layers_solver.proto',
 'VGG_ILSVRC_19_layers_solver.proto')
+# Storing the labels.txt in the weights directory allows predict to return a 
label (for example: 'cougar, puma, catamount, mountain lion, painter, panther, 
Felis concolor') rather than the column index of one-hot encoded vector (for 
example: 287).
+urllib.urlretrieve('https://raw.githubusercontent.com/apache/systemml/master/scripts/nn/examples/caffe2dml/models/imagenet/labels.txt',
 os.path.join('trained_vgg_weights', 'labels.txt'))
+from systemml.mllearn import Caffe2DML
+vgg = Caffe2DML(sqlCtx, solver='VGG_ILSVRC_19_layers_solver.proto', 
input_shape=(3, 224, 224))
+vgg.load('trained_vgg_weights')
+# We can then perform prediction:
+from PIL import Image
+X_test = sml.convertImageToNumPyArr(Image.open('test.jpg'), img_shape=(3, 224, 
224))
+vgg.predict(X_test)
+# OR Fine-Tuning: vgg.fit(X_train, y_train)
+```
 
 ## Frequently asked questions
 
@@ -291,16 +381,358 @@ train_df = sc.parallelize(list_jpeg_files, 
int(len(list_jpeg_files)/10)).map(lam
 train_df.write.parquet('kaggle-cats-dogs.parquet')
 ```
 
+An alternative way to load images into a PySpark DataFrame for prediction, is 
to use MLLib's LabeledPoint class:
+
+```python
+list_jpeg_files = os.listdir(train_dir)
+train_df = sc.parallelize(list_jpeg_files, 
int(len(list_jpeg_files)/10)).map(lambda filename : LabeledPoint(0, 
sml.convertImageToNumPyArr(Image.open(os.path.join(train_dir, filename)), 
img_shape=img_shape)[0,:])).toDF().select('features')
+# Note: convertVectorColumnsToML has an additional serialization cost
+train_df = MLUtils.convertVectorColumnsToML(train_df)
+```
+ 
+
 #### Can I use Caffe2DML via Scala ?
 
 Though we recommend using Caffe2DML via its Python interfaces, it is possible 
to use it by creating an object of the class
 `org.apache.sysml.api.dl.Caffe2DML`. It is important to note that Caffe2DML's 
scala API is packaged in `systemml-*-extra.jar`.
 
+#### How can I get summary information of my network ?
+ 
+
+```python
+lenet.summary()
+```
+
+Output:
+
+```
++-----+---------------+--------------+------------+---------+-----------+---------+
+| Name|           Type|        Output|      Weight|     Bias|        Top|   
Bottom|
++-----+---------------+--------------+------------+---------+-----------+---------+
+|mnist|           Data| (, 1, 28, 28)|            |         |mnist,mnist|      
   |
+|conv1|    Convolution|(, 32, 28, 28)|   [32 X 25]| [32 X 1]|      conv1|    
mnist|
+|relu1|           ReLU|(, 32, 28, 28)|            |         |      relu1|    
conv1|
+|pool1|        Pooling|(, 32, 14, 14)|            |         |      pool1|    
relu1|
+|conv2|    Convolution|(, 64, 14, 14)|  [64 X 800]| [64 X 1]|      conv2|    
pool1|
+|relu2|           ReLU|(, 64, 14, 14)|            |         |      relu2|    
conv2|
+|pool2|        Pooling|  (, 64, 7, 7)|            |         |      pool2|    
relu2|
+|  ip1|   InnerProduct| (, 512, 1, 1)|[3136 X 512]|[1 X 512]|        ip1|    
pool2|
+|relu3|           ReLU| (, 512, 1, 1)|            |         |      relu3|      
ip1|
+|drop1|        Dropout| (, 512, 1, 1)|            |         |      drop1|    
relu3|
+|  ip2|   InnerProduct|  (, 10, 1, 1)|  [512 X 10]| [1 X 10]|        ip2|    
drop1|
+| loss|SoftmaxWithLoss|  (, 10, 1, 1)|            |         |       
loss|ip2,mnist|
++-----+---------------+--------------+------------+---------+-----------+---------+
+``` 
 
 #### How can I view the script generated by Caffe2DML ?
 
 To view the generated DML script (and additional debugging information), 
please set the `debug` parameter to True.
 
 ```python
-caffe2dmlObject.set(debug=True)
+lenet.set(debug=True)
+```
+
+Output:
+```
+001|debug = TRUE
+002|source("nn/layers/softmax.dml") as softmax
+003|source("nn/layers/cross_entropy_loss.dml") as cross_entropy_loss
+004|source("nn/layers/conv2d_builtin.dml") as conv2d_builtin
+005|source("nn/layers/relu.dml") as relu
+006|source("nn/layers/max_pool2d_builtin.dml") as max_pool2d_builtin
+007|source("nn/layers/affine.dml") as affine
+008|source("nn/layers/dropout.dml") as dropout
+009|source("nn/optim/sgd_momentum.dml") as sgd_momentum
+010|source("nn/layers/l2_reg.dml") as l2_reg
+011|X_full_path = ifdef($X, " ")
+012|X_full = read(X_full_path)
+013|y_full_path = ifdef($y, " ")
+014|y_full = read(y_full_path)
+015|num_images = nrow(y_full)
+016|# Convert to one-hot encoding (Assumption: 1-based labels)
+017|y_full = table(seq(1,num_images,1), y_full, num_images, 10)
+018|weights = ifdef($weights, " ")
+019|# Initialize the layers and solvers
+020|X_full = X_full * 0.00390625
+021|BATCH_SIZE = 64
+022|[conv1_weight,conv1_bias] = conv2d_builtin::init(32,1,5,5)
+023|[conv2_weight,conv2_bias] = conv2d_builtin::init(64,32,5,5)
+024|[ip1_weight,ip1_bias] = affine::init(3136,512)
+025|[ip2_weight,ip2_bias] = affine::init(512,10)
+026|conv1_weight_v = sgd_momentum::init(conv1_weight)
+027|conv1_bias_v = sgd_momentum::init(conv1_bias)
+028|conv2_weight_v = sgd_momentum::init(conv2_weight)
+029|conv2_bias_v = sgd_momentum::init(conv2_bias)
+030|ip1_weight_v = sgd_momentum::init(ip1_weight)
+031|ip1_bias_v = sgd_momentum::init(ip1_bias)
+032|ip2_weight_v = sgd_momentum::init(ip2_weight)
+033|ip2_bias_v = sgd_momentum::init(ip2_bias)
+034|num_validation = 10 * BATCH_SIZE
+035|# Sanity check to ensure that validation set is not too large
+036|if(num_validation > ceil(0.3 * num_images)) {
+037|    max_test_iter = floor(ceil(0.3 * num_images) / BATCH_SIZE)
+038|    stop("Too large validation size. Please reduce test_iter to " + 
max_test_iter)
+039|}
+040|X = X_full[(num_validation+1):num_images,]; y = 
y_full[(num_validation+1):num_images,]; X_val = X_full[1:num_validation,]; 
y_val = y_full[1:num_validation,]; num_images = nrow(y)
+041|num_iters_per_epoch = ceil(num_images / BATCH_SIZE)
+042|max_epochs = ceil(2000 / num_iters_per_epoch)
+043|iter = 0
+044|lr = 0.01
+045|for(e in 1:max_epochs) {
+046|    for(i in 1:num_iters_per_epoch) {
+047|            beg = ((i-1) * BATCH_SIZE) %% num_images + 1; end = min(beg + 
BATCH_SIZE - 1, num_images); Xb = X[beg:end,]; yb = y[beg:end,];
+048|            iter = iter + 1
+049|            # Perform forward pass
+050|            [out3,ignoreHout_3,ignoreWout_3] = 
conv2d_builtin::forward(Xb,conv1_weight,conv1_bias,1,28,28,5,5,1,1,2,2)
+051|            out4 = relu::forward(out3)
+052|            [out5,ignoreHout_5,ignoreWout_5] = 
max_pool2d_builtin::forward(out4,32,28,28,2,2,2,2,0,0)
+053|            [out6,ignoreHout_6,ignoreWout_6] = 
conv2d_builtin::forward(out5,conv2_weight,conv2_bias,32,14,14,5,5,1,1,2,2)
+054|            out7 = relu::forward(out6)
+055|            [out8,ignoreHout_8,ignoreWout_8] = 
max_pool2d_builtin::forward(out7,64,14,14,2,2,2,2,0,0)
+056|            out9 = affine::forward(out8,ip1_weight,ip1_bias)
+057|            out10 = relu::forward(out9)
+058|            [out11,mask11] = dropout::forward(out10,0.5,-1)
+059|            out12 = affine::forward(out11,ip2_weight,ip2_bias)
+060|            out13 = softmax::forward(out12)
+061|            # Perform backward pass
+062|            dProbs = cross_entropy_loss::backward(out13,yb); dOut13 = 
softmax::backward(dProbs,out12); dOut13_12 = dOut13; dOut13_2 = dOut13;
+063|            [dOut12,ip2_dWeight,ip2_dBias] = 
affine::backward(dOut13_12,out11,ip2_weight,ip2_bias); dOut12_11 = dOut12;
+064|            dOut11 = dropout::backward(dOut12_11,out10,0.5,mask11); 
dOut11_10 = dOut11;
+065|            dOut10 = relu::backward(dOut11_10,out9); dOut10_9 = dOut10;
+066|            [dOut9,ip1_dWeight,ip1_dBias] = 
affine::backward(dOut10_9,out8,ip1_weight,ip1_bias); dOut9_8 = dOut9;
+067|            dOut8 = 
max_pool2d_builtin::backward(dOut9_8,7,7,out7,64,14,14,2,2,2,2,0,0); dOut8_7 = 
dOut8;
+068|            dOut7 = relu::backward(dOut8_7,out6); dOut7_6 = dOut7;
+069|            [dOut6,conv2_dWeight,conv2_dBias] = 
conv2d_builtin::backward(dOut7_6,14,14,out5,conv2_weight,conv2_bias,32,14,14,5,5,1,1,2,2);
 dOut6_5 = dOut6;
+070|            dOut5 = 
max_pool2d_builtin::backward(dOut6_5,14,14,out4,32,28,28,2,2,2,2,0,0); dOut5_4 
= dOut5;
+071|            dOut4 = relu::backward(dOut5_4,out3); dOut4_3 = dOut4;
+072|            [dOut3,conv1_dWeight,conv1_dBias] = 
conv2d_builtin::backward(dOut4_3,28,28,Xb,conv1_weight,conv1_bias,1,28,28,5,5,1,1,2,2);
 dOut3_2 = dOut3;
+073|            # Update the parameters
+074|            conv1_dWeight_reg = l2_reg::backward(conv1_weight, 
5.000000237487257E-4)
+075|            conv1_dWeight = conv1_dWeight + conv1_dWeight_reg
+076|            [conv1_weight,conv1_weight_v] = 
sgd_momentum::update(conv1_weight,conv1_dWeight,(lr * 
1.0),0.8999999761581421,conv1_weight_v)
+077|            [conv1_bias,conv1_bias_v] = 
sgd_momentum::update(conv1_bias,conv1_dBias,(lr * 
2.0),0.8999999761581421,conv1_bias_v)
+078|            conv2_dWeight_reg = l2_reg::backward(conv2_weight, 
5.000000237487257E-4)
+079|            conv2_dWeight = conv2_dWeight + conv2_dWeight_reg
+080|            [conv2_weight,conv2_weight_v] = 
sgd_momentum::update(conv2_weight,conv2_dWeight,(lr * 
1.0),0.8999999761581421,conv2_weight_v)
+081|            [conv2_bias,conv2_bias_v] = 
sgd_momentum::update(conv2_bias,conv2_dBias,(lr * 
2.0),0.8999999761581421,conv2_bias_v)
+082|            ip1_dWeight_reg = l2_reg::backward(ip1_weight, 
5.000000237487257E-4)
+083|            ip1_dWeight = ip1_dWeight + ip1_dWeight_reg
+084|            [ip1_weight,ip1_weight_v] = 
sgd_momentum::update(ip1_weight,ip1_dWeight,(lr * 
1.0),0.8999999761581421,ip1_weight_v)
+085|            [ip1_bias,ip1_bias_v] = 
sgd_momentum::update(ip1_bias,ip1_dBias,(lr * 
2.0),0.8999999761581421,ip1_bias_v)
+086|            ip2_dWeight_reg = l2_reg::backward(ip2_weight, 
5.000000237487257E-4)
+087|            ip2_dWeight = ip2_dWeight + ip2_dWeight_reg
+088|            [ip2_weight,ip2_weight_v] = 
sgd_momentum::update(ip2_weight,ip2_dWeight,(lr * 
1.0),0.8999999761581421,ip2_weight_v)
+089|            [ip2_bias,ip2_bias_v] = 
sgd_momentum::update(ip2_bias,ip2_dBias,(lr * 
2.0),0.8999999761581421,ip2_bias_v)
+090|            # Compute training loss & accuracy
+091|            if(iter  %% 100 == 0) {
+092|                    loss = 0
+093|                    accuracy = 0
+094|                    tmp_loss = cross_entropy_loss::forward(out13,yb)
+095|                    loss = loss + tmp_loss
+096|                    true_yb = rowIndexMax(yb)
+097|                    predicted_yb = rowIndexMax(out13)
+098|                    accuracy = mean(predicted_yb == true_yb)*100
+099|                    training_loss = loss
+100|                    training_accuracy = accuracy
+101|                    print("Iter:" + iter + ", training loss:" + 
training_loss + ", training accuracy:" + training_accuracy)
+102|                    if(debug) {
+103|                            num_rows_error_measures = min(10, ncol(yb))
+104|                            error_measures = matrix(0, 
rows=num_rows_error_measures, cols=5)
+105|                            for(class_i in 1:num_rows_error_measures) {
+106|                                    tp = sum( (true_yb == predicted_yb) * 
(true_yb == class_i) )
+107|                                    tp_plus_fp = sum( (predicted_yb == 
class_i) )
+108|                                    tp_plus_fn = sum( (true_yb == class_i) 
)
+109|                                    precision = tp / tp_plus_fp
+110|                                    recall = tp / tp_plus_fn
+111|                                    f1Score = 2*precision*recall / 
(precision+recall)
+112|                                    error_measures[class_i,1] = class_i
+113|                                    error_measures[class_i,2] = precision
+114|                                    error_measures[class_i,3] = recall
+115|                                    error_measures[class_i,4] = f1Score
+116|                                    error_measures[class_i,5] = tp_plus_fn
+117|                            }
+118|                            print("class    \tprecision\trecall  
\tf1-score\tnum_true_labels\n" + toString(error_measures, decimal=7, sep="\t"))
+119|                    }
+120|            }
+121|            # Compute validation loss & accuracy
+122|            if(iter  %% 500 == 0) {
+123|                    loss = 0
+124|                    accuracy = 0
+125|                    validation_loss = 0
+126|                    validation_accuracy = 0
+127|                    for(iVal in 1:num_iters_per_epoch) {
+128|                            beg = ((iVal-1) * BATCH_SIZE) %% 
num_validation + 1; end = min(beg + BATCH_SIZE - 1, num_validation); Xb = 
X_val[beg:end,]; yb = y_val[beg:end,];
+129|                            # Perform forward pass
+130|                            [out3,ignoreHout_3,ignoreWout_3] = 
conv2d_builtin::forward(Xb,conv1_weight,conv1_bias,1,28,28,5,5,1,1,2,2)
+131|                            out4 = relu::forward(out3)
+132|                            [out5,ignoreHout_5,ignoreWout_5] = 
max_pool2d_builtin::forward(out4,32,28,28,2,2,2,2,0,0)
+133|                            [out6,ignoreHout_6,ignoreWout_6] = 
conv2d_builtin::forward(out5,conv2_weight,conv2_bias,32,14,14,5,5,1,1,2,2)
+134|                            out7 = relu::forward(out6)
+135|                            [out8,ignoreHout_8,ignoreWout_8] = 
max_pool2d_builtin::forward(out7,64,14,14,2,2,2,2,0,0)
+136|                            out9 = 
affine::forward(out8,ip1_weight,ip1_bias)
+137|                            out10 = relu::forward(out9)
+138|                            [out11,mask11] = dropout::forward(out10,0.5,-1)
+139|                            out12 = 
affine::forward(out11,ip2_weight,ip2_bias)
+140|                            out13 = softmax::forward(out12)
+141|                            tmp_loss = 
cross_entropy_loss::forward(out13,yb)
+142|                            loss = loss + tmp_loss
+143|                            true_yb = rowIndexMax(yb)
+144|                            predicted_yb = rowIndexMax(out13)
+145|                            accuracy = mean(predicted_yb == true_yb)*100
+146|                            validation_loss = validation_loss + loss
+147|                            validation_accuracy = validation_accuracy + 
accuracy
+148|                    }
+149|                    validation_accuracy = validation_accuracy / 
num_iters_per_epoch
+150|                    print("Iter:" + iter + ", validation loss:" + 
validation_loss + ", validation accuracy:" + validation_accuracy)
+151|            }
+152|    }
+153|    # Learning rate
+154|    lr = (0.009999999776482582 * 0.949999988079071^e)
+155|}
+
+Iter:100, training loss:0.24014199350958168, training accuracy:87.5
+class           precision       recall          f1-score        num_true_labels
+1.0000000       1.0000000       1.0000000       1.0000000       3.0000000
+2.0000000       1.0000000       1.0000000       1.0000000       8.0000000
+3.0000000       0.8888889       0.8888889       0.8888889       9.0000000
+4.0000000       0.7500000       0.7500000       0.7500000       4.0000000
+5.0000000       0.7500000       1.0000000       0.8571429       3.0000000
+6.0000000       0.8333333       1.0000000       0.9090909       5.0000000
+7.0000000       1.0000000       1.0000000       1.0000000       8.0000000
+8.0000000       0.8571429       0.7500000       0.8000000       8.0000000
+9.0000000       1.0000000       0.5714286       0.7272727       7.0000000
+10.0000000      0.7272727       0.8888889       0.8000000       9.0000000
+
+Iter:200, training loss:0.09555593867171894, training accuracy:98.4375
+class           precision       recall          f1-score        num_true_labels
+1.0000000       1.0000000       1.0000000       1.0000000       10.0000000
+2.0000000       1.0000000       1.0000000       1.0000000       3.0000000
+3.0000000       1.0000000       1.0000000       1.0000000       9.0000000
+4.0000000       1.0000000       1.0000000       1.0000000       6.0000000
+5.0000000       1.0000000       1.0000000       1.0000000       7.0000000
+6.0000000       1.0000000       1.0000000       1.0000000       8.0000000
+7.0000000       1.0000000       0.6666667       0.8000000       3.0000000
+8.0000000       1.0000000       1.0000000       1.0000000       9.0000000
+9.0000000       0.8571429       1.0000000       0.9230769       6.0000000
+10.0000000      1.0000000       1.0000000       1.0000000       3.0000000
+
+Iter:300, training loss:0.058686794512570216, training accuracy:98.4375
+class           precision       recall          f1-score        num_true_labels
+1.0000000       1.0000000       1.0000000       1.0000000       6.0000000
+2.0000000       1.0000000       1.0000000       1.0000000       9.0000000
+3.0000000       1.0000000       1.0000000       1.0000000       4.0000000
+4.0000000       1.0000000       1.0000000       1.0000000       8.0000000
+5.0000000       1.0000000       1.0000000       1.0000000       6.0000000
+6.0000000       1.0000000       0.8750000       0.9333333       8.0000000
+7.0000000       1.0000000       1.0000000       1.0000000       5.0000000
+8.0000000       1.0000000       1.0000000       1.0000000       2.0000000
+9.0000000       0.8888889       1.0000000       0.9411765       8.0000000
+10.0000000      1.0000000       1.0000000       1.0000000       8.0000000
+
+Iter:400, training loss:0.08742103541529415, training accuracy:96.875
+class           precision       recall          f1-score        num_true_labels
+1.0000000       1.0000000       1.0000000       1.0000000       6.0000000
+2.0000000       0.8000000       1.0000000       0.8888889       8.0000000
+3.0000000       1.0000000       0.8333333       0.9090909       6.0000000
+4.0000000       1.0000000       1.0000000       1.0000000       4.0000000
+5.0000000       1.0000000       1.0000000       1.0000000       4.0000000
+6.0000000       1.0000000       1.0000000       1.0000000       6.0000000
+7.0000000       1.0000000       1.0000000       1.0000000       7.0000000
+8.0000000       1.0000000       1.0000000       1.0000000       6.0000000
+9.0000000       1.0000000       1.0000000       1.0000000       4.0000000
+10.0000000      1.0000000       0.9230769       0.9600000       13.0000000
+
+Iter:500, training loss:0.05873836245880005, training accuracy:98.4375
+class           precision       recall          f1-score        num_true_labels
+1.0000000       1.0000000       1.0000000       1.0000000       3.0000000
+2.0000000       1.0000000       1.0000000       1.0000000       5.0000000
+3.0000000       1.0000000       1.0000000       1.0000000       6.0000000
+4.0000000       1.0000000       1.0000000       1.0000000       9.0000000
+5.0000000       1.0000000       1.0000000       1.0000000       4.0000000
+6.0000000       1.0000000       0.8571429       0.9230769       7.0000000
+7.0000000       0.8571429       1.0000000       0.9230769       6.0000000
+8.0000000       1.0000000       1.0000000       1.0000000       9.0000000
+9.0000000       1.0000000       1.0000000       1.0000000       10.0000000
+10.0000000      1.0000000       1.0000000       1.0000000       5.0000000
+
+Iter:500, validation loss:260.1580978627665, validation 
accuracy:96.43954918032787
+Iter:600, training loss:0.07584116043829209, training accuracy:98.4375
+class           precision       recall          f1-score        num_true_labels
+1.0000000       1.0000000       1.0000000       1.0000000       8.0000000
+2.0000000       1.0000000       1.0000000       1.0000000       4.0000000
+3.0000000       1.0000000       1.0000000       1.0000000       4.0000000
+4.0000000       1.0000000       1.0000000       1.0000000       4.0000000
+5.0000000       1.0000000       1.0000000       1.0000000       5.0000000
+6.0000000       1.0000000       1.0000000       1.0000000       8.0000000
+7.0000000       1.0000000       1.0000000       1.0000000       8.0000000
+8.0000000       1.0000000       0.9230769       0.9600000       13.0000000
+9.0000000       1.0000000       1.0000000       1.0000000       5.0000000
+10.0000000      0.8333333       1.0000000       0.9090909       5.0000000
+
+Iter:700, training loss:0.07973166944626336, training accuracy:98.4375
+class           precision       recall          f1-score        num_true_labels
+1.0000000       1.0000000       1.0000000       1.0000000       5.0000000
+2.0000000       1.0000000       1.0000000       1.0000000       4.0000000
+3.0000000       1.0000000       1.0000000       1.0000000       6.0000000
+4.0000000       1.0000000       1.0000000       1.0000000       4.0000000
+5.0000000       1.0000000       1.0000000       1.0000000       5.0000000
+6.0000000       1.0000000       1.0000000       1.0000000       6.0000000
+7.0000000       1.0000000       1.0000000       1.0000000       10.0000000
+8.0000000       0.8000000       1.0000000       0.8888889       4.0000000
+9.0000000       1.0000000       1.0000000       1.0000000       8.0000000
+10.0000000      1.0000000       0.9166667       0.9565217       12.0000000
+
+Iter:800, training loss:0.0063778595034221855, training accuracy:100.0
+class           precision       recall          f1-score        num_true_labels
+1.0000000       1.0000000       1.0000000       1.0000000       9.0000000
+2.0000000       1.0000000       1.0000000       1.0000000       6.0000000
+3.0000000       1.0000000       1.0000000       1.0000000       7.0000000
+4.0000000       1.0000000       1.0000000       1.0000000       7.0000000
+5.0000000       1.0000000       1.0000000       1.0000000       4.0000000
+6.0000000       1.0000000       1.0000000       1.0000000       9.0000000
+7.0000000       1.0000000       1.0000000       1.0000000       6.0000000
+8.0000000       1.0000000       1.0000000       1.0000000       8.0000000
+9.0000000       1.0000000       1.0000000       1.0000000       2.0000000
+10.0000000      1.0000000       1.0000000       1.0000000       6.0000000
+
+Iter:900, training loss:0.019673112167879484, training accuracy:100.0
+class           precision       recall          f1-score        num_true_labels
+1.0000000       1.0000000       1.0000000       1.0000000       3.0000000
+2.0000000       1.0000000       1.0000000       1.0000000       4.0000000
+3.0000000       1.0000000       1.0000000       1.0000000       3.0000000
+4.0000000       1.0000000       1.0000000       1.0000000       5.0000000
+5.0000000       1.0000000       1.0000000       1.0000000       6.0000000
+6.0000000       1.0000000       1.0000000       1.0000000       10.0000000
+7.0000000       1.0000000       1.0000000       1.0000000       7.0000000
+8.0000000       1.0000000       1.0000000       1.0000000       7.0000000
+9.0000000       1.0000000       1.0000000       1.0000000       12.0000000
+10.0000000      1.0000000       1.0000000       1.0000000       7.0000000
+
+Iter:1000, training loss:0.06137978002508307, training accuracy:96.875
+class           precision       recall          f1-score        num_true_labels
+1.0000000       1.0000000       1.0000000       1.0000000       5.0000000
+2.0000000       1.0000000       1.0000000       1.0000000       7.0000000
+3.0000000       1.0000000       1.0000000       1.0000000       8.0000000
+4.0000000       0.8333333       0.8333333       0.8333333       6.0000000
+5.0000000       1.0000000       1.0000000       1.0000000       5.0000000
+6.0000000       1.0000000       1.0000000       1.0000000       10.0000000
+7.0000000       1.0000000       1.0000000       1.0000000       3.0000000
+8.0000000       0.8888889       0.8888889       0.8888889       9.0000000
+9.0000000       1.0000000       1.0000000       1.0000000       7.0000000
+10.0000000      1.0000000       1.0000000       1.0000000       4.0000000
+
+Iter:1000, validation loss:238.62301345198944, validation 
accuracy:97.02868852459017
+Iter:1100, training loss:0.023325103696013115, training accuracy:100.0
+class           precision       recall          f1-score        num_true_labels
+1.0000000       1.0000000       1.0000000       1.0000000       4.0000000
+2.0000000       1.0000000       1.0000000       1.0000000       10.0000000
+3.0000000       1.0000000       1.0000000       1.0000000       6.0000000
+4.0000000       1.0000000       1.0000000       1.0000000       4.0000000
+5.0000000       1.0000000       1.0000000       1.0000000       2.0000000
+6.0000000       1.0000000       1.0000000       1.0000000       10.0000000
+7.0000000       1.0000000       1.0000000       1.0000000       7.0000000
+8.0000000       1.0000000       1.0000000       1.0000000       6.0000000
+9.0000000       1.0000000       1.0000000       1.0000000       9.0000000
+10.0000000      1.0000000       1.0000000       1.0000000       6.0000000
+...
 ```

http://git-wip-us.apache.org/repos/asf/systemml/blob/0ff26740/python-reference.md
----------------------------------------------------------------------
diff --git a/python-reference.md b/python-reference.md
index 7de3fb0..119c1d0 100644
--- a/python-reference.md
+++ b/python-reference.md
@@ -406,6 +406,10 @@ model.transform(df_test)
 </div>
 </div>
 
+Please note that when training using mllearn API (i.e. `model.fit(X_df)`), 
SystemML 
+expects that labels have been converted to 1-based value.
+This avoids unnecessary decoding overhead for large dataset if the label 
columns has already been decoded.
+For scikit-learn API, there is no such requirement.
 
 The table below describes the parameter available for mllearn algorithms:
 

Reply via email to