[SYSTEMML-540] Added optimizer support in Keras2DML

- Also, updated the documentation.
- Added a controlled error when batch size is not multiple of training
  data points in lstm.
- Added perform_one_hot_encoding flag to deal with non-label data.
- Bug fix for EuclideanLoss layer in Caffe2DML.
- Added regularization support in Caffe2DML.


Project: http://git-wip-us.apache.org/repos/asf/systemml/repo
Commit: http://git-wip-us.apache.org/repos/asf/systemml/commit/573427fb
Tree: http://git-wip-us.apache.org/repos/asf/systemml/tree/573427fb
Diff: http://git-wip-us.apache.org/repos/asf/systemml/diff/573427fb

Branch: refs/heads/gh-pages
Commit: 573427fb5a36264699806e1d18b6301c26df7ecb
Parents: d4b723e
Author: Niketan Pansare <[email protected]>
Authored: Thu Jan 11 15:14:25 2018 -0800
Committer: Niketan Pansare <[email protected]>
Committed: Thu Jan 11 15:18:21 2018 -0800

----------------------------------------------------------------------
 beginners-guide-keras2dml.md | 82 +++++++++++++++++++++++++++++++++++++--
 reference-guide-caffe2dml.md | 29 +++++++++++++-
 2 files changed, 106 insertions(+), 5 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/systemml/blob/573427fb/beginners-guide-keras2dml.md
----------------------------------------------------------------------
diff --git a/beginners-guide-keras2dml.md b/beginners-guide-keras2dml.md
index fd2af87..c99334e 100644
--- a/beginners-guide-keras2dml.md
+++ b/beginners-guide-keras2dml.md
@@ -53,10 +53,84 @@ from systemml.mllearn import Keras2DML
 import keras
 from keras.applications.resnet50 import preprocess_input, decode_predictions, 
ResNet50
 
-model = 
ResNet50(weights='imagenet',include_top=True,pooling='None',input_shape=(224,224,3))
-model.compile(optimizer='sgd', loss= 'categorical_crossentropy')
+keras_model = 
ResNet50(weights='imagenet',include_top=True,pooling='None',input_shape=(224,224,3))
+keras_model.compile(optimizer='sgd', loss= 'categorical_crossentropy')
 
-resnet = Keras2DML(spark,model,input_shape=(3,224,224))
-resnet.summary()
+sysml_model = Keras2DML(spark, keras_model,input_shape=(3,224,224))
+sysml_model.summary()
 ```
 
+# Frequently asked questions
+
+#### What is the mapping between Keras' parameters and Caffe's solver 
specification ? 
+
+|                                                        | Specified via the 
given parameter in the Keras2DML constructor | From input Keras' model          
                                                       | Corresponding 
parameter in the Caffe solver file |
+|--------------------------------------------------------|----------------------------------------------------------------|-----------------------------------------------------------------------------------------|--------------------------------------------------|
+| Solver type                                            |                     
                                           | `type(keras_model.optimizer)`. 
Supported types: `keras.optimizers.{SGD, Adagrad, Adam}` | `type`               
                            |
+| Maximum number of iterations                           | `max_iter`          
                                           | The `epoch` parameter in the `fit` 
method is not supported.                             | `max_iter`               
                        |
+| Validation dataset                                     | `test_iter` 
(explained in the below section)                   | The `validation_data` 
parameter in the `fit` method is not supported.                   | `test_iter` 
                                     |
+| Monitoring the loss                                    | `display, 
test_interval` (explained in the below section)      | The `LossHistory` 
callback in the `fit` method is not supported.                        | 
`display, test_interval`                         |
+| Learning rate schedule                                 | `lr_policy`         
                                           | The `LearningRateScheduler` 
callback in the `fit` method is not supported.              | `lr_policy` 
(default: step)                      |
+| Base learning rate                                     |                     
                                           | `keras_model.optimizer.lr`         
                                                     | `base_lr`                
                        |
+| Learning rate decay over each update                   |                     
                                           | `keras_model.optimizer.decay`      
                                                     | `gamma`                  
                        |
+| Global regularizer to use for all layers               | 
`regularization_type,weight_decay`                             | The current 
version of Keras2DML doesnot support custom regularizers per layer.         | 
`regularization_type,weight_decay`               |
+| If type of the optimizer is `keras.optimizers.SGD`     |                     
                                           | `momentum, nesterov`               
                                                     | `momentum, type`         
                        |
+| If type of the optimizer is `keras.optimizers.Adam`    |                     
                                           | `beta_1, beta_2, epsilon`. The 
parameter `amsgrad` is not supported.                    | `momentum, 
momentum2, delta`                     |
+| If type of the optimizer is `keras.optimizers.Adagrad` |                     
                                           | `epsilon`                          
                                                     | `delta`                  
                        |
+
+#### How do I specify the batch size and the number of epochs ?
+
+Since Keras2DML is a mllearn API, it doesnot accept the batch size and number 
of epochs as the parameter in the `fit` method.
+Instead, these parameters are passed via `batch_size` and `max_iter` 
parameters in the Keras2DML constructor.
+For example, the equivalent Python code for `keras_model.fit(features, labels, 
epochs=10, batch_size=64)` is as follows:
+
+```python
+from systemml.mllearn import Keras2DML
+epochs = 10
+batch_size = 64
+num_samples = features.shape[0]
+max_iter = int(epochs*math.ceil(num_samples/batch_size))
+sysml_model = Keras2DML(spark, keras_model, batch_size=batch_size, 
max_iter=max_iter, ...)
+sysml_model.fit(features, labels)
+``` 
+
+#### What optimizer and loss does Keras2DML use by default if `keras_model` is 
not compiled ?
+
+If the user does not `compile` the keras model, then we use cross entropy loss 
and SGD optimizer with nesterov momentum:
+
+```python 
+keras_model.compile(loss='categorical_crossentropy', 
optimizer=keras.optimizers.SGD(lr=0.01, momentum=0.95, decay=5e-4, 
nesterov=True))
+```
+
+#### What is the learning rate schedule used ?
+
+Keras2DML does not support the `LearningRateScheduler` callback. 
+Instead one can set the custom learning rate schedule to one of the following 
schedules by using the `lr_policy` parameter of the constructor:
+- `step`: return `base_lr * gamma ^ (floor(iter / step))` (default schedule)
+- `fixed`: always return `base_lr`.
+- `exp`: return `base_lr * gamma ^ iter`
+- `inv`: return `base_lr * (1 + gamma * iter) ^ (- power)`
+- `poly`: the effective learning rate follows a polynomial decay, to be zero 
by the max_iter. return `base_lr (1 - iter/max_iter) ^ (power)`
+- `sigmoid`: the effective learning rate follows a sigmod decay return 
b`ase_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))`
+
+#### How to set the size of the validation dataset ?
+
+The size of the validation dataset is determined by the parameters `test_iter` 
and the batch size. For example: If the batch size is 64 and 
+`test_iter` is set to 10 in the `Keras2DML`'s constructor, then the validation 
size is 640. This setting generates following DML code internally:
+
+```python
+num_images = nrow(y_full)
+BATCH_SIZE = 64
+num_validation = 10 * BATCH_SIZE
+X = X_full[(num_validation+1):num_images,]; y = 
y_full[(num_validation+1):num_images,]
+X_val = X_full[1:num_validation,]; y_val = y_full[1:num_validation,]
+num_images = nrow(y)
+``` 
+
+#### How to monitor loss via command-line ?
+
+To monitor loss, please set the parameters `display`, `test_iter` and 
`test_interval` in the `Keras2DML`'s constructor.  
+For example: for the expression `Keras2DML(..., display=100, test_iter=10, 
test_interval=500)`, we
+- display the training loss and accuracy every 100 iterations and
+- carry out validation every 500 training iterations and display validation 
loss and accuracy.
+

http://git-wip-us.apache.org/repos/asf/systemml/blob/573427fb/reference-guide-caffe2dml.md
----------------------------------------------------------------------
diff --git a/reference-guide-caffe2dml.md b/reference-guide-caffe2dml.md
index be8c078..0e191dd 100644
--- a/reference-guide-caffe2dml.md
+++ b/reference-guide-caffe2dml.md
@@ -578,7 +578,34 @@ The parameter `lr_policy` specifies the learning rate 
decay policy. Caffe2DML su
 - `inv`: return `base_lr * (1 + gamma * iter) ^ (- power)`
 - `poly`: the effective learning rate follows a polynomial decay, to be zero 
by the max_iter. return `base_lr (1 - iter/max_iter) ^ (power)`
 - `sigmoid`: the effective learning rate follows a sigmod decay return 
b`ase_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))`
-      
+
+
+The parameters `base_lr` and  `lr_policy` are required and other parameters 
are optional:
+```
+lr_policy: "step" # learning rate policy: drop the learning rate in "steps"
+                  # by a factor of gamma every stepsize iterations (required)
+base_lr: 0.01     # begin training at a learning rate of 0.01 (required)
+gamma: 0.95       # drop the learning rate by the given factor (optional, 
default value: 0.95)
+stepsize: 100000  # drop the learning rate every 100K iterations (optional, 
default value: 100000)
+power: 0.75       # (optional, default value: 0.75)
+``` 
+
+#### How do I regularize weight matrices in the neural network ?
+
+The user can specify the type of regularization using the parameter 
`regularization_type` in the solver file.
+The valid values are `L2` (default) and `L1`.
+Caffe2DML then invokes the backward function of the layers 
`nn/layers/l2_reg.dml` and `nn/layers/l1_reg.dml` respectively.
+The regularation strength is set using the property `weight_decay` in the 
solver file:
+```
+regularization_type: "L2"
+weight_decay: 5e-4
+```
+
+Like learning rate, you can customize the regularation strength of a given 
layer by specifying the property `decay_mult` in the network file:
+```
+param { lr_mult: 1 decay_mult: 1 }
+```  
+
 #### How to set batch size ?
 
 Batch size is set in `data_param` of the Data layer:

Reply via email to