[SYSTEMML-540] Added optimizer support in Keras2DML - Also, updated the documentation. - Added a controlled error when batch size is not multiple of training data points in lstm. - Added perform_one_hot_encoding flag to deal with non-label data. - Bug fix for EuclideanLoss layer in Caffe2DML. - Added regularization support in Caffe2DML.
Project: http://git-wip-us.apache.org/repos/asf/systemml/repo Commit: http://git-wip-us.apache.org/repos/asf/systemml/commit/573427fb Tree: http://git-wip-us.apache.org/repos/asf/systemml/tree/573427fb Diff: http://git-wip-us.apache.org/repos/asf/systemml/diff/573427fb Branch: refs/heads/gh-pages Commit: 573427fb5a36264699806e1d18b6301c26df7ecb Parents: d4b723e Author: Niketan Pansare <[email protected]> Authored: Thu Jan 11 15:14:25 2018 -0800 Committer: Niketan Pansare <[email protected]> Committed: Thu Jan 11 15:18:21 2018 -0800 ---------------------------------------------------------------------- beginners-guide-keras2dml.md | 82 +++++++++++++++++++++++++++++++++++++-- reference-guide-caffe2dml.md | 29 +++++++++++++- 2 files changed, 106 insertions(+), 5 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/systemml/blob/573427fb/beginners-guide-keras2dml.md ---------------------------------------------------------------------- diff --git a/beginners-guide-keras2dml.md b/beginners-guide-keras2dml.md index fd2af87..c99334e 100644 --- a/beginners-guide-keras2dml.md +++ b/beginners-guide-keras2dml.md @@ -53,10 +53,84 @@ from systemml.mllearn import Keras2DML import keras from keras.applications.resnet50 import preprocess_input, decode_predictions, ResNet50 -model = ResNet50(weights='imagenet',include_top=True,pooling='None',input_shape=(224,224,3)) -model.compile(optimizer='sgd', loss= 'categorical_crossentropy') +keras_model = ResNet50(weights='imagenet',include_top=True,pooling='None',input_shape=(224,224,3)) +keras_model.compile(optimizer='sgd', loss= 'categorical_crossentropy') -resnet = Keras2DML(spark,model,input_shape=(3,224,224)) -resnet.summary() +sysml_model = Keras2DML(spark, keras_model,input_shape=(3,224,224)) +sysml_model.summary() ``` +# Frequently asked questions + +#### What is the mapping between Keras' parameters and Caffe's solver specification ? + +| | Specified via the given parameter in the Keras2DML constructor | From input Keras' model | Corresponding parameter in the Caffe solver file | +|--------------------------------------------------------|----------------------------------------------------------------|-----------------------------------------------------------------------------------------|--------------------------------------------------| +| Solver type | | `type(keras_model.optimizer)`. Supported types: `keras.optimizers.{SGD, Adagrad, Adam}` | `type` | +| Maximum number of iterations | `max_iter` | The `epoch` parameter in the `fit` method is not supported. | `max_iter` | +| Validation dataset | `test_iter` (explained in the below section) | The `validation_data` parameter in the `fit` method is not supported. | `test_iter` | +| Monitoring the loss | `display, test_interval` (explained in the below section) | The `LossHistory` callback in the `fit` method is not supported. | `display, test_interval` | +| Learning rate schedule | `lr_policy` | The `LearningRateScheduler` callback in the `fit` method is not supported. | `lr_policy` (default: step) | +| Base learning rate | | `keras_model.optimizer.lr` | `base_lr` | +| Learning rate decay over each update | | `keras_model.optimizer.decay` | `gamma` | +| Global regularizer to use for all layers | `regularization_type,weight_decay` | The current version of Keras2DML doesnot support custom regularizers per layer. | `regularization_type,weight_decay` | +| If type of the optimizer is `keras.optimizers.SGD` | | `momentum, nesterov` | `momentum, type` | +| If type of the optimizer is `keras.optimizers.Adam` | | `beta_1, beta_2, epsilon`. The parameter `amsgrad` is not supported. | `momentum, momentum2, delta` | +| If type of the optimizer is `keras.optimizers.Adagrad` | | `epsilon` | `delta` | + +#### How do I specify the batch size and the number of epochs ? + +Since Keras2DML is a mllearn API, it doesnot accept the batch size and number of epochs as the parameter in the `fit` method. +Instead, these parameters are passed via `batch_size` and `max_iter` parameters in the Keras2DML constructor. +For example, the equivalent Python code for `keras_model.fit(features, labels, epochs=10, batch_size=64)` is as follows: + +```python +from systemml.mllearn import Keras2DML +epochs = 10 +batch_size = 64 +num_samples = features.shape[0] +max_iter = int(epochs*math.ceil(num_samples/batch_size)) +sysml_model = Keras2DML(spark, keras_model, batch_size=batch_size, max_iter=max_iter, ...) +sysml_model.fit(features, labels) +``` + +#### What optimizer and loss does Keras2DML use by default if `keras_model` is not compiled ? + +If the user does not `compile` the keras model, then we use cross entropy loss and SGD optimizer with nesterov momentum: + +```python +keras_model.compile(loss='categorical_crossentropy', optimizer=keras.optimizers.SGD(lr=0.01, momentum=0.95, decay=5e-4, nesterov=True)) +``` + +#### What is the learning rate schedule used ? + +Keras2DML does not support the `LearningRateScheduler` callback. +Instead one can set the custom learning rate schedule to one of the following schedules by using the `lr_policy` parameter of the constructor: +- `step`: return `base_lr * gamma ^ (floor(iter / step))` (default schedule) +- `fixed`: always return `base_lr`. +- `exp`: return `base_lr * gamma ^ iter` +- `inv`: return `base_lr * (1 + gamma * iter) ^ (- power)` +- `poly`: the effective learning rate follows a polynomial decay, to be zero by the max_iter. return `base_lr (1 - iter/max_iter) ^ (power)` +- `sigmoid`: the effective learning rate follows a sigmod decay return b`ase_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))` + +#### How to set the size of the validation dataset ? + +The size of the validation dataset is determined by the parameters `test_iter` and the batch size. For example: If the batch size is 64 and +`test_iter` is set to 10 in the `Keras2DML`'s constructor, then the validation size is 640. This setting generates following DML code internally: + +```python +num_images = nrow(y_full) +BATCH_SIZE = 64 +num_validation = 10 * BATCH_SIZE +X = X_full[(num_validation+1):num_images,]; y = y_full[(num_validation+1):num_images,] +X_val = X_full[1:num_validation,]; y_val = y_full[1:num_validation,] +num_images = nrow(y) +``` + +#### How to monitor loss via command-line ? + +To monitor loss, please set the parameters `display`, `test_iter` and `test_interval` in the `Keras2DML`'s constructor. +For example: for the expression `Keras2DML(..., display=100, test_iter=10, test_interval=500)`, we +- display the training loss and accuracy every 100 iterations and +- carry out validation every 500 training iterations and display validation loss and accuracy. + http://git-wip-us.apache.org/repos/asf/systemml/blob/573427fb/reference-guide-caffe2dml.md ---------------------------------------------------------------------- diff --git a/reference-guide-caffe2dml.md b/reference-guide-caffe2dml.md index be8c078..0e191dd 100644 --- a/reference-guide-caffe2dml.md +++ b/reference-guide-caffe2dml.md @@ -578,7 +578,34 @@ The parameter `lr_policy` specifies the learning rate decay policy. Caffe2DML su - `inv`: return `base_lr * (1 + gamma * iter) ^ (- power)` - `poly`: the effective learning rate follows a polynomial decay, to be zero by the max_iter. return `base_lr (1 - iter/max_iter) ^ (power)` - `sigmoid`: the effective learning rate follows a sigmod decay return b`ase_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))` - + + +The parameters `base_lr` and `lr_policy` are required and other parameters are optional: +``` +lr_policy: "step" # learning rate policy: drop the learning rate in "steps" + # by a factor of gamma every stepsize iterations (required) +base_lr: 0.01 # begin training at a learning rate of 0.01 (required) +gamma: 0.95 # drop the learning rate by the given factor (optional, default value: 0.95) +stepsize: 100000 # drop the learning rate every 100K iterations (optional, default value: 100000) +power: 0.75 # (optional, default value: 0.75) +``` + +#### How do I regularize weight matrices in the neural network ? + +The user can specify the type of regularization using the parameter `regularization_type` in the solver file. +The valid values are `L2` (default) and `L1`. +Caffe2DML then invokes the backward function of the layers `nn/layers/l2_reg.dml` and `nn/layers/l1_reg.dml` respectively. +The regularation strength is set using the property `weight_decay` in the solver file: +``` +regularization_type: "L2" +weight_decay: 5e-4 +``` + +Like learning rate, you can customize the regularation strength of a given layer by specifying the property `decay_mult` in the network file: +``` +param { lr_mult: 1 decay_mult: 1 } +``` + #### How to set batch size ? Batch size is set in `data_param` of the Data layer:
