systemml git commit: [SYSTEMML-540] Added optimizer support in Keras2DML

niketanpansare Thu, 11 Jan 2018 15:19:27 -0800

Repository: systemml
Updated Branches:
  refs/heads/master 45eec2d25 -> 9dc354ac2



[SYSTEMML-540] Added optimizer support in Keras2DML

- Also, updated the documentation.
- Added a controlled error when batch size is not multiple of training
  data points in lstm.
- Added perform_one_hot_encoding flag to deal with non-label data.
- Bug fix for EuclideanLoss layer in Caffe2DML.
- Added regularization support in Caffe2DML.


Project: http://git-wip-us.apache.org/repos/asf/systemml/repo
Commit: http://git-wip-us.apache.org/repos/asf/systemml/commit/9dc354ac
Tree: http://git-wip-us.apache.org/repos/asf/systemml/tree/9dc354ac
Diff: http://git-wip-us.apache.org/repos/asf/systemml/diff/9dc354ac

Branch: refs/heads/master
Commit: 9dc354ac2f1ca0378a1ec76317d3df72cfb6f380
Parents: 45eec2d
Author: Niketan Pansare <[email protected]>
Authored: Thu Jan 11 15:14:25 2018 -0800
Committer: Niketan Pansare <[email protected]>
Committed: Thu Jan 11 15:18:21 2018 -0800

----------------------------------------------------------------------
 docs/beginners-guide-keras2dml.md               |  82 ++++++++++++-
 docs/reference-guide-caffe2dml.md               |  29 ++++-
 scripts/nn/layers/lstm.dml                      |   9 +-
 src/main/python/systemml/mllearn/estimators.py  |  28 +++--
 src/main/python/systemml/mllearn/keras2caffe.py | 117 +++++++++++++++----
 .../org/apache/sysml/api/dl/Caffe2DML.scala     |  36 ++++--
 .../org/apache/sysml/api/dl/CaffeLayer.scala    |  96 +++++++++++----
 .../org/apache/sysml/api/dl/CaffeSolver.scala   |  95 +++++++++++++--
 .../org/apache/sysml/api/dl/DMLGenerator.scala  |   9 +-
 .../scala/org/apache/sysml/api/dl/Utils.scala   |  10 +-
 .../sysml/api/ml/BaseSystemMLClassifier.scala   |   2 +-
 11 files changed, 428 insertions(+), 85 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/systemml/blob/9dc354ac/docs/beginners-guide-keras2dml.md
----------------------------------------------------------------------
diff --git a/docs/beginners-guide-keras2dml.md 
b/docs/beginners-guide-keras2dml.md
index fd2af87..c99334e 100644
--- a/docs/beginners-guide-keras2dml.md
+++ b/docs/beginners-guide-keras2dml.md
@@ -53,10 +53,84 @@ from systemml.mllearn import Keras2DML
 import keras
 from keras.applications.resnet50 import preprocess_input, decode_predictions, 
ResNet50
 
-model = 
ResNet50(weights='imagenet',include_top=True,pooling='None',input_shape=(224,224,3))
-model.compile(optimizer='sgd', loss= 'categorical_crossentropy')
+keras_model = 
ResNet50(weights='imagenet',include_top=True,pooling='None',input_shape=(224,224,3))
+keras_model.compile(optimizer='sgd', loss= 'categorical_crossentropy')
 
-resnet = Keras2DML(spark,model,input_shape=(3,224,224))
-resnet.summary()
+sysml_model = Keras2DML(spark, keras_model,input_shape=(3,224,224))
+sysml_model.summary()
 ```
 
+# Frequently asked questions
+
+#### What is the mapping between Keras' parameters and Caffe's solver 
specification ? 
+
+|                                                        | Specified via the 
given parameter in the Keras2DML constructor | From input Keras' model          
                                                       | Corresponding 
parameter in the Caffe solver file |
+|--------------------------------------------------------|----------------------------------------------------------------|-----------------------------------------------------------------------------------------|--------------------------------------------------|
+| Solver type                                            |                     
                                           | `type(keras_model.optimizer)`. 
Supported types: `keras.optimizers.{SGD, Adagrad, Adam}` | `type`               
                            |
+| Maximum number of iterations                           | `max_iter`          
                                           | The `epoch` parameter in the `fit` 
method is not supported.                             | `max_iter`               
                        |
+| Validation dataset                                     | `test_iter` 
(explained in the below section)                   | The `validation_data` 
parameter in the `fit` method is not supported.                   | `test_iter` 
                                     |
+| Monitoring the loss                                    | `display, 
test_interval` (explained in the below section)      | The `LossHistory` 
callback in the `fit` method is not supported.                        | 
`display, test_interval`                         |
+| Learning rate schedule                                 | `lr_policy`         
                                           | The `LearningRateScheduler` 
callback in the `fit` method is not supported.              | `lr_policy` 
(default: step)                      |
+| Base learning rate                                     |                     
                                           | `keras_model.optimizer.lr`         
                                                     | `base_lr`                
                        |
+| Learning rate decay over each update                   |                     
                                           | `keras_model.optimizer.decay`      
                                                     | `gamma`                  
                        |
+| Global regularizer to use for all layers               | 
`regularization_type,weight_decay`                             | The current 
version of Keras2DML doesnot support custom regularizers per layer.         | 
`regularization_type,weight_decay`               |
+| If type of the optimizer is `keras.optimizers.SGD`     |                     
                                           | `momentum, nesterov`               
                                                     | `momentum, type`         
                        |
+| If type of the optimizer is `keras.optimizers.Adam`    |                     
                                           | `beta_1, beta_2, epsilon`. The 
parameter `amsgrad` is not supported.                    | `momentum, 
momentum2, delta`                     |
+| If type of the optimizer is `keras.optimizers.Adagrad` |                     
                                           | `epsilon`                          
                                                     | `delta`                  
                        |
+
+#### How do I specify the batch size and the number of epochs ?
+
+Since Keras2DML is a mllearn API, it doesnot accept the batch size and number 
of epochs as the parameter in the `fit` method.
+Instead, these parameters are passed via `batch_size` and `max_iter` 
parameters in the Keras2DML constructor.
+For example, the equivalent Python code for `keras_model.fit(features, labels, 
epochs=10, batch_size=64)` is as follows:
+
+```python
+from systemml.mllearn import Keras2DML
+epochs = 10
+batch_size = 64
+num_samples = features.shape[0]
+max_iter = int(epochs*math.ceil(num_samples/batch_size))
+sysml_model = Keras2DML(spark, keras_model, batch_size=batch_size, 
max_iter=max_iter, ...)
+sysml_model.fit(features, labels)
+``` 
+
+#### What optimizer and loss does Keras2DML use by default if `keras_model` is 
not compiled ?
+
+If the user does not `compile` the keras model, then we use cross entropy loss 
and SGD optimizer with nesterov momentum:
+
+```python 
+keras_model.compile(loss='categorical_crossentropy', 
optimizer=keras.optimizers.SGD(lr=0.01, momentum=0.95, decay=5e-4, 
nesterov=True))
+```
+
+#### What is the learning rate schedule used ?
+
+Keras2DML does not support the `LearningRateScheduler` callback. 
+Instead one can set the custom learning rate schedule to one of the following 
schedules by using the `lr_policy` parameter of the constructor:
+- `step`: return `base_lr * gamma ^ (floor(iter / step))` (default schedule)
+- `fixed`: always return `base_lr`.
+- `exp`: return `base_lr * gamma ^ iter`
+- `inv`: return `base_lr * (1 + gamma * iter) ^ (- power)`
+- `poly`: the effective learning rate follows a polynomial decay, to be zero 
by the max_iter. return `base_lr (1 - iter/max_iter) ^ (power)`
+- `sigmoid`: the effective learning rate follows a sigmod decay return 
b`ase_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))`
+
+#### How to set the size of the validation dataset ?
+
+The size of the validation dataset is determined by the parameters `test_iter` 
and the batch size. For example: If the batch size is 64 and 
+`test_iter` is set to 10 in the `Keras2DML`'s constructor, then the validation 
size is 640. This setting generates following DML code internally:
+
+```python
+num_images = nrow(y_full)
+BATCH_SIZE = 64
+num_validation = 10 * BATCH_SIZE
+X = X_full[(num_validation+1):num_images,]; y = 
y_full[(num_validation+1):num_images,]
+X_val = X_full[1:num_validation,]; y_val = y_full[1:num_validation,]
+num_images = nrow(y)
+``` 
+
+#### How to monitor loss via command-line ?
+
+To monitor loss, please set the parameters `display`, `test_iter` and 
`test_interval` in the `Keras2DML`'s constructor.  
+For example: for the expression `Keras2DML(..., display=100, test_iter=10, 
test_interval=500)`, we
+- display the training loss and accuracy every 100 iterations and
+- carry out validation every 500 training iterations and display validation 
loss and accuracy.
+

http://git-wip-us.apache.org/repos/asf/systemml/blob/9dc354ac/docs/reference-guide-caffe2dml.md
----------------------------------------------------------------------
diff --git a/docs/reference-guide-caffe2dml.md 
b/docs/reference-guide-caffe2dml.md
index be8c078..0e191dd 100644
--- a/docs/reference-guide-caffe2dml.md
+++ b/docs/reference-guide-caffe2dml.md
@@ -578,7 +578,34 @@ The parameter `lr_policy` specifies the learning rate 
decay policy. Caffe2DML su
 - `inv`: return `base_lr * (1 + gamma * iter) ^ (- power)`
 - `poly`: the effective learning rate follows a polynomial decay, to be zero 
by the max_iter. return `base_lr (1 - iter/max_iter) ^ (power)`
 - `sigmoid`: the effective learning rate follows a sigmod decay return 
b`ase_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))`
-      
+
+
+The parameters `base_lr` and  `lr_policy` are required and other parameters 
are optional:
+```
+lr_policy: "step" # learning rate policy: drop the learning rate in "steps"
+                  # by a factor of gamma every stepsize iterations (required)
+base_lr: 0.01     # begin training at a learning rate of 0.01 (required)
+gamma: 0.95       # drop the learning rate by the given factor (optional, 
default value: 0.95)
+stepsize: 100000  # drop the learning rate every 100K iterations (optional, 
default value: 100000)
+power: 0.75       # (optional, default value: 0.75)
+``` 
+
+#### How do I regularize weight matrices in the neural network ?
+
+The user can specify the type of regularization using the parameter 
`regularization_type` in the solver file.
+The valid values are `L2` (default) and `L1`.
+Caffe2DML then invokes the backward function of the layers 
`nn/layers/l2_reg.dml` and `nn/layers/l1_reg.dml` respectively.
+The regularation strength is set using the property `weight_decay` in the 
solver file:
+```
+regularization_type: "L2"
+weight_decay: 5e-4
+```
+
+Like learning rate, you can customize the regularation strength of a given 
layer by specifying the property `decay_mult` in the network file:
+```
+param { lr_mult: 1 decay_mult: 1 }
+```  
+
 #### How to set batch size ?
 
 Batch size is set in `data_param` of the Data layer:

http://git-wip-us.apache.org/repos/asf/systemml/blob/9dc354ac/scripts/nn/layers/lstm.dml
----------------------------------------------------------------------
diff --git a/scripts/nn/layers/lstm.dml b/scripts/nn/layers/lstm.dml
index 696ed88..664a1e2 100644
--- a/scripts/nn/layers/lstm.dml
+++ b/scripts/nn/layers/lstm.dml
@@ -168,10 +168,11 @@ backward = function(matrix[double] dout, matrix[double] 
dc,
   N = nrow(X)
   M = as.integer(ncol(W)/4)
   N1 = nrow(out0)
-  if(N < N1) {
-    # Allow for smaller out0 for last batch
-    out0 = out0[1:N,]
-    c0 = c0[1:N,]
+  if(N != N1) {
+    # Allow for smaller out0 for last batch 
+    # out0 = out0[1:N,]
+    # c0 = c0[1:N,]
+    stop("Unsupported operation: The batch size of previous iteration " + N1 + 
" is different than the batch size of current iteration " + N)
   }
   dX = matrix(0, rows=N, cols=T*D)
   dW = matrix(0, rows=D+M, cols=4*M)

http://git-wip-us.apache.org/repos/asf/systemml/blob/9dc354ac/src/main/python/systemml/mllearn/estimators.py
----------------------------------------------------------------------
diff --git a/src/main/python/systemml/mllearn/estimators.py 
b/src/main/python/systemml/mllearn/estimators.py
index f1e793b..72a6f55 100644
--- a/src/main/python/systemml/mllearn/estimators.py
+++ b/src/main/python/systemml/mllearn/estimators.py
@@ -213,11 +213,13 @@ class BaseSystemMLEstimator(Estimator):
         if y is None:
             return self._fit(X)
         elif y is not None and isinstance(X, SUPPORTED_TYPES) and 
isinstance(y, SUPPORTED_TYPES):
-            y = self.encode(y)
+            # Donot encode if y is a numpy matrix => useful for segmentation
+            skipEncodingY = len(y.shape) == 2 and y.shape[0] != 1 and 
y.shape[1] != 1
+            y = y if skipEncodingY else self.encode(y)
             if self.transferUsingDF:
                 pdfX = convertToPandasDF(X)
                 pdfY = convertToPandasDF(y)
-                if getNumCols(pdfY) != 1:
+                if getNumCols(pdfY) != 1 and not skipEncodingY:
                     raise Exception('y should be a column vector')
                 if pdfX.shape[0] != pdfY.shape[0]:
                     raise Exception('Number of rows of X and y should match')
@@ -227,7 +229,7 @@ class BaseSystemMLEstimator(Estimator):
                 self.fit_df(df)
             else:
                 numColsy = getNumCols(y)
-                if numColsy != 1:
+                if numColsy != 1 and not skipEncodingY:
                     raise Exception('Expected y to be a column vector')
                 self.fit_numpy(X, y)
             if self.setOutputRawPredictionsToFalse:
@@ -842,7 +844,7 @@ class Caffe2DML(BaseSystemMLClassifier):
         if ignore_weights is not None:
             self.estimator.setWeightsToIgnore(ignore_weights)
             
-    def set(self, debug=None, train_algo=None, test_algo=None, 
parallel_batches=None, output_activations=None):
+    def set(self, debug=None, train_algo=None, test_algo=None, 
parallel_batches=None, output_activations=None, perform_one_hot_encoding=None):
         """
         Set input to Caffe2DML
         
@@ -853,12 +855,14 @@ class Caffe2DML(BaseSystemMLClassifier):
         test_algo: can be minibatch, batch, allreduce_parallel_batches or 
allreduce (default: minibatch)
         parallel_batches: number of parallel batches
         output_activations: (developer flag) directory to output activations 
of each layer as csv while prediction. To be used only in batch mode (default: 
None)
+        perform_one_hot_encoding: should perform one-hot encoding in DML using 
table function (default: False)
         """
         if debug is not None: self.estimator.setInput("$debug", 
str(debug).upper())
         if train_algo is not None: self.estimator.setInput("$train_algo", 
str(train_algo).lower())
         if test_algo is not None: self.estimator.setInput("$test_algo", 
str(test_algo).lower())
         if parallel_batches is not None: 
self.estimator.setInput("$parallel_batches", str(parallel_batches))
         if output_activations is not None: 
self.estimator.setInput("$output_activations", str(output_activations))
+        if perform_one_hot_encoding is not None: 
self.estimator.setInput("$perform_one_hot_encoding", 
str(perform_one_hot_encoding).lower())
         return self
     
     def summary(self):
@@ -884,7 +888,7 @@ class Keras2DML(Caffe2DML):
 
     """
 
-    def __init__(self, sparkSession, keras_model, input_shape, 
transferUsingDF=False, weights=None, labels=None):
+    def __init__(self, sparkSession, keras_model, input_shape, 
transferUsingDF=False, weights=None, labels=None, batch_size=64, max_iter=2000, 
test_iter=10, test_interval=500, display=100, lr_policy="step", 
weight_decay=5e-4, regularization_type="L2"):
         """
         Performs training/prediction for a given keras model.
 
@@ -896,6 +900,14 @@ class Keras2DML(Caffe2DML):
         transferUsingDF: whether to pass the input dataset via PySpark 
DataFrame (default: False)
         weights: directory whether learned weights are stored (default: None)
         labels: file containing mapping between index and string labels 
(default: None)
+        batch_size: size of the input batch (default: 64)
+        max_iter: maximum number of iterations (default: 1)
+        test_iter: test_iter for caffe solver (default: 10)
+        test_interval: test_interval for caffe solver (default: 500)
+        display: display for caffe solver (default: 100)
+        lr_policy: learning rate policy for caffe solver (default: "step")
+        weight_decay: regularation strength (default: 5e-4)
+        regularization_type: regularization type (default: "L2")
         """
         from .keras2caffe import *
         import tempfile
@@ -906,8 +918,10 @@ class Keras2DML(Caffe2DML):
             keras_model = keras_model.model
         self.name = keras_model.name
         createJavaObject(sparkSession._sc, 'dummy')
-        convertKerasToCaffeNetwork(keras_model, self.name + ".proto")
-        convertKerasToCaffeSolver(keras_model, self.name + ".proto", self.name 
+ "_solver.proto")
+        if not hasattr(keras_model, 'optimizer'):
+                   keras_model.compile(loss='categorical_crossentropy', 
optimizer=keras.optimizers.SGD(lr=0.01, momentum=0.95, decay=5e-4, 
nesterov=True))
+        convertKerasToCaffeNetwork(keras_model, self.name + ".proto", 
int(batch_size))
+        convertKerasToCaffeSolver(keras_model, self.name + ".proto", self.name 
+ "_solver.proto", int(max_iter), int(test_iter), int(test_interval), 
int(display), lr_policy, weight_decay, regularization_type)
         self.weights = tempfile.mkdtemp() if weights is None else weights
         convertKerasToSystemMLModel(sparkSession, keras_model, self.weights)
         if labels is not None and (labels.startswith('https:') or 
labels.startswith('http:')):

http://git-wip-us.apache.org/repos/asf/systemml/blob/9dc354ac/src/main/python/systemml/mllearn/keras2caffe.py
----------------------------------------------------------------------
diff --git a/src/main/python/systemml/mllearn/keras2caffe.py 
b/src/main/python/systemml/mllearn/keras2caffe.py
index 3cf710c..81fb63a 100755
--- a/src/main/python/systemml/mllearn/keras2caffe.py
+++ b/src/main/python/systemml/mllearn/keras2caffe.py
@@ -28,6 +28,8 @@ from itertools import chain, imap
 from ..converters import *
 from ..classloader import *
 import keras
+from keras import backend as K
+from keras.layers import Activation
 
 try:
     import py4j.java_gateway
@@ -137,7 +139,7 @@ def _parseKerasLayer(layer):
        param = layerParamMapping[layerType](layer)
        paramName = param.keys()[0]
        if layerType == keras.layers.InputLayer:
-               ret = { 'layer': { 'name':layer.name, 'type':'Data', 
'top':layer.name, paramName:param[paramName] } }
+               ret = { 'layer': { 'name':layer.name, 'type':'Data', 
paramName:param[paramName], 'top':layer.name, 'top':'label' } }
        else:
                ret = { 'layer': { 'name':layer.name, 
'type':supportedLayers[layerType], 'bottom':_getBottomLayers(layer), 
'top':layer.name, paramName:param[paramName] } }
        return [ ret, _parseActivation(layer, layer.name + '_activation') ] if 
_shouldParseActivation(layer)  else [ ret ]
@@ -155,8 +157,6 @@ specialLayers = {
     keras.layers.BatchNormalization: _parseBatchNorm
     }
        
-batchSize = 64
-
 def getConvParam(layer):
        stride = (1, 1) if layer.strides is None else layer.strides
        padding = [layer.kernel_size[0] / 2, layer.kernel_size[1] / 2] if 
layer.padding == 'same' else [0, 0]
@@ -181,7 +181,7 @@ def getRecurrentParam(layer):
 # TODO: Update AveragePooling2D when we add maxpooling support 
 layerParamMapping = {
     keras.layers.InputLayer: lambda l: \
-        {'data_param': {'batch_size': batchSize}},
+        {'data_param': {'batch_size': l.batch_size}},
     keras.layers.Dense: lambda l: \
         {'inner_product_param': {'num_output': l.units}},
     keras.layers.Dropout: lambda l: \
@@ -210,16 +210,58 @@ def _checkIfValid(myList, fn, errorMessage):
        if len(unsupported_elems) != 0:
                raise ValueError(errorMessage + 
str(np.array(myList)[unsupported_elems]))
 
-def convertKerasToCaffeNetwork(kerasModel, outCaffeNetworkFilePath):
+def _transformLayer(layer, batch_size):
+       if type(layer) == keras.layers.InputLayer:
+               layer.batch_size = batch_size
+       return [ layer ]
+
+def _appendKerasLayers(fileHandle, kerasLayers, batch_size):
+       if len(kerasLayers) >= 1:
+               transformedLayers = list(chain.from_iterable(imap(lambda layer: 
_transformLayer(layer, batch_size), kerasLayers)))  
+               jsonLayers = list(chain.from_iterable(imap(lambda layer: 
_parseKerasLayer(layer), transformedLayers)))
+               parsedLayers = list(chain.from_iterable(imap(lambda layer: 
_parseJSONObject(layer), jsonLayers)))
+               fileHandle.write(''.join(parsedLayers))
+               fileHandle.write('\n')
+       
+def lossLayerStr(layerType, bottomLayer):
+       return 'layer {\n  name: "loss"\n  type: "' + layerType + '"\n  bottom: 
"' + bottomLayer + '"\n  bottom: "label"\n  top: "loss"\n}\n'
+       
+def _appendKerasLayerWithoutActivation(fileHandle, layer, batch_size):
+       if type(layer) != keras.layers.Activation:
+               lastLayerActivation = layer.activation
+               layer.activation = keras.activations.linear
+               _appendKerasLayers(fileHandle, [layer], batch_size)
+               layer.activation = lastLayerActivation
+
+def _getExactlyOneBottomLayer(layer):
+       bottomLayers = _getBottomLayers(layer)
+       if len(bottomLayers) != 1:
+               raise Exception('Expected only one bottom layer for ' + 
str(layer.name) + ', but found ' + str(bottomLayers))
+       return bottomLayers[0]
+
+def _isMeanSquaredError(loss):
+       return loss == 'mean_squared_error' or loss == 'mse' or loss == 'MSE' 
+       
+def convertKerasToCaffeNetwork(kerasModel, outCaffeNetworkFilePath, 
batch_size):
        _checkIfValid(kerasModel.layers, lambda layer: False if type(layer) in 
supportedLayers else True, 'Unsupported Layers:')
-       #unsupported_layers = np.array([False if type(layer) in supportedLayers 
else True for layer in kerasModel.layers])
-       #if len(np.where(unsupported_layers)[0]) != 0:
-       #       raise TypeError('Unsupported Layers:' + 
str(np.array(kerasModel.layers)[np.where(unsupported_layers)[0]]))
-       # Core logic: model.layers.flatMap(layer => 
_parseJSONObject(_parseKerasLayer(layer)))
-       jsonLayers = list(chain.from_iterable(imap(lambda layer: 
_parseKerasLayer(layer), kerasModel.layers)))
-       parsedLayers = list(chain.from_iterable(imap(lambda layer: 
_parseJSONObject(layer), jsonLayers)))
        with open(outCaffeNetworkFilePath, 'w') as f:
-               f.write(''.join(parsedLayers))
+               # Write the parsed layers for all but the last layer
+               _appendKerasLayers(f, kerasModel.layers[:-1], batch_size)
+               # Now process the last layer with loss
+               lastLayer = kerasModel.layers[-1]
+               if _isMeanSquaredError(kerasModel.loss):
+                       _appendKerasLayers(f, [ lastLayer ], batch_size)
+                       f.write(lossLayerStr('EuclideanLoss', lastLayer.name))
+               elif kerasModel.loss == 'categorical_crossentropy':
+                       _appendKerasLayerWithoutActivation(f, lastLayer, 
batch_size)
+                       bottomLayer = _getExactlyOneBottomLayer(lastLayer) if 
type(lastLayer) == keras.layers.Activation else lastLayer.name  
+                       lastLayerActivation = 
str(keras.activations.serialize(lastLayer.activation))
+                       if lastLayerActivation == 'softmax' and kerasModel.loss 
== 'categorical_crossentropy':
+                               f.write(lossLayerStr('SoftmaxWithLoss', 
bottomLayer))
+                       else:
+                               raise Exception('Unsupported loss layer ' + 
str(kerasModel.loss) + ' (where last layer activation ' + lastLayerActivation + 
').')
+               else:
+                       raise Exception('Unsupported loss layer ' + 
str(kerasModel.loss) + ' (where last layer activation ' + lastLayerActivation + 
').')
 
 
 def getNumPyMatrixFromKerasWeight(param):
@@ -234,23 +276,52 @@ def getNumPyMatrixFromKerasWeight(param):
 
 
 defaultSolver = """
-base_lr: 0.01
-momentum: 0.9
-weight_decay: 5e-4
-lr_policy: "exp"
-gamma: 0.95
-display: 100
 solver_mode: CPU
-type: "SGD"
-max_iter: 2000
-test_iter: 10
-test_interval: 500
 """
 
-def convertKerasToCaffeSolver(kerasModel, caffeNetworkFilePath, 
outCaffeSolverFilePath):
+def evaluateValue(val):
+       if type(val) == int or type(val) == float:
+               return float(val)
+       else:
+               return K.eval(val)
+       
+def convertKerasToCaffeSolver(kerasModel, caffeNetworkFilePath, 
outCaffeSolverFilePath, max_iter, test_iter, test_interval, display, lr_policy, 
weight_decay, regularization_type):
+       if type(kerasModel.optimizer) == keras.optimizers.SGD:
+               solver = 'type: "Nesterov"\n' if kerasModel.optimizer.nesterov 
else 'type: "SGD"\n'
+       elif type(kerasModel.optimizer) == keras.optimizers.Adagrad:
+               solver = 'type: "Adagrad"\n'
+       elif type(kerasModel.optimizer) == keras.optimizers.Adam:
+               solver = 'type: "Adam"\n'
+       else:
+               raise Exception('Only sgd (with/without momentum/nesterov), 
Adam and Adagrad supported.')
+       base_lr = evaluateValue(kerasModel.optimizer.lr) if 
hasattr(kerasModel.optimizer, 'lr') else 0.01
+       gamma = evaluateValue(kerasModel.optimizer.decay) if 
hasattr(kerasModel.optimizer, 'decay') else 0.0
        with open(outCaffeSolverFilePath, 'w') as f:
                f.write('net: "' + caffeNetworkFilePath + '"\n')
                f.write(defaultSolver)
+               f.write(solver)
+               f.write('lr_policy: "' + lr_policy + '"\n')
+               f.write('regularization_type: "' + str(regularization_type) + 
'"\n')
+               f.write('weight_decay: ' + str(weight_decay) + '\n')
+               f.write('max_iter: ' + str(max_iter) + '\ntest_iter: ' + 
str(test_iter) + '\ntest_interval: ' + str(test_interval) + '\n')
+               f.write('display: ' + str(display) + '\n')
+               f.write('base_lr: ' + str(base_lr) + '\n')
+               f.write('gamma: ' + str(gamma) + '\n')
+               if type(kerasModel.optimizer) == keras.optimizers.SGD:
+                       momentum = evaluateValue(kerasModel.optimizer.momentum) 
if hasattr(kerasModel.optimizer, 'momentum') else 0.0
+                       f.write('momentum: ' + str(momentum) + '\n')
+               elif type(kerasModel.optimizer) == keras.optimizers.Adam:
+                       momentum = evaluateValue(kerasModel.optimizer.beta_1) 
if hasattr(kerasModel.optimizer, 'beta_1') else 0.9
+                       momentum2 = evaluateValue(kerasModel.optimizer.beta_2) 
if hasattr(kerasModel.optimizer, 'beta_2') else 0.999
+                       delta = evaluateValue(kerasModel.optimizer.epsilon) if 
hasattr(kerasModel.optimizer, 'epsilon') else 1e-8
+                       f.write('momentum: ' + str(momentum) + '\n')
+                       f.write('momentum2: ' + str(momentum2) + '\n')
+                       f.write('delta: ' + str(delta) + '\n')
+               elif type(kerasModel.optimizer) == keras.optimizers.Adagrad:
+                       delta = evaluateValue(kerasModel.optimizer.epsilon) if 
hasattr(kerasModel.optimizer, 'epsilon') else 1e-8
+                       f.write('delta: ' + str(delta) + '\n')
+               else:
+                       raise Exception('Only sgd (with/without 
momentum/nesterov), Adam and Adagrad supported.')
 
 
 def getInputMatrices(layer):

http://git-wip-us.apache.org/repos/asf/systemml/blob/9dc354ac/src/main/scala/org/apache/sysml/api/dl/Caffe2DML.scala
----------------------------------------------------------------------
diff --git a/src/main/scala/org/apache/sysml/api/dl/Caffe2DML.scala 
b/src/main/scala/org/apache/sysml/api/dl/Caffe2DML.scala
index 789d08a..0a215b1 100644
--- a/src/main/scala/org/apache/sysml/api/dl/Caffe2DML.scala
+++ b/src/main/scala/org/apache/sysml/api/dl/Caffe2DML.scala
@@ -357,6 +357,10 @@ class Caffe2DML(val sc: SparkContext,
     
     System.out.println("* => memory in megabytes assuming the parameters 
(input, output activations, weights and backpropagation errors) are in double 
precision and in dense format.")
   }
+  
+  def setDebugFlags(isDebug:Boolean):Unit = {
+    net.getLayers.map(layer => {net.getCaffeLayer(layer).debugLayer = isDebug})
+  }
 
   // 
================================================================================================
   // The below method parses the provided network and solver file and 
generates DML script.
@@ -368,9 +372,11 @@ class Caffe2DML(val sc: SparkContext,
     // Flags passed by user
     val DEBUG_TRAINING = if (inputs.containsKey("$debug")) 
inputs.get("$debug").toLowerCase.toBoolean else false
     assign(tabDMLScript, "debug", if (DEBUG_TRAINING) "TRUE" else "FALSE")
+    setDebugFlags(DEBUG_TRAINING)
 
     appendHeaders(net, solver, true) // Appends DML corresponding to source 
and externalFunction statements.
-    readInputData(net, true)         // Read X_full and y_full
+    val performOneHotEncoding = 
!inputs.containsKey("$perform_one_hot_encoding") || 
inputs.get("$perform_one_hot_encoding").toBoolean
+    readInputData(net, true, performOneHotEncoding)         // Read X_full and 
y_full
     // Initialize the layers and solvers. Reads weights and bias if $weights 
is set.
     initWeights(net, solver, inputs.containsKey("$weights"), layersToIgnore)
 
@@ -389,7 +395,7 @@ class Caffe2DML(val sc: SparkContext,
     // 
----------------------------------------------------------------------------
     // Main logic
     forBlock("iter", "1", "max_iter") {
-      performTrainingIter(lossLayers, shouldValidate)
+      performTrainingIter(lossLayers, shouldValidate, performOneHotEncoding)
       if (getTrainAlgo.toLowerCase.equals("batch")) {
         assign(tabDMLScript, "e", "iter")
         tabDMLScript.append("# Learning rate\n")
@@ -414,11 +420,14 @@ class Caffe2DML(val sc: SparkContext,
     val script = dml(trainingScript).in(inputs)
     net.getLayers.map(net.getCaffeLayer(_)).filter(_.weight != null).map(l => 
script.out(l.weight))
     net.getLayers.map(net.getCaffeLayer(_)).filter(_.bias != null).map(l => 
script.out(l.bias))
+    
+    setDebugFlags(false)
+    
     (script, "X_full", "y_full")
   }
   // 
================================================================================================
 
-  private def performTrainingIter(lossLayers: List[IsLossLayer], 
shouldValidate: Boolean): Unit =
+  private def performTrainingIter(lossLayers: List[IsLossLayer], 
shouldValidate: Boolean, performOneHotEncoding:Boolean): Unit =
     getTrainAlgo.toLowerCase match {
       case "minibatch" =>
         getTrainingBatch(tabDMLScript)
@@ -426,14 +435,14 @@ class Caffe2DML(val sc: SparkContext,
         // Perform forward, backward and update on minibatch
         forward; backward; update
         // -------------------------------------------------------
-        displayLoss(lossLayers(0), shouldValidate)
+        displayLoss(lossLayers(0), shouldValidate, performOneHotEncoding)
         performSnapshot
       case "batch" => {
         // -------------------------------------------------------
         // Perform forward, backward and update on entire dataset
         forward; backward; update
         // -------------------------------------------------------
-        displayLoss(lossLayers(0), shouldValidate)
+        displayLoss(lossLayers(0), shouldValidate, performOneHotEncoding)
         performSnapshot
       }
       case "allreduce_parallel_batches" => {
@@ -469,7 +478,7 @@ class Caffe2DML(val sc: SparkContext,
           // -------------------------------------------------------
           assign(tabDMLScript, "Xb", "X_group_batch")
           assign(tabDMLScript, "yb", "y_group_batch")
-          displayLoss(lossLayers(0), shouldValidate)
+          displayLoss(lossLayers(0), shouldValidate, performOneHotEncoding)
           performSnapshot
         }
       }
@@ -496,7 +505,7 @@ class Caffe2DML(val sc: SparkContext,
         // -------------------------------------------------------
         assign(tabDMLScript, "Xb", "X_group_batch")
         assign(tabDMLScript, "yb", "y_group_batch")
-        displayLoss(lossLayers(0), shouldValidate)
+        displayLoss(lossLayers(0), shouldValidate, performOneHotEncoding)
         performSnapshot
       }
       case _ => throw new DMLRuntimeException("Unsupported train algo:" + 
getTrainAlgo)
@@ -537,7 +546,7 @@ class Caffe2DML(val sc: SparkContext,
     }
 
   // Append the DML to display training and validation loss
-  private def displayLoss(lossLayer: IsLossLayer, shouldValidate: Boolean): 
Unit = {
+  private def displayLoss(lossLayer: IsLossLayer, shouldValidate: Boolean, 
performOneHotEncoding:Boolean): Unit = {
     if (solverParam.getDisplay > 0) {
       // Append the DML to compute training loss
       if (!getTrainAlgo.toLowerCase.startsWith("allreduce")) {
@@ -550,7 +559,9 @@ class Caffe2DML(val sc: SparkContext,
           tabDMLScript.append(
             print(dmlConcat(asDMLString("Iter:"), "iter", asDMLString(", 
training loss:"), "training_loss", asDMLString(", training accuracy:"), 
"training_accuracy"))
           )
-          printClassificationReport
+          if(performOneHotEncoding) {
+            printClassificationReport
+          }
         }
       } else {
         Caffe2DML.LOG.info("Training loss is not printed for train_algo=" + 
getTrainAlgo)
@@ -743,9 +754,11 @@ class Caffe2DMLModel(val numClasses: String, val sc: 
SparkContext, val solver: C
 
     val DEBUG_PREDICTION = if (estimator.inputs.containsKey("$debug")) 
estimator.inputs.get("$debug").toLowerCase.toBoolean else false
     assign(tabDMLScript, "debug", if (DEBUG_PREDICTION) "TRUE" else "FALSE")
+    estimator.setDebugFlags(DEBUG_PREDICTION)
 
     appendHeaders(net, solver, false) // Appends DML corresponding to source 
and externalFunction statements.
-    readInputData(net, false)         // Read X_full and y_full
+    val performOneHotEncoding = 
!estimator.inputs.containsKey("$perform_one_hot_encoding") || 
estimator.inputs.get("$perform_one_hot_encoding").toBoolean
+    readInputData(net, false, performOneHotEncoding)         // Read X_full 
and y_full
     assign(tabDMLScript, "X", "X_full")
 
     // Initialize the layers and solvers. Reads weights and bias if 
readWeights is true.
@@ -837,6 +850,9 @@ class Caffe2DMLModel(val numClasses: String, val sc: 
SparkContext, val solver: C
       net.getLayers.map(net.getCaffeLayer(_)).filter(_.weight != null).map(l 
=> script.in(l.weight, estimator.mloutput.getMatrix(l.weight)))
       net.getLayers.map(net.getCaffeLayer(_)).filter(_.bias != null).map(l => 
script.in(l.bias, estimator.mloutput.getMatrix(l.bias)))
     }
+    
+    estimator.setDebugFlags(false)
+    
     (script, "X_full")
   }
   // 
================================================================================================

http://git-wip-us.apache.org/repos/asf/systemml/blob/9dc354ac/src/main/scala/org/apache/sysml/api/dl/CaffeLayer.scala
----------------------------------------------------------------------
diff --git a/src/main/scala/org/apache/sysml/api/dl/CaffeLayer.scala 
b/src/main/scala/org/apache/sysml/api/dl/CaffeLayer.scala
index cdecdce..65a9921 100644
--- a/src/main/scala/org/apache/sysml/api/dl/CaffeLayer.scala
+++ b/src/main/scala/org/apache/sysml/api/dl/CaffeLayer.scala
@@ -42,6 +42,22 @@ trait CaffeLayer extends BaseDMLGenerator {
     computedOutputShape
   }
   // -------------------------------------------------
+  var debugLayer = false
+  def validateDimensions(dmlScript: StringBuilder, mat:String, 
expectedNumRows:String, expectedNumCols:String, optionalString:String=""):Unit 
= {
+    if(debugLayer) {
+      val msg = " in " + sourceFileName + "(" + optionalString + ") script."
+      if(expectedNumRows != null) {
+        dmlScript.append("\nif( " + expectedNumRows + " != nrow(" + mat + ")) 
{\n")
+        dmlScript.append("\tstop(\"Incorrect number of rows for " + mat + msg 
+ " Expected:\" + " + expectedNumRows + " + \" but found \" +  nrow(" + mat + 
") )") 
+        dmlScript.append("\n}\n")
+      }
+      if(expectedNumCols != null) {
+        dmlScript.append("\nif( " + expectedNumCols + " != ncol(" + mat + ")) 
{\n")
+        dmlScript.append("\tstop(\"Incorrect number of columns for " + mat + 
msg + " Expected:\" + " + expectedNumCols + " + \" but found \" +  ncol(" + mat 
+ ") )") 
+        dmlScript.append("\n}\n")
+      }
+    }
+  }
   var computedBottomLayerOutputShape: (String, String, String) = null
   def bottomLayerOutputShape: (String, String, String) = {
     if (computedBottomLayerOutputShape == null) {
@@ -532,27 +548,25 @@ class Concat(val param: LayerParameter, val id: Int, val 
net: CaffeNetwork) exte
 
 // L2 loss function.
 class EuclideanLoss(val param: LayerParameter, val id: Int, val net: 
CaffeNetwork) extends CaffeLayer with IsLossLayer {
-  override def sourceFileName: String = if (!isSegmentationProblem()) 
"l2_loss" else throw new DMLRuntimeException("Segmentation is not supported for 
EuclideanLoss in Caffe2DML yet")
+  override def sourceFileName: String = "l2_loss"
   override def weightShape(): Array[Int] = null
   override def biasShape(): Array[Int]   = null
   
-  override def forward(dmlScript: StringBuilder, isPrediction: Boolean) =
-    invokeForward(dmlScript, List[String](out), scores, "yb")
+  override def forward(dmlScript: StringBuilder, isPrediction: Boolean) = 
+    assign(dmlScript, out, scores)
   
-  override def backward(dmlScript: StringBuilder,outSuffix: String): Unit = 
+  override def backward(dmlScript: StringBuilder,outSuffix: String): Unit =  {
+      invokeForward(dmlScript, List[String](out), scores, "yb")
       invokeBackward(dmlScript, outSuffix, List[String]("dOut" + id + 
outSuffix), scores, "yb")
-  
-  override def computeLoss(dmlScript: StringBuilder,numTabs: Int): Unit =
-    if (!isSegmentationProblem()) {
-      val tabBuilder = new StringBuilder
-      for (i <- 0 until numTabs) tabBuilder.append("\t")
-      val tabs = tabBuilder.toString
-      dmlScript.append("tmp_loss = l2_loss::forward(" + commaSep(out, "yb") + 
")\n")
-      dmlScript.append(tabs).append("loss = loss + tmp_loss\n")
-      dmlScript.append(tabs).append("accuracy = -1\n")
-    } else {
-      throw new RuntimeException("Computation of loss for SoftmaxWithLoss is 
not implemented for segmentation problem")
-    }
+  }
+  override def computeLoss(dmlScript: StringBuilder,numTabs: Int): Unit = {
+    val tabBuilder = new StringBuilder
+    for (i <- 0 until numTabs) tabBuilder.append("\t")
+    val tabs = tabBuilder.toString
+    invokeForward(dmlScript, List[String]("tmp_loss"), scores, "yb")
+    dmlScript.append(tabs).append("loss = loss + tmp_loss\n")
+    dmlScript.append(tabs).append("accuracy = -1\n")
+  }
 }
 
 class SoftmaxWithLoss(val param: LayerParameter, val id: Int, val net: 
CaffeNetwork) extends CaffeLayer with IsLossLayer {
@@ -853,8 +867,14 @@ class InnerProduct(val param: LayerParameter, val id: Int, 
val net: CaffeNetwork
    * Outputs:
    *  - out: Outputs, of shape (N, M).
    */
-  override def forward(dmlScript: StringBuilder, isPrediction: Boolean) =
+  override def forward(dmlScript: StringBuilder, isPrediction: Boolean) = {
+    val D = numFeatures
+    val M = numNeurons
+    validateDimensions(dmlScript, X, null, D)
+    validateDimensions(dmlScript, weight, D, M, "forward")
+    validateDimensions(dmlScript, bias, "1", M)
     invokeForward(dmlScript, List[String](out), X, weight, bias)
+  }
   /*
    * Computes the backward pass for a fully-connected (affine) layer
    * with M neurons.
@@ -870,8 +890,15 @@ class InnerProduct(val param: LayerParameter, val id: Int, 
val net: CaffeNetwork
    *  - dW: Gradient wrt `W`, of shape (D, M).
    *  - db: Gradient wrt `b`, of shape (1, M).
    */
-  override def backward(dmlScript: StringBuilder, outSuffix: String) =
+  override def backward(dmlScript: StringBuilder, outSuffix: String) = {
+    val D = numFeatures
+    val M = numNeurons
+    validateDimensions(dmlScript, dout, null, M)
+    validateDimensions(dmlScript, X, null, D)
+    validateDimensions(dmlScript, weight, D, M, "backward")
+    validateDimensions(dmlScript, bias, "1", M)
     invokeBackward(dmlScript, outSuffix, List[String]("dOut" + id, dWeight, 
dBias), dout, X, weight, bias)
+  }
   // -------------------------------------------------
   // num_output (c_o): the number of filters
   def numNeurons  = param.getInnerProductParam.getNumOutput.toString
@@ -935,15 +962,44 @@ class LSTM(val param: LayerParameter, val id: Int, val 
net: CaffeNetwork) extend
   
   override def init(dmlScript: StringBuilder) = {
     invokeInit(dmlScript, List[String](weight, bias, out0, c0), 
Caffe2DML.batchSize, input_features, M)
+    // Also, initialize gradient wrt `c` to empty matrix 
+    dmlScript.append(dc0 + " = matrix(0, rows=" + Caffe2DML.batchSize + ", 
cols=" + M + ")\n")
   }
   
   override def forward(dmlScript: StringBuilder, isPrediction: Boolean) = {
-    invokeForward(dmlScript, List[String](out, c, cache_out, cache_c, 
cache_ifog), X, weight, bias, timesteps, input_features, 
return_sequences.toString.toUpperCase, out0, c0)
+    val N:String = null // output_features.toString
+    val T = timesteps()
+    val D = input_features()
+    validateDimensions(dmlScript, X, N, T + "*" + D)
+    validateDimensions(dmlScript, out0, N, M)
+    validateDimensions(dmlScript, c0, N, M)
+    validateDimensions(dmlScript, weight, D + "+" + M, 4 + "*" + M)
+    validateDimensions(dmlScript, bias, "1", 4 + "*" + M)
+    invokeForward(dmlScript, List[String](out, c, cache_out, cache_c, 
cache_ifog), X, weight, bias, T, D, return_sequences.toString.toUpperCase, 
out0, c0)
+    // This validates whether the output is of correct dimensions
+    validateDimensions(dmlScript, out, null, int_mult(outputShape._1, 
outputShape._2, outputShape._3))
   }
   
   override def backward(dmlScript: StringBuilder, outSuffix: String) = {
+    val T = timesteps()
+    val D = input_features()
+    if(return_sequences) {
+      validateDimensions(dmlScript, dout, null, T + "*" + M)
+    }
+    else {
+      validateDimensions(dmlScript, dout, null, M)
+    }
+    validateDimensions(dmlScript, dc0, null, M)
+    validateDimensions(dmlScript, X, null, T + "*" + D)
+    validateDimensions(dmlScript, out0, null, M)
+    validateDimensions(dmlScript, c0, null, M)
+    validateDimensions(dmlScript, cache_out, T, null)
+    validateDimensions(dmlScript, cache_c, T, null)
+    validateDimensions(dmlScript, cache_ifog, T, null)
+    validateDimensions(dmlScript, weight, D + "+" + M, 4 + "*" + M)
+    validateDimensions(dmlScript, bias, "1", 4 + "*" + M)
     invokeBackward(dmlScript, outSuffix, List[String]("dOut" + id, dWeight, 
dBias, dout0, dc0), dout, dc0, X, weight, bias,
-        timesteps, input_features, return_sequences.toString.toUpperCase, 
out0, c0, cache_out, cache_c, cache_ifog)
+        T, D, return_sequences.toString.toUpperCase, out0, c0, cache_out, 
cache_c, cache_ifog)
   }
   
   val cache_out = "cache_out_" + id

http://git-wip-us.apache.org/repos/asf/systemml/blob/9dc354ac/src/main/scala/org/apache/sysml/api/dl/CaffeSolver.scala
----------------------------------------------------------------------
diff --git a/src/main/scala/org/apache/sysml/api/dl/CaffeSolver.scala 
b/src/main/scala/org/apache/sysml/api/dl/CaffeSolver.scala
index a61ff10..d0d738e 100644
--- a/src/main/scala/org/apache/sysml/api/dl/CaffeSolver.scala
+++ b/src/main/scala/org/apache/sysml/api/dl/CaffeSolver.scala
@@ -52,12 +52,25 @@ trait CaffeSolver {
       ret
     }
 
-  def l2reg_update(lambda: Double, dmlScript: StringBuilder, layer: 
CaffeLayer): Unit =
+  def regularization_update(regularizationType:String, lambda: Double, 
dmlScript: StringBuilder, layer: CaffeLayer): Unit = {
     // val donotRegularizeLayers:Boolean = layer.isInstanceOf[BatchNorm] || 
layer.isInstanceOf[Scale];
+    val regularizationSource = 
+      if(regularizationType.toLowerCase.equals("l2")) "l2_reg"
+      else if(regularizationType.toLowerCase.equals("l1")) "l1_reg"
+      else null
+    if(regularizationSource == null) {
+      throw new DMLRuntimeException("Unsupported regularization_type:" + 
regularizationType + ". Please use either L2 or L1.")
+    }
+    
     if (lambda != 0 && layer.shouldUpdateWeight) {
-      dmlScript.append("\t").append(layer.dWeight + "_reg = l2_reg::backward(" 
+ layer.weight + ", " + lambda + ")\n")
+      // Use layer-specific decay multiplier, if param { lr_mult: 1 
decay_mult: 1 } is specified in the network file
+      val hasDecayMult = layer.param.getParamList != null && 
layer.param.getParamList.size >= 1 && 
layer.param.getParamList.get(0).hasDecayMult
+      val newLambda = if(hasDecayMult) 
layer.param.getParamList.get(0).getDecayMult * lambda else lambda
+      
+      dmlScript.append("\t").append(layer.dWeight + "_reg = " + 
regularizationSource + "::backward(" + layer.weight + ", " + newLambda + ")\n")
       dmlScript.append("\t").append(layer.dWeight + " = " + layer.dWeight + " 
+ " + layer.dWeight + "_reg\n")
     }
+  }
 }
 
 class LearningRatePolicy(lr_policy: String = "exp", base_lr: Double = 0.01) {
@@ -87,7 +100,7 @@ class LearningRatePolicy(lr_policy: String = "exp", base_lr: 
Double = 0.01) {
   }
 }
 
-class SGD(lambda: Double = 5e-04, momentum: Double = 0.9) extends CaffeSolver {
+class SGD(regularizationType:String = "L2", lambda: Double = 5e-04, momentum: 
Double = 0.9) extends CaffeSolver {
   /*
    * Performs an SGD update with momentum.
    *
@@ -112,7 +125,7 @@ class SGD(lambda: Double = 5e-04, momentum: Double = 0.9) 
extends CaffeSolver {
    *      input `X`.
    */
   def update(dmlScript: StringBuilder, layer: CaffeLayer): Unit = {
-    l2reg_update(lambda, dmlScript, layer)
+    regularization_update(regularizationType, lambda, dmlScript, layer)
     if (momentum == 0) {
       // Use sgd
       if (layer.shouldUpdateWeight) dmlScript.append("\t").append(layer.weight 
+ " = sgd::update(" + commaSep(layer.weight, layer.dWeight, getWeightLr(layer)) 
+ ")\n")
@@ -143,7 +156,7 @@ class SGD(lambda: Double = 5e-04, momentum: Double = 0.9) 
extends CaffeSolver {
   def sourceFileName: String = if (momentum == 0) "sgd" else "sgd_momentum"
 }
 
-class AdaGrad(lambda: Double = 5e-04, epsilon: Double = 1e-6) extends 
CaffeSolver {
+class AdaGrad(regularizationType:String = "L2", lambda: Double = 5e-04, 
epsilon: Double = 1e-6) extends CaffeSolver {
   /*
    * Performs an Adagrad update.
    *
@@ -172,7 +185,7 @@ class AdaGrad(lambda: Double = 5e-04, epsilon: Double = 
1e-6) extends CaffeSolve
    *      gradients, of same shape as `X`.
    */
   def update(dmlScript: StringBuilder, layer: CaffeLayer): Unit = {
-    l2reg_update(lambda, dmlScript, layer)
+    regularization_update(regularizationType, lambda, dmlScript, layer)
     if (layer.shouldUpdateWeight)
       dmlScript
         .append("\t")
@@ -195,7 +208,73 @@ class AdaGrad(lambda: Double = 5e-04, epsilon: Double = 
1e-6) extends CaffeSolve
   def sourceFileName: String = "adagrad"
 }
 
-class Nesterov(lambda: Double = 5e-04, momentum: Double = 0.9) extends 
CaffeSolver {
+class Adam(regularizationType:String = "L2", lambda: Double = 5e-04, 
momentum:Double = 0.9, momentum2:Double = 0.999, delta:Double = 1e-8) extends 
CaffeSolver {
+  /*
+   * Performs an Adam update.
+   *
+   * Reference:
+   *  - Adam: A Method for Stochastic Optimization, Kingma, Ba.
+   *    - http://arxiv.org/abs/1412.6980
+   *
+   * Inputs:
+   *  - X: Parameters to update, of shape (any, any).
+   *  - dX: Gradient wrt `X` of a loss function being optimized, of
+   *      same shape as `X`.
+   *  - lr: Learning rate.  Recommended value is 0.001.
+   *  - beta1: Exponential decay rate for the 1st moment estimates.
+   *      Recommended value is 0.9.
+   *  - beta2: Exponential decay rate for the 2nd moment estimates.
+   *      Recommended value is 0.999.
+   *  - epsilon: Smoothing term to avoid divide by zero errors.
+   *      Recommended value is 1e-8.
+   *  - t: Timestep, starting at 0.
+   *  - m: State containing the 1st moment (mean) estimate by
+   *      maintaining exponential moving averages of the gradients, of
+   *      same shape as `X`.
+   *  - v: State containing the 2nd raw moment (uncentered variance)
+   *      estimate by maintaining exponential moving averages of the
+   *      squared gradients, of same shape as `X`.
+   *
+   * Outputs:
+   *  - X: Updated parameters `X`, of same shape as input `X`.
+   *  - m: Updated state containing the 1st moment (mean) estimate by
+   *      maintaining exponential moving averages of the gradients, of
+   *      same shape as `X`.
+   *  - v: Updated state containing the 2nd raw moment (uncentered
+   *      variance) estimate by maintaining exponential moving averages
+   *      of the squared gradients, of same shape as `X`.
+   */
+  def update(dmlScript: StringBuilder, layer: CaffeLayer): Unit = {
+    regularization_update(regularizationType, lambda, dmlScript, layer)
+    val t:String = "iter - 1" // since iter starts with 0
+    // X, dX, double lr, double beta1, double beta2, epsilon, int t, 
matrix[double] m, matrix[double] v
+    if (layer.shouldUpdateWeight)
+      dmlScript
+        .append("\t")
+        .append(
+          "[" + commaSep(layer.weight, layer.weight + "_m", layer.weight + 
"_v") + "] " +
+          "= adam::update(" + commaSep(layer.weight, layer.dWeight, 
getWeightLr(layer), 
+              momentum.toString, momentum2.toString, delta.toString,  t,
+              layer.weight + "_m", layer.weight + "_v") + ")\n"
+        )
+    if (layer.shouldUpdateBias)
+      dmlScript
+        .append("\t")
+        .append(
+          "[" + commaSep(layer.bias, layer.bias + "_m", layer.bias + "_v") + 
"] " +
+          "= adam::update(" + commaSep(layer.bias, layer.dBias, 
getBiasLr(layer), 
+              momentum.toString, momentum2.toString, delta.toString,  t, 
+              layer.weight + "_m", layer.weight + "_v") + ")\n"
+        )
+  }
+  def init(dmlScript: StringBuilder, layer: CaffeLayer): Unit = {
+    if (layer.shouldUpdateWeight) dmlScript.append("[ " + layer.weight + "_m, 
" + layer.weight + "_v ] = adam::init(" + layer.weight + ")\n")
+    if (layer.shouldUpdateBias) dmlScript.append("[ " + layer.bias + "_m, " + 
layer.bias + "_v ] = adam::init(" + layer.bias + ")\n")
+  }
+  def sourceFileName: String = "adam"
+}
+
+class Nesterov(regularizationType:String = "L2", lambda: Double = 5e-04, 
momentum: Double = 0.9) extends CaffeSolver {
   /*
    * Performs an SGD update with Nesterov momentum.
    *
@@ -232,7 +311,7 @@ class Nesterov(lambda: Double = 5e-04, momentum: Double = 
0.9) extends CaffeSolv
     val fn            = if (Caffe2DML.USE_NESTEROV_UDF) "update_nesterov" else 
"sgd_nesterov::update"
     val lastParameter = if (Caffe2DML.USE_NESTEROV_UDF) (", " + lambda) else ""
     if (!Caffe2DML.USE_NESTEROV_UDF) {
-      l2reg_update(lambda, dmlScript, layer)
+      regularization_update(regularizationType, lambda, dmlScript, layer)
     }
     if (layer.shouldUpdateWeight)
       dmlScript

http://git-wip-us.apache.org/repos/asf/systemml/blob/9dc354ac/src/main/scala/org/apache/sysml/api/dl/DMLGenerator.scala
----------------------------------------------------------------------
diff --git a/src/main/scala/org/apache/sysml/api/dl/DMLGenerator.scala 
b/src/main/scala/org/apache/sysml/api/dl/DMLGenerator.scala
index 304f788..0231354 100644
--- a/src/main/scala/org/apache/sysml/api/dl/DMLGenerator.scala
+++ b/src/main/scala/org/apache/sysml/api/dl/DMLGenerator.scala
@@ -261,6 +261,7 @@ trait DMLGenerator extends SourceDMLGenerator with 
NextBatchGenerator {
   // Appends DML corresponding to source and externalFunction statements.
   def appendHeaders(net: CaffeNetwork, solver: CaffeSolver, isTraining: 
Boolean): Unit = {
     // Append source statements for layers as well as solver
+    source(net, solver, if (isTraining) Array[String]("l1_reg") else null)
     source(net, solver, if (isTraining) Array[String]("l2_reg") else null)
 
     if (isTraining) {
@@ -284,14 +285,16 @@ trait DMLGenerator extends SourceDMLGenerator with 
NextBatchGenerator {
     assign(tabDMLScript, varName, "read(" + pathVar + ")")
   }
 
-  def readInputData(net: CaffeNetwork, isTraining: Boolean): Unit = {
+  def readInputData(net: CaffeNetwork, isTraining: Boolean, 
performOneHotEncoding:Boolean): Unit = {
     // Read and convert to one-hot encoding
     readMatrix("X_full", "$X")
     if (isTraining) {
       readMatrix("y_full", "$y")
       tabDMLScript.append(Caffe2DML.numImages).append(" = nrow(y_full)\n")
-      tabDMLScript.append("# Convert to one-hot encoding (Assumption: 1-based 
labels) \n")
-      tabDMLScript.append("y_full = table(seq(1," + Caffe2DML.numImages + 
",1), y_full, " + Caffe2DML.numImages + ", " + Utils.numClasses(net) + ")\n")
+      if(performOneHotEncoding) {
+        tabDMLScript.append("# Convert to one-hot encoding (Assumption: 
1-based labels) \n")
+        tabDMLScript.append("y_full = table(seq(1," + Caffe2DML.numImages + 
",1), y_full, " + Caffe2DML.numImages + ", " + Utils.numClasses(net) + ")\n")
+      }
     } else {
       tabDMLScript.append(Caffe2DML.numImages + " = nrow(X_full)\n")
     }

http://git-wip-us.apache.org/repos/asf/systemml/blob/9dc354ac/src/main/scala/org/apache/sysml/api/dl/Utils.scala
----------------------------------------------------------------------
diff --git a/src/main/scala/org/apache/sysml/api/dl/Utils.scala 
b/src/main/scala/org/apache/sysml/api/dl/Utils.scala
index 19596c3..5939cf1 100644
--- a/src/main/scala/org/apache/sysml/api/dl/Utils.scala
+++ b/src/main/scala/org/apache/sysml/api/dl/Utils.scala
@@ -71,12 +71,14 @@ object Utils {
     val momentum = if (solver.hasMomentum) solver.getMomentum else 0.0
     val lambda   = if (solver.hasWeightDecay) solver.getWeightDecay else 0.0
     val delta    = if (solver.hasDelta) solver.getDelta else 0.0
+    val regularizationType = if(solver.hasRegularizationType) 
solver.getRegularizationType else "L2"
 
     solver.getType.toLowerCase match {
-      case "sgd"      => new SGD(lambda, momentum)
-      case "adagrad"  => new AdaGrad(lambda, delta)
-      case "nesterov" => new Nesterov(lambda, momentum)
-      case _          => throw new DMLRuntimeException("The solver type is not 
supported: " + solver.getType + ". Try: SGD, AdaGrad or Nesterov.")
+      case "sgd"      => new SGD(regularizationType, lambda, momentum)
+      case "adagrad"  => new AdaGrad(regularizationType, lambda, delta)
+      case "nesterov" => new Nesterov(regularizationType, lambda, momentum)
+      case "adam"        => new Adam(regularizationType, lambda, momentum, 
if(solver.hasMomentum2) solver.getMomentum2 else 0.0, delta)
+      case _          => throw new DMLRuntimeException("The solver type is not 
supported: " + solver.getType + ". Try: SGD, AdaGrad or Nesterov or Adam.")
     }
 
   }

http://git-wip-us.apache.org/repos/asf/systemml/blob/9dc354ac/src/main/scala/org/apache/sysml/api/ml/BaseSystemMLClassifier.scala
----------------------------------------------------------------------
diff --git 
a/src/main/scala/org/apache/sysml/api/ml/BaseSystemMLClassifier.scala 
b/src/main/scala/org/apache/sysml/api/ml/BaseSystemMLClassifier.scala
index ce92321..97abe9e 100644
--- a/src/main/scala/org/apache/sysml/api/ml/BaseSystemMLClassifier.scala
+++ b/src/main/scala/org/apache/sysml/api/ml/BaseSystemMLClassifier.scala
@@ -257,7 +257,7 @@ trait BaseSystemMLClassifierModel extends 
BaseSystemMLEstimatorModel {
     val freeMem = Runtime.getRuntime().freeMemory();
     if(freeMem < OptimizerUtils.getLocalMemBudget()) {
        val LOG = 
LogFactory.getLog(classOf[BaseSystemMLClassifierModel].getName())
-       LOG.warn("SystemML local memory budget:" + 
OptimizerUtils.toMB(OptimizerUtils.getLocalMemBudget()) + " mb. Approximate 
free memory abailable:" + OptimizerUtils.toMB(freeMem));
+       LOG.warn("SystemML local memory budget:" + 
OptimizerUtils.toMB(OptimizerUtils.getLocalMemBudget()) + " mb. Approximate 
free memory available:" + OptimizerUtils.toMB(freeMem));
     }
     val ret = (new 
MLContext(sc)).execute(script1).getMatrix("Prediction").toMatrixBlock

systemml git commit: [SYSTEMML-540] Added optimizer support in Keras2DML

Reply via email to