[GitHub] Ishitori commented on a change in pull request #11651: Add logistic regression tutorial

GitBox Fri, 20 Jul 2018 17:00:44 -0700

Ishitori commented on a change in pull request #11651: Add logistic regression 
tutorial
URL: https://github.com/apache/incubator-mxnet/pull/11651#discussion_r204193164


 ##########
 File path: docs/tutorials/gluon/logistic_regression_explained.md
 ##########
 @@ -120,96 +108,115 @@ for e in range(epochs):
         # Calculate gradients 
         loss_result.backward()
 
-        # Change parameters of the network
+        # Update parameters of the network
         trainer.step(batch_size)
 
-        # Since we calculate loss per single batch, but want to display it per 
epoch
-        # we sum losses of every batch per an epoch into a single variable
+        # sum losses of every batch
         cumulative_train_loss += nd.sum(loss_result).asscalar()
+    
+    return cumulative_train_loss
+```
+
+## Validating the model
+
+Our validation function is very similar to the training one. The main 
difference is that we want to calculate accuracy of the model. We use [Accuracy 
metric](https://mxnet.incubator.apache.org/api/python/model.html#mxnet.metric.Accuracy)
 to do so. 
+
+`Accuracy` metric requires 2 arguments: 1) a vector of ground-truth classes 
and 2) A vector or matrix of predictions. When predictions are of the same 
shape as the vector of ground-truth classes, `Accuracy` class assumes that 
prediction vector contains predicted classes. So, it converts the vector to 
`Int32` and compare each item of ground-truth classes to prediction vector. 
+
+Because of the behaviour above, you will get an unexpected result if you just 
apply 
[Sigmoid](https://mxnet.incubator.apache.org/api/python/ndarray/ndarray.html#mxnet.ndarray.sigmoid)
 function to the network result and pass it to `Accuracy` metric. As mentioned 
before, we need to apply `Sigmoid` function to the output of the neuron to get 
a probability of belonging to the class 1. But `Sigmoid` function produces 
output in range [0; 1], and all numbers in that range are going to be casted to 
0, even if it is as high as 0.99. To avoid this we write a custom bit of code 
on line 12, that:
 
-    # iterate over all batches of validation data and calculate validation loss
+1. Calculates sigmoid using `Sigmoid` function
+
+2. Subtracts a threshold from the original sigmoid output. Usually, the 
threshold is equal to 0.5, but it can be higher, if you want to increase 
certainty of an item to belong to class 1.
+
+3. Uses 
[mx.nd.ceil](https://mxnet.incubator.apache.org/api/python/ndarray/ndarray.html#mxnet.ndarray.ceil)
 function, which converts all negative values to 0 and all positive values to 1
+
+After these transformations we can pass the result to `Accuracy.update()` 
method and expect it to behave in a proper way.
+
+
+```python
+def validate_model(threshold):
+    cumulative_val_loss = 0
+    
     for i, (val_data, val_ground_truth_class) in enumerate(val_dataloader):
         # Do forward pass on a batch of validation data
         output = net(val_data)
         
         # Similar to cumulative training loss, calculate cumulative validation 
loss
         cumulative_val_loss += nd.sum(loss(output, 
val_ground_truth_class)).asscalar()
         
-        # Applying sigmoid function, to get data in range [0, 1] and then
-        # subtracting threshold, to make 0 serve as a class boundary: below 0 
- class 0, above 0 - class 1
-        # Apply mx.nd.ceil to get classes: convert negative values to 0 and 
positive to 1.
+        # Converting neuron outputs to classes
         prediction = mx.nd.ceil(net(val_data).sigmoid() - threshold)
         
-        # Reshape predictions to match dimension of val_ground_truth_class
-        # and update accuracy with the results for that batch.
+        # Update validation accuracy
         accuracy.update(val_ground_truth_class, prediction.reshape(-1)) 
+
+    return cumulative_val_loss
+```
+
+## Putting it all together
+
+By using the defined above functions, we can finally write our main training 
loop.
+
+
+```python
+epochs = 10
+threshold = 0.5
+
+for e in range(epochs):
+    cumulative_train_loss = train_model()
+    cumulative_val_loss = validate_model(threshold)
     
-    # in the end of epoch, we print out current values for epoch, training and 
validation losses, and accuracy
     print("Epoch: %s, Training loss: %.2f, Validation loss: %.2f, Validation 
accuracy: %s" % 
           (e, cumulative_train_loss, cumulative_val_loss, accuracy.get()[1]))
 
-    # we reset accuracy, so the new epoch's accuracy would be calculate from 
the blank state
+    # we reset accuracy, so the new epoch's accuracy would be calculated from 
the blank state
     accuracy.reset()
-
 ```
 
-    Epoch: 0, Training loss: 446.68, Validation loss: 40.19, Validation 
accuracy: 0.85 <!--notebook-skip-line-->
-
-    Epoch: 1, Training loss: 343.15, Validation loss: 30.82, Validation 
accuracy: 0.85 <!--notebook-skip-line-->
-    
-    Epoch: 2, Training loss: 187.40, Validation loss: 11.76, Validation 
accuracy: 0.96 <!--notebook-skip-line-->
-    
-    Epoch: 3, Training loss: 90.18, Validation loss: 10.13, Validation 
accuracy: 0.98 <!--notebook-skip-line-->
-    
-    Epoch: 4, Training loss: 68.51, Validation loss: 8.69, Validation 
accuracy: 0.97 <!--notebook-skip-line-->
-    
-    Epoch: 5, Training loss: 67.43, Validation loss: 6.71, Validation 
accuracy: 0.99 <!--notebook-skip-line-->
-    
-    Epoch: 6, Training loss: 54.76, Validation loss: 7.45, Validation 
accuracy: 0.98 <!--notebook-skip-line-->
-    
-    Epoch: 7, Training loss: 48.29, Validation loss: 8.56, Validation 
accuracy: 0.97 <!--notebook-skip-line-->
-    
-    Epoch: 8, Training loss: 50.50, Validation loss: 7.24, Validation 
accuracy: 0.98 <!--notebook-skip-line-->
-    
-    Epoch: 9, Training loss: 49.42, Validation loss: 7.46, Validation 
accuracy: 0.97 <!--notebook-skip-line-->
+    Epoch: 0, Training loss: 447.90, Validation loss: 40.13, Validation 
accuracy: 0.85 <!--notebook-skip-line-->
 
+    Epoch: 1, Training loss: 356.33, Validation loss: 34.38, Validation 
accuracy: 0.85 <!--notebook-skip-line-->
 
-## Tip 1: Use only one neuron in the output layer
+    Epoch: 2, Training loss: 238.26, Validation loss: 16.34, Validation 
accuracy: 0.93 <!--notebook-skip-line-->
 
-Despite that there are 2 classes, there should be only one output neuron, 
because `SigmoidBinaryCrossEntropyLoss` accepts only one feature as an input. 
+    Epoch: 3, Training loss: 106.45, Validation loss: 13.55, Validation 
accuracy: 0.95 <!--notebook-skip-line-->
 
-In case when there are 3 or more classes, one cannot use a single Logistic 
regression, but should do multiclass regression. The solution would be to 
increase the number of output neurons to the number of classes and use 
`SoftmaxCrossEntropyLoss`. 
+    Epoch: 4, Training loss: 77.17, Validation loss: 8.87, Validation 
accuracy: 0.97 <!--notebook-skip-line-->
 
-## Tip 2: Encode classes as 0 and 1
+    Epoch: 5, Training loss: 60.52, Validation loss: 10.60, Validation 
accuracy: 0.96 <!--notebook-skip-line-->
 
-`Sigmoid` function produces values from 0 to 1. 
`SigmoidBinaryCrossEntropyLoss` uses these values to calculate the   loss by 
essentially subtracting values and class labels from 1. [Here is the 
formula](https://mxnet.incubator.apache.org/api/python/gluon/loss.html?highlight=sigmoidbinarycrossentropyloss#mxnet.gluon.loss.SigmoidBinaryCrossEntropyLoss)
 used for that calculation (we use default version with `from_sigmoid` is 
False). That's why it is numerically better to have classes encoded in the same 
range as a `Sigmoid` output with 0 and 1.
+    Epoch: 6, Training loss: 55.00, Validation loss: 8.23, Validation 
accuracy: 0.97 <!--notebook-skip-line-->
 
-If your data comes with a label encoded in a different format, such as -1 and 
1, then you can either recode it to 0 and 1 by comparing the initial class to 
0, or use another function instead of `Sigmoid`, like 
[`Tanh`](https://mxnet.incubator.apache.org/api/python/ndarray/ndarray.html?highlight=tanh#mxnet.ndarray.tanh),
 to produce output in range [-1; 1].
+    Epoch: 7, Training loss: 56.08, Validation loss: 10.59, Validation 
accuracy: 0.96 <!--notebook-skip-line-->
 
-## Tip 3: Use SigmoidBinaryCrossEntropyLoss instead of LogisticRegressionOutput
+    Epoch: 8, Training loss: 56.10, Validation loss: 6.74, Validation 
accuracy: 0.97 <!--notebook-skip-line-->
 
-NDArray API has two options to calculate logistic regression loss. One is 
`SigmoidBinaryCrossEntropyLoss`, which I used in the example. This class 
inherits from the `Loss` class and is intended to be used as a loss function 
for logistic regression. But there is also a function called 
`LogisticRegressionOutput`, which can be applied to any `NDArray`. 
Mathematically speaking, this function does the same thing as 
`SigmoidBinaryCrossEntropyLoss`. 
+    Epoch: 9, Training loss: 51.81, Validation loss: 7.48, Validation 
accuracy: 0.98 <!--notebook-skip-line-->
 
-My recommendation would be to use `SigmoidBinaryCrossEntropyLoss`, because 
this class properly inherits from `Loss` class, while 
`LogisticRegressionOutput` is just a regular function. 
`LogisticRegressionOutput` is a   function to go when implementing logistic 
regression using Symbol API, but in case of using Gluon API, there are no 
benefits using it. The only case when you may want to consider using 
`LogisticRegressionOutput` as your loss, is when you need to have a support for 
sparse matrices.
 
-## Tip 4: Convert probabilities to classes before calculating Accuracy
+In our case we easily hit the accuracy of 0.98.
 
-`Accuracy` metric requires 2 arguments: 1) a vector of ground-truth classes 
and 2) A tensor of predictions. When tensor of predictions is of the same shape 
as the vector of ground-truth classes, `Accuracy` class assumes that it should 
contain predicted classes. So, it converts the vector to `Int32` and compare 
each item of ground-truth classes to prediction vector. 
+## Tip 1: Use only one neuron in the output layer
 
-Because of the behaviour above, you will get an unexpected result if you just 
pass the output of `Sigmoid` function as is. `Sigmoid` function produces output 
in range [0; 1], and all numbers in that range are going to be casted to 0, 
even if it is as high as 0.99. To avoid this we write a custom bit of code, 
that:
+Despite that there are 2 classes, there should be only one output neuron, 
because `SigmoidBinaryCrossEntropyLoss` accepts only one feature as an input. 
 
-1. Subtracts a threshold from the original prediction. Usually, the threshold 
is equal to 0.5, but it can be higher, if you want to increase certainty of an 
item to belong to class 1.
+In case when there are 3 or more classes, one cannot use a single Logistic 
regression, but should do multiclass regression. The solution would be to 
increase the number of output neurons to the number of classes and use 
`SoftmaxCrossEntropyLoss`. 
 
 Review comment:
   Removed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] Ishitori commented on a change in pull request #11651: Add logistic regression tutorial

Reply via email to