Ishitori commented on a change in pull request #11651: Add logistic regression
tutorial
URL: https://github.com/apache/incubator-mxnet/pull/11651#discussion_r204193164
##########
File path: docs/tutorials/gluon/logistic_regression_explained.md
##########
@@ -120,96 +108,115 @@ for e in range(epochs):
# Calculate gradients
loss_result.backward()
- # Change parameters of the network
+ # Update parameters of the network
trainer.step(batch_size)
- # Since we calculate loss per single batch, but want to display it per
epoch
- # we sum losses of every batch per an epoch into a single variable
+ # sum losses of every batch
cumulative_train_loss += nd.sum(loss_result).asscalar()
+
+ return cumulative_train_loss
+```
+
+## Validating the model
+
+Our validation function is very similar to the training one. The main
difference is that we want to calculate accuracy of the model. We use [Accuracy
metric](https://mxnet.incubator.apache.org/api/python/model.html#mxnet.metric.Accuracy)
to do so.
+
+`Accuracy` metric requires 2 arguments: 1) a vector of ground-truth classes
and 2) A vector or matrix of predictions. When predictions are of the same
shape as the vector of ground-truth classes, `Accuracy` class assumes that
prediction vector contains predicted classes. So, it converts the vector to
`Int32` and compare each item of ground-truth classes to prediction vector.
+
+Because of the behaviour above, you will get an unexpected result if you just
apply
[Sigmoid](https://mxnet.incubator.apache.org/api/python/ndarray/ndarray.html#mxnet.ndarray.sigmoid)
function to the network result and pass it to `Accuracy` metric. As mentioned
before, we need to apply `Sigmoid` function to the output of the neuron to get
a probability of belonging to the class 1. But `Sigmoid` function produces
output in range [0; 1], and all numbers in that range are going to be casted to
0, even if it is as high as 0.99. To avoid this we write a custom bit of code
on line 12, that:
- # iterate over all batches of validation data and calculate validation loss
+1. Calculates sigmoid using `Sigmoid` function
+
+2. Subtracts a threshold from the original sigmoid output. Usually, the
threshold is equal to 0.5, but it can be higher, if you want to increase
certainty of an item to belong to class 1.
+
+3. Uses
[mx.nd.ceil](https://mxnet.incubator.apache.org/api/python/ndarray/ndarray.html#mxnet.ndarray.ceil)
function, which converts all negative values to 0 and all positive values to 1
+
+After these transformations we can pass the result to `Accuracy.update()`
method and expect it to behave in a proper way.
+
+
+```python
+def validate_model(threshold):
+ cumulative_val_loss = 0
+
for i, (val_data, val_ground_truth_class) in enumerate(val_dataloader):
# Do forward pass on a batch of validation data
output = net(val_data)
# Similar to cumulative training loss, calculate cumulative validation
loss
cumulative_val_loss += nd.sum(loss(output,
val_ground_truth_class)).asscalar()
- # Applying sigmoid function, to get data in range [0, 1] and then
- # subtracting threshold, to make 0 serve as a class boundary: below 0
- class 0, above 0 - class 1
- # Apply mx.nd.ceil to get classes: convert negative values to 0 and
positive to 1.
+ # Converting neuron outputs to classes
prediction = mx.nd.ceil(net(val_data).sigmoid() - threshold)
- # Reshape predictions to match dimension of val_ground_truth_class
- # and update accuracy with the results for that batch.
+ # Update validation accuracy
accuracy.update(val_ground_truth_class, prediction.reshape(-1))
+
+ return cumulative_val_loss
+```
+
+## Putting it all together
+
+By using the defined above functions, we can finally write our main training
loop.
+
+
+```python
+epochs = 10
+threshold = 0.5
+
+for e in range(epochs):
+ cumulative_train_loss = train_model()
+ cumulative_val_loss = validate_model(threshold)
- # in the end of epoch, we print out current values for epoch, training and
validation losses, and accuracy
print("Epoch: %s, Training loss: %.2f, Validation loss: %.2f, Validation
accuracy: %s" %
(e, cumulative_train_loss, cumulative_val_loss, accuracy.get()[1]))
- # we reset accuracy, so the new epoch's accuracy would be calculate from
the blank state
+ # we reset accuracy, so the new epoch's accuracy would be calculated from
the blank state
accuracy.reset()
-
```
- Epoch: 0, Training loss: 446.68, Validation loss: 40.19, Validation
accuracy: 0.85 <!--notebook-skip-line-->
-
- Epoch: 1, Training loss: 343.15, Validation loss: 30.82, Validation
accuracy: 0.85 <!--notebook-skip-line-->
-
- Epoch: 2, Training loss: 187.40, Validation loss: 11.76, Validation
accuracy: 0.96 <!--notebook-skip-line-->
-
- Epoch: 3, Training loss: 90.18, Validation loss: 10.13, Validation
accuracy: 0.98 <!--notebook-skip-line-->
-
- Epoch: 4, Training loss: 68.51, Validation loss: 8.69, Validation
accuracy: 0.97 <!--notebook-skip-line-->
-
- Epoch: 5, Training loss: 67.43, Validation loss: 6.71, Validation
accuracy: 0.99 <!--notebook-skip-line-->
-
- Epoch: 6, Training loss: 54.76, Validation loss: 7.45, Validation
accuracy: 0.98 <!--notebook-skip-line-->
-
- Epoch: 7, Training loss: 48.29, Validation loss: 8.56, Validation
accuracy: 0.97 <!--notebook-skip-line-->
-
- Epoch: 8, Training loss: 50.50, Validation loss: 7.24, Validation
accuracy: 0.98 <!--notebook-skip-line-->
-
- Epoch: 9, Training loss: 49.42, Validation loss: 7.46, Validation
accuracy: 0.97 <!--notebook-skip-line-->
+ Epoch: 0, Training loss: 447.90, Validation loss: 40.13, Validation
accuracy: 0.85 <!--notebook-skip-line-->
+ Epoch: 1, Training loss: 356.33, Validation loss: 34.38, Validation
accuracy: 0.85 <!--notebook-skip-line-->
-## Tip 1: Use only one neuron in the output layer
+ Epoch: 2, Training loss: 238.26, Validation loss: 16.34, Validation
accuracy: 0.93 <!--notebook-skip-line-->
-Despite that there are 2 classes, there should be only one output neuron,
because `SigmoidBinaryCrossEntropyLoss` accepts only one feature as an input.
+ Epoch: 3, Training loss: 106.45, Validation loss: 13.55, Validation
accuracy: 0.95 <!--notebook-skip-line-->
-In case when there are 3 or more classes, one cannot use a single Logistic
regression, but should do multiclass regression. The solution would be to
increase the number of output neurons to the number of classes and use
`SoftmaxCrossEntropyLoss`.
+ Epoch: 4, Training loss: 77.17, Validation loss: 8.87, Validation
accuracy: 0.97 <!--notebook-skip-line-->
-## Tip 2: Encode classes as 0 and 1
+ Epoch: 5, Training loss: 60.52, Validation loss: 10.60, Validation
accuracy: 0.96 <!--notebook-skip-line-->
-`Sigmoid` function produces values from 0 to 1.
`SigmoidBinaryCrossEntropyLoss` uses these values to calculate the loss by
essentially subtracting values and class labels from 1. [Here is the
formula](https://mxnet.incubator.apache.org/api/python/gluon/loss.html?highlight=sigmoidbinarycrossentropyloss#mxnet.gluon.loss.SigmoidBinaryCrossEntropyLoss)
used for that calculation (we use default version with `from_sigmoid` is
False). That's why it is numerically better to have classes encoded in the same
range as a `Sigmoid` output with 0 and 1.
+ Epoch: 6, Training loss: 55.00, Validation loss: 8.23, Validation
accuracy: 0.97 <!--notebook-skip-line-->
-If your data comes with a label encoded in a different format, such as -1 and
1, then you can either recode it to 0 and 1 by comparing the initial class to
0, or use another function instead of `Sigmoid`, like
[`Tanh`](https://mxnet.incubator.apache.org/api/python/ndarray/ndarray.html?highlight=tanh#mxnet.ndarray.tanh),
to produce output in range [-1; 1].
+ Epoch: 7, Training loss: 56.08, Validation loss: 10.59, Validation
accuracy: 0.96 <!--notebook-skip-line-->
-## Tip 3: Use SigmoidBinaryCrossEntropyLoss instead of LogisticRegressionOutput
+ Epoch: 8, Training loss: 56.10, Validation loss: 6.74, Validation
accuracy: 0.97 <!--notebook-skip-line-->
-NDArray API has two options to calculate logistic regression loss. One is
`SigmoidBinaryCrossEntropyLoss`, which I used in the example. This class
inherits from the `Loss` class and is intended to be used as a loss function
for logistic regression. But there is also a function called
`LogisticRegressionOutput`, which can be applied to any `NDArray`.
Mathematically speaking, this function does the same thing as
`SigmoidBinaryCrossEntropyLoss`.
+ Epoch: 9, Training loss: 51.81, Validation loss: 7.48, Validation
accuracy: 0.98 <!--notebook-skip-line-->
-My recommendation would be to use `SigmoidBinaryCrossEntropyLoss`, because
this class properly inherits from `Loss` class, while
`LogisticRegressionOutput` is just a regular function.
`LogisticRegressionOutput` is a function to go when implementing logistic
regression using Symbol API, but in case of using Gluon API, there are no
benefits using it. The only case when you may want to consider using
`LogisticRegressionOutput` as your loss, is when you need to have a support for
sparse matrices.
-## Tip 4: Convert probabilities to classes before calculating Accuracy
+In our case we easily hit the accuracy of 0.98.
-`Accuracy` metric requires 2 arguments: 1) a vector of ground-truth classes
and 2) A tensor of predictions. When tensor of predictions is of the same shape
as the vector of ground-truth classes, `Accuracy` class assumes that it should
contain predicted classes. So, it converts the vector to `Int32` and compare
each item of ground-truth classes to prediction vector.
+## Tip 1: Use only one neuron in the output layer
-Because of the behaviour above, you will get an unexpected result if you just
pass the output of `Sigmoid` function as is. `Sigmoid` function produces output
in range [0; 1], and all numbers in that range are going to be casted to 0,
even if it is as high as 0.99. To avoid this we write a custom bit of code,
that:
+Despite that there are 2 classes, there should be only one output neuron,
because `SigmoidBinaryCrossEntropyLoss` accepts only one feature as an input.
-1. Subtracts a threshold from the original prediction. Usually, the threshold
is equal to 0.5, but it can be higher, if you want to increase certainty of an
item to belong to class 1.
+In case when there are 3 or more classes, one cannot use a single Logistic
regression, but should do multiclass regression. The solution would be to
increase the number of output neurons to the number of classes and use
`SoftmaxCrossEntropyLoss`.
Review comment:
Removed
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services