acphile opened a new issue #18046: Proposal to mxnet.metric
URL: https://github.com/apache/incubator-mxnet/issues/18046
 
 
   ## Motivation
   
   mxnet.metric provides different methods for users to judge the performance 
of models. But currently there are some shortcomings which need to be improved 
in mxnet.metric. We propose to refactor the metrics interface to fix all issues 
and place the new interface under mx.gluon.metrics.
   
   In general, we want to make the following improvements:
   
   1. Moving the API to the gluon namespace
   2. Make the API more user-friendly and pythonic
   3. Structure the API to make hybridization of the complete training loop 
more easily feasible in the future.
   
   ### 1. Inconsistency in computational granularity of metrics
   
   Currently there are two computational granularities in mxnet.metric:
   
   1. “macro” level: calculate average performance per batch , like 
implementation in 
[MAE](http://mxnet.incubator.apache.org/api/python/docs/api/metric/index.html?highlight=metric#mxnet.metric.MAE)
   2. “micro” level: calculate average performance per sample, like 
implementation in 
[Accuracy](http://mxnet.incubator.apache.org/api/python/docs/api/metric/index.html?highlight=metric#mxnet.metric.Accuracy),
 
[CrossEntropy](http://mxnet.incubator.apache.org/api/python/docs/api/metric/index.html?highlight=metric#mxnet.metric.CrossEntropy)
   
   Generally, “micro” level is more useful because usually we focus on average 
performance of data samples in the test set rather than that of testing 
batches. So here we need to make arrangements between these metrics.
   
   ### 2. For future hybridization of the complete training loop
   
   Currently metrics in mxnet.metric receives “list of NDArray” and calculate 
results by numpy. In fact, many metrics’ computation could be implemented in 
nn.HybridBlock. Using HybridBlock.hybridize(), the computation could be done in 
the backend, which could be faster. By refactoring the mxnet.metric, we could 
one day compile the model with the metric like Tensorflow and do the complete 
training loop including evaluation fully in the backend. Thus our new API 
design takes into account the hybridization use-case, so that hybridizing the 
complete training loop will be easily possible once the backend support is 
there.
   
   ### 3. lacking some useful metrics
   
   Although many metrics are already included, some still need to be 
implemented.
   
   Apart from the metrics already provided in mxnet.metric: 
http://mxnet.incubator.apache.org/api/python/docs/api/metric/index.html?highlight=metric#module-mxnet.metric
 ,  we plan to add the following metrics:
   
   1. F-beta score: (1+beta^2)*precision*recall/(beta^2*precision+recall)
   2. binary accuracy with threshold: using a confidence threshold to judge 
whether the example is positive or negative
   3. MeanCosineSimilarity: return the average cosin similarity between 
predictions and ground truth
   4. MeanPairwiseDistance:  return the average pairwise distance between 
predictions and ground truth
   
   ### 4. Fixing issues in the existing metrics
   
   Some special cases and input shapes need to be examined and fixed.
   About EvalMetric (base class in metrics.py)
   
   1. distinction between local and global:
       a. Currently for metrics in metric.py, when update() is called, both 
local accumulator and global accumulator are updated with the same value. 
       b. Global accumulator may be useful when there are different parts 
during evaluation (for example, joint training on different datasets). You may 
want to get evaluation result of one part and call “reset_local()” to continue 
the evaluation for next part. In the end, you can call “get_global()” to obtain 
the overall evaluation performance.
       c. You may also define the way to update local and global results in 
your own metric(EvalMetric)
   2. parameter “output_names” “label_names” and method “update_dict”
       a. Seemingly I only find “update_dict” in 
“https://github.com/apache/incubator-mxnet/blob/48e9e2c6a1544843ba860124f4eaa8e7bac6100b/python/mxnet/module/executor_group.py”,
 where I think using “update” is also reasonable.
       b. I don’t know where the corresponding parameter 
"output_names","label_names" could be used, since there are not corresponding 
examples.
   3. get_name_value()
       a. return metric’s name and metric’s evalutaion value pairs.
       b. It is helpful when using CompositeEvalMetric
   
   Here are the detailed changes to be made:
   
   1. improve Class MAE (and MSE, RMSE)
       a. including parameter “average”, default average=“macro”
             i. “macro” represents average per batch
             ii. “micro” represents average per example
       b. including micro level calculation:
   2. improve Class _BinaryClassification
       a. support the situation len(pred.shape)==1 
             i. for binary classification, we only need to output a confidence 
score of being positive, like: pred=[0.1,0.3,0.7] or like 
pred=[[0.1],[0.3],[0.7]]
       b. including parameter “threshold”, default: threshold=0.5
             i. sometimes we may need to define a threshold that when 
confidence(positive) > threshold, we classify it as positive, otherwise negative
       c. including parameter “beta” default: beta=1
             i. updating “fscore” calculation with F-beta= 
(1+beta^2)*precision*recall/(beta^2*precision+recall), which is more general
       d. including method binary_accuracy:
             i. calculation: (true_positives+true_negatives)/total_examples
   3. improve Class TopKAccuracy
       a. Line 578-579: self.global_sum_metric should be accumulated
   4. add Class MeanCosineSimilarity(axis=-1, eps=1e-12)
   5. add Class MeanPairwiseDistance(p=2)
   
   ## Comparisons with other framework
   
   ### Compared with Pytorch Ignite
   
   Reference: https://pytorch.org/ignite/metrics.html
   Base class for metrics is implemented independently. Metrics in 
ignite.metrics use .attach() method to use the output of the engine’s 
process_function. It is done by letting the engine to add_event_handler. 
   Metric arithmetics are supported, which is like mxnet.metrics.CustomMetric
   Some metrics currently are not included in ours:
   
   1. 
[ConfusionMatrix](https://pytorch.org/ignite/metrics.html#ignite.metrics.ConfusionMatrix)
   2. 
[DiceCoefficient()](https://pytorch.org/ignite/metrics.html#ignite.metrics.DiceCoefficient)
   3. [IoU()](https://pytorch.org/ignite/metrics.html#ignite.metrics.IoU)
   4. [mIoU()](https://pytorch.org/ignite/metrics.html#ignite.metrics.mIoU)
   5. 
[MeanPairwiseDistance](https://pytorch.org/ignite/metrics.html#ignite.metrics.MeanPairwiseDistance)
   
   ### Compared with Tensorflow Keras 
   
   Reference: 
https://tensorflow.google.cn/api_docs/python/tf/keras/metrics?hl=en
   Base class for metrics inherits from tf.keras.engine.base_layer.Layer, which 
is also the class from which all layers inherit. Metric functions in 
tf.keras.metrics could be supplied in the metrics parameter when a model is 
compiled. 
   Generally, metric functions in tf.keras.metrics have an input 
*sample_weight* defining contributing weights when updating the states.
   tf.keras.metrics use 
[Accuracy](https://tensorflow.google.cn/api_docs/python/tf/keras/metrics/Accuracy)and
 
[SparseCategoricalAccuracy](https://tensorflow.google.cn/api_docs/python/tf/keras/metrics/SparseCategoricalAccuracy)to
 denote the situation that y_pred is predicted label and the situation that 
y_pred is probability distribution, which I think may be to avoid internal 
shape checking. Currently we could combine them in one metric.
   Some metrics currently are not included in ours:
   
   1. [AUC](https://tensorflow.google.cn/api_docs/python/tf/keras/metrics/AUC)
   2. 
[BinaryAccuracy](https://tensorflow.google.cn/api_docs/python/tf/keras/metrics/BinaryAccuracy)
   3. Hinge related, like 
[SquaredHinge](https://tensorflow.google.cn/api_docs/python/tf/keras/metrics/SquaredHinge)
  [Hinge](https://tensorflow.google.cn/api_docs/python/tf/keras/metrics/Hinge) 
[CategoricalHinge](https://tensorflow.google.cn/api_docs/python/tf/keras/metrics/CategoricalHinge)
   4. 
[CosineSimilarity](https://tensorflow.google.cn/api_docs/python/tf/keras/metrics/CosineSimilarity)
   5. 
[KLDivergence](https://tensorflow.google.cn/api_docs/python/tf/keras/metrics/KLDivergence)
   6. 
[LogCoshError](https://tensorflow.google.cn/api_docs/python/tf/keras/metrics/LogCoshError)
 :logcosh = log((exp(x) + exp(-x))/2), where x is the error (y_pred - y_true)
   7. 
[MeanIoU](https://tensorflow.google.cn/api_docs/python/tf/keras/metrics/MeanIoU)
   8. 
[Poisson](https://tensorflow.google.cn/api_docs/python/tf/keras/metrics/Poisson)
   9. 
[SensitivityAtSpecificity](https://tensorflow.google.cn/api_docs/python/tf/keras/metrics/SensitivityAtSpecificity)
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to