[ 
https://issues.apache.org/jira/browse/LUCENE-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084775#comment-14084775
 ] 

Gergő Törcsvári commented on LUCENE-5699:
-----------------------------------------

So why good the normalized and normalizedList functions?

First of all, why normalized?
When I first tried to use the Lucene Classification, one of the bigger problem 
was, that the scores, whats come back means nothing. Basically the classifier 
returns the class, and a random number. If you have 2 text, and you push them 
in the classifier, the scores didn't help you  to figure out what result is 
more trustworthy.
The normalized values have that option. If you want to tell the user how sure 
are you, the normalized values help you out.

Second, why lists?
If you can tell the user, how sure are you, it's not far that you want to tell 
them whats are the other options. What are the 3 more relevant or 5 more 
relevant class.
Most of the classification algorithms have those numbers a prior.

The problem with the normalization and the lists:
Sadly not all classification algorithm have lists, they just drop classes. So 
it can't go instantly to the api, because some classification method never have 
list or score.


I have 2 api suggestion:
The first where the Classifier interface get those normalized and 
normalizedList functions, and some of the implementations drop exceptions if 
somebody want to use them.
Or, the Classifier interface don't get them, but some classifier can provide 
these functions.

> Lucene classification score calculation normalize and return lists
> ------------------------------------------------------------------
>
>                 Key: LUCENE-5699
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5699
>             Project: Lucene - Core
>          Issue Type: Sub-task
>          Components: modules/classification
>            Reporter: Gergő Törcsvári
>            Assignee: Tommaso Teofili
>         Attachments: 06-06-5699.patch, 0730.patch, 0803-base.patch
>
>
> Now the classifiers can return only the "best matching" classes. If somebody 
> want it to use more complex tasks he need to modify these classes for get 
> second and third results too. If it is possible to return a list and it is 
> not a lot resource why we dont do that? (We iterate a list so also.)
> The Bayes classifier get too small return values, and there were a bug with 
> the zero floats. It was fixed with logarithmic. It would be nice to scale the 
> class scores sum vlue to one, and then we coud compare two documents return 
> score and relevance. (If we dont do this the wordcount in the test documents 
> affected the result score.)
> With bulletpoints:
> * In the Bayes classification normalized score values, and return with result 
> lists.
> * In the KNN classifier possibility to return a result list.
> * Make the ClassificationResult Comparable for list sorting.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to