[ 
https://issues.apache.org/jira/browse/LUCENE-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044506#comment-14044506
 ] 

Gergő Törcsvári commented on LUCENE-5699:
-----------------------------------------

Yes, the compiler error was something like that, i pressed ctrl+shift+o to 
organize imports and it vanished in eclipse. (But its build in eclipse without 
error...) My bad.

In the KNN there was a maximum search, the list building, sorting and pick the 
first element is not cost efficient if you have a huge number of classes it's 
totally true. But if you have a huge number of classes, the list building and 
Collections.sort will be your last problem in cost calculation :P If you have 
few classes, the list building and the max searching is the same complexity, 
and the collections.sort is the time what you wasted, buts it will be fast 
because of the elements number. Thats the reason why I made this, I think the 
search time not increasing relevantly.

The public "not in the Classifier" functions are there because not all the 
classifier can return with lists, but thats whose can, that could be a huge 
usability boost for them. There is 2 way there, add a new function in 
Classifier, and the not lister classifiers return with a 1 element list, or 
make an additional interface. As I see, there are only this kind of public 
functions are there.



> Lucene classification score calculation normalize and return lists
> ------------------------------------------------------------------
>
>                 Key: LUCENE-5699
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5699
>             Project: Lucene - Core
>          Issue Type: Sub-task
>          Components: modules/classification
>            Reporter: Gergő Törcsvári
>            Assignee: Tommaso Teofili
>         Attachments: 06-06-5699.patch
>
>
> Now the classifiers can return only the "best matching" classes. If somebody 
> want it to use more complex tasks he need to modify these classes for get 
> second and third results too. If it is possible to return a list and it is 
> not a lot resource why we dont do that? (We iterate a list so also.)
> The Bayes classifier get too small return values, and there were a bug with 
> the zero floats. It was fixed with logarithmic. It would be nice to scale the 
> class scores sum vlue to one, and then we coud compare two documents return 
> score and relevance. (If we dont do this the wordcount in the test documents 
> affected the result score.)
> With bulletpoints:
> * In the Bayes classification normalized score values, and return with result 
> lists.
> * In the KNN classifier possibility to return a result list.
> * Make the ClassificationResult Comparable for list sorting.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to