[ 
https://issues.apache.org/jira/browse/LUCENE-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090547#comment-14090547
 ] 

Gergő Törcsvári commented on LUCENE-5698:
-----------------------------------------

Yes, I cut the ClassificationTestBase to TestDataset and TestAssertion thats 
why they duplicating, It is a working progress idea, I think it could be a good 
direction, but it needs some suggestion. I'm working on the publicly available 
datasets.

I have a question for that:
What is the preferd way for the public test set. I could build a separated 
project, with example code snippets for training and evaluating (and maybe some 
measurement summary). Or I can make a project what is making the index, what 
can be plug into the Lucene tests. The second method have some cons like the 
test need fail if it classifying badly, and some question like; where I need to 
put the initial indexes (or source documents).

(I started a separated project but of course it can be merged.)  

> Evaluate Lucene classification on publicly available datasets
> -------------------------------------------------------------
>
>                 Key: LUCENE-5698
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5698
>             Project: Lucene - Core
>          Issue Type: Sub-task
>          Components: modules/classification
>            Reporter: Gergő Törcsvári
>            Assignee: Tommaso Teofili
>         Attachments: 0803-test.patch
>
>
> The Lucene classification module need some publicly available dataset for 
> keep track on the development.
> Now it woud be nice to have some generated fast test-sets, and some bigger 
> real world dataset too.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to