[ https://issues.apache.org/jira/browse/LUCENE-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090547#comment-14090547 ]
Gergő Törcsvári commented on LUCENE-5698: ----------------------------------------- Yes, I cut the ClassificationTestBase to TestDataset and TestAssertion thats why they duplicating, It is a working progress idea, I think it could be a good direction, but it needs some suggestion. I'm working on the publicly available datasets. I have a question for that: What is the preferd way for the public test set. I could build a separated project, with example code snippets for training and evaluating (and maybe some measurement summary). Or I can make a project what is making the index, what can be plug into the Lucene tests. The second method have some cons like the test need fail if it classifying badly, and some question like; where I need to put the initial indexes (or source documents). (I started a separated project but of course it can be merged.) > Evaluate Lucene classification on publicly available datasets > ------------------------------------------------------------- > > Key: LUCENE-5698 > URL: https://issues.apache.org/jira/browse/LUCENE-5698 > Project: Lucene - Core > Issue Type: Sub-task > Components: modules/classification > Reporter: Gergő Törcsvári > Assignee: Tommaso Teofili > Attachments: 0803-test.patch > > > The Lucene classification module need some publicly available dataset for > keep track on the development. > Now it woud be nice to have some generated fast test-sets, and some bigger > real world dataset too. -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org