[
https://issues.apache.org/jira/browse/LUCENE-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090547#comment-14090547
]
Gergő Törcsvári commented on LUCENE-5698:
-----------------------------------------
Yes, I cut the ClassificationTestBase to TestDataset and TestAssertion thats
why they duplicating, It is a working progress idea, I think it could be a good
direction, but it needs some suggestion. I'm working on the publicly available
datasets.
I have a question for that:
What is the preferd way for the public test set. I could build a separated
project, with example code snippets for training and evaluating (and maybe some
measurement summary). Or I can make a project what is making the index, what
can be plug into the Lucene tests. The second method have some cons like the
test need fail if it classifying badly, and some question like; where I need to
put the initial indexes (or source documents).
(I started a separated project but of course it can be merged.)
> Evaluate Lucene classification on publicly available datasets
> -------------------------------------------------------------
>
> Key: LUCENE-5698
> URL: https://issues.apache.org/jira/browse/LUCENE-5698
> Project: Lucene - Core
> Issue Type: Sub-task
> Components: modules/classification
> Reporter: Gergő Törcsvári
> Assignee: Tommaso Teofili
> Attachments: 0803-test.patch
>
>
> The Lucene classification module need some publicly available dataset for
> keep track on the development.
> Now it woud be nice to have some generated fast test-sets, and some bigger
> real world dataset too.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]