[
https://issues.apache.org/jira/browse/TIKA-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167891#comment-15167891
]
Ken Krugler commented on TIKA-1855:
-----------------------------------
The things I don't like about this approach are that (a) core becomes a dumping
ground for everyone's test data, and (b) it couples module development with the
core. Plus I'm waiting for the next crazy parser to be added that has 100MB of
binary test data, which will create an el grande jar that everybody is going to
be unzipping. So I guess I'd add scalability as another concern.
I haven't looked into where test files wind up, but I'd suspect that many of
the core tests that wind up needing to be in parsers due to data dependencies
aren't really the tests that should be run in core. I can see mime-type
detection being an example of wanting to have one of each, and (maybe) some of
the app/server tests, so I'd be fine with having a tika-test-corpus (or
whatever you want to call it) that has a good sampling of docs which are used
in these situations.
Finally, to make myself really popular, I'd prefer that we use the jar as a
test dependency (vs. zip/unzip), and for cases where we need to have an actual
file then use some utility code to extract/create the file.
Maybe we should have a Skype chat to discuss VF2F :)
> TIka 2.0 - Move shared test-code back to tika-core and distribute test files
> to parser modules
> ----------------------------------------------------------------------------------------------
>
> Key: TIKA-1855
> URL: https://issues.apache.org/jira/browse/TIKA-1855
> Project: Tika
> Issue Type: Sub-task
> Reporter: Tim Allison
> Assignee: Tim Allison
>
> Undo TIKA-1851, and divide test docs to appropriate parser modules.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)