[
https://issues.apache.org/jira/browse/TIKA-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071846#comment-13071846
]
Christopher Currie commented on TIKA-686:
-----------------------------------------
I admit up front I'm biased toward the dependency management case. From my
perspective it's a pain to have to dig into the dependencies and exclude all
the ones I don't want.
In the end, I think the key question is "what's the common case?" Is it more
common to need a lot of parsers, or just one or two? If it's the former, I
think keeping a single jar makes a lot of sense. If it's one or two, then I
think having separate jars makes things better, because end-users have a clear
path: only care about AutoCAD? Take the DWGParser jar and you're done.
Alternatively, there are other Maven-level options that could be considered
that would be an improvement on the current state:
1. Make all of the dependencies of tika-parsers 'optional', except for
tika-core. This more closely matches the non-dependency-managed scenario, where
the end user is responsible for making sure he or she has all the required
dependencies for the parser in question.
2. Create pom-only modules for each parser, that pre-document the depenedency
filter. In other words, for each parser 'foo', create a tika-parser-foo pom
that depends on tika-parsers but excludes the dependencies that are not needed
by that parser. This saves each end user from the work of figuring out the
exclusion list by themselves.
Since I'm making the request, I'm happy to volunteer myself for some of the
grunt-work for any of these solutions, if resources are needed to get them done.
> Split tika-parsers into separate components
> -------------------------------------------
>
> Key: TIKA-686
> URL: https://issues.apache.org/jira/browse/TIKA-686
> Project: Tika
> Issue Type: Wish
> Components: parser
> Affects Versions: 0.9
> Reporter: Christopher Currie
> Priority: Minor
>
> The email thread [1] from two years ago that led to splitting Tika into
> separate components also suggested splitting tika-parsers into separate
> components based on dependencies. This would be extremely useful, especially
> in cases where a given parser has no dependencies beyond tika-core. Please
> consider refactoring the parsers into separate components for 1.0.
> [1] http://markmail.org/message/tavirkqhn6r2szrz
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira