[
https://issues.apache.org/jira/browse/TIKA-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071721#comment-13071721
]
Jukka Zitting commented on TIKA-686:
------------------------------------
We already did quite a bit of work towards making Tika degrade gracefully when
some dependencies are not present, so for now I'd rather encourage people to
exclude those dependencies they don't want instead of having to deal with an
explosion of dependencies.
My original idea for the Parser interface was that upstream parser libraries
could actually implement the interface directly, so that we wouldn't even need
any code in tika-parsers. So far we haven't done that too much because the
Parser interface was still evolving, but with the AbstractParser class and the
proposed cleanup of the Parser interface in 1.0 we should be in a good position
to start pushing the Parser implementations upstream.
For example with POI we could push the entire o.a.tika.parsers.microsoft
package up to be maintained and included inside POI as something like
o.a.poi.tika, either inside one of the existing POI jars (with tika-core as an
optional dependency) or as a separate poi-tika jar. Then people could get MS
Office support with dependencies to nothing but tika-core and POI. The
tika-parsers component would still exist as a composite that mostly just brings
together all known Apache-compatible parser implementations.
> Split tika-parsers into separate components
> -------------------------------------------
>
> Key: TIKA-686
> URL: https://issues.apache.org/jira/browse/TIKA-686
> Project: Tika
> Issue Type: Wish
> Components: parser
> Affects Versions: 0.9
> Reporter: Christopher Currie
> Priority: Minor
>
> The email thread [1] from two years ago that led to splitting Tika into
> separate components also suggested splitting tika-parsers into separate
> components based on dependencies. This would be extremely useful, especially
> in cases where a given parser has no dependencies beyond tika-core. Please
> consider refactoring the parsers into separate components for 1.0.
> [1] http://markmail.org/message/tavirkqhn6r2szrz
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira