[
https://issues.apache.org/jira/browse/TIKA-1484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218553#comment-14218553
]
Ken Krugler commented on TIKA-1484:
-----------------------------------
1. I assume you can exclude the Boilerpipe jar from the Tika dependency, as a
work-around (though only if you don't need Boilerpipe). Or is that not working?
2. Do you have a candidate for replacing Boilerpipe?
3. Another possibility is that we create a facade that lets you plug in the
implementation. This would let us remove the explicit dependency on Boilerpipe.
Though anyone who's dealt with this and XML parsers understands that it can
also cause pain and suffering.
> Boilerpipe dependency is evil
> -----------------------------
>
> Key: TIKA-1484
> URL: https://issues.apache.org/jira/browse/TIKA-1484
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.6
> Reporter: Ben McCann
>
> The Boilerpipe project bundles inside it two classes from org.cyberneko.html.
> We're already using NekoHTML in our project. Depending on which library shows
> up on our classpath certain parts of our project will either work or not. I'd
> really love it if Boilerpipe could be fixed or replaced with some other
> library that is a better citizen.
> I see I'm not the first person to run into this as another Tika user has
> filed a bug on the Boilerpipe project:
> https://code.google.com/p/boilerpipe/issues/detail?id=62
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)