[
https://issues.apache.org/jira/browse/TIKA-1484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218582#comment-14218582
]
Ben McCann commented on TIKA-1484:
----------------------------------
Yes, it turns out I can exclude Boilerpipe. I wasn't sure if I could at first
because I wasn't sure how Tika was using it. I had to checkout and read the
source code to determine if this was a safe option.
I don't have any candidates for replacing it.
Maybe it could be moved to a separate boilerpipe-parser project?
> Boilerpipe dependency is evil
> -----------------------------
>
> Key: TIKA-1484
> URL: https://issues.apache.org/jira/browse/TIKA-1484
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.6
> Reporter: Ben McCann
>
> The Boilerpipe project bundles inside it two classes from org.cyberneko.html.
> We're already using NekoHTML in our project. Depending on which library shows
> up on our classpath certain parts of our project will either work or not. I'd
> really love it if Boilerpipe could be fixed or replaced with some other
> library that is a better citizen.
> I see I'm not the first person to run into this as another Tika user has
> filed a bug on the Boilerpipe project:
> https://code.google.com/p/boilerpipe/issues/detail?id=62
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)