[ 
https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771170#comment-17771170
 ] 

Tim Allison commented on NUTCH-2959:
------------------------------------

I've continued to stub my toes on this this morning.

The best option, which I realize might not be acceptable, seems to be to create 
a separate (temporary!) shim project that shades commons-io for Tika and POI 
and removes xerces/xml-apis.

The shaded fat tika-app jar didn't work because of xerces/xml-apis.

The current shim project is here: https://github.com/tballison/hadoop-safe-tika

If this is something we want to pursue, I can run through the full tests etc 
and then publish to maven central.  I also have to add the language detector.  
The repo is purely proof of concept and shouldn't even be built/tested locally 
yet.

The goal would be to use this until Apache Tika, Apache POI and Apache Hadoop 
can all get to a compatible version of commons-io.

This solution would allow us to avoid the messy shading of commons-io in 
tika-app on the actual Apache Tika project.

WDYT?

> Upgrade to Apache Tika 2.9.0
> ----------------------------
>
>                 Key: NUTCH-2959
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2959
>             Project: Nutch
>          Issue Type: Task
>    Affects Versions: 1.19
>            Reporter: Markus Jelsma
>            Priority: Major
>             Fix For: 1.20
>
>         Attachments: NUTCH-2959.patch
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to