[
https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771170#comment-17771170
]
Tim Allison edited comment on NUTCH-2959 at 10/2/23 3:51 PM:
-------------------------------------------------------------
I've continued to stub my toes on this this morning.
The best option, which I acknowledge might not be acceptable, seems to be to
create a separate (temporary!) shim project that shades commons-io for Tika and
POI and removes xerces/xml-apis.
The shaded fat tika-app jar didn't work because of xerces/xml-apis. I could
have done some ugly jar rewriting in ant to delete org/apache/xerces etc., but
that felt really awful.
The current shim project is here: https://github.com/tballison/hadoop-safe-tika
If this is something we want to pursue, I can run through the full tests etc
and then publish to maven central. I also have to add the language detector.
The repo is purely proof of concept and shouldn't even be built/tested locally
yet.
The goal would be to use this until Apache Tika, Apache POI and Apache Hadoop
can all get to a compatible version of commons-io.
This solution would allow us to avoid the messy shading of commons-io in
tika-app on the actual Apache Tika project.
WDYT?
was (Author: [email protected]):
I've continued to stub my toes on this this morning.
The best option, which I realize might not be acceptable, seems to be to create
a separate (temporary!) shim project that shades commons-io for Tika and POI
and removes xerces/xml-apis.
The shaded fat tika-app jar didn't work because of xerces/xml-apis.
The current shim project is here: https://github.com/tballison/hadoop-safe-tika
If this is something we want to pursue, I can run through the full tests etc
and then publish to maven central. I also have to add the language detector.
The repo is purely proof of concept and shouldn't even be built/tested locally
yet.
The goal would be to use this until Apache Tika, Apache POI and Apache Hadoop
can all get to a compatible version of commons-io.
This solution would allow us to avoid the messy shading of commons-io in
tika-app on the actual Apache Tika project.
WDYT?
> Upgrade to Apache Tika 2.9.0
> ----------------------------
>
> Key: NUTCH-2959
> URL: https://issues.apache.org/jira/browse/NUTCH-2959
> Project: Nutch
> Issue Type: Task
> Affects Versions: 1.19
> Reporter: Markus Jelsma
> Priority: Major
> Fix For: 1.20
>
> Attachments: NUTCH-2959.patch
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)