[ 
https://issues.apache.org/jira/browse/TIKA-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yaniv Kunda updated TIKA-1706:
------------------------------
    Attachment: TIKA-1706-2.patch
                TIKA-1706-1.patch

A proposed patch per [~grossws]'s suggestion from the dev mailing list -
The first patch contains the following:
- creation of the secondary jar using maven-shade-plugin:
-- used the *uber* classifier using <shadedClassifierName>
alternatives: shaded, nodep, all, etc.
Which one is best?
-- commons-io shaded under 
{{shaded.commons-io.$\{commons.io.version\}.org.apache.commons.io}} to avoid 
potential conflicts with other commons-io-shading dependencies e.g. as in 
org.ops4j.pax.url:pax-url-aether:2.3.0
-- automatic removal of unused classes using <minimizeJar>
- deprecated all classes that were copied from commons-io and modified them to 
extend their new counterparts 
- deprecated all constructors
- removed all identical or functionally identical methods
- modified all remaining methods to call alternative existing jdk/commons-io 
methods, deprecated them and refered to the used alternatives
_*Note: this was done only in IOUtils, where many methods that has the same 
signature as the ones in commons-io were modified along the way to use UTF-8 
instead of the platform default._
- all things should remain backward-compatible, except one: 
org.apache.tika.io.TaggedIOException(IOException, Object) will now throw a 
ClassCastException if the Object is not Serializable

The second patch contains trivial import changes in tika-core from 
org.apache.tika.io to org.apache.commons.io

> Bring back commons-io to tika-core
> ----------------------------------
>
>                 Key: TIKA-1706
>                 URL: https://issues.apache.org/jira/browse/TIKA-1706
>             Project: Tika
>          Issue Type: Improvement
>          Components: core
>            Reporter: Yaniv Kunda
>            Priority: Minor
>             Fix For: 1.11
>
>         Attachments: TIKA-1706-1.patch, TIKA-1706-2.patch
>
>
> TIKA-249 inlined select commons-io classes in order to simplify the 
> dependency tree and save some space.
> I believe these arguments are weaker nowadays due to the following concerns:
> - Most of the non-core modules already use commons-io, and since tika-core is 
> usually not used by itself, commons-io is already included with it
> - Since some modules use both tika-core and commons-io, it's not clear which 
> code should be used
> - Having the inlined classes causes more maintenance and/or technology debt 
> (which in turn causes more maintenance)
> - Newer commons-io code utilizes newer platform code, e.g. using Charset 
> objects instead of encoding names, being able to use StringBuilder instead of 
> StringBuffer, and so on.
> I'll be happy to provide a patch to replace usages of the inlined classes 
> with commons-io classes if this is accepted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to