[ https://issues.apache.org/jira/browse/TIKA-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yaniv Kunda updated TIKA-1706: ------------------------------ Attachment: TIKA-1706-2.patch TIKA-1706-1.patch A proposed patch per [~grossws]'s suggestion from the dev mailing list - The first patch contains the following: - creation of the secondary jar using maven-shade-plugin: -- used the *uber* classifier using <shadedClassifierName> alternatives: shaded, nodep, all, etc. Which one is best? -- commons-io shaded under {{shaded.commons-io.$\{commons.io.version\}.org.apache.commons.io}} to avoid potential conflicts with other commons-io-shading dependencies e.g. as in org.ops4j.pax.url:pax-url-aether:2.3.0 -- automatic removal of unused classes using <minimizeJar> - deprecated all classes that were copied from commons-io and modified them to extend their new counterparts - deprecated all constructors - removed all identical or functionally identical methods - modified all remaining methods to call alternative existing jdk/commons-io methods, deprecated them and refered to the used alternatives _*Note: this was done only in IOUtils, where many methods that has the same signature as the ones in commons-io were modified along the way to use UTF-8 instead of the platform default._ - all things should remain backward-compatible, except one: org.apache.tika.io.TaggedIOException(IOException, Object) will now throw a ClassCastException if the Object is not Serializable The second patch contains trivial import changes in tika-core from org.apache.tika.io to org.apache.commons.io > Bring back commons-io to tika-core > ---------------------------------- > > Key: TIKA-1706 > URL: https://issues.apache.org/jira/browse/TIKA-1706 > Project: Tika > Issue Type: Improvement > Components: core > Reporter: Yaniv Kunda > Priority: Minor > Fix For: 1.11 > > Attachments: TIKA-1706-1.patch, TIKA-1706-2.patch > > > TIKA-249 inlined select commons-io classes in order to simplify the > dependency tree and save some space. > I believe these arguments are weaker nowadays due to the following concerns: > - Most of the non-core modules already use commons-io, and since tika-core is > usually not used by itself, commons-io is already included with it > - Since some modules use both tika-core and commons-io, it's not clear which > code should be used > - Having the inlined classes causes more maintenance and/or technology debt > (which in turn causes more maintenance) > - Newer commons-io code utilizes newer platform code, e.g. using Charset > objects instead of encoding names, being able to use StringBuilder instead of > StringBuffer, and so on. > I'll be happy to provide a patch to replace usages of the inlined classes > with commons-io classes if this is accepted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)