[
https://issues.apache.org/jira/browse/TIKA-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-3650.
-------------------------------
Fix Version/s: 2.2.2
Resolution: Fixed
> Removal of duplicate javax/* classes from tika-app jar
> -------------------------------------------------------
>
> Key: TIKA-3650
> URL: https://issues.apache.org/jira/browse/TIKA-3650
> Project: Tika
> Issue Type: Improvement
> Environment: Java 8
> Reporter: Aravinth
> Priority: Major
> Fix For: 2.2.2
>
>
> The javax.xml.parsers.DocumentBuilderFactory.class present both in rt.jar
> from JDK and tika-app.jar.
> We are using child first classloader to isolate the tika-app jar from the
> classpath for file parsing, the child first classloader loads the
> DocumentBuilderFactory interface from the tika-app jar.
> If the tika-app.jar didn't contain the DocumentBuilderFactory class, the
> class will be loaded from the rt.jar.
> Inside the serviceloader, there is a check happening to validate whether the
> interface and implementation classes are assignable to each other. We are
> facing a break here, as the interface is loaded from the tika-app jar.
>
> ^public static DocumentBuilderFactory newInstance() {^
> ^return FactoryFinder.find(^
> ^/* The default property name according to the JAXP spec */^
> ^DocumentBuilderFactory.class, // "javax.xml.parsers.DocumentBuilderFactory"^
> ^/* The fallback implementation class name */^
> ^"com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl");^
> ^}^
>
> DocumentBuilderFactory.class - this .class operator loads the class into the
> default classloader regardless of which classloader is in the current path.
> So this will always return the class object from the default classloader.
> But during the tika parsing execution, the class loader will be different
> from the default one (child first classloader), and it will load both
> interface and implementation from the tika app jar.
> As the DocumentBuilderFactory.class is created from the default classloader
> and the implementation class
> org.apache.xerces.jaxp.DocumentBuilderFactoryImpl is created in a different
> classloader (interface too loaded in the child first classloader),
> both are not assignable to each other.
> In a normal scenario ( most of us will use parent first classloader I
> assume), The javax.xml.parsers.DocumentBuilderFactory.class will be always
> loaded from the rt.jar (Java 8 has). The
> javax.xml.parsers.DocumentBuilderFactory inside the tika-app jar is
> redundant.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)