[
https://issues.apache.org/jira/browse/TIKA-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aravinth updated TIKA-3650:
---------------------------
Summary: Removal of duplicate javax/* classes from tika-app jar (was:
Removal of duplicate classes from Xerces in tika-app jar)
> Removal of duplicate javax/* classes from tika-app jar
> -------------------------------------------------------
>
> Key: TIKA-3650
> URL: https://issues.apache.org/jira/browse/TIKA-3650
> Project: Tika
> Issue Type: Improvement
> Environment: Java 8
> Reporter: Aravinth
> Priority: Major
>
> The javax.xml.parsers.DocumentBuilderFactory.class present both in rt.jar
> from JDK and tika-app.jar.
> We are using child first classloader to isolate the tika-app jar from the
> classpath for file parsing, the child first classloader loads the
> DocumentBuilderFactory interface from the tika-app jar.
> If the tika-app.jar didn't contain the DocumentBuilderFactory class, the
> class will be loaded from the rt.jar.
> Inside the serviceloader, there is a check happening to validate whether the
> interface and implementation classes are assignable to each other. We are
> facing a break here, as the interface is loaded from the tika-app jar.
> {{public static DocumentBuilderFactory newInstance() {}}
> {{ return FactoryFinder.find(}}
> {{ /* The default property name according to the JAXP spec */}}
> {{ DocumentBuilderFactory.class, //
> "javax.xml.parsers.DocumentBuilderFactory"}}
> {{ /* The fallback implementation class name */}}
> {{ "com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl");}}
> {{}}}
>
> DocumentBuilderFactory.class - this .class operator loads the class into the
> default classloader regardless of which classloader is in the current path.
> So this will always return the class object from the default classloader.
> But during the tika parsing execution, the class loader will be different
> from the default one (child first classloader), and it will load both
> interface and implementation from the tika app jar.
> As the DocumentBuilderFactory.class is created from the default classloader
> and the implementation class
> org.apache.xerces.jaxp.DocumentBuilderFactoryImpl is created in a different
> classloader (interface too loaded in the child first classloader),
> both are not assignable to each other.
> In a normal scenario ( most of us will use parent first classloader I
> assume), The javax.xml.parsers.DocumentBuilderFactory.class will be always
> loaded from the rt.jar (Java 8 has). The
> javax.xml.parsers.DocumentBuilderFactory inside the tika-app jar is
> redundant.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)