[
https://issues.apache.org/jira/browse/TIKA-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17478058#comment-17478058
]
Hudson commented on TIKA-3650:
------------------------------
UNSTABLE: Integrated in Jenkins build Tika » tika-main-jdk8 #425 (See
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/425/])
TIKA-3650 - Removal of duplicate javax/* classes from tika-app jar (#481)
(github:
[https://github.com/apache/tika/commit/40c79ac98ddc9c882c6274d09aba9ca3d5c112e9])
* (edit) tika-app/pom.xml
> Removal of duplicate javax/* classes from tika-app jar
> -------------------------------------------------------
>
> Key: TIKA-3650
> URL: https://issues.apache.org/jira/browse/TIKA-3650
> Project: Tika
> Issue Type: Improvement
> Environment: Java 8
> Reporter: Aravinth
> Priority: Major
> Fix For: 2.2.2
>
>
> The javax.xml.parsers.DocumentBuilderFactory.class present both in rt.jar
> from JDK and tika-app.jar.
> We are using child first classloader to isolate the tika-app jar from the
> classpath for file parsing, the child first classloader loads the
> DocumentBuilderFactory interface from the tika-app jar.
> If the tika-app.jar didn't contain the DocumentBuilderFactory class, the
> class will be loaded from the rt.jar.
> Inside the serviceloader, there is a check happening to validate whether the
> interface and implementation classes are assignable to each other. We are
> facing a break here, as the interface is loaded from the tika-app jar.
>
> ^public static DocumentBuilderFactory newInstance() {^
> ^return FactoryFinder.find(^
> ^/* The default property name according to the JAXP spec */^
> ^DocumentBuilderFactory.class, // "javax.xml.parsers.DocumentBuilderFactory"^
> ^/* The fallback implementation class name */^
> ^"com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl");^
> ^}^
>
> DocumentBuilderFactory.class - this .class operator loads the class into the
> default classloader regardless of which classloader is in the current path.
> So this will always return the class object from the default classloader.
> But during the tika parsing execution, the class loader will be different
> from the default one (child first classloader), and it will load both
> interface and implementation from the tika app jar.
> As the DocumentBuilderFactory.class is created from the default classloader
> and the implementation class
> org.apache.xerces.jaxp.DocumentBuilderFactoryImpl is created in a different
> classloader (interface too loaded in the child first classloader),
> both are not assignable to each other.
> In a normal scenario ( most of us will use parent first classloader I
> assume), The javax.xml.parsers.DocumentBuilderFactory.class will be always
> loaded from the rt.jar (Java 8 has). The
> javax.xml.parsers.DocumentBuilderFactory inside the tika-app jar is
> redundant.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)