[ 
https://issues.apache.org/jira/browse/TIKA-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-3650.
-------------------------------
    Fix Version/s: 2.2.2
       Resolution: Fixed

> Removal of duplicate  javax/* classes from tika-app jar
> -------------------------------------------------------
>
>                 Key: TIKA-3650
>                 URL: https://issues.apache.org/jira/browse/TIKA-3650
>             Project: Tika
>          Issue Type: Improvement
>         Environment: Java 8 
>            Reporter: Aravinth
>            Priority: Major
>             Fix For: 2.2.2
>
>
> The javax.xml.parsers.DocumentBuilderFactory.class present both in rt.jar 
> from JDK and tika-app.jar. 
> We are using child first classloader to isolate the tika-app jar from the 
> classpath for file parsing, the child first classloader loads the 
> DocumentBuilderFactory interface from the tika-app jar. 
> If the tika-app.jar didn't contain the DocumentBuilderFactory class, the 
> class will be loaded from the rt.jar. 
> Inside the serviceloader, there is a check happening to validate whether the 
> interface and implementation classes are assignable to each other. We are 
> facing a break here, as the interface is loaded from the tika-app jar. 
>  
> ^public static DocumentBuilderFactory newInstance() {^
> ^return FactoryFinder.find(^
> ^/* The default property name according to the JAXP spec */^
> ^DocumentBuilderFactory.class, // "javax.xml.parsers.DocumentBuilderFactory"^
> ^/* The fallback implementation class name */^
> ^"com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl");^
> ^}^
>  
> DocumentBuilderFactory.class - this .class operator loads the class into the 
> default classloader regardless of which classloader is in the current path. 
> So this will always return the class object from the default classloader. 
> But during the tika parsing execution, the class loader will be different 
> from the default one (child first classloader), and it will load both 
> interface and implementation from the tika app jar. 
> As the DocumentBuilderFactory.class is created from the default classloader 
> and the implementation class 
> org.apache.xerces.jaxp.DocumentBuilderFactoryImpl is created in a different 
> classloader (interface too loaded in the child first classloader), 
> both are not assignable to each other. 
> In a normal scenario ( most of us will use parent first classloader I 
> assume), The javax.xml.parsers.DocumentBuilderFactory.class will be always 
> loaded from the rt.jar (Java 8 has). The 
> javax.xml.parsers.DocumentBuilderFactory inside the tika-app jar is 
> redundant. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to