[ 
https://issues.apache.org/jira/browse/TIKA-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477767#comment-17477767
 ] 

ASF GitHub Bot commented on TIKA-3650:
--------------------------------------

imaravin opened a new pull request #481:
URL: https://github.com/apache/tika/pull/481


   The pull request for https://issues.apache.org/jira/browse/TIKA-3650


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> Removal of duplicate  javax/* classes from tika-app jar
> -------------------------------------------------------
>
>                 Key: TIKA-3650
>                 URL: https://issues.apache.org/jira/browse/TIKA-3650
>             Project: Tika
>          Issue Type: Improvement
>         Environment: Java 8 
>            Reporter: Aravinth
>            Priority: Major
>
> The javax.xml.parsers.DocumentBuilderFactory.class present both in rt.jar 
> from JDK and tika-app.jar. 
> We are using child first classloader to isolate the tika-app jar from the 
> classpath for file parsing, the child first classloader loads the 
> DocumentBuilderFactory interface from the tika-app jar. 
> If the tika-app.jar didn't contain the DocumentBuilderFactory class, the 
> class will be loaded from the rt.jar. 
> Inside the serviceloader, there is a check happening to validate whether the 
> interface and implementation classes are assignable to each other. We are 
> facing a break here, as the interface is loaded from the tika-app jar. 
>  
> ^public static DocumentBuilderFactory newInstance() {^
> ^return FactoryFinder.find(^
> ^/* The default property name according to the JAXP spec */^
> ^DocumentBuilderFactory.class, // "javax.xml.parsers.DocumentBuilderFactory"^
> ^/* The fallback implementation class name */^
> ^"com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl");^
> ^}^
>  
> DocumentBuilderFactory.class - this .class operator loads the class into the 
> default classloader regardless of which classloader is in the current path. 
> So this will always return the class object from the default classloader. 
> But during the tika parsing execution, the class loader will be different 
> from the default one (child first classloader), and it will load both 
> interface and implementation from the tika app jar. 
> As the DocumentBuilderFactory.class is created from the default classloader 
> and the implementation class 
> org.apache.xerces.jaxp.DocumentBuilderFactoryImpl is created in a different 
> classloader (interface too loaded in the child first classloader), 
> both are not assignable to each other. 
> In a normal scenario ( most of us will use parent first classloader I 
> assume), The javax.xml.parsers.DocumentBuilderFactory.class will be always 
> loaded from the rt.jar (Java 8 has). The 
> javax.xml.parsers.DocumentBuilderFactory inside the tika-app jar is 
> redundant. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to