Hi all,
Currently there is no classpath isolation in Sqoop 2. This will cause problems in the following two cases: 1) if the dependencies of the downstream users of the Sqoop 2 client conflicts with the dependencies of Sqoop 2 client 2) if the dependencies of third-part connectors conflicts with the dependencies of Sqoop 2 server or conflicts with other third-part connectors I'd like to provide classpath isolation in Sqoop 2 and have taken some time to investigate the status of classpath isolation in Hadoop. Here is a simple summary of the problem and the solution proposed by Sean in HADOOP-11656<https://issues.apache.org/jira/browse/HADOOP-11656>: The problems HADOOP-11656 tries to solve: 1) Client side classpath isolation: between Hadoop and its downside applications which talk directly with HDFS or submit YARN applications. 2) Framework level classpath isolation: between YARN server and ApplicationMaster or between YARN and user application. There is already a solution in Hadoop to solve this issue which uses webapp-style classloader named ApplicationClassLoader (parent last). The solution proposed in HADOOP-11656 by Sean: 1) For the client side classpath isolation, Sean proposes to use Maven Shade Plugin to expose only the public API to clients and use the Maven Shade Plugin relocation capacity to relocate other dependencies under the package org.apache.hadoop.shaded. (Refer to HADOOP-11804<https://issues.apache.org/jira/browse/HADOOP-11804> for details) 2) For the existing webapp-style classloader solution for framework level classpath isolation, Sean pointed out it doesn't provide much upgrade help for applications that rely on the classes found in the fallback case. That is to say, if user code relied on a Hadoop dependency implicitly and Hadoop upgraded it to an incompatible version, problems will be caused. Sean proposes to use OSGi container to export different set of dependencies in different Hadoop versions to solve this issue. (more discussion about this can be found here<https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14540773&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14540773>) Based on the understanding of the above, for the classpath isolation problem in Sqoop 2, it can be separated into two parts: 1) client side classpath isolation 2) isolation for connectors And we have three options to consider: 1) Maven Shade Plugin 2) Webapp-style classloader 3) OSGi I'd like to use Maven Shade Plugin to solve the client side classpath isolation problem in the similar way done in HADOOP-11804<https://issues.apache.org/jira/browse/HADOOP-11804>. For the isolation for connectors, Maven shade plugin won't be an option as it isolates the classpath via relocation capacity at build time and it can't relocate connectors dependencies at runtime. Between option webapp-style classloader and OSGi, we may need to choose OSGi if we want to upgrade Sqoop 2 dependencies without affecting third-part connectors in the case that third-part connectors rely on some Sqoop 2 dependencies implicitly. But if we think that requiring third-part connectors to upgrade accordingly is acceptable, I would prefer webapp-style classloader as it is easier to implement compared to OSGi. Please feel free to provide your opinions, thanks a lot. Regards, Dian
