Thanks to stepping up and taking this huge and important effort. I believe that both suggestion - using shading for client and custom classloader for connector code is completely reasonable, so +1 from my side.
Jarcec > On Sep 15, 2015, at 7:30 PM, Fu, Dian <[email protected]> wrote: > > Hi all, > > > Currently there is no classpath isolation in Sqoop 2. This will cause > problems in the following two cases: > > 1) if the dependencies of the downstream users of the Sqoop 2 client > conflicts with the dependencies of Sqoop 2 client > > 2) if the dependencies of third-part connectors conflicts with the > dependencies of Sqoop 2 server or conflicts with other third-part connectors > > > > I'd like to provide classpath isolation in Sqoop 2 and have taken some time > to investigate the status of classpath isolation in Hadoop. Here is a simple > summary of the problem and the solution proposed by Sean in > HADOOP-11656<https://issues.apache.org/jira/browse/HADOOP-11656>: > > The problems HADOOP-11656 tries to solve: > > 1) Client side classpath isolation: between Hadoop and its downside > applications which talk directly with HDFS or submit YARN applications. > > 2) Framework level classpath isolation: between YARN server and > ApplicationMaster or between YARN and user application. There is already a > solution in Hadoop to solve this issue which uses webapp-style classloader > named ApplicationClassLoader (parent last). > > The solution proposed in HADOOP-11656 by Sean: > > 1) For the client side classpath isolation, Sean proposes to use Maven > Shade Plugin to expose only the public API to clients and use the Maven Shade > Plugin relocation capacity to relocate other dependencies under the package > org.apache.hadoop.shaded. (Refer to > HADOOP-11804<https://issues.apache.org/jira/browse/HADOOP-11804> for details) > > 2) For the existing webapp-style classloader solution for framework level > classpath isolation, Sean pointed out it doesn't provide much upgrade help > for applications that rely on the classes found in the fallback case. That is > to say, if user code relied on a Hadoop dependency implicitly and Hadoop > upgraded it to an incompatible version, problems will be caused. Sean > proposes to use OSGi container to export different set of dependencies in > different Hadoop versions to solve this issue. (more discussion about this > can be found > here<https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14540773&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14540773>) > > > > Based on the understanding of the above, for the classpath isolation problem > in Sqoop 2, it can be separated into two parts: > > 1) client side classpath isolation > > 2) isolation for connectors > > And we have three options to consider: > > 1) Maven Shade Plugin > > 2) Webapp-style classloader > > 3) OSGi > > I'd like to use Maven Shade Plugin to solve the client side classpath > isolation problem in the similar way done in > HADOOP-11804<https://issues.apache.org/jira/browse/HADOOP-11804>. > > For the isolation for connectors, Maven shade plugin won't be an option as it > isolates the classpath via relocation capacity at build time and it can't > relocate connectors dependencies at runtime. Between option webapp-style > classloader and OSGi, we may need to choose OSGi if we want to upgrade Sqoop > 2 dependencies without affecting third-part connectors in the case that > third-part connectors rely on some Sqoop 2 dependencies implicitly. But if we > think that requiring third-part connectors to upgrade accordingly is > acceptable, I would prefer webapp-style classloader as it is easier to > implement compared to OSGi. > > > > Please feel free to provide your opinions, thanks a lot. > > > > Regards, > > Dian >
