Hi all,

Currently there is no classpath isolation in Sqoop 2. This will cause problems 
in the following two cases:

1)  if the dependencies of the downstream users of the Sqoop 2 client conflicts 
with the dependencies of Sqoop 2 client

2)  if the dependencies of third-part connectors conflicts with the 
dependencies of Sqoop 2 server or conflicts with other third-part connectors



I'd like to provide classpath isolation in Sqoop 2 and have taken some time to 
investigate the status of classpath isolation in Hadoop. Here is a simple 
summary of the problem and the solution proposed by Sean in 
HADOOP-11656<https://issues.apache.org/jira/browse/HADOOP-11656>:

The problems HADOOP-11656 tries to solve:

   1) Client side classpath isolation: between Hadoop and its downside 
applications which talk directly with HDFS or submit YARN applications.

  2) Framework level classpath isolation: between YARN server and 
ApplicationMaster or between YARN and user application. There is already a 
solution in Hadoop to solve this issue which uses webapp-style classloader 
named ApplicationClassLoader (parent last).

The solution proposed in HADOOP-11656 by Sean:

   1) For the client side classpath isolation, Sean proposes to use Maven Shade 
Plugin to expose only the public API to clients and use the Maven Shade Plugin 
relocation capacity to relocate other dependencies under the package 
org.apache.hadoop.shaded. (Refer to 
HADOOP-11804<https://issues.apache.org/jira/browse/HADOOP-11804> for details)

   2) For the existing webapp-style classloader solution for framework level 
classpath isolation, Sean pointed out it doesn't provide much upgrade help for 
applications that rely on the classes found in the fallback case. That is to 
say, if user code relied on a Hadoop dependency implicitly and Hadoop upgraded 
it to an incompatible version, problems will be caused. Sean proposes to use 
OSGi container to export different set of dependencies in different Hadoop 
versions to solve this issue. (more discussion about this can be found 
here<https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14540773&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14540773>)



Based on the understanding of the above, for the classpath isolation problem in 
Sqoop 2, it can be separated into two parts:

1)  client side classpath isolation

2)  isolation for connectors

And we have three options to consider:

1)  Maven Shade Plugin

2)  Webapp-style classloader

3)  OSGi

I'd like to use Maven Shade Plugin to solve the client side classpath isolation 
problem in the similar way done in 
HADOOP-11804<https://issues.apache.org/jira/browse/HADOOP-11804>.

For the isolation for connectors, Maven shade plugin won't be an option as it 
isolates the classpath via relocation capacity at build time and it can't 
relocate connectors dependencies at runtime. Between option webapp-style 
classloader and OSGi, we may need to choose OSGi if we want to upgrade Sqoop 2 
dependencies without affecting third-part connectors in the case that 
third-part connectors rely on some Sqoop 2 dependencies implicitly. But if we 
think that requiring third-part connectors to upgrade accordingly is 
acceptable, I would prefer webapp-style classloader as it is easier to 
implement compared to OSGi.



Please feel free to provide your opinions, thanks a lot.



Regards,

Dian

Reply via email to