Thanks to stepping up and taking this huge and important effort.

I believe that both suggestion - using shading for client and custom 
classloader for connector code is completely reasonable, so +1 from my side.

Jarcec

> On Sep 15, 2015, at 7:30 PM, Fu, Dian <[email protected]> wrote:
> 
> Hi all,
> 
> 
> Currently there is no classpath isolation in Sqoop 2. This will cause 
> problems in the following two cases:
> 
> 1)  if the dependencies of the downstream users of the Sqoop 2 client 
> conflicts with the dependencies of Sqoop 2 client
> 
> 2)  if the dependencies of third-part connectors conflicts with the 
> dependencies of Sqoop 2 server or conflicts with other third-part connectors
> 
> 
> 
> I'd like to provide classpath isolation in Sqoop 2 and have taken some time 
> to investigate the status of classpath isolation in Hadoop. Here is a simple 
> summary of the problem and the solution proposed by Sean in 
> HADOOP-11656<https://issues.apache.org/jira/browse/HADOOP-11656>:
> 
> The problems HADOOP-11656 tries to solve:
> 
>   1) Client side classpath isolation: between Hadoop and its downside 
> applications which talk directly with HDFS or submit YARN applications.
> 
>  2) Framework level classpath isolation: between YARN server and 
> ApplicationMaster or between YARN and user application. There is already a 
> solution in Hadoop to solve this issue which uses webapp-style classloader 
> named ApplicationClassLoader (parent last).
> 
> The solution proposed in HADOOP-11656 by Sean:
> 
>   1) For the client side classpath isolation, Sean proposes to use Maven 
> Shade Plugin to expose only the public API to clients and use the Maven Shade 
> Plugin relocation capacity to relocate other dependencies under the package 
> org.apache.hadoop.shaded. (Refer to 
> HADOOP-11804<https://issues.apache.org/jira/browse/HADOOP-11804> for details)
> 
>   2) For the existing webapp-style classloader solution for framework level 
> classpath isolation, Sean pointed out it doesn't provide much upgrade help 
> for applications that rely on the classes found in the fallback case. That is 
> to say, if user code relied on a Hadoop dependency implicitly and Hadoop 
> upgraded it to an incompatible version, problems will be caused. Sean 
> proposes to use OSGi container to export different set of dependencies in 
> different Hadoop versions to solve this issue. (more discussion about this 
> can be found 
> here<https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14540773&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14540773>)
> 
> 
> 
> Based on the understanding of the above, for the classpath isolation problem 
> in Sqoop 2, it can be separated into two parts:
> 
> 1)  client side classpath isolation
> 
> 2)  isolation for connectors
> 
> And we have three options to consider:
> 
> 1)  Maven Shade Plugin
> 
> 2)  Webapp-style classloader
> 
> 3)  OSGi
> 
> I'd like to use Maven Shade Plugin to solve the client side classpath 
> isolation problem in the similar way done in 
> HADOOP-11804<https://issues.apache.org/jira/browse/HADOOP-11804>.
> 
> For the isolation for connectors, Maven shade plugin won't be an option as it 
> isolates the classpath via relocation capacity at build time and it can't 
> relocate connectors dependencies at runtime. Between option webapp-style 
> classloader and OSGi, we may need to choose OSGi if we want to upgrade Sqoop 
> 2 dependencies without affecting third-part connectors in the case that 
> third-part connectors rely on some Sqoop 2 dependencies implicitly. But if we 
> think that requiring third-part connectors to upgrade accordingly is 
> acceptable, I would prefer webapp-style classloader as it is easier to 
> implement compared to OSGi.
> 
> 
> 
> Please feel free to provide your opinions, thanks a lot.
> 
> 
> 
> Regards,
> 
> Dian
> 

Reply via email to