[ 
https://issues.apache.org/jira/browse/SQOOP-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14997833#comment-14997833
 ] 

Dian Fu commented on SQOOP-2634:
--------------------------------

Hi [~jarcec],
Thanks a lot for the comments.
{quote}
I'm not so much concerned about shipping one jar to the execution engine twice. 
I'm more concerned about how we load classes from shared jars inside the 
running code - whether each connector will load it's own instance of classes 
inside those shared jars or not.
The reason for that is that I see a possible problem with exchanging data from 
one connector to the other. As far as I know, class equivalency doesn't cross 
ClassLoader boundaries. E.g. {{ClassUtils.loadClass(A.class, FirstClassLoader) 
!= ClassUtils.loadClass(B.class, SecondClassLoader)}}, which could be a 
potential problem here.
{quote}
There are two parameters {{urls}} and {{systemClasses}} in the constructor of 
{{ConnectorClassLoader}}. From the implementation of {{ConnectorClassLoader}}, 
we can see that only classes which are in the {{urls}} and not in the 
{{systemClasses}} will be loaded by {{ConnectorClassLoader}}. Other classes 
will be loaded by the parent classloader. 
Regarding to loading classes from shared jars, as the jars aren't specific for 
a connector, so these jars won't be put in the parameter {{urls}} of 
{{ConnectorClassLoader}}. This means that these jars won't be loaded by 
{{ConnectorClassLoader}} and will be loaded by the parent classloader.
Moreover, from connector and to connector are exchanging data indirectly. From 
connector writes data into an intermediate file and to connector loads data 
from the intermediate file. There aren't direct communication between 
connectors. So even if one class is loaded by two different classloader by from 
connector and to connector, it won't cause problems.
{quote}
Can we include how this actually will be implemented in the design doc? 
{quote}
IMO, We don't need to do any special things for this. As the HDFS libraries are 
part of the Hadoop cluster, so when the jobs are running on the Hadoop cluster, 
they will get these libraries automatically. Thoughts?
{quote}
So far I don't see any provision to add dependency jars from Server's own 
classpath to the connector's classpath.
{quote}
Do you mean classes such as {{IntermediateDataFormat}}, {{SqoopConnector}}, 
etc? If so, these kinds of jars or classes will be handled by the existing 
code. These kinds of jars will be found in {{JobManager#createJobRequest}} and 
distributed to the cluster nodes in {{MapreduceSubmissionEngine#submit}}.

> Sqoop2: Allow connectors to express jar dependencies
> ----------------------------------------------------
>
>                 Key: SQOOP-2634
>                 URL: https://issues.apache.org/jira/browse/SQOOP-2634
>             Project: Sqoop
>          Issue Type: Sub-task
>            Reporter: Dian Fu
>            Assignee: Dian Fu
>             Fix For: 1.99.7
>
>         Attachments: SQOOP-2634.001.patch, SQOOP-2634.002.patch, 
> SQOOP-2634.003.patch, SQOOP-2634.004.patch, SQOOP-2634.005.patch, 
> SQOOP-2634.006.patch, SQOOP-2634.007.patch, SQOOP-2634.008.patch, 
> SQOOP-2634.009.patch, SQOOP-2634.010.patch, design-doc-v1.pdf, 
> design-doc-v2.pdf, design-doc-v3.pdf
>
>
> Currently Sqoop 2 has already provided the ability to config jar dependencies 
> with property "org.apache.sqoop.classpath.extra". The limitation of this 
> property is that we have to put all the dependencies together. It can't 
> express jar dependencies for a specified connector. This capacity is useful 
> as some connectors may have conflict jar dependencies. Put all the 
> dependencies from different connectors together may cause problems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to