[ 
https://issues.apache.org/jira/browse/SQOOP-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14995253#comment-14995253
 ] 

Dian Fu commented on SQOOP-2634:
--------------------------------

Hi [~jarcec],
Thanks a lot for the detailed summary. 
{quote}
* What classes/jars should be put on connector's classpath by Sqoop 2 server. 
Couple of interesting classes/jars:
** {{joda-time}} is part of our API to exchange data (e.g. all date-time 
objects passed from IDFs should be encoded in joda-time)?
** {{connector-sdk}}: ?
** {{sqoop-common}}: ?
** *HDFS Connector*: Since we're running on Hadoop cluster should we use HDFS 
libraries from the cluster or not?
{quote}
At the first step we could put all a connector's dependencies into its 
classpath. Then we could improve this as some jars are dependent by all the 
connectors, like the jars listed by you: {{joda-time}}, {{connector-sdk}}, 
{{sqoop-common}}, etc. We could put these common dependencies into 
configuration "tmpjars" before submitting jobs to make sure they could be 
shared by all the connectors. For *HDFS Connector*, we should use HDFS 
libraries from the cluster as before.
{quote}
Couple of additional ideas to explore:
* One jar allows to create single jar with all dependencies inside that jar.
* Class-Path in Manifest.mf file might be also a viable solution.
{quote}
Thanks a lot for the suggestions. 
Regarding to {{one jar}}, it's usually used to produce an executable jar. So 
I'm afraid it's not usable for us. But we can consider {{maven shade plugin}}. 
It provides the capacity of packaging the dependencies inside the jar. 
Currently I haven't seen any drawback for this solution. Maybe it's a good 
solution. One thing to note is that this solution will require all external 
connectors package their dependencies with {{maven shade plugin}} and this 
solution doesn't solve the JDBC driver kinds of issue.
Regarding to class-path in {{Manifest.mf}}, it seems that it only saves us the 
cost of configure connector classpath in sqoopconnector.properties. We still 
need to separate the connector dependencies, add these dependencies to 
configuration "tmpfiles" to make sure they are distributed to the cluster nodes 
by mapreduce framework, load these dependencies with connector's own 
classloader. If my understanding is correct?

The summary is very detailed. Thanks again.


> Sqoop2: Allow connectors to express jar dependencies
> ----------------------------------------------------
>
>                 Key: SQOOP-2634
>                 URL: https://issues.apache.org/jira/browse/SQOOP-2634
>             Project: Sqoop
>          Issue Type: Sub-task
>            Reporter: Dian Fu
>            Assignee: Dian Fu
>             Fix For: 1.99.7
>
>         Attachments: SQOOP-2634.001.patch, SQOOP-2634.002.patch, 
> SQOOP-2634.003.patch, SQOOP-2634.004.patch, SQOOP-2634.005.patch, 
> SQOOP-2634.006.patch, SQOOP-2634.007.patch, SQOOP-2634.008.patch, 
> SQOOP-2634.009.patch, SQOOP-2634.010.patch, design-doc-v1.pdf, 
> design-doc-v2.pdf
>
>
> Currently Sqoop 2 has already provided the ability to config jar dependencies 
> with property "org.apache.sqoop.classpath.extra". The limitation of this 
> property is that we have to put all the dependencies together. It can't 
> express jar dependencies for a specified connector. This capacity is useful 
> as some connectors may have conflict jar dependencies. Put all the 
> dependencies from different connectors together may cause problems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to