[
https://issues.apache.org/jira/browse/SQOOP-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14995253#comment-14995253
]
Dian Fu commented on SQOOP-2634:
--------------------------------
Hi [~jarcec],
Thanks a lot for the detailed summary.
{quote}
* What classes/jars should be put on connector's classpath by Sqoop 2 server.
Couple of interesting classes/jars:
** {{joda-time}} is part of our API to exchange data (e.g. all date-time
objects passed from IDFs should be encoded in joda-time)?
** {{connector-sdk}}: ?
** {{sqoop-common}}: ?
** *HDFS Connector*: Since we're running on Hadoop cluster should we use HDFS
libraries from the cluster or not?
{quote}
At the first step we could put all a connector's dependencies into its
classpath. Then we could improve this as some jars are dependent by all the
connectors, like the jars listed by you: {{joda-time}}, {{connector-sdk}},
{{sqoop-common}}, etc. We could put these common dependencies into
configuration "tmpjars" before submitting jobs to make sure they could be
shared by all the connectors. For *HDFS Connector*, we should use HDFS
libraries from the cluster as before.
{quote}
Couple of additional ideas to explore:
* One jar allows to create single jar with all dependencies inside that jar.
* Class-Path in Manifest.mf file might be also a viable solution.
{quote}
Thanks a lot for the suggestions.
Regarding to {{one jar}}, it's usually used to produce an executable jar. So
I'm afraid it's not usable for us. But we can consider {{maven shade plugin}}.
It provides the capacity of packaging the dependencies inside the jar.
Currently I haven't seen any drawback for this solution. Maybe it's a good
solution. One thing to note is that this solution will require all external
connectors package their dependencies with {{maven shade plugin}} and this
solution doesn't solve the JDBC driver kinds of issue.
Regarding to class-path in {{Manifest.mf}}, it seems that it only saves us the
cost of configure connector classpath in sqoopconnector.properties. We still
need to separate the connector dependencies, add these dependencies to
configuration "tmpfiles" to make sure they are distributed to the cluster nodes
by mapreduce framework, load these dependencies with connector's own
classloader. If my understanding is correct?
The summary is very detailed. Thanks again.
> Sqoop2: Allow connectors to express jar dependencies
> ----------------------------------------------------
>
> Key: SQOOP-2634
> URL: https://issues.apache.org/jira/browse/SQOOP-2634
> Project: Sqoop
> Issue Type: Sub-task
> Reporter: Dian Fu
> Assignee: Dian Fu
> Fix For: 1.99.7
>
> Attachments: SQOOP-2634.001.patch, SQOOP-2634.002.patch,
> SQOOP-2634.003.patch, SQOOP-2634.004.patch, SQOOP-2634.005.patch,
> SQOOP-2634.006.patch, SQOOP-2634.007.patch, SQOOP-2634.008.patch,
> SQOOP-2634.009.patch, SQOOP-2634.010.patch, design-doc-v1.pdf,
> design-doc-v2.pdf
>
>
> Currently Sqoop 2 has already provided the ability to config jar dependencies
> with property "org.apache.sqoop.classpath.extra". The limitation of this
> property is that we have to put all the dependencies together. It can't
> express jar dependencies for a specified connector. This capacity is useful
> as some connectors may have conflict jar dependencies. Put all the
> dependencies from different connectors together may cause problems.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)