[
https://issues.apache.org/jira/browse/SQOOP-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662539#comment-14662539
]
Jarek Jarcec Cecho commented on SQOOP-2465:
-------------------------------------------
I'm generally supportive to add an APIs to help connector developers build the
connectors :) I have few high level thoughts though:
* Current design is such that "From" and "To" sides are independent from the
connector developer perspective and we should keep that. Hence "From" side
should not know how many loaders are configured and vice versa "To" side should
not care how many extractors are running.
* Propagating the information "number of extractor" to "From" initializer seems
completely valid request. Similarly for number of "loaders" to "To" Initializer.
* I kind of like the idea to create an optional "To" "partitioner". Even though
I would not call it a partitioner par say as it's kind of confusing - we're not
partitioning data in any way, it's more about pre-creating temporary objects
for each loader. I think that this one is a big on itself, so perhaps we should
track it in separate JIRA. I would love to see more detailed proposal :)
> Initializer and Destroyer should know how many executors will run
> -----------------------------------------------------------------
>
> Key: SQOOP-2465
> URL: https://issues.apache.org/jira/browse/SQOOP-2465
> Project: Sqoop
> Issue Type: Bug
> Affects Versions: 1.99.6
> Reporter: David Robson
>
> Looking at a job to load data into Oracle as an example - depending on the
> way the user wants to load data, we may be loading data into temporary
> tables. For maximum performance we need to create a separate temporary table
> for each loader - so when the initializer is running we need to know how many
> loaders will run so we can create these temporary tables. Again when the
> destroyer is run we will need to drop these temporary tables - so it will
> need to know as well.
> Another example where we need to know this in the initializer - Oracle
> databases may be real application clusters where there is multiple instances
> across multiple machines. For both FROM and TO jobs we spread the load across
> these instances during the initialization phase - so we need to know how many
> loaders / extractors will run.
> In the case of a FROM job we could do this in the partition phase - but there
> is no way to achieve this for a TO job. It seems we could either add the
> information into the initialize phase - or add a new partition phase on the
> TO side that is called after the partition phase on the FROM side. It could
> take the details of the partitioned output and match it up to the other side.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)