[jira] [Commented] (SQOOP-2465) Initializer and Destroyer should know how many executors will run

Jarek Jarcec Cecho (JIRA) Fri, 07 Aug 2015 15:13:07 -0700

    [ 
https://issues.apache.org/jira/browse/SQOOP-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662539#comment-14662539
 ]


Jarek Jarcec Cecho commented on SQOOP-2465:
-------------------------------------------

I'm generally supportive to add an APIs to help connector developers build the 
connectors :) I have few high level thoughts though:

* Current design is such that "From" and "To" sides are independent from the 
connector developer perspective and we should keep that. Hence "From" side 
should not know how many loaders are configured and vice versa "To" side should 
not care how many extractors are running. 
* Propagating the information "number of extractor" to "From" initializer seems 
completely valid request. Similarly for number of "loaders" to "To" Initializer.
* I kind of like the idea to create an optional "To" "partitioner". Even though 
I would not call it a partitioner par say as it's kind of confusing - we're not 
partitioning data in any way, it's more about pre-creating temporary objects 
for each loader. I think that this one is a big on itself, so perhaps we should 
track it in separate JIRA. I would love to see more detailed proposal :)

> Initializer and Destroyer should know how many executors will run
> -----------------------------------------------------------------
>
>                 Key: SQOOP-2465
>                 URL: https://issues.apache.org/jira/browse/SQOOP-2465
>             Project: Sqoop
>          Issue Type: Bug
>    Affects Versions: 1.99.6
>            Reporter: David Robson
>
> Looking at a job to load data into Oracle as an example - depending on the 
> way the user wants to load data, we may be loading data into temporary 
> tables. For maximum performance we need to create a separate temporary table 
> for each loader - so when the initializer is running we need to know how many 
> loaders will run so we can create these temporary tables. Again when the 
> destroyer is run we will need to drop these temporary tables - so it will 
> need to know as well.
> Another example where we need to know this in the initializer - Oracle 
> databases may be real application clusters where there is multiple instances 
> across multiple machines. For both FROM and TO jobs we spread the load across 
> these instances during the initialization phase - so we need to know how many 
> loaders / extractors will run.
> In the case of a FROM job we could do this in the partition phase - but there 
> is no way to achieve this for a TO job. It seems we could either add the 
> information into the initialize phase - or add a new partition phase on the 
> TO side that is called after the partition phase on the FROM side. It could 
> take the details of the partitioned output and match it up to the other side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SQOOP-2465) Initializer and Destroyer should know how many executors will run

Reply via email to