[ 
https://issues.apache.org/jira/browse/SQOOP-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Veena Basavaraj updated SQOOP-1601:
-----------------------------------
    Fix Version/s:     (was: 1.99.5)
                   2.0.0

> Sqoop2: To part of the Connector API to support balancing/ re-partioning step
> -----------------------------------------------------------------------------
>
>                 Key: SQOOP-1601
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1601
>             Project: Sqoop
>          Issue Type: Sub-task
>            Reporter: Veena Basavaraj
>            Assignee: Veena Basavaraj
>             Fix For: 2.0.0
>
>
> Today the job lifecycle of the SQOOP looks like this.
> to recap:
> Step 1 : Intializers for the sources both from/ to
> Step 2 : Partitioner ( for the data from the FROM data source )
> Step 3 : Extractor ( actual reading from the FROM data source)
> Step 4: Loader ( for the TO datasource, i.e writing data to)
> Step 5: Destroyer for both the sources
> Both Extractors and Loaders are parallelized in themselves, so we can say the 
> numExtractors and numLoaders to use via the driver config.
> But in cases when there is imbalance between the extractors and loaders, we 
> may need a intermediate step to rebalance/ repartition or shuffle as the 
> writing is happening in the Loaders.  Today we do not support this step, 
> might be good to provide another step that may be relevant for some 
> connectors to add for better control on the load step.
> Whether this step can be generic one that can operate/ transform the output 
> as it is written to the TO data source, we should discuss that in addition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to