Veena Basavaraj created SQOOP-1602:
--------------------------------------
Summary: Sqoop2: Fix the current balancing to Loaders is internal
to Sqoop
Key: SQOOP-1602
URL: https://issues.apache.org/jira/browse/SQOOP-1602
Project: Sqoop
Issue Type: Task
Reporter: Veena Basavaraj
Assignee: Veena Basavaraj
Today the job lifecycle of the SQOOP looks like this.
to recap:
Step 1 : Intializers for the sources both from/ to
Step 2 : Partitioner ( for the data from the FROM data source )
Step 3 : Extractor ( actual reading from the FROM data source)
Step 4: Loader ( for the TO datasource, i.e writing data to)
Step 5: Destroyer for both the sources
Both Extractors and Loaders are parallelized in themselves, so we can say the
numExtractors and numLoaders to use via the driver config.
But in cases when there is imbalance between the extractors and loaders, we may
need a intermediate step to rebalance/ repartition or shuffle as the writing is
happening in the Loaders. Today we do not support this step, might be good to
provide another step that may be relevant for some connectors to add for better
control on the load step.
Whether this step can be generic one that can operate/ transform the output as
it is written to the TO data source, we should discuss that in addition.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)