David Robson created SQOOP-2465:
-----------------------------------
Summary: Initializer and Destroyer should know how many executors
will run
Key: SQOOP-2465
URL: https://issues.apache.org/jira/browse/SQOOP-2465
Project: Sqoop
Issue Type: Bug
Affects Versions: 1.99.6
Reporter: David Robson
Looking at a job to load data into Oracle as an example - depending on the way
the user wants to load data, we may be loading data into temporary tables. For
maximum performance we need to create a separate temporary table for each
loader - so when the initializer is running we need to know how many loaders
will run so we can create these temporary tables. Again when the destroyer is
run we will need to drop these temporary tables - so it will need to know as
well.
Another example where we need to know this in the initializer - Oracle
databases may be real application clusters where there is multiple instances
across multiple machines. For both FROM and TO jobs we spread the load across
these instances during the initialization phase - so we need to know how many
loaders / extractors will run.
In the case of a FROM job we could do this in the partition phase - but there
is no way to achieve this for a TO job. It seems we could either add the
information into the initialize phase - or add a new partition phase on the TO
side that is called after the partition phase on the FROM side. It could take
the details of the partitioned output and match it up to the other side.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)