David Robson created SQOOP-2465:
-----------------------------------

             Summary: Initializer and Destroyer should know how many executors 
will run
                 Key: SQOOP-2465
                 URL: https://issues.apache.org/jira/browse/SQOOP-2465
             Project: Sqoop
          Issue Type: Bug
    Affects Versions: 1.99.6
            Reporter: David Robson


Looking at a job to load data into Oracle as an example - depending on the way 
the user wants to load data, we may be loading data into temporary tables. For 
maximum performance we need to create a separate temporary table for each 
loader - so when the initializer is running we need to know how many loaders 
will run so we can create these temporary tables. Again when the destroyer is 
run we will need to drop these temporary tables - so it will need to know as 
well.

Another example where we need to know this in the initializer - Oracle 
databases may be real application clusters where there is multiple instances 
across multiple machines. For both FROM and TO jobs we spread the load across 
these instances during the initialization phase - so we need to know how many 
loaders / extractors will run.

In the case of a FROM job we could do this in the partition phase - but there 
is no way to achieve this for a TO job. It seems we could either add the 
information into the initialize phase - or add a new partition phase on the TO 
side that is called after the partition phase on the FROM side. It could take 
the details of the partitioned output and match it up to the other side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to