[ 
https://issues.apache.org/jira/browse/SQOOP-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284432#comment-14284432
 ] 

Veena Basavaraj commented on SQOOP-1938:
----------------------------------------

[~jarcec] thanks

[~hshreedharan] gotcha. Good explanation and trying to start the loader in 
parallel to extractor is very cool. I missed this point, it is parallellization 
in another dimension. Now I can write a pretty diagram to explain this. I never 
think about this aspect when angle when trying to get this working on spark. 

The only con I can see is if we want to do some form of merging across the data 
from different extractors and then do the load, then it is too late. Since we 
do not support this now, it seems like a very good optimization to be 
parallizing the loading process with extraction process

> DOC:update the sqoop MR engine implementation details
> -----------------------------------------------------
>
>                 Key: SQOOP-1938
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1938
>             Project: Sqoop
>          Issue Type: Sub-task
>            Reporter: Veena Basavaraj
>            Assignee: Veena Basavaraj
>             Fix For: 1.99.5
>
>
> https://cwiki.apache.org/confluence/display/SQOOP/Sqoop+MR+Execution+Engine
> 1. Why we need SqoopWritable, what can be done in future?
> 2. Even though we call sqoop as a map only, is that how it always works? what 
> happend when numLoaders is non zero
> {code}
>       // Set number of reducers as number of configured loaders  or suppress
>       // reduce phase entirely if loaders are not set at all.
>       if(request.getLoaders() != null) {
>         job.setNumReduceTasks(request.getLoaders());
>       } else {
>         job.setNumReduceTasks(0);
>       }
> {code}
> 3. Internals of SqoopNullOutputFormat and how SqoopWritable is used in it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to