[
https://issues.apache.org/jira/browse/SQOOP-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281885#comment-14281885
]
Jarek Jarcec Cecho commented on SQOOP-1938:
-------------------------------------------
Very nice wiki page [~vybs], good work on the summary :)
* Why do we have ability to run reduce phase and why it’s part of throttling?
The original idea was that you want to throttle “From” and “To” side
independently. For example if I’m exporting data from HBase to relational
database, I might want to have one extractor (=mapper) per HBase region - but
number of regions very likely will be more then number of pumping transactions
that I want to have on my database, so I might want to specify a different
number of loaders to throttle that down. But having reduce phase means to
serialize all data and transfer them across network, so we are not running
reduce phase unless user explicitly sets different number of loaders then
reducers.
* Why all the threading magic in passing data from extractor to loader?
[~bleeapache] or [~hshreedharan] would be the best to answer details as they
have most context (they written the code). The premise is that hadoop is doing
a lot of work under the hood (sorting) that we do not want to as it’s wasting
cycles and might even lead to spills on disk that would further negatively
affect the performance. Hence we are exchanging data ourselves manually (and
therefore all the producer-consumer related code).
> DOC:update the sqoop MR engine implementation details
> -----------------------------------------------------
>
> Key: SQOOP-1938
> URL: https://issues.apache.org/jira/browse/SQOOP-1938
> Project: Sqoop
> Issue Type: Sub-task
> Reporter: Veena Basavaraj
> Assignee: Veena Basavaraj
> Fix For: 1.99.5
>
>
> https://cwiki.apache.org/confluence/display/SQOOP/Sqoop+MR+Execution+Engine
> 1. Why we need SqoopWritable, what can be done in future?
> 2. Even though we call sqoop as a map only, is that how it always works? what
> happend when numLoaders is non zero
> {code}
> // Set number of reducers as number of configured loaders or suppress
> // reduce phase entirely if loaders are not set at all.
> if(request.getLoaders() != null) {
> job.setNumReduceTasks(request.getLoaders());
> } else {
> job.setNumReduceTasks(0);
> }
> {code}
> 3. Internals of SqoopNullOutputFormat and how SqoopWritable is used in it
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)