[ https://issues.apache.org/jira/browse/HADOOP-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716864#action_12716864 ]
Hadoop QA commented on HADOOP-5967: ----------------------------------- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12409836/single-mapper.patch against trunk revision 782083. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/472/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/472/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/472/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/472/console This message is automatically generated. > Sqoop should only use a single map task > --------------------------------------- > > Key: HADOOP-5967 > URL: https://issues.apache.org/jira/browse/HADOOP-5967 > Project: Hadoop Core > Issue Type: Improvement > Reporter: Aaron Kimball > Assignee: Aaron Kimball > Priority: Minor > Attachments: single-mapper.patch > > > The current DBInputFormat implementation uses SELECT ... LIMIT ... OFFSET > statements to read from a database table. This actually results in several > queries all accessing the same table at the same time. Most database > implementations will actually use a full table scan for each such query, > starting at row 1 and scanning down until the OFFSET is reached before > emitting data to the client. The upshot of this is that we see O(n^2) > performance in the size of the table when using a large number of mappers, > when a single mapper would read through the table in O(n) time in the number > of rows. > This patch sets the number of map tasks to 1 in the MapReduce job sqoop > launches. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.