[ https://issues.apache.org/jira/browse/MAPREDUCE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078711#comment-14078711 ]
Hadoop QA commented on MAPREDUCE-6012: -------------------------------------- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658549/MAPREDUCE-6012-branch-1.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4776//console This message is automatically generated. > DBInputSplit creates invalid ranges on Oracle > --------------------------------------------- > > Key: MAPREDUCE-6012 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6012 > Project: Hadoop Map/Reduce > Issue Type: Bug > Affects Versions: 1.2.1, 2.4.1 > Reporter: Julien Serdaru > Assignee: Wei Yan > Attachments: HADOOP-9530.patch, MAPREDUCE-6012-branch-1.patch > > > The DBInputFormat on Oracle does not create valid ranges. > The method getSplit line 263 is as follows: > split = new DBInputSplit(i * chunkSize, (i * chunkSize) + > chunkSize); > So the first split will have a start value of 0 (0*chunkSize). > However, the OracleDBRecordReader, line 84 is as follows: > if (split.getLength() > 0 && split.getStart() > 0){ > Since the start value of the first range is equal to 0, we will skip the > block that partitions the input set. As a result, one of the map task will > process the entire data set, rather than the partition. > I'm assuming the fix is trivial and would involve removing the second check > in the if block. > Also, I believe the OracleDBRecordReader paging query is incorrect. > Line 92 should read: > query.append(" ) WHERE dbif_rno > ").append(split.getStart()); > instead of (note > instead of >=) > query.append(" ) WHERE dbif_rno >= ").append(split.getStart()); > Otherwise some rows will be ignored and some counted more than once. > A map/reduce job that counts the number of rows based on a predicate will > highlight the incorrect behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)