[ 
https://issues.apache.org/jira/browse/SQOOP-603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459948#comment-13459948
 ] 

Hudson commented on SQOOP-603:
------------------------------

Integrated in Sqoop-ant-jdk-1.6-hadoop20 #74 (See 
[https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop20/74/])
    SQOOP-603 Support small intervals in IntegerSplitter implementation 
(Revision 5616152ac4c96d6c0589768b982cf67f3277df74)

     Result = SUCCESS
cheolsoo : 
https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=5616152ac4c96d6c0589768b982cf67f3277df74
Files : 
* src/java/org/apache/sqoop/mapreduce/db/IntegerSplitter.java
* src/test/org/apache/sqoop/mapreduce/db/TestIntegerSplitter.java
* src/java/org/apache/sqoop/mapreduce/db/DataDrivenDBInputFormat.java

                
> Support small intervals in IntegerSplitter implementation
> ---------------------------------------------------------
>
>                 Key: SQOOP-603
>                 URL: https://issues.apache.org/jira/browse/SQOOP-603
>             Project: Sqoop
>          Issue Type: Improvement
>    Affects Versions: 1.4.2
>            Reporter: Jarek Jarcec Cecho
>            Assignee: Jarek Jarcec Cecho
>             Fix For: 1.4.3
>
>         Attachments: SQOOP-603.patch
>
>
> IntegerSplitter is currently creating splits of following nature:
> {code}
> minimal value <= x < splitPoint1
> splitPoint1 <= x < splitPoint2
> ...
> splitPointN <= x <= maximal value
> {code}
> Please notice that upper bound is always with using condition "<" with 
> exception of the last split that is using condition "<=". This is perfectly 
> fine when creating reasonable amount of splits on very huge interval.
> This approach will however cause issues on very small intervals. For example 
> following splits will be created on interval [0, 5] with 5 splits:
> * 0 <= x < 1
> * 1 <= x < 2 
> * 2 <= x < 3 
> * 3 <= x < 4 
> * 4 <= x <= 5
> Notice that all splits have equal count of numbers except the last one having 
> two numbers - 4 and 5. This becomes very huge issue when for example user 
> needs to create one split per one partition as one mapper will end up with 
> moving two partitions and thus entire job will take twice as long as the 
> other ones.
> Jarcec

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to