Jarek Jarcec Cecho created SQOOP-603:
----------------------------------------
Summary: Support small intervals in IntegerSplitter implementation
Key: SQOOP-603
URL: https://issues.apache.org/jira/browse/SQOOP-603
Project: Sqoop
Issue Type: Improvement
Affects Versions: 1.4.2
Reporter: Jarek Jarcec Cecho
Assignee: Jarek Jarcec Cecho
Fix For: 1.4.3
IntegerSplitter is currently creating splits of following nature:
{code}
minimal value <= x < splitPoint1
splitPoint1 <= x < splitPoint2
...
splitPointN <= x <= maximal value
{code}
Please notice that upper bound is always with using condition "<" with
exception of the last split that is using condition "<=". This is perfectly
fine when creating reasonable amount of splits on very huge interval.
This approach will however cause issues on very small intervals. For example
following splits will be created on interval [0, 5] with 5 splits:
* 0 <= x < 1
* 1 <= x < 2
* 2 <= x < 3
* 3 <= x < 4
* 4 <= x <= 5
Notice that all splits have equal count of numbers except the last one having
two numbers - 4 and 5. This becomes very huge issue when for example user needs
to create one split per one partition as one mapper will end up with moving two
partitions and thus entire job will take twice as long as the other ones.
Jarcec
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira