[
https://issues.apache.org/jira/browse/TAJO-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084310#comment-14084310
]
ASF GitHub Bot commented on TAJO-966:
-------------------------------------
Github user jihoonson commented on a diff in the pull request:
https://github.com/apache/tajo/pull/91#discussion_r15740968
--- Diff:
tajo-core/src/main/java/org/apache/tajo/engine/planner/UniformRangePartition.java
---
@@ -94,25 +100,81 @@ public UniformRangePartition(TupleRange range,
SortSpec [] sortSpecs) {
}
List<TupleRange> ranges = Lists.newArrayList();
- BigDecimal term = reverseCardsForDigit[0].divide(
- new BigDecimal(partNum), RoundingMode.CEILING);
- BigDecimal reminder = reverseCardsForDigit[0];
- Tuple last = range.getStart();
- while(reminder.compareTo(new BigDecimal(0)) > 0) {
+
+ BigDecimal x = new BigDecimal(reverseCardsForDigit[0]);
+
+ BigInteger term = x.divide(BigDecimal.valueOf(partNum),
RoundingMode.CEILING).toBigInteger();
+ BigInteger reminder = reverseCardsForDigit[0];
+ Tuple last = mergedRange.getStart();
+ TupleRange tupleRange;
+ while(reminder.compareTo(BigInteger.ZERO) > 0) {
if (reminder.compareTo(term) <= 0) { // final one is inclusive
- ranges.add(new TupleRange(sortSpecs, last, range.getEnd()));
+ tupleRange = new TupleRange(sortSpecs, last, mergedRange.getEnd());
} else {
- Tuple next = increment(last, term.longValue(), variableId);
- ranges.add(new TupleRange(sortSpecs, last, next));
+ Tuple next = increment(last, term, variableId);
+ tupleRange = new TupleRange(sortSpecs, last, next);
}
+
+ ranges.add(tupleRange);
last = ranges.get(ranges.size() - 1).getEnd();
reminder = reminder.subtract(term);
}
+ for (TupleRange r : ranges) {
+ denormalize(sortSpecs, r);
+ }
+
return ranges.toArray(new TupleRange[ranges.size()]);
}
/**
+ * It normalizes the start and end keys to have the same length bytes if
they are texts or bytes.
+ *
+ * @param sortSpecs The sort specs
+ * @param range Tuple range to be normalize
+ */
+ public static void normalize(final SortSpec [] sortSpecs, TupleRange
range) {
+ // normalize text fields to have same bytes length
+ for (int i = 0; i < sortSpecs.length; i++) {
+ if (sortSpecs[i].getSortKey().getDataType().getType() ==
TajoDataTypes.Type.TEXT) {
--- End diff --
According to your comment, the type can be BLOB as well as TEXT, right?
> Range partition should support split of multiple characters.
> ------------------------------------------------------------
>
> Key: TAJO-966
> URL: https://issues.apache.org/jira/browse/TAJO-966
> Project: Tajo
> Issue Type: Improvement
> Components: data shuffle
> Reporter: Hyunsik Choi
> Assignee: Hyunsik Choi
> Fix For: 0.9.0
>
>
> Currently, range partition does not support split of multiple characters. As
> a result, it only consider the first character when Tajo does range
> partitioning against TEXT or VARCHAR fields. This approach sometimes results
> in skewed ranges, and it causes performance degradation.
> We should fix it.
--
This message was sent by Atlassian JIRA
(v6.2#6252)