Jihoon Son created TAJO-1995:
--------------------------------
Summary: Improve range partitioning using histogram
Key: TAJO-1995
URL: https://issues.apache.org/jira/browse/TAJO-1995
Project: Tajo
Issue Type: New Feature
Components: QueryMaster
Reporter: Jihoon Son
Assignee: Jihoon Son
Fix For: 0.12.0
Currently implemented range repartition algorithm has two major problems as
follows:
* It assumes that data distribution is uniform, so is fragile for skewed data
distribution.
* Given floating point values, it ignores the numbers to the right to the
decimal point, so is difficult to guess the proper partition number.
One of the solutions for this problem is to use the histogram. With a
histogram, we can figure out data distribution and provide a proper handling of
floating point values.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)