[
https://issues.apache.org/jira/browse/DRILL-230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steven Phillips updated DRILL-230:
----------------------------------
Attachment: DRILL-230_2013-10-25_05:15:16.patch
> Build a sampling range partitioner
> ----------------------------------
>
> Key: DRILL-230
> URL: https://issues.apache.org/jira/browse/DRILL-230
> Project: Apache Drill
> Issue Type: New Feature
> Reporter: Jacques Nadeau
> Assignee: Steven Phillips
> Attachments: DRILL-230_2013-10-23_13:10:50.patch,
> DRILL-230_2013-10-25_05:15:16.patch, DRILL-230_2013-10-25_05:16:58.patch,
> DRILL-230.patch
>
>
> Create a new operator that caches a number of record batches and then
> coordinates across the cluster on the distribution of partitioning keys to
> try to determine a reasonable set of range partitions. The outgoing stream
> should include a partition key that is equal to the width of the receiving
> fragment.
> - histogram or similar should be held in the distributed cache
> - need to figure out the logic for how long to wait before the partitioning
> estimate is good enough.
> - need to update the partitioning sender so that we can drop the partitioning
> column rather than sending it onward.
--
This message was sent by Atlassian JIRA
(v6.1#6144)