[
https://issues.apache.org/jira/browse/DRILL-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13803892#comment-13803892
]
Ashish Paliwal commented on DRILL-230:
--------------------------------------
[~sphillips] [~jnadeau] Can the review be kept open for 2-3 more days? I have
worked extensively on Caching and would like to review in detail. Have few more
inputs but would like to read the code before adding them to review board.
[~sphillips] We can get on a hangout to discuss further details. Let me know if
it works.
> Build a sampling range partitioner
> ----------------------------------
>
> Key: DRILL-230
> URL: https://issues.apache.org/jira/browse/DRILL-230
> Project: Apache Drill
> Issue Type: New Feature
> Reporter: Jacques Nadeau
> Assignee: Steven Phillips
> Attachments: DRILL-230_2013-10-23_13:10:50.patch, DRILL-230.patch
>
>
> Create a new operator that caches a number of record batches and then
> coordinates across the cluster on the distribution of partitioning keys to
> try to determine a reasonable set of range partitions. The outgoing stream
> should include a partition key that is equal to the width of the receiving
> fragment.
> - histogram or similar should be held in the distributed cache
> - need to figure out the logic for how long to wait before the partitioning
> estimate is good enough.
> - need to update the partitioning sender so that we can drop the partitioning
> column rather than sending it onward.
--
This message was sent by Atlassian JIRA
(v6.1#6144)