[jira] [Assigned] (DRILL-5209) Standardize Drill's batch size

Paul Rogers (JIRA) Mon, 19 Jun 2017 11:33:53 -0700

     [ 
https://issues.apache.org/jira/browse/DRILL-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Paul Rogers reassigned DRILL-5209:
----------------------------------

    Assignee: Paul Rogers

> Standardize Drill's batch size
> ------------------------------
>
>                 Key: DRILL-5209
>                 URL: https://issues.apache.org/jira/browse/DRILL-5209
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.9.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>
> Drill is columnar, implemented as a set of value vectors. Value vectors 
> consume memory, which is a fixed resource on each Drillbit. Effective 
> resource management requires the ability to control (or at least predict) 
> resource usage.
> Most data consists of more than one column. A collection of columns (or rows, 
> depending on your perspective) is a record batch.
> Many parts of Drill use 64K rows as the target size of a record batch. The 
> Flatten operator targets batch sizes of 512 MB. The text scan operator 
> appears to target batch sizes of 128 MB. Other operators may use other sizes.
> Operators that target 64K rows use, essentially, unknown and potentially 
> unlimited amounts of memory. While 64K rows of an integer each is fine, 64K 
> rows of Varchar columns of 50K each leads to a batch of 3.2 GB in size, which 
> is rather large.
> This ticket requests three improvements.
> 1. Define a preferred batch size which is a balance between various needs: 
> memory use, network efficiency, benefits of vector operations, etc.
> 2. Provide a reliable way to learn the size of each row as it is added to a 
> batch.
> 3. Use the above to limit batches to the preferred batch size.
> The above will go a long way to easing the task of managing memory because 
> the planner will have some hope of understanding how much memory to allocate 
> to various operations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (DRILL-5209) Standardize Drill's batch size

Reply via email to