[ 
https://issues.apache.org/jira/browse/DRILL-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15945830#comment-15945830
 ] 

Paul Rogers commented on DRILL-5285:
------------------------------------

Primarily an implementation issue. The QA-visible result is that the external 
sort does not run out of memory regardless of input record sizes or variations 
(as long as we stay away from memory-fragmentation issues.)

> Provide detailed, accurate estimate of size consumed by a record batch
> ----------------------------------------------------------------------
>
>                 Key: DRILL-5285
>                 URL: https://issues.apache.org/jira/browse/DRILL-5285
>             Project: Apache Drill
>          Issue Type: Sub-task
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: 1.10.0
>
>
> DRILL-5080 introduced a {{RecordBatchSizer}} that estimates the space taken 
> by a record batch and determines batch "density."
> Drill provides a large variety of vectors, each with their own internal 
> structure and collections of vectors. For example, fixed vectors use just a 
> data vector. Nullable vectors add an "is set" vector. Variable length vectors 
> add an offset vector. Repeated vectors add a second offset vector.
> The original {{RecordBatchSizer}} attempted to compute sizes for all these 
> vector types. But, the complexity got to be out of hand. This ticket requests 
> to simply bite the bullet and move the calculations into each vector type so 
> that the {{RecordBatchSizer}} can simply use the results of the calculations.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to