[ 
https://issues.apache.org/jira/browse/DRILL-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171154#comment-16171154
 ] 

ASF GitHub Bot commented on DRILL-5657:
---------------------------------------

Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/914
  
    This commit introduces a feature to limit memory consumed by a batch.
    
    ### Batch Size Limits
    
    With this change, the code now has three overlapping limits:
    
    * The traditional row-count limit.
    * A maximum limit of 16 MB per vector.
    * The new memory-per-batch limit.
    
    ### Overall Flow for Limiting Batch Memory Usage
    
    The batch size limit builds on the work already done for overflow.
    
    * The column metadata allows the client to specify allocation hints such as 
expected Varchar width and array cardinality.
    * The result set loader allocates a batch using the hints and target row 
count.
    * The result set loader measures the memory allocated above. This is the 
initial batch size.
    * As the writers find the need to extend a vector, the writer calls a 
listener to ask if the extension is allowed, passing in the amount of growth 
expected.
    * The result set loader adds the delta to the accumulated total, compares 
this against the size limit, and returns whether the resize is allowed.
    * If the resize is not allowed, an overflow is triggered.
    
    Note that the above reuses the overflow mechanism, allowing the size limit 
to be handled even if reached in the middle of a row.
    
    ### Implementation Details
    
    To make the above work:
    
    * A new batch size limit is added to the result set loader options.
    * The batch size tracking code is added. This required a new method in the 
value vectors to report actual allocated memory.
    * The scalar accessors are refactored to add in the batch size limitation 
without introducing duplicated code. Code moved from the template to base 
classes to factor out redundancy.
    * General code clean-up in the vector limit found while doing the above 
work.
    * Unit tests for the new mechanism.


> Implement size-aware result set loader
> --------------------------------------
>
>                 Key: DRILL-5657
>                 URL: https://issues.apache.org/jira/browse/DRILL-5657
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: Future
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: Future
>
>
> A recent extension to Drill's set of test tools created a "row set" 
> abstraction to allow us to create, and verify, record batches with very few 
> lines of code. Part of this work involved creating a set of "column 
> accessors" in the vector subsystem. Column readers provide a uniform API to 
> obtain data from columns (vectors), while column writers provide a uniform 
> writing interface.
> DRILL-5211 discusses a set of changes to limit value vectors to 16 MB in size 
> (to avoid memory fragmentation due to Drill's two memory allocators.) The 
> column accessors have proven to be so useful that they will be the basis for 
> the new, size-aware writers used by Drill's record readers.
> A step in that direction is to retrofit the column writers to use the 
> size-aware {{setScalar()}} and {{setArray()}} methods introduced in 
> DRILL-5517.
> Since the test framework row set classes are (at present) the only consumer 
> of the accessors, those classes must also be updated with the changes.
> This then allows us to add a new "row mutator" class that handles size-aware 
> vector writing, including the case in which a vector fills in the middle of a 
> row.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to