GitHub user ilooner opened a pull request:

    https://github.com/apache/drill/pull/1101

    DRILL-6032: Made the batch sizing for HashAgg more accurate.

    ## RecordBatchSizer changes
    
     - The RecordBatchSizer previously computed fixed width column sizes by 
measuring the total size of a vector and dividing by the number of elements. 
Because of this the RecordBatchSizer would return a zero size for FixedWidth 
vectors that had no data. So I added a method to FixedWidth vectors to get the 
size of a record and use that method to compute the column width in the 
RecordBatchSizer.
 - In some cases it was possible for the RecordBatchSizer 
to return a column width of 0, when it is not possible to have vectors with a 
width of 1 in practice. So I made the minimum column width returned by the 
RecordBatchSizer 1.
    

    ## HashAgg changes
    
     - Removed commented out code and unused variables.
     - Removed if statements for printing debug statements and instead used 
logger.debug
     - Removed the extraNonNullColumns and extraRowBytes tweak parameters for 
computing the sizes of batches
     - The RecordBatchSizer is used to compute the width of each column instead 
of adhoc custom logic. 
     - Using the real width of each column to estimate column sizes instead of 
taking the max width of all columns and assuming each column has the max width
     - Removed the assumption that varchars will not exceed 50 characters in 
length
     - Removed unnecessary condition checks in delayedSetup and 
updateEstMaxBatchSize

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ilooner/drill DRILL-6032

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/1101.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1101
    
----
commit b8682f6e09889e5ba334f36006fb9ed754f571f6
Author: Timothy Farkas <timothyfarkas@...>
Date:   2017-12-13T23:44:28Z

    DRILL-6032: Made the batch sizing for HashAgg more accurate.

----


---

Reply via email to