Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1101#discussion_r165516121
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
    @@ -215,6 +206,7 @@ public BatchHolder() {
               MaterializedField outputField = materializedValueFields[i];
               // Create a type-specific ValueVector for this value
               vector = TypeHelper.getNewVector(outputField, allocator);
    +          int columnSize = new RecordBatchSizer.ColumnSize(vector).estSize;
    --- End diff --
    
    Thanks for the info @paul-rogers . Since we want to have the 
RecordBatchSizer be a one stop shop for size related information I propose to 
do the following to address @ppadma 's concerns:
    
     - Remove changes to estSize in the RecordBatchSizer that I introduced.
     - Add a field called knownSize to the RecordBatchSizer. This field will 
contain the known column width for FixedWidthVectors and will be -1 for things 
like varchars and repeated columns. Then we can add a method getKnownSize() for 
a column size. This method will return the known size if available and will 
throw an exception if the column is not FixedWidth. So this method should only 
be used by the caller when they know they are dealing with FixedWidth columns 
as is the case with HashAgg for now (things will probably change when strings 
will be stored in varchars instead of objects). The rest of RecordBatchSizer 
will remain as is.
     - Add a constant called FRAGMENTATION_FACTOR (1.33) that people can use to 
multiply their custom crafted batch sizes by when they are estimating batch 
sizes from scratch. In HashAgg I would multiply the custom batch size I compute 
by this factor.
    
    Thoughts?
    



---

Reply via email to