[ 
https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886812#comment-15886812
 ] 

ASF GitHub Bot commented on DRILL-5284:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/761#discussion_r103333813
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
 ---
    @@ -948,50 +1027,50 @@ private void updateMemoryEstimates(long memoryDelta, 
RecordBatchSizer sizer) {
         // spill batches of either 64K records, or as many records as fit into 
the
         // amount of memory dedicated to each spill batch, whichever is less.
     
    -    spillBatchRowCount = (int) Math.max(1, spillBatchSize / 
estimatedRowWidth);
    +    spillBatchRowCount = (int) Math.max(1, preferredSpillBatchSize / 
estimatedRowWidth / 2);
    --- End diff --
    
    Yes. Another wonderful Drill artifact. Suppose we have 1023 bytes of data. 
We will allocate a vector of 1024 bytes in size. Suppose we have 1025 bytes of 
data. (Just 0.2% more.) We allocate a vector of 2048 bytes.
    
    Now, we could be more conservative and assume that, on average, each vector 
will bye 3/4 full, so we should us a factor of 1.5 for the calcs. We can file a 
JIRA and experiment with this change as a future enhancement.
    
    It would also help if the allocator didn't kill the query if we allocate 
even one extra byte. But, since math errors are fatal, we are 
super-conservative for now.


> Roll-up of final fixes for managed sort
> ---------------------------------------
>
>                 Key: DRILL-5284
>                 URL: https://issues.apache.org/jira/browse/DRILL-5284
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: 1.10.0
>
>
> The managed external sort was introduced in DRILL-5080. Since that time, 
> extensive testing has identified a number of minor fixes and improvements. 
> Given the long PR cycles, it is not practical to spend a week or two to do a 
> PR for each fix individually. This ticket represents a roll-up of a 
> combination of a number of fixes. Small fixes are listed here, larger items 
> appear as sub-tasks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to