[ 
https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886813#comment-15886813
 ] 

ASF GitHub Bot commented on DRILL-5284:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/761#discussion_r103333406
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
 ---
    @@ -934,6 +1005,14 @@ private void updateMemoryEstimates(long memoryDelta, 
RecordBatchSizer sizer) {
         long origInputBatchSize = estimatedInputBatchSize;
         estimatedInputBatchSize = Math.max(estimatedInputBatchSize, 
actualBatchSize);
     
    +    // The row width may end up as zero if all fields are nulls or some
    +    // other unusual situation. In this case, assume a width of 10 just
    +    // to avoid lots of special case code.
    +
    +    if (estimatedRowWidth == 0) {
    +      estimatedRowWidth = 10;
    --- End diff --
    
    This is a very peculiar case that came up in testing. It seems that we can 
have a row with one column and that one column is always null. Imagine a 
Parquet file that has 1 million Varchars, all of which are null. In every 
batch, the row width will be 0. Since we often divide by the row width, bad 
things happen. So, here, we arbitrarily say that if the row is abnormally 
small, just assume 10 bytes to avoid the need for a bunch of special case 
calcs. (The calcs are already too complex already.)
    
    If there are 1000 columns, all of which are null, we would write 1000 "bit" 
(really byte) vectors, so each row would be 1000 bytes wide. But, in such a 
case, the batch analyzer should have come up with a number other than 0 for the 
row width.


> Roll-up of final fixes for managed sort
> ---------------------------------------
>
>                 Key: DRILL-5284
>                 URL: https://issues.apache.org/jira/browse/DRILL-5284
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: 1.10.0
>
>
> The managed external sort was introduced in DRILL-5080. Since that time, 
> extensive testing has identified a number of minor fixes and improvements. 
> Given the long PR cycles, it is not practical to spend a week or two to do a 
> PR for each fix individually. This ticket represents a roll-up of a 
> combination of a number of fixes. Small fixes are listed here, larger items 
> appear as sub-tasks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to