[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

ASF GitHub Bot (JIRA) Thu, 25 Jan 2018 18:25:24 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16340473#comment-16340473
 ]


ASF GitHub Bot commented on DRILL-6080:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1090#discussion_r164022284
  
    --- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/xsort/managed/TestSortImpl.java
 ---
    @@ -466,10 +469,10 @@ public void runLargeSortTest(OperatorFixture fixture, 
DataGenerator dataGen,
     
       public void runJumboBatchTest(OperatorFixture fixture, int rowCount) {
         timer.reset();
    -    DataGenerator dataGen = new DataGenerator(fixture, rowCount, 
Character.MAX_VALUE);
    -    DataValidator validator = new DataValidator(rowCount, 
Character.MAX_VALUE);
    +    DataGenerator dataGen = new DataGenerator(fixture, rowCount, 
ValueVector.MAX_ROW_COUNT);
    --- End diff --
    
    That is true. But, what advantage is there to passing an arbitrarily large 
number that doesn't represent a valid Drill batch size? The reader would see it 
as an error until they go and look at the code to see that the potential error 
is not really an error.
    
    The question is: is this a bug that needs to be fixed, or just a coding 
preference that we can let go for now?


> Sort incorrectly limits batch size to 65535 records rather than 65536
> ---------------------------------------------------------------------
>
>                 Key: DRILL-6080
>                 URL: https://issues.apache.org/jira/browse/DRILL-6080
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.12.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>             Fix For: 1.13.0
>
>
> Drill places an upper limit on the number of rows in a batch of 64K. That is 
> 65,536 decimal. When we index records, the indexes run from 0 to 64K-1 or 0 
> to 65,535.
> The sort code incorrectly uses {{Character.MAX_VALUE}} as the maximum row 
> count. So, if an incoming batch uses the full 64K size, sort ends up 
> splitting batches unnecessarily.
> The fix is to instead use the correct constant `ValueVector.MAX_ROW_COUNT`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

Reply via email to