Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1090#discussion_r163456045
  
    --- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/xsort/managed/TestSortImpl.java
 ---
    @@ -466,10 +469,10 @@ public void runLargeSortTest(OperatorFixture fixture, 
DataGenerator dataGen,
     
       public void runJumboBatchTest(OperatorFixture fixture, int rowCount) {
         timer.reset();
    -    DataGenerator dataGen = new DataGenerator(fixture, rowCount, 
Character.MAX_VALUE);
    -    DataValidator validator = new DataValidator(rowCount, 
Character.MAX_VALUE);
    +    DataGenerator dataGen = new DataGenerator(fixture, rowCount, 
ValueVector.MAX_ROW_COUNT);
    --- End diff --
    
    Well... As it turns out, `ValueVector.MAX_ROW_COUNT` is 64K, which the the 
maximum size an SV2 can address. (An SV2 is 16 bits wide.) `Integer.MAX_VALUE` 
is 2^32, which would require a 32-bit SV2, which we don't have. So, using the 
`Integer.MAX_VALUE` would cause the test to fail as the sorter could not sort 
batches larger than 64K...
    
    Prior we used to use `Character.MAX_VALUE`, but it is not intuitively 
obvious that our batch size should be correlated to the size of Java's UTF-16 
character encoding... And, in fact, the original bug is that they are not 
correlated: `Character.MAX_VALUE` is 65535, while `ValueVector.MAX_ROWS` is 
65536. As a result, we were not testing the full-batch corner case.


---

Reply via email to