Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1164#discussion_r174327511
  
    --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java 
---
    @@ -514,6 +516,22 @@ public boolean isNull(int index){
        *   The equivalent Java primitive is '${minor.javaType!type.javaType}'
        *
        * NB: this class is automatically generated from ValueVectorTypes.tdd 
using FreeMarker.
    +   * </p>
    +   * <h2>Contract</h2>
    +   * <p>
    +   *   Variable length vectors do not support random writes. All set 
methods must be called for with a monotonically increasing consecutive sequence 
of indexes.
    --- End diff --
    
    This is very important to know. This is why spill-to-disk for hash agg will 
eventually cause a serious customer failure. Aggregate UDFs write to vectors to 
store intermediate group values. A "max" string can't. Instead, it writes to a 
Java object. That object will be lost on spill and reread. Will result in 
loosing prior max values and the aggregate starting over.
    
    So, this little note is not just a nuisance, it is the fatal flaw in how we 
handle the (albeit obscure) string aggregate values.


---

Reply via email to