Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/1164#discussion_r174327511
--- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java
---
@@ -514,6 +516,22 @@ public boolean isNull(int index){
* The equivalent Java primitive is '${minor.javaType!type.javaType}'
*
* NB: this class is automatically generated from ValueVectorTypes.tdd
using FreeMarker.
+ * </p>
+ * <h2>Contract</h2>
+ * <p>
+ * Variable length vectors do not support random writes. All set
methods must be called for with a monotonically increasing consecutive sequence
of indexes.
--- End diff --
This is very important to know. This is why spill-to-disk for hash agg will
eventually cause a serious customer failure. Aggregate UDFs write to vectors to
store intermediate group values. A "max" string can't. Instead, it writes to a
Java object. That object will be lost on spill and reread. Will result in
loosing prior max values and the aggregate starting over.
So, this little note is not just a nuisance, it is the fatal flaw in how we
handle the (albeit obscure) string aggregate values.
---