paul-rogers commented on a change in pull request #1429: DRILL-6676: Add Union, 
List and Repeated List types to Result Set Loader
URL: https://github.com/apache/drill/pull/1429#discussion_r209803508
 
 

 ##########
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/complex/ListVector.java
 ##########
 @@ -250,22 +270,140 @@ public void load(UserBitShared.SerializedField 
metadata, DrillBuf buffer) {
     bits.load(bitMetadata, buffer.slice(offsetLength, bitLength));
 
     final UserBitShared.SerializedField vectorMetadata = metadata.getChild(2);
-    if (getDataVector() == DEFAULT_DATA_VECTOR) {
+    if (isEmptyType()) {
       addOrGetVector(VectorDescriptor.create(vectorMetadata.getMajorType()));
     }
 
     final int vectorLength = vectorMetadata.getBufferLength();
     vector.load(vectorMetadata, buffer.slice(offsetLength + bitLength, 
vectorLength));
   }
 
+  public boolean isEmptyType() {
+    return getDataVector() == DEFAULT_DATA_VECTOR;
+  }
+
+  @Override
+  public void setChildVector(ValueVector childVector) {
+
+    // Unlike the repeated list vector, the (plain) list vector
+    // adds the dummy vector as a child type.
+
+    assert field.getChildren().size() == 1;
+    assert field.getChildren().iterator().next().getType().getMinorType() == 
MinorType.LATE;
+    field.removeChild(vector.getField());
+
+    super.setChildVector(childVector);
+
+    // Initial LATE type vector not added as a subtype initially.
+    // So, no need to remove it, just add the new subtype. Since the
+    // MajorType is immutable, must build a new one and replace the type
+    // in the materialized field. (We replace the type, rather than creating
+    // a new materialized field, to preserve the link to this field from
+    // a parent map, list or union.)
+
+    assert field.getType().getSubTypeCount() == 0;
+    field.replaceType(
+        field.getType().toBuilder()
+          .addSubType(childVector.getField().getType().getMinorType())
+          .build());
+  }
+
+  /**
+   * Promote the list to a union. Called from old-style writers. This 
implementation
+   * relies on the caller to set the types vector for any existing values.
+   * This method simply clears the existing vector.
+   *
+   * @return the new union vector
+   */
+
   public UnionVector promoteToUnion() {
-    MaterializedField newField = 
MaterializedField.create(getField().getName(), Types.optional(MinorType.UNION));
-    UnionVector vector = new UnionVector(newField, allocator, null);
+    UnionVector vector = createUnion();
+
+    // Replace the current vector, clearing its data. (This is the
+    // old behavior.
+
     replaceDataVector(vector);
     reader = new UnionListReader(this);
     return vector;
   }
 
+  /**
+   * Revised form of promote to union that correctly fixes up the list
+   * field metadata to match the new union type.
+   *
+   * @return the new union vector
+   */
+
+  public UnionVector promoteToUnion2() {
+    UnionVector unionVector = createUnion();
+
+    // Replace the current vector, clearing its data. (This is the
+    // old behavior.
+
+    setChildVector(unionVector);
+    return unionVector;
+  }
+
+  /**
+   * Promote to a union, preserving the existing data vector as a member of
+   * the new union. Back-fill the types vector with the proper type value
+   * for existing rows.
+   *
+   * @return the new union vector
+   */
+
+  public UnionVector convertToUnion(int allocValueCount, int valueCount) {
+    assert allocValueCount >= valueCount;
+    UnionVector unionVector = createUnion();
+    unionVector.allocateNew(allocValueCount);
+
+    // Preserve the current vector (and its data) if it is other than
+    // the default. (New behavior used by column writers.)
+
+    if (! isEmptyType()) {
+      unionVector.addType(vector);
+      int prevType = vector.getField().getType().getMinorType().getNumber();
+      UInt1Vector.Mutator typeMutator = 
unionVector.getTypeVector().getMutator();
+
+      // If the previous vector was nullable, then promote the nullable state
+      // to the type vector by setting either the null marker or the type
+      // marker depending on the original nullable values.
+
+      if (vector instanceof NullableVector) {
+        UInt1Vector.Accessor bitsAccessor =
+            ((UInt1Vector) ((NullableVector) 
vector).getBitsVector()).getAccessor();
+        for (int i = 0; i < valueCount; i++) {
+          typeMutator.setSafe(i, (bitsAccessor.get(i) == 0)
+              ? UnionVector.NULL_MARKER
+              : prevType);
+        }
+      } else {
+
+        // The value is not nullable. (Perhaps it is a map.)
+        // Note that the original design of lists have a flaw: if the sole 
member
+        // is a map, then map entries can't be nullable when the only type, but
+        // become nullable when in a union. What a mess...
 
 Review comment:
   The first thing to understand is that union vectors are broken in most 
operators, and so major bugs remain in union vectors that have not had to be 
fixed. This code fixes or works around some of the bugs.
   
   Unions have a null bit for the union itself. That is, a union can be an int, 
say, or. a Varchar, or null. Oddly, the Int and Varchar can also be null (we 
use nullable vectors so we can mark the unused values as null.) So, we have a 
two-level null bit. The most logical way to interpret it is that a union value 
can be:
   
   * Untyped null (if the type is not set and the null bit (really, the isSet 
bit) is unset.)
   * Typed null if the type is set and EITHER the union's isSet bit is unset OR 
the union's isSet bit is set, but the data vector's isSet bit is not set.
   
   It is not clear in the code which convention is assumed, or if different 
code does it differently.
   
   Now, add all that to a list. A list can be a list of something (ints, say, 
or maps.) When the list is a list of maps, the entire value for a row can be 
null. But individual maps can't be null. In a list, however, individual ints 
can be null (because we use a nullable int vector.)
   
   So, when a list of (non-nullable maps) converts to a list of unions (one of 
which is a map), we suddenly now have the list null bit and the union null bit 
to worry about. We have to go and back-patch the isSet vector for all the 
existing map entries in the new union so that we don't end up with all previous 
entries becoming null by default.
   
   See why the code notes that this is a mess?
   
   And, it is hard to file a bug because this is a design problem. If the list 
and union vectors don't need to work (they barely work today), then any design 
is fine.
   
   See DRILL-5955, DRILL-6037, DRILL-5958, DRILL-6046, DRILL-6062, DRILL-3806 
to get a flavor for the issues.
   
   Fundamental issue: should Drill support unions and lists? Is the current 
approach compatible with SQL? Is there a better approach?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to