paul-rogers commented on a change in pull request #2018: DRILL-7633: Fixes for 
union and repeated list accessors
URL: https://github.com/apache/drill/pull/2018#discussion_r396947139
 
 

 ##########
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/VariantColumnMetadata.java
 ##########
 @@ -96,11 +134,13 @@ public StructureType structureType() {
   public boolean isVariant() { return true; }
 
   @Override
-  public boolean isArray() { return type() == MinorType.LIST; }
+  public boolean isArray() {
+    return super.isArray() || type() == MinorType.LIST;
+  }
 
   @Override
   public ColumnMetadata cloneEmpty() {
-    return new VariantColumnMetadata(name, type, variantSchema.cloneEmpty());
+    return new VariantColumnMetadata(name, type, mode, new VariantSchema());
 
 Review comment:
   @arina-ielchiieva, thanks for the reminder; I did miss your response.
   
   I think I see where we had a misunderstanding. I thought the `typeString()` 
code already worked because I saw this in `AbstractColumnMetadata`:
   
   ```
     public String typeString() {
       return majorType().toString();
     }
   ```
   
    `PrimitiveColumnMetadata` uses a different set of type names.
   
   It seems this area has gotten a bit muddled: the  `AbstractColumnMetadata` 
version seems to never be called, except for `VariantColumnMetadata`. (This 
confused the heck out of me on an earlier attempt to clean up this area, but I 
didn't clue into the problem then.)
   
   So, now that I understand what you are asking for, I did this:
   
   * Removed the `AbstractColumnMetadata` version.
   * Added a  `VariantColumnMetadata` that produces either `UNION` or 
`ARRAY<UNION>`.
   
   Until I see a good reason otherwise, I think we should treat `UNION` (what I 
call a "variant") as an opaque type. I guess I don't see the contents of a 
variant as something we want to specify, either in the metastore or in a 
provided schema. For example, can the metastore compute an NDV across an `INT`, 
`VARCHAR` and `MAP`? Probably not. Nor do the min/max values or other stats 
make sense for a variant. As a result, a variant is an opaque type for the 
metastore.
   
   Similarly, in a provided schema, I can't convince myself that the user wants 
not only to say, "the type of this field can vary", but also that "the type can 
be an `INT`, `DOUBLE` or `VARCHAR`, but not a `BIGINT`.) Just does not seem 
super-useful.
   
   An additional concern, which I think I mentioned somewhere else, is that as 
soon as we start serializing complex types, the SQL-like text format becomes 
far too cumbersome. We'd be much better of with a JSON format that can capture 
the complex tree structure. Compare:
   
   ```
   (A UNION<INT, MAP(B UNION<VARCHAR, MAP(C BIGINT)>)>)
   ```
   With something like:
   ```
   { schema: [
     {name: "a",
     type: {
        name: "UNION",
        nullable: true,
        subtypes: [
         {name: INT, nullable: false},
         {name: MAP, nullable: false},
         members: [ ...
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to