[ 
https://issues.apache.org/jira/browse/DRILL-7633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17065389#comment-17065389
 ] 

ASF GitHub Bot commented on DRILL-7633:
---------------------------------------

paul-rogers commented on pull request #2018: DRILL-7633: Fixes for union and 
repeated list accessors
URL: https://github.com/apache/drill/pull/2018#discussion_r396947139
 
 

 ##########
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/VariantColumnMetadata.java
 ##########
 @@ -96,11 +134,13 @@ public StructureType structureType() {
   public boolean isVariant() { return true; }
 
   @Override
-  public boolean isArray() { return type() == MinorType.LIST; }
+  public boolean isArray() {
+    return super.isArray() || type() == MinorType.LIST;
+  }
 
   @Override
   public ColumnMetadata cloneEmpty() {
-    return new VariantColumnMetadata(name, type, variantSchema.cloneEmpty());
+    return new VariantColumnMetadata(name, type, mode, new VariantSchema());
 
 Review comment:
   @arina-ielchiieva, thanks for the reminder; I did miss your response.
   
   I think I see where we had a misunderstanding. I thought the `typeString()` 
code already worked because I saw this in `AbstractColumnMetadata`:
   
   ```
     public String typeString() {
       return majorType().toString();
     }
   ```
   
    `PrimitiveColumnMetadata` uses a different set of type names.
   
   It seems this area has gotten a bit muddled: the  `AbstractColumnMetadata` 
version seems to never be called, except for `VariantColumnMetadata`. (This 
confused the heck out of me on an earlier attempt to clean up this area, but I 
didn't clue into the problem then.)
   
   So, now that I understand what you are asking for, I did this:
   
   * Removed the `AbstractColumnMetadata` version.
   * Added a  `VariantColumnMetadata` that produces either `UNION` or 
`ARRAY<UNION>`.
   
   Until I see a good reason otherwise, I think we should treat `UNION` (what I 
call a "variant") as an opaque type. I guess I don't see the contents of a 
variant as something we want to specify, either in the metastore or in a 
provided schema. For example, can the metastore compute an NDV across an `INT`, 
`VARCHAR` and `MAP`? Probably not. Nor do the min/max values or other stats 
make sense for a variant. As a result, a variant is an opaque type for the 
metastore.
   
   Similarly, in a provided schema, I can't convince myself that the user wants 
not only to say, "the type of this field can vary", but also that "the type can 
be an `INT`, `DOUBLE` or `VARCHAR`, but not a `BIGINT`.) Just does not seem 
super-useful.
   
   An additional concern, which I think I mentioned somewhere else, is that as 
soon as we start serializing complex types, the SQL-like text format becomes 
far too cumbersome. We'd be much better of with a JSON format that can capture 
the complex tree structure. Compare:
   
   ```
   (A UNION<INT, MAP(B UNION<VARCHAR, MAP(C BIGINT)>)>)
   ```
   With something like:
   ```
   { schema: [
     {name: "a",
     type: {
        name: "UNION",
        nullable: true,
        subtypes: [
         {name: INT, nullable: false},
         {name: MAP, nullable: false},
         members: [ ...
   ```
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Fixes for union and repeated list accessors
> -------------------------------------------
>
>                 Key: DRILL-7633
>                 URL: https://issues.apache.org/jira/browse/DRILL-7633
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.17.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>             Fix For: 1.18.0
>
>
> Minor fixes for repeated list and Union type support in column accessors



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to