[ https://issues.apache.org/jira/browse/DRILL-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907299#comment-14907299 ]
Parth Chandra commented on DRILL-3229: -------------------------------------- The union type looks good (haven't delved into the UnionListVector, though it doesn't look too far removed from the UnionVector). I'm missing some details - i) When do we create a Union type? ii) The Union Vector will have a map vector which will have a fields for each minor type. The fields will be nullable vectors of the corresponding minor type. For a given value, only one of the value vectors will have the bits field set. Is my understanding correct? A picture would be a big help. More importantly, can we write up a couple of notes on the big picture so I can see where this fits in? For instance, it is not clear in what cases we plan to use this. There are different use cases where changing schema is encountered. For instance, a large number of nulls followed by a schema that materializes is one frequently encountered case. The other common case is that of a primitive type that appears within quotes in a particular record and gets interpreted as a varchar. More complex cases can occur that have the same information represented differently eg a timestamp that is written either as as string or as a long. (I'm not yet considering the rather extreme example in the yelp data set where a null field shows up as an empty map). Which of these types of cases are we addressing with UnionVectors? Also, one question I've never resolved in my own mind is that of FieldMetadata. Does a ValueVector require FieldMetadata to describe it's structure? Or is it the other way around: FieldMetadata can be derived from the ValueVector. Either way, how do we define FieldMetadata for Union types? What is the impact on ODBC/JDBC, if any? Would a shared doc be a better way to discuss this? Then we can consolidate and add the result to https://drill.apache.org/docs/value-vectors/. > Create a new EmbeddedVector > --------------------------- > > Key: DRILL-3229 > URL: https://issues.apache.org/jira/browse/DRILL-3229 > Project: Apache Drill > Issue Type: Sub-task > Components: Execution - Codegen, Execution - Data Types, Execution - > Relational Operators, Functions - Drill > Reporter: Jacques Nadeau > Assignee: Steven Phillips > Fix For: Future > > > Embedded Vector will leverage a binary encoding for holding information about > type for each individual field. -- This message was sent by Atlassian JIRA (v6.3.4#6332)