[
https://issues.apache.org/jira/browse/DRILL-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14804637#comment-14804637
]
Steven Phillips commented on DRILL-3229:
----------------------------------------
Basic design outline:
A Union type represents a field where the type can vary between records. The
data for a field of type Union will be stored in a UnionVector.
h4. UnionVector
Internally uses a MapVector to hold the vectors for the various types.
The types include all of the MinorTypes, including List and Map.
For example, the internal MapVector will have a subfield named
"bigInt", which will refer to a NullableBigIntVector.
In addition to the vectors corresponding to the minor types, there will
be two additional fields, both represented by UInt1Vectors. These are
"bits" and "types", which will represent the nullability and types of
the underlying data. The "bits" vector will work the same way it works in other
nullable vectors. The "types" vector will store the number
corresponding to the value of the MinorType as defined in the protobuf
definition. There
will be mutator methods for setting null and type.
h4. UnionWriter
The UnionWriter implements and overwrites all of the methods of
FieldWriter. It holds field writers corresponding to each of the types included
in the underly
UnionVector, and delegates the method calls for each type to the
corresponding writer. For example, the BigIntWriter interface:
{code}
public interface BigIntWriter extends BaseWriter {
public void write(BigIntHolder h);
public void writeBigInt(long value);
}
{code}
UnionWriter overwrites these methods:
{code}
@Override
public void writeBigInt(long value) {
data.getMutator().setType(idx(), MinorType.BIGINT);
data.getMutator().setNotNull(idx());
getBigIntWriter().setPosition(idx());
getBigIntWriter().writeBigInt(value);
}
@Override
public void writeBigInt(BigIntHolder h) {
data.getMutator().setType(idx(), MinorType.BIGINT);
data.getMutator().setNotNull(idx());
getBigIntWriter().setPosition(idx());
getBigIntWriter().writeBigInt(holder.value);
}
{code}
This requires users of the interface to go through the UnionWriter,
rather than using the underlying BigIntWriter directly. Otherwise, the "type"
and "bits" vector would not get set correctly.
h4. UnionReader
Much the same as the UnionWriter, the UnionReader overwrites the
methods of FieldReader, and delegates to a corresponding specific FieldReader
implementation depending on which type
the current value is.
h4. UnionListVector
UnionListVector extends BaseRepeatedVector. It works much the same as
other Repeated vectors; there is a data vector and an offset vector. The data
vector in this case is a UnionVector.
h4. UnionListWriter
The UnionListWriter overrides all FieldWriter methods. When starting a
new list, the startList() method is called. This calls the startNewValue(int
index) method
of the underlying UnionListVector.Mutator. Subsequent calls to the
ListWriter methods (such as bigint()), return the UnionListWriter itself, and
calls to write are handled by calling
the appropriate method on the underlying UnionListVector.Mutator, which
handles updating the offset vector.
In the case that the map() method is called (i.e. repeated map), the
UnionListWriter is itself returned, but a state variable is updated to indicate
that it should oeprate as a MapWriter.
While in MapWriter mode, calls to the MapWriter methods will also
return the UnionListWriter itself, but will also update the field indicating
what the name of the current field is.
Subsequent writes to the ScalarWriter methods will write to the
underlying UnionVector using the UnionWriter interface.
For example,
{code}
UnionListWriter list;
...
list.startList();
list.map().bigInt("a").writeBigInt(1);
{code}
This code first indicates that a new list is starting. By doing this,
the offset vector is correctly set. Calling map() sets the internal state of
the writer to "MAP". bigInt("a") sets the current
field of the writer to "a", and writeBigInt(1) writes the value 1 to
the underlying UnionVector.
Another example:
{code}
MapWriter mapWriter = list.map().map("a")
{code}
In this case, the final call to map("a") delegates to the underlying
UnionWriter, and returns a new MapWriter, with the position set according to
the current offset.
> Create a new EmbeddedVector
> ---------------------------
>
> Key: DRILL-3229
> URL: https://issues.apache.org/jira/browse/DRILL-3229
> Project: Apache Drill
> Issue Type: Sub-task
> Components: Execution - Codegen, Execution - Data Types, Execution -
> Relational Operators, Functions - Drill
> Reporter: Jacques Nadeau
> Assignee: Steven Phillips
> Fix For: Future
>
>
> Embedded Vector will leverage a binary encoding for holding information about
> type for each individual field.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)