It is meant to represent that a data type has been materialized for a given column. As we can have column with the ANY type at planning time, discovery of type is the distinction made when we create one of the MaterializedField s.
Unfortunately there is no explicit concept for a non-MaterializedField, we currently represent this implicitly, as a column lacking both data and type information is considered equivalent to a column that does not exist. The main area where this shows in reading JSON files. If a file contains only nulls, we don't know a type. The current behavior is to defer adding the field to the schema at all, unless we can create a MaterializedField with the type information. Later when the column is requested as part of an expression, or when we need to send the final list of requested columns to the client, we will materialize the type nullable bigint. Unfortunately this only "solves" a very limited case, and can cause odd behavior in a number of other cases, pretty much anything where the user expects to actually operate on a file with typeless nulls. On Thu, Jul 16, 2015 at 11:04 PM, Daniel Barclay <[email protected]> wrote: > What exactly is materialized about class > org.apache.drill.exec.record.MaterializedField? > > The name gave me the impression that it would be a field/column with > its data materialized (as a materialized view has copies of data). > > However, MaterializedField doesn't seem to contain data values (just > field metadata like the name/pathname and data type). > > So what exactly does the class represent? (What's materialized, > and relative to what?) > > Daniel > -- > Daniel Barclay > MapR Technologies >
