[GitHub] [drill] paul-rogers opened a new pull request #1690: DRILL-7086: Output schema for row set mechanism

GitBox Mon, 11 Mar 2019 22:22:51 -0700

paul-rogers opened a new pull request #1690: DRILL-7086: Output schema for row 
set mechanism
URL: https://github.com/apache/drill/pull/1690
 
 
   Enhances the row set mechanism to take an "output schema" that describes the 
vectors to create. The "input schema" describes the type that the reader would 
like to write. A conversion mechanism inserts a conversion shim to convert from 
the input to output type.
   
   The "output schema" will be the one provided by the new schema mechanism: a 
later PR will connect the mechanism here with the "output schema" provided in 
the physical plan.
   
   The output schema is optional. Within an output schema, the columns are 
optional. If not output schema is provided, or no column matches a given input 
column, then the input schema determines vector type as was the case before 
this enhancement.
   
   This PR includes several "starter" conversion classes. These are mostly 
prototypes, only the string-to-int version has been fully tested. The date, 
time and date time versions show the use of the new format property added to 
the column metadata class.
   
   A format or storage plugin can specify its own conversion rules. For 
example, the plugin for HBase might provide byte-array-to-whatever converters.
   
   An odd aspect of the current implementation is that the type conversion is 
done at reader open time. If the reader detects a column type which cannot be 
converted, using known rules, to the desired output type, the query will fail. 
This is odd because one might expect this error to be caught at plan time. But, 
Drill, of course, is schema-on-read, so read time is when we'd detect the 
conflict. May not be a problem in real tables.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers opened a new pull request #1690: DRILL-7086: Output schema for row set mechanism

Reply via email to