paul-rogers opened a new pull request #1690: DRILL-7086: Output schema for row set mechanism URL: https://github.com/apache/drill/pull/1690 Enhances the row set mechanism to take an "output schema" that describes the vectors to create. The "input schema" describes the type that the reader would like to write. A conversion mechanism inserts a conversion shim to convert from the input to output type. The "output schema" will be the one provided by the new schema mechanism: a later PR will connect the mechanism here with the "output schema" provided in the physical plan. The output schema is optional. Within an output schema, the columns are optional. If not output schema is provided, or no column matches a given input column, then the input schema determines vector type as was the case before this enhancement. This PR includes several "starter" conversion classes. These are mostly prototypes, only the string-to-int version has been fully tested. The date, time and date time versions show the use of the new format property added to the column metadata class. A format or storage plugin can specify its own conversion rules. For example, the plugin for HBase might provide byte-array-to-whatever converters. An odd aspect of the current implementation is that the type conversion is done at reader open time. If the reader detects a column type which cannot be converted, using known rules, to the desired output type, the query will fail. This is odd because one might expect this error to be caught at plan time. But, Drill, of course, is schema-on-read, so read time is when we'd detect the conflict. May not be a problem in real tables.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
