Github user sudheeshkatkam commented on the pull request:
https://github.com/apache/drill/pull/405#issuecomment-200026538
Thank you for the reviews.
All regression tests passed; I am running unit tests right now.
Note that, the `planner.enable_limit0_optimization` option is disabled by
default. To summarize (and document) the limitations:
If, during validation, the planner is able to resolve that the types of the
columns (i.e. types are non late binding), the shorter execution path is taken.
Some types are excluded:
+ DECIMAL type is not fully supported in general.
+ VARBINARY is not fully tested.
+ MAP, ARRAY are currently not exposed to the planner.
+ TINYINT, SMALLINT are defined in the Drill type system but have been
turned off for now.
+ SYMBOL, MULTISET, DISTINCT, STRUCTURED, ROW, OTHER, CURSOR, COLUMN_LIST
are Calcite types currently not supported by Drill, nor defined in the Drill
type list.
Three scenarios when the planner can do type resolution during validation:
+ Queries on Hive tables
+ Queries with explicit casts on table columns, example: `SELECT CAST(col1
AS BIGINT), ABS(CAST(col2 AS INTEGER)) FROM table;`
+ Queries on views with casts on table columns
In the latter two cases, the schema of the query with LIMIT 0 clause has
relaxed nullability compared to the query without the LIMIT 0 clause. Example:
Say the schema definition of the Parquet file (`numbers.parquet`) is:
```
message Numbers {
required int col1;
optional int col2;
}
```
Since the view definition does not specify nullability of columns, and
schema of a parquet file is not yet leveraged by Drill's planner:
```
CREATE VIEW dfs.tmp.mynumbers AS SELECT CAST(col1 AS INTEGER) as col1,
CAST(col2 AS INTEGER) AS col2 FROM dfs.tmp.`numbers.parquet`;
```
(1) For query with LIMIT 0 clause, since the file/ metadata is not read,
Drill assumes the nullability of both columns is
[`columnNullable`](https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html#columnNullable).
```
SELECT col1, col2 FROM dfs.tmp.mynumbers LIMIT 0;
```
(2) For query without LIMIT 0 clause, since the file is read, Drill knows
the nullability of `col1` is
[`columnNoNulls`](https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html#columnNoNulls),
and `col2` is
[`columnNullable`](https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html#columnNullable).
```
SELECT col1, col2 FROM dfs.tmp.mynumbers LIMIT 1;
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---