[
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207311#comment-15207311
]
ASF GitHub Bot commented on DRILL-3623:
---------------------------------------
Github user sudheeshkatkam commented on the pull request:
https://github.com/apache/drill/pull/405#issuecomment-200026538
Thank you for the reviews.
All regression tests passed; I am running unit tests right now.
Note that, the `planner.enable_limit0_optimization` option is disabled by
default. To summarize (and document) the limitations:
If, during validation, the planner is able to resolve that the types of the
columns (i.e. types are non late binding), the shorter execution path is taken.
Some types are excluded:
+ DECIMAL type is not fully supported in general.
+ VARBINARY is not fully tested.
+ MAP, ARRAY are currently not exposed to the planner.
+ TINYINT, SMALLINT are defined in the Drill type system but have been
turned off for now.
+ SYMBOL, MULTISET, DISTINCT, STRUCTURED, ROW, OTHER, CURSOR, COLUMN_LIST
are Calcite types currently not supported by Drill, nor defined in the Drill
type list.
Three scenarios when the planner can do type resolution during validation:
+ Queries on Hive tables
+ Queries with explicit casts on table columns, example: `SELECT CAST(col1
AS BIGINT), ABS(CAST(col2 AS INTEGER)) FROM table;`
+ Queries on views with casts on table columns
In the latter two cases, the schema of the query with LIMIT 0 clause has
relaxed nullability compared to the query without the LIMIT 0 clause. Example:
Say the schema definition of the Parquet file (`numbers.parquet`) is:
```
message Numbers {
required int col1;
optional int col2;
}
```
Since the view definition does not specify nullability of columns, and
schema of a parquet file is not yet leveraged by Drill's planner:
```
CREATE VIEW dfs.tmp.mynumbers AS SELECT CAST(col1 AS INTEGER) as col1,
CAST(col2 AS INTEGER) AS col2 FROM dfs.tmp.`numbers.parquet`;
```
(1) For query with LIMIT 0 clause, since the file/ metadata is not read,
Drill assumes the nullability of both columns is
[`columnNullable`](https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html#columnNullable).
```
SELECT col1, col2 FROM dfs.tmp.mynumbers LIMIT 0;
```
(2) For query without LIMIT 0 clause, since the file is read, Drill knows
the nullability of `col1` is
[`columnNoNulls`](https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html#columnNoNulls),
and `col2` is
[`columnNullable`](https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html#columnNullable).
```
SELECT col1, col2 FROM dfs.tmp.mynumbers LIMIT 1;
```
> Limit 0 should avoid execution when querying a known schema
> -----------------------------------------------------------
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
> Issue Type: Sub-task
> Components: Storage - Hive
> Affects Versions: 1.1.0
> Environment: MapR cluster
> Reporter: Andries Engelbrecht
> Assignee: Sudheesh Katkam
> Labels: doc-impacting
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)