[jira] [Commented] (DRILL-3623) Limit 0 should avoid execution when querying a known schema

ASF GitHub Bot (JIRA) Tue, 22 Mar 2016 14:06:52 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207311#comment-15207311
 ]


ASF GitHub Bot commented on DRILL-3623:
---------------------------------------

Github user sudheeshkatkam commented on the pull request:

    https://github.com/apache/drill/pull/405#issuecomment-200026538
  
    Thank you for the reviews.
    
    All regression tests passed; I am running unit tests right now.
    
    Note that, the `planner.enable_limit0_optimization` option is disabled by 
default. To summarize (and document) the limitations:
    
    If, during validation, the planner is able to resolve that the types of the 
columns (i.e. types are non late binding), the shorter execution path is taken. 
Some types are excluded:
    + DECIMAL type is not fully supported in general.
    + VARBINARY is not fully tested.
    + MAP, ARRAY are currently not exposed to the planner.
    + TINYINT, SMALLINT are defined in the Drill type system but have been 
turned off for now.
    + SYMBOL, MULTISET, DISTINCT, STRUCTURED, ROW, OTHER, CURSOR, COLUMN_LIST 
are Calcite types currently not supported by Drill, nor defined in the Drill 
type list.
    
    Three scenarios when the planner can do type resolution during validation:
    + Queries on Hive tables
    + Queries with explicit casts on table columns, example: `SELECT CAST(col1 
AS BIGINT), ABS(CAST(col2 AS INTEGER)) FROM table;`
    + Queries on views with casts on table columns
    
    In the latter two cases, the schema of the query with LIMIT 0 clause has 
relaxed nullability compared to the query without the LIMIT 0 clause. Example:
    Say the schema definition of the Parquet file (`numbers.parquet`) is:
    ```
    message Numbers {
      required int col1;
      optional int col2;
     }
    ```
    
    Since the view definition does not specify nullability of columns, and 
schema of a parquet file is not yet leveraged by Drill's planner:
    ```
    CREATE VIEW dfs.tmp.mynumbers AS SELECT CAST(col1 AS INTEGER) as col1, 
CAST(col2 AS INTEGER) AS col2 FROM dfs.tmp.`numbers.parquet`;
    ```
    (1) For query with LIMIT 0 clause, since the file/ metadata is not read, 
Drill assumes the nullability of both columns is 
[`columnNullable`](https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html#columnNullable).
    ```
    SELECT col1, col2 FROM dfs.tmp.mynumbers LIMIT 0;
    ```
    
    (2) For query without LIMIT 0 clause, since the file is read, Drill knows 
the nullability of `col1` is 
[`columnNoNulls`](https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html#columnNoNulls),
 and `col2` is 
[`columnNullable`](https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html#columnNullable).
    ```
    SELECT col1, col2 FROM dfs.tmp.mynumbers LIMIT 1;
    ```


> Limit 0 should avoid execution when querying a known schema
> -----------------------------------------------------------
>
>                 Key: DRILL-3623
>                 URL: https://issues.apache.org/jira/browse/DRILL-3623
>             Project: Apache Drill
>          Issue Type: Sub-task
>          Components: Storage - Hive
>    Affects Versions: 1.1.0
>         Environment: MapR cluster
>            Reporter: Andries Engelbrecht
>            Assignee: Sudheesh Katkam
>              Labels: doc-impacting
>             Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3623) Limit 0 should avoid execution when querying a known schema

Reply via email to