[GitHub] drill pull request: DRILL-3623: For limit 0 queries, use a shorter...

sudheeshkatkam Tue, 22 Mar 2016 14:06:10 -0700

Github user sudheeshkatkam commented on the pull request:

    https://github.com/apache/drill/pull/405#issuecomment-200026538
  
    Thank you for the reviews.
    
    All regression tests passed; I am running unit tests right now.
    
    Note that, the `planner.enable_limit0_optimization` option is disabled by 
default. To summarize (and document) the limitations:
    
    If, during validation, the planner is able to resolve that the types of the 
columns (i.e. types are non late binding), the shorter execution path is taken. 
Some types are excluded:
    + DECIMAL type is not fully supported in general.
    + VARBINARY is not fully tested.
    + MAP, ARRAY are currently not exposed to the planner.
    + TINYINT, SMALLINT are defined in the Drill type system but have been 
turned off for now.
    + SYMBOL, MULTISET, DISTINCT, STRUCTURED, ROW, OTHER, CURSOR, COLUMN_LIST 
are Calcite types currently not supported by Drill, nor defined in the Drill 
type list.
    
    Three scenarios when the planner can do type resolution during validation:
    + Queries on Hive tables
    + Queries with explicit casts on table columns, example: `SELECT CAST(col1 
AS BIGINT), ABS(CAST(col2 AS INTEGER)) FROM table;`
    + Queries on views with casts on table columns
    
    In the latter two cases, the schema of the query with LIMIT 0 clause has 
relaxed nullability compared to the query without the LIMIT 0 clause. Example:
    Say the schema definition of the Parquet file (`numbers.parquet`) is:
    ```
    message Numbers {
      required int col1;
      optional int col2;
     }
    ```
    
    Since the view definition does not specify nullability of columns, and 
schema of a parquet file is not yet leveraged by Drill's planner:
    ```
    CREATE VIEW dfs.tmp.mynumbers AS SELECT CAST(col1 AS INTEGER) as col1, 
CAST(col2 AS INTEGER) AS col2 FROM dfs.tmp.`numbers.parquet`;
    ```
    (1) For query with LIMIT 0 clause, since the file/ metadata is not read, 
Drill assumes the nullability of both columns is 
[`columnNullable`](https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html#columnNullable).
    ```
    SELECT col1, col2 FROM dfs.tmp.mynumbers LIMIT 0;
    ```
    
    (2) For query without LIMIT 0 clause, since the file is read, Drill knows 
the nullability of `col1` is 
[`columnNoNulls`](https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html#columnNoNulls),
 and `col2` is 
[`columnNullable`](https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html#columnNullable).
    ```
    SELECT col1, col2 FROM dfs.tmp.mynumbers LIMIT 1;
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request: DRILL-3623: For limit 0 queries, use a shorter...

Reply via email to