[
https://issues.apache.org/jira/browse/DRILL-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615469#comment-14615469
]
Sean Hsuan-Yi Chu commented on DRILL-2332:
------------------------------------------
Although both parquet and view are created with "type information", when you
are trying to query from
1. view: data type is exposed to our planner/parser
2. parquet: data type is NOT exposed to our planner/parser
In general, no matter how you created the parquet files, the data types for all
the columns are treated as ANY type {color:red} at planning {color}. And
implicit casting will kick-in at the later execution.
However, in the case of querying views, the data types have been exposed to
planner/parser. And if there are non-compatible types, Drill-Calcite will block
it at planning phase.
By and large, when schema is available, drill tries to converge to schema-based
DB.
> Drill should be consistent with Implicit casting rules across data formats
> --------------------------------------------------------------------------
>
> Key: DRILL-2332
> URL: https://issues.apache.org/jira/browse/DRILL-2332
> Project: Apache Drill
> Issue Type: Improvement
> Components: Query Planning & Optimization
> Reporter: Abhishek Girish
> Assignee: Sean Hsuan-Yi Chu
> Fix For: 1.2.0
>
>
> Currently, the outcome of a query with a filter on a column comparing it with
> a literal, depends on the underlying data format.
> *Parquet*
> {code:sql}
> select * from date_dim where d_month_seq ='1193' limit 1;
> [Succeeds]
> select * from date_dim where d_date in ('1999-06-30') limit 1;
> [Succeeds]
> {code}
> *View on top of text:*
> {code:sql}
> select * from date_dim where d_date in ('1999-06-30') limit 1;
> Query failed: SqlValidatorException: Values passed to IN operator must have
> compatible types
> Error: exception while executing query: Failure while executing query.
> (state=,code=0)
> select * from date_dim where d_month_seq ='1193' limit 1;
> Query failed: SqlValidatorException: Cannot apply '=' to arguments of type
> '<INTEGER> = <CHAR(4)>'. Supported form(s): '<COMPARABLE_TYPE> =
> <COMPARABLE_TYPE>'
> Error: exception while executing query: Failure while executing query.
> (state=,code=0)
> {code}
> I understand that in the case of View on Text, SQL validation fails at the
> Optiq layer.
> But from the perspective of an end-user, Drill's behavior must be consistent
> across data formats. Also having a view by definition should abstract out
> this information.
> Here, both the view and parquet were created with type information.
> *Parquet-meta*
> {code}
> parquet-schema /mapr/abhi311/data/parquet/tpcds/scale1/date_dim/0_0_0.parquet
> message root {
> optional int32 d_date_sk;
> optional binary d_date_id (UTF8);
> optional binary d_date (UTF8);
> optional int32 d_month_seq;
> optional int32 d_week_seq;
> optional int32 d_quarter_seq;
> optional int32 d_year;
> optional int32 d_dow;
> optional int32 d_moy;
> optional int32 d_dom;
> optional int32 d_qoy;
> optional int32 d_fy_year;
> optional int32 d_fy_quarter_seq;
> optional int32 s_fy_week_seq;
> optional binary d_day_name (UTF8);
> optional binary d_quarter_name (UTF8);
> optional binary d_holiday (UTF8);
> optional binary d_weekend (UTF8);
> optional binary d_following_holiday (UTF8);
> optional int32 d_first_dom;
> optional int32 d_last_dom;
> optional int32 d_same_day_ly;
> optional int32 d_same_day_lq;
> optional binary d_current_day (UTF8);
> optional binary d_current_week (UTF8);
> optional binary d_current_month (UTF8);
> optional binary d_current_quarter (UTF8);
> optional binary d_current_year (UTF8);
> }
> {code}
> *Describe View*
> {code:sql}
> > describe date_dim;
> +-------------+------------+-------------+
> | COLUMN_NAME | DATA_TYPE | IS_NULLABLE |
> +-------------+------------+-------------+
> | d_date_sk | INTEGER | NO |
> | d_date_id | VARCHAR | NO |
> | d_date | DATE | NO |
> | d_month_seq | INTEGER | NO |
> | d_week_seq | INTEGER | NO |
> | d_quarter_seq | INTEGER | NO |
> | d_year | INTEGER | NO |
> | d_dow | INTEGER | NO |
> | d_moy | INTEGER | NO |
> | d_dom | INTEGER | NO |
> | d_qoy | INTEGER | NO |
> | d_fy_year | INTEGER | NO |
> | d_fy_quarter_seq | INTEGER | NO |
> | s_fy_week_seq | INTEGER | NO |
> | d_day_name | VARCHAR | NO |
> | d_quarter_name | VARCHAR | NO |
> | d_holiday | VARCHAR | NO |
> | d_weekend | VARCHAR | NO |
> | d_following_holiday | VARCHAR | NO |
> | d_first_dom | INTEGER | NO |
> | d_last_dom | INTEGER | NO |
> | d_same_day_ly | INTEGER | NO |
> | d_same_day_lq | INTEGER | NO |
> | d_current_day | VARCHAR | NO |
> | d_current_week | VARCHAR | NO |
> | d_current_month | VARCHAR | NO |
> | d_current_quarter | VARCHAR | NO |
> | d_current_year | VARCHAR | NO |
> +-------------+------------+-------------+
> 28 rows selected (0.137 seconds)
> {code}
> For an end
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)