[
https://issues.apache.org/jira/browse/DRILL-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Victoria Markman updated DRILL-3961:
------------------------------------
Priority: Critical (was: Major)
> CONVERT_FROM should return INTEGER when converting from TINYINT/SMALLINT types
> ------------------------------------------------------------------------------
>
> Key: DRILL-3961
> URL: https://issues.apache.org/jira/browse/DRILL-3961
> Project: Apache Drill
> Issue Type: Bug
> Components: Functions - Drill
> Affects Versions: 1.1.0, 1.2.0
> Reporter: Victoria Markman
> Priority: Critical
>
> We've disabled SMALLINT, TINYINT SQL types because we had bugs that we could
> not fix before 1.0.0 Some things were left in sort of half implemented state
> and we should either fix up these things or re-implement TINYINT/SMALLINT
> types.
> * We can't cast to TINYINT/SMALLINT:
> {code}
> 0: jdbc:drill:schema=dfs> select cast(1 as tinyint) from sys.options limit 1;
> Error: UNSUPPORTED_OPERATION ERROR: TINYINT is not supported
> See Apache Drill JIRA: DRILL-1959
> [Error Id: c86bb4f5-4ea9-47cc-856e-ce1e96463477 on atsqa4-133.qa.lab:31010]
> (state=,code=0)
> 0: jdbc:drill:schema=dfs> select cast(1 as smallint) from sys.options limit 1;
> Error: UNSUPPORTED_OPERATION ERROR: SMALLINT is not supported
> See Apache Drill JIRA: DRILL-1959
> [Error Id: 14cc008a-532b-4f02-a9ad-6a54cf5dc6bc on atsqa4-133.qa.lab:31010]
> (state=,code=0)
> {code}
> * We hacked our implementation to read TINTINT/SMALLINT in hive as INTEGER:
> https://issues.apache.org/jira/browse/DRILL-3263
> Which works. Now, if you try querying hive table with TINYINT/SMALLINT
> attribute, you get INTEGER in Drill.
> Hive table, notice c9/c10 columns are TINYINT/SMALLINT respectively:
> {code}
> hive> describe alltypes;
> OK
> c1 int
> c2 boolean
> c3 double
> c4 string
> c5 array<int>
> c6 map<int,string>
> c7 map<string,string>
> c8 struct<r:string,s:int,t:double>
> c9 tinyint
> c10 smallint
> c11 float
> c12 bigint
> c13 array<array<string>>
> c14 map<int,map<int,int>>
> c15 struct<r:int,s:struct<a:int,b:string>>
> c16 array<struct<m:map<string,string>,n:int>>
> c17 timestamp
> c18 decimal(10,0)
> c19 binary
> c20 date
> Time taken: 0.094 seconds, Fetched: 20 row(s)
> {code}
> Same table accessed from Drill:
> {code}
> 0: jdbc:drill:schema=dfs> use hive;
> +-------+-----------------------------------+
> | ok | summary |
> +-------+-----------------------------------+
> | true | Default schema changed to [hive] |
> +-------+-----------------------------------+
> 1 row selected (0.337 seconds)
> 0: jdbc:drill:schema=dfs> describe alltypes;
> +--------------+--------------------+--------------+
> | COLUMN_NAME | DATA_TYPE | IS_NULLABLE |
> +--------------+--------------------+--------------+
> | c1 | INTEGER | YES |
> | c2 | BOOLEAN | YES |
> | c3 | DOUBLE | YES |
> | c4 | CHARACTER VARYING | YES |
> | c5 | ARRAY | NO |
> | c6 | MAP | NO |
> | c7 | MAP | NO |
> | c8 | ROW | NO |
> | c9 | INTEGER | YES |
> | c10 | INTEGER | YES |
> | c11 | FLOAT | YES |
> | c12 | BIGINT | YES |
> | c13 | ARRAY | NO |
> | c14 | MAP | NO |
> | c15 | ROW | NO |
> | c16 | ARRAY | NO |
> | c17 | TIMESTAMP | YES |
> | c18 | DECIMAL | YES |
> | c19 | BINARY VARYING | YES |
> | c20 | DATE | YES |
> +--------------+--------------------+--------------+
> 20 rows selected (1.379 seconds)
> {code}
> * However, if you try reading parquet file with TINYINT/SMALLINT types
> (generated by impala for instance), CONVERT_FROM function still returns
> SMALLINT/TINYINT types.
> I can successfully read SMALLINT field from impala generated parquet file:
> {code}
> 0: jdbc:drill:schema=dfs> select asset_id from `impala.parquet` limit 1;
> +--------------+
> | asset_id |
> +--------------+
> | [B@22a3aec1 |
> +--------------+
> 1 row selected (0.298 seconds)
> {code}
> Using CONVERT_FROM function:
> {code}
> 0: jdbc:drill:schema=dfs> select CONVERT_FROM(asset_id, 'SMALLINT') from
> `impala.parquet` limit 1;
> +---------+
> | EXPR$0 |
> +---------+
> | 19535 |
> +---------+
> 1 row selected (0.283 seconds)
> {code}
> Use of this function in an arithmetic expression results in an error:
> {code}
> 0: jdbc:drill:schema=dfs> select CONVERT_FROM(asset_id, 'SMALLINT') + 1 from
> `impala.parquet` limit 1;
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to
> materialize incoming schema. Errors:
> Error in expression at index -1. Error: Missing function implementation:
> [castINT(SMALLINT-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--..
> Fragment 0:0
> [Error Id: b5e063ee-85e0-48f6-9ca2-09dd9d6a84e6 on atsqa4-133.qa.lab:31010]
> (state=,code=0)
> {code}
> Attempt to cast to integer fails for the same reason:
> {code}
> 0: jdbc:drill:schema=dfs> select CAST(CONVERT_FROM(asset_id, 'SMALLINT') as
> integer) from `impala.parquet` limit 1;
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to
> materialize incoming schema. Errors:
> Error in expression at index -1. Error: Missing function implementation:
> [castBIGINT(SMALLINT-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--..
> Fragment 0:0
> [Error Id: 70820c35-f5fe-4229-8897-cafe0db5bdef on atsqa4-133.qa.lab:31010]
> (state=,code=0)
> {code}
> Trying to 'cheat it' and read as INT:
> {code}
> 0: jdbc:drill:schema=dfs> select CONVERT_FROM(asset_id, 'INT') from
> `impala.parquet` limit 1;
> Error: SYSTEM ERROR: IllegalArgumentException: Wrong length 2(2-0) in the
> buffer 'OL', expected 4.
> Fragment 0:0
> [Error Id: 47e0a0d7-fc40-4f3f-8d68-4bed63ac0a60 on atsqa4-133.qa.lab:31010]
> (state=,code=0)
> {code}
> The only thing I can do in order to make use of this column in a query is to
> write it to a table:
> {code}
> 0: jdbc:drill:schema=dfs> create table x(c1) as select CONVERT_FROM(asset_id,
> 'SMALLINT') from `impala.parquet` limit 1;
> +-----------+----------------------------+
> | Fragment | Number of records written |
> +-----------+----------------------------+
> | 0_0 | 1 |
> +-----------+----------------------------+
> 1 row selected (0.568 seconds)
> [Tue Oct 20 15:57:14] # ~/parquet-tools/parquet-schema 0_0_0.parquet
> message root {
> optional int32 c1;
> }
> {code}
> * Probably the same with Hive UDFs
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)