benj created DRILL-7104: --------------------------- Summary: Change of data type when parquet with multiple fragment Key: DRILL-7104 URL: https://issues.apache.org/jira/browse/DRILL-7104 Project: Apache Drill Issue Type: Bug Components: Storage - Parquet Affects Versions: 1.15.0 Reporter: benj
When creating a Parquet with a column filled only with "CAST(NULL AS VARCHAR)", if the parquet has several fragment, the type is read like INT instead of VARCHAR. First, create +Parquet with only one fragment+ - all is fine (the type of "demo" is correct). {code:java} CREATE TABLE ....`bug` AS (SELECT CAST(NULL AS VARCHAR) AS demo , md5(cast(rand() AS VARCHAR) AS jam FROM ....`onebigfile` LIMIT 1000000)); +-----------+----------------------------+ | Fragment | Number of records written | +-----------+----------------------------+ | 0_0 | 10000000 | SELECT drilltypeof(demo) AS goodtype FROM ....`bug` LIMIT 1; +--------------------+ | goodtype | +--------------------+ | VARCHAR | {code} Second, create +Parquet with at least 2 fragments+ - the type of "demo" change to INT {code:java} CREATE TABLE ....`bug` AS ((SELECT CAST(NULL AS VARCHAR) AS demo ,md5(CAST(rand() AS VARCHAR)) AS jam FROM ....`onebigfile` LIMIT 1000000) UNION (SELECT CAST(NULL AS VARCHAR) AS demo ,md5(CAST(rand() AS VARCHAR)) AS jam FROM ....`onebigfile` LIMIT 1000000)); +-----------+----------------------------+ | Fragment | Number of records written | +-----------+----------------------------+ | 1_1 | 1000276 | | 1_0 | 999724 | SELECT drilltypeof(demo) AS badtype FROM ....`bug` LIMIT 1; +--------------------+ | badtype | +--------------------+ | INT |{code} The change of type is really terrible... -- This message was sent by Atlassian JIRA (v7.6.3#76005)