[ https://issues.apache.org/jira/browse/DRILL-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vitalii Diravka closed DRILL-4614. ---------------------------------- Resolution: Fixed The problem is already mentioned here https://issues.apache.org/jira/browse/DRILL-3806 > Drill must appoint one data type per one column for self-describing data > while querying directories > ---------------------------------------------------------------------------------------------------- > > Key: DRILL-4614 > URL: https://issues.apache.org/jira/browse/DRILL-4614 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types > Affects Versions: 1.6.0 > Reporter: Vitalii Diravka > Assignee: Vitalii Diravka > Fix For: 1.7.0 > > Attachments: data.json > > > While drill selects data from the directory and detects data types on-the-fly > it is possible that one field will be of several data types . > For example: > 1. Create an input file as follows > 20K rows with the following - > {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}} > 200 rows with the following - > {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last > entries only"}} > 2. CTAS as follows > {code:sql} > CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t > {code} > In this case will be created parquet table as the folder with two files. > 3. Select the data > {code} > select t.others.additional from dfs.`tmp`.`tp` t > {code} > *The result of selecting will be mix of EXPR$0<INT(OPTIONAL)> and > EXPR$0<VARCHAR(OPTIONAL)>.* > It happens because Drill defines column data type per file. > The same result with json files. > Since streaming aggregate does not support schema changes this issue makes > impossible of using aggregate functions with query results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)