[jira] [Commented] (DRILL-7090) Improve management of Optional(Nullable) / Required(Not nullable) type at least for parquet storage

benj (JIRA) Mon, 11 Mar 2019 07:25:18 -0700


    [ 
https://issues.apache.org/jira/browse/DRILL-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16789628#comment-16789628
 ]


benj commented on DRILL-7090:
-----------------------------

In the same way but without any Parquet
{code:java}
SELECT t1, count(*) FROM (SELECT modeof(t1), t1 FROM (
(SELECT '' AS t1 FROM ....`directories` LIMIT 1)
UNION ALL
(SELECT dir0 t1  FROM ....`directories` LIMIT 1)
)) GROUP BY t1;
=>
Error: SYSTEM ERROR: IllegalArgumentException: Input batch and output batch 
have different field counthas!{code}
And in a surprising way again, here it's works with COALESCE
{code:java}
SELECT t1, count(*) FROM (SELECT modeof(t1), t1 FROM (
(SELECT '' AS t1             FROM ....`directories` LIMIT 1)
UNION ALL
(SELECT COALESCE(dir0,'') t1 FROM ....`directories` LIMIT 1)
)) GROUP BY t1;
=>
     t1      | EXPR$1  |
+------------+---------+
|            | 1       |
| Indian     | 1       |
{code}
Although the mode of the different rows are already different
{code:java}
SELECT t1, modeof(t1) FROM (
(SELECT '' t1                FROM ....`directories` LIMIT 1)
UNION ALL
(SELECT COALESCE(dir0,'') t1 FROM ....`directories` LIMIT 1));
=>
|     t1     |  EXPR$1   |
+------------+-----------+
|            | NOT NULL  |
| Indian     | NULLABLE  |
{code}
 

 

 

> Improve management of Optional(Nullable) / Required(Not nullable) type at 
> least for parquet storage
> ---------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-7090
>                 URL: https://issues.apache.org/jira/browse/DRILL-7090
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Parquet
>    Affects Versions: 1.15.0
>            Reporter: benj
>            Priority: Major
>
> It will be useful to have the ability to precise/define/cast the "mode" of 
> columns for Parquet storage.
> Example of problem without this possibility : several files are created by 
> different methods/process. all the files have the same columns. When 
> requested all the file and group on a column
> {code:java}
> SELECT source, count(*) FROM ....`ALL` GROUP BY source;
> =>
> java.sql.SQLException: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not 
> support schema change 
> Prior schema : BatchSchema [fields=[[`source` (VARCHAR:REQUIRED)]], 
> selectionVector=NONE] 
> New schema : BatchSchema [fields=[[`source` (VARCHAR:OPTIONAL)]], 
> selectionVector=NONE]
> {code}
> Because source has different way of generation (example : use of a const, use 
> of dir0*).
> It will be nice to have the ability to define himself the nullable attribute 
> (required/optional) or at least the ability to cast on read the mode/type of 
> the field - it will allows a better homogeneity of the files and avoid crash 
> on simple operation like aggregation.
>  
> (*) In a surprising way,
>  * dir0 => varchar<NULLABLE>
>  * '' => varchar<NOT NULL>
>  * coalesce(dir0, '') => varchar<NULLABLE>  *???*
> User should have the ability to overrule the system choice to define if the 
> column mode is required or optional



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7090) Improve management of Optional(Nullable) / Required(Not nullable) type at least for parquet storage

Reply via email to