[ 
https://issues.apache.org/jira/browse/DRILL-7379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

benj updated DRILL-7379:
------------------------
    Description: 
sample file: [^example.parquet]

With data as:
{code:sql}
SELECT id, tags FROM ....`example_parquet`;
+--------+------------------------------------+
|   id   |                tags                |
+--------+------------------------------------+
| 7b8808 | ["peexe","signed","overlay"]       |
| 55a4ae | ["peexe","signed","upx","overlay"] |
+--------+------------------------------------+
{code}
The next request is OK
{code:sql}
SELECT id, flatten(tags) tag 
FROM ( 
  SELECT id, any_value(tags) tags 
  FROM ....`example_parquet` 
  GROUP BY id 
) LIMIT 2;
+--------+--------+
|   id   |  tag   |
+--------+--------+
| 55a4ae | peexe  |
| 55a4ae | signed |
+--------+--------+
{code}
But unexpectedly, the next query failed:
{code:sql}
SELECT tag, count(*) 
FROM (
  SELECT flatten(tags) tag 
  FROM (
    SELECT id, any_value(tags) tags 
    FROM ....`example_parquet`
    GROUP BY id 
  )
) GROUP BY tag;
Error: SYSTEM ERROR: UnsupportedOperationException: Map, Array, Union or 
repeated scalar type should not be used in group by, order by or in a 
comparison operator. Drill does not support compare between MAP:REPEATED and 
MAP:REPEATED.

/* Or other error with another set of data :
Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize 
incoming schema.  Errors:
 
Error in expression at index 0.  Error: Missing function implementation: 
[hash32asdouble(MAP-REPEATED, INT-REQUIRED)].  Full expression: null..
*/
{code}
These errors are incomprehensible because, the aggregate is on VARCHAR.

More, the request works if decomposed in 2 request with with the creation of an 
intermediate table like below:
{code:sql}
CREATE TABLE ....`tmp.parquet` AS (
  SELECT id, flatten(tags) tag 
  FROM ( 
    SELECT id, any_value(tags) tags 
    FROM ....`example_parquet` 
    GROUP BY id 
));

SELECT tag, count(*) c FROM ....`tmp_parquet` GROUP BY tag;
+---------+---+
|   tag   | c |
+---------+---+
| overlay | 2 |
| peexe   | 2 |
| signed  | 2 |
| upx     | 1 |
+---------+---+
{code}

  was:
With data as:
{code:sql}
SELECT id, tags FROM ....`example_parquet`;
+--------+------------------------------------+
|   id   |                tags                |
+--------+------------------------------------+
| 7b8808 | ["peexe","signed","overlay"]       |
| 55a4ae | ["peexe","signed","upx","overlay"] |
+--------+------------------------------------+
{code}
The next request is OK
{code:sql}
SELECT id, flatten(tags) tag 
FROM ( 
  SELECT id, any_value(tags) tags 
  FROM ....`example_parquet` 
  GROUP BY id 
) LIMIT 2;
+--------+--------+
|   id   |  tag   |
+--------+--------+
| 55a4ae | peexe  |
| 55a4ae | signed |
+--------+--------+
{code}
But unexpectedly, the next query failed:
{code:sql}
SELECT tag, count(*) 
FROM (
  SELECT flatten(tags) tag 
  FROM (
    SELECT id, any_value(tags) tags 
    FROM ....`example_parquet`
    GROUP BY id 
  )
) GROUP BY tag;
Error: SYSTEM ERROR: UnsupportedOperationException: Map, Array, Union or 
repeated scalar type should not be used in group by, order by or in a 
comparison operator. Drill does not support compare between MAP:REPEATED and 
MAP:REPEATED.

/* Or other error with another set of data :
Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize 
incoming schema.  Errors:
 
Error in expression at index 0.  Error: Missing function implementation: 
[hash32asdouble(MAP-REPEATED, INT-REQUIRED)].  Full expression: null..
*/
{code}
These errors are incomprehensible because, the aggregate is on VARCHAR.

More, the request works if decomposed in 2 request with with the creation of an 
intermediate table like below:
{code:sql}
CREATE TABLE ....`tmp.parquet` AS (
  SELECT id, flatten(tags) tag 
  FROM ( 
    SELECT id, any_value(tags) tags 
    FROM ....`example_parquet` 
    GROUP BY id 
));

SELECT tag, count(*) c FROM ....`tmp_parquet` GROUP BY tag;
+---------+---+
|   tag   | c |
+---------+---+
| overlay | 2 |
| peexe   | 2 |
| signed  | 2 |
| upx     | 1 |
+---------+---+
{code}


> Planning error
> --------------
>
>                 Key: DRILL-7379
>                 URL: https://issues.apache.org/jira/browse/DRILL-7379
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>    Affects Versions: 1.16.0
>            Reporter: benj
>            Priority: Major
>         Attachments: example.parquet
>
>
> sample file: [^example.parquet]
> With data as:
> {code:sql}
> SELECT id, tags FROM ....`example_parquet`;
> +--------+------------------------------------+
> |   id   |                tags                |
> +--------+------------------------------------+
> | 7b8808 | ["peexe","signed","overlay"]       |
> | 55a4ae | ["peexe","signed","upx","overlay"] |
> +--------+------------------------------------+
> {code}
> The next request is OK
> {code:sql}
> SELECT id, flatten(tags) tag 
> FROM ( 
>   SELECT id, any_value(tags) tags 
>   FROM ....`example_parquet` 
>   GROUP BY id 
> ) LIMIT 2;
> +--------+--------+
> |   id   |  tag   |
> +--------+--------+
> | 55a4ae | peexe  |
> | 55a4ae | signed |
> +--------+--------+
> {code}
> But unexpectedly, the next query failed:
> {code:sql}
> SELECT tag, count(*) 
> FROM (
>   SELECT flatten(tags) tag 
>   FROM (
>     SELECT id, any_value(tags) tags 
>     FROM ....`example_parquet`
>     GROUP BY id 
>   )
> ) GROUP BY tag;
> Error: SYSTEM ERROR: UnsupportedOperationException: Map, Array, Union or 
> repeated scalar type should not be used in group by, order by or in a 
> comparison operator. Drill does not support compare between MAP:REPEATED and 
> MAP:REPEATED.
> /* Or other error with another set of data :
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
>  
> Error in expression at index 0.  Error: Missing function implementation: 
> [hash32asdouble(MAP-REPEATED, INT-REQUIRED)].  Full expression: null..
> */
> {code}
> These errors are incomprehensible because, the aggregate is on VARCHAR.
> More, the request works if decomposed in 2 request with with the creation of 
> an intermediate table like below:
> {code:sql}
> CREATE TABLE ....`tmp.parquet` AS (
>   SELECT id, flatten(tags) tag 
>   FROM ( 
>     SELECT id, any_value(tags) tags 
>     FROM ....`example_parquet` 
>     GROUP BY id 
> ));
> SELECT tag, count(*) c FROM ....`tmp_parquet` GROUP BY tag;
> +---------+---+
> |   tag   | c |
> +---------+---+
> | overlay | 2 |
> | peexe   | 2 |
> | signed  | 2 |
> | upx     | 1 |
> +---------+---+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to