[jira] [Updated] (ARROW-7480) [Rust] [DataFusion] Query fails/incorrect when aggregated + grouped columns don't match the selected columns

Kyle McCarthy (Jira) Mon, 30 Dec 2019 14:43:24 -0800


     [ 
https://issues.apache.org/jira/browse/ARROW-7480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Kyle McCarthy updated ARROW-7480:
---------------------------------
    Description: 
There are two scenarios that cause problems but are related to the queries with 
aggregate expressions and the SQL planner.

 

*Scenario 1*

Columns are grouped by but not selected.
{noformat}
ArrowError(InvalidArgumentError("number of columns must match number of fields 
in schema")){noformat}
You can reproduce with the aggregate_test_100 dataset with the query:
{code:java}
SELECT c1, MIN(c12) FROM aggregate_test_100 GROUP BY c1, c13{code}
 

*Senario 2*

Columns are selected, but not grouped or part of an aggregate function. This 
query actually will run, but the wrong schema is produced.

Query: 
{code:java}
SELECT c1, c13, MIN(c12) FROM aggregate_test_100 GROUP BY c1{code}
Schema generated:
{code:java}
Schema {
    fields: [
        Field {
            name: "c0",
            data_type: Utf8,
            nullable: true,
        },
        Field {
            name: "c1",
            data_type: Float64,
            nullable: true,
        },
        Field {
            name: "c1",
            data_type: Float64,
            nullable: true,
        },
    ],
    metadata: {},
} {code}
This should actually be Field(c0, Utf8), Field(c13, Utf8), Field(MAX, Float64).

 
----
Schema 2 is questionable since some DBMS will run the query (ex MySQL) but 
others will require that all the columns must be in the GROUP BY to be used in 
an aggregate function.

  was:
**There are two scenarios that cause problems but are related to the queries 
with aggregate expressions and the SQL planner.

*Scenario 1*

Columns are grouped by but not selected.

 
{noformat}
ArrowError(InvalidArgumentError("number of columns must match number of fields 
in schema")){noformat}
 

You can reproduce with the aggregate_test_100 dataset with the query:

 
{code:java}
SELECT c1, MIN(c12) FROM aggregate_test_100 GROUP BY c1, c13{code}
 

*Senario 2*

 

Columns are selected, but not grouped or part of an aggregate function. This 
query actually will run, but the wrong schema is produced.

Query 

 
{code:java}
SELECT c1, c13, MIN(c12) FROM aggregate_test_100 GROUP BY c1{code}
Schema generated:

 
{code:java}
Schema {
    fields: [
        Field {
            name: "c0",
            data_type: Utf8,
            nullable: true,
        },
        Field {
            name: "c1",
            data_type: Float64,
            nullable: true,
        },
        Field {
            name: "c1",
            data_type: Float64,
            nullable: true,
        },
    ],
    metadata: {},
} {code}
----
Schema 2 is questionable since some DBMS will run the query (ex MySQL) but 
others will require that all the columns must be in the GROUP BY to be used in 
an aggregate function.


> [Rust] [DataFusion] Query fails/incorrect when aggregated + grouped columns 
> don't match the selected columns
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-7480
>                 URL: https://issues.apache.org/jira/browse/ARROW-7480
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Rust, Rust - DataFusion
>            Reporter: Kyle McCarthy
>            Priority: Major
>
> There are two scenarios that cause problems but are related to the queries 
> with aggregate expressions and the SQL planner.
>  
> *Scenario 1*
> Columns are grouped by but not selected.
> {noformat}
> ArrowError(InvalidArgumentError("number of columns must match number of 
> fields in schema")){noformat}
> You can reproduce with the aggregate_test_100 dataset with the query:
> {code:java}
> SELECT c1, MIN(c12) FROM aggregate_test_100 GROUP BY c1, c13{code}
>  
> *Senario 2*
> Columns are selected, but not grouped or part of an aggregate function. This 
> query actually will run, but the wrong schema is produced.
> Query: 
> {code:java}
> SELECT c1, c13, MIN(c12) FROM aggregate_test_100 GROUP BY c1{code}
> Schema generated:
> {code:java}
> Schema {
>     fields: [
>         Field {
>             name: "c0",
>             data_type: Utf8,
>             nullable: true,
>         },
>         Field {
>             name: "c1",
>             data_type: Float64,
>             nullable: true,
>         },
>         Field {
>             name: "c1",
>             data_type: Float64,
>             nullable: true,
>         },
>     ],
>     metadata: {},
> } {code}
> This should actually be Field(c0, Utf8), Field(c13, Utf8), Field(MAX, 
> Float64).
>  
> ----
> Schema 2 is questionable since some DBMS will run the query (ex MySQL) but 
> others will require that all the columns must be in the GROUP BY to be used 
> in an aggregate function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-7480) [Rust] [DataFusion] Query fails/incorrect when aggregated + grouped columns don't match the selected columns

Reply via email to