hengfeiyang opened a new issue, #6173:
URL: https://github.com/apache/arrow-datafusion/issues/6173
### Describe the bug
With SQL:
```sql
SELECT id, count(distinct name) as cnt FROM t GROUP BY id ORDER BY cnt DESC
```
will got result:
```
+------+-----+
| t.id | cnt |
+------+-----+
| 8 | 1 |
| 4 | 1 |
| 2 | 1 |
+------+-----+
```
I expect the fields name to be `id` and `cnt`, but it give me `t.id`.
With this SQL no problem:
```sql
SELECT id as id, count(distinct name) as cnt FROM t GROUP BY id ORDER BY cnt
DESC
```
will got result:
```
+----+-----+
| id | cnt |
+----+-----+
| 8 | 1 |
| 2 | 1 |
| 4 | 1 |
+----+-----+
```
So, I am not sure if this is a bug, I want to got the field name `id`, I
must select it like this `id as id`.
### To Reproduce
the test code is:
```Rust
use datafusion::arrow::{
array::Int64Array, array::StringArray, datatypes::DataType,
record_batch::RecordBatch,
};
use datafusion::error::Result;
use datafusion::from_slice::FromSlice;
use datafusion::prelude::SessionContext;
use std::sync::Arc;
// create local session context with an in-memory table
fn create_context() -> Result<SessionContext> {
use datafusion::arrow::datatypes::{Field, Schema};
use datafusion::datasource::MemTable;
// define a schema.
let schema = Arc::new(Schema::new(vec![
Field::new("id", DataType::Int64, false),
Field::new("name", DataType::Utf8, false),
]));
// define data in two partitions
let batch1 = RecordBatch::try_new(
schema.clone(),
vec![
Arc::new(Int64Array::from_slice([2, 4, 8])),
Arc::new(StringArray::from_slice(["alex", "smish", "alex"])),
],
)?;
// declare a new context. In spark API, this corresponds to a new spark
SQLsession
let ctx = SessionContext::new();
// declare a table in memory. In spark API, this corresponds to
createDataFrame(...).
let provider = MemTable::try_new(schema, vec![vec![batch1]])?;
ctx.register_table("t", Arc::new(provider))?;
Ok(ctx)
}
#[tokio::main]
async fn main() -> Result<()> {
let ctx = create_context()?;
let sql = "SELECT id, count(distinct name) as cnt FROM t GROUP BY id
ORDER BY cnt DESC";
let df = ctx.sql(&sql).await?;
df.show().await?;
let sql = "SELECT id as id, count(distinct name) as cnt FROM t GROUP BY
id ORDER BY cnt DESC";
let df = ctx.sql(&sql).await?;
df.show().await?;
Ok(())
}
```
### Expected behavior
With SQL:
```sql
SELECT id, count(distinct name) as cnt FROM t GROUP BY id ORDER BY cnt DESC
```
Should get result:
```
+------+-----+
| id | cnt |
+------+-----+
| 8 | 1 |
| 4 | 1 |
| 2 | 1 |
+------+-----+
```
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]