progval opened a new issue, #609:
URL: https://github.com/apache/arrow-datafusion-python/issues/609
**Describe the bug**
ORDER BY is ignored when COPYing from a pyarrow table to a csv file
This happens both for tables created with `pyarrow.Table.from_pydict` and
from ORC files.
**To Reproduce**
```py
from pathlib import Path
import datafusion
import pyarrow.csv
import pyarrow.dataset
config = datafusion.SessionConfig()
config.set("datafusion.execution.minimum_parallel_output_files", "16")
ctx = datafusion.SessionContext(config=config)
ctx.from_arrow_table(pyarrow.Table.from_pydict({'value': [2, 1, 3]}),
"content")
output_path = Path("/tmp/output.csv")
query = f"""
COPY (SELECT value FROM content ORDER BY value DESC)
TO '{output_path}' (
FORMAT CSV,
)
"""
df = ctx.sql(query)
columns = df.schema().names
assert columns == ["value"], columns
df.count() # force the query to run
print(output_path.read_text())
```
```
$ python3 /tmp/order_arrow_table.py
value
2
1
3
```
**Expected behavior**
Should print
```
value
2
1
3
```
**Additional context**
I tried to reproduce it directly in Rust, but this code does produce a
sorted output as expected:
```rust
use std::sync::Arc;
use datafusion::arrow::array::PrimitiveArray;
use datafusion::arrow::datatypes::{DataType, Field, Schema, Int64Type};
use datafusion::arrow::record_batch::RecordBatch;
use datafusion::prelude::*;
use datafusion::datasource::MemTable;
#[tokio::main]
async fn main() {
let ctx = SessionContext::new();
let schema = Arc::new(Schema::new(vec![Field::new("value",
DataType::Int64, false)]));
let column: PrimitiveArray<Int64Type> = vec![2, 1, 3].into();
let partition = RecordBatch::try_new(schema.clone(),
vec![Arc::new(column)]).unwrap();
let table = MemTable::try_new(schema, vec![vec![partition]]).unwrap();
ctx.register_table("content", Arc::new(table)).unwrap();
let df = ctx.sql("COPY (SELECT value FROM content ORDER BY value) TO
'/tmp/output.csv'").await.unwrap();
df.collect().await.unwrap();
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]