progval opened a new issue, #609:
URL: https://github.com/apache/arrow-datafusion-python/issues/609

   **Describe the bug**
   ORDER BY is ignored when COPYing from a pyarrow table to a csv file
   
   This happens both for tables created with `pyarrow.Table.from_pydict` and 
from ORC files.
   
   **To Reproduce**
   
   ```py
   from pathlib import Path
   
   import datafusion
   import pyarrow.csv
   import pyarrow.dataset
   
   config = datafusion.SessionConfig()
   config.set("datafusion.execution.minimum_parallel_output_files", "16")
   ctx = datafusion.SessionContext(config=config)
   ctx.from_arrow_table(pyarrow.Table.from_pydict({'value': [2, 1, 3]}), 
"content")
   
   output_path = Path("/tmp/output.csv")
   
   query = f"""
       COPY (SELECT value FROM content ORDER BY value DESC)
       TO '{output_path}' (
           FORMAT CSV,
       )
   """
   df = ctx.sql(query)
   
   columns = df.schema().names
   assert columns == ["value"], columns
   
   df.count()  # force the query to run
   
   print(output_path.read_text())
   ```
   
   ```
   $ python3 /tmp/order_arrow_table.py
   value
   2
   1
   3
   ```
   
   **Expected behavior**
   Should print
   
   ```
   value
   2
   1
   3
   ```
   
   **Additional context**
   
   I tried to reproduce it directly in Rust, but this code does produce a 
sorted output as expected:
   
   ```rust
   use std::sync::Arc;
   use datafusion::arrow::array::PrimitiveArray;
   use datafusion::arrow::datatypes::{DataType, Field, Schema, Int64Type};
   use datafusion::arrow::record_batch::RecordBatch;
   use datafusion::prelude::*;
   use datafusion::datasource::MemTable;
   
   #[tokio::main]
   async fn main() {
       let ctx = SessionContext::new();
   
       let schema = Arc::new(Schema::new(vec![Field::new("value", 
DataType::Int64, false)]));
       let column: PrimitiveArray<Int64Type> = vec![2, 1, 3].into();
       let partition = RecordBatch::try_new(schema.clone(), 
vec![Arc::new(column)]).unwrap();
       let table = MemTable::try_new(schema, vec![vec![partition]]).unwrap();
       ctx.register_table("content", Arc::new(table)).unwrap();
   
       let df = ctx.sql("COPY (SELECT value FROM content ORDER BY value) TO 
'/tmp/output.csv'").await.unwrap();
       df.collect().await.unwrap();
   }
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to