sergiimk opened a new issue #959:
URL: https://github.com/apache/arrow-datafusion/issues/959


   **Describe the bug**
   When trying to query a Parquet file produced by Apache Flink I get an error:
   
   ```bash
   ArrowError(InvalidArgumentError("column types must match schema types, 
expected Timestamp(Millisecond, Some(\"UTC\")) but found Timestamp(Millisecond, 
None) at column index 0"))
   ```
   
   Output of Java `parquet-schema`:
   ```sql
   message Row {
     optional int64 system_time (TIMESTAMP(MILLIS,true));
     optional int64 reported_date (TIMESTAMP(MILLIS,true));
     optional binary province (STRING);
     optional int64 total_daily;
   }
   ```
   
   **To Reproduce**
   Download and extract the sample data: 
[data.tar.gz](https://github.com/apache/arrow-datafusion/files/7079931/data.tar.gz).
   
   Run:
   ```rust
   use datafusion::arrow::util::pretty::print_batches;
   use datafusion::prelude::*;
   
   #[tokio::main]
   async fn main() -> datafusion::error::Result<()> {
       let mut ctx = ExecutionContext::new();
       ctx.register_parquet("test", "flink.parquet")?;
       let df = ctx.table("test")?;
   
       //let df = ctx.sql("select * from test")?;
       let df = ctx.sql("select * from test order by reported_date desc")?;
   
       let records = df.collect().await?;
       print_batches(&records)?;
       Ok(())
   }
   ```
   
   Note that simple select works fine, but `ORDER BY` fails.
   
   **Expected behavior**
   Query executes without errors.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to