ahmedriza commented on issue #3617:
URL: 
https://github.com/apache/arrow-datafusion/issues/3617#issuecomment-1432204767

   @alamb Apologies, I should have been more clear.  The Parquet file mentioned 
was in https://github.com/apache/arrow-datafusion/issues/2439. Attaching here 
as well. 
   
   The above mentioned error was when I ran the SQL from `ballista` and I've 
checked that `ballista` on the master branch is currently using `datafusion` 
version `18.0.0`.  
   
   Hence, I just wrote two little tests, using the `datafusion` and the 
`ballista` context respectively. 
   
   SQL from the `datafusion` context works, whilst the one that uses the 
`ballista` context fails.   Test code:
   ```rust
   #[tokio::test]
   async fn test_datafusion_sql() {
       let ctx = SessionContext::new();
       let filename = 
"part-00000-f6337bce-7fcd-4021-9f9d-040413ea83f8-c000.snappy.parquet";
       ctx.register_parquet("t", filename, 
ParquetReadOptions::default()).await.unwrap();
       let df = ctx.sql("select t.text['status'] from t").await.unwrap();
       df.show().await.unwrap();
   }
   ```
   Output:
   ```
   +----------------+
   | t.text[status] |
   +----------------+
   |                |
   | generated      |
   | generated      |
   | generated      |
   | generated      |
   | generated      |
   | generated      |
   | generated      |
   | generated      |
   | generated      |
   +----------------+
   ```
   ```rust
   #[tokio::test]
   async fn test_ballista_sql() {
       let config = BallistaConfig::builder().build().unwrap();
       let ctx = BallistaContext::standalone(&config, 10).await.unwrap();
       let filename = 
"part-00000-f6337bce-7fcd-4021-9f9d-040413ea83f8-c000.snappy.parquet";
       ctx.register_parquet("t", filename, 
ParquetReadOptions::default()).await.unwrap();
       let df = ctx.sql("select t.text['status'] from t").await.unwrap();
       df.show().await.unwrap();
   }
   ```
   Output:
   ```
   thread 'query::test::test_ballista_sql' panicked at 'called 
`Result::unwrap()` on an `Err` value: ArrowError(ExternalError(Execution("Job 
QeRwZCh failed: Error planning job QeRwZCh: 
DataFusionError(Internal(\"physical_plan::to_proto() unsupported expression 
GetIndexedFieldExpr { arg: Column { name: \\\"text\\\", index: 0 }, key: 
Utf8(\\\"status\\\") }\"))")))', src/query.rs:44:25
   ```
   I am a bit surprised by the failure from the `ballista` version. 
   
   I've checked my `Cargo.toml` and the `cargo tree` output as well to double 
check that there really is just `datafusion` version `18.0.0` that's being used.
   
   ```
   ballista = { git = "https://github.com/apache/arrow-ballista";, features = 
["s3"] }
   ballista-cli = { git = "https://github.com/apache/arrow-ballista";, features 
= ["s3"] }
   ballista-core = { git = "https://github.com/apache/arrow-ballista";, features 
= ["s3"] }
   datafusion = "18.0.0"
   
   futures = "0.3"
   object_store = "0.5"
   tokio = { version = "1", features = ["full"] }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to