sergiimk opened a new issue, #8942:
URL: https://github.com/apache/arrow-datafusion/issues/8942

   ### Describe the bug
   
   When logical optimization is enabled datafusion `v34` started producing 
incorrect results.
   
   ### To Reproduce
   
   Here's the minimal repro case I found so far:
   
   ```rust
   let config = SessionConfig::new();
   let runtime = Arc::new(RuntimeEnv::default());
   let state = SessionState::new_with_config_rt(config, 
runtime).with_optimizer_rules(vec![]);
   let ctx = SessionContext::new_with_state(state);
   
   let schema = Arc::new(Schema::new(vec![Field::new("id", DataType::Int32, 
false)]));
   
   let batch =
       RecordBatch::try_new(schema, 
vec![Arc::new(array::Int32Array::from(vec![0, 1]))]).unwrap();
   
   let df = ctx.read_batch(batch).unwrap();
   df.clone().show().await.unwrap();
   
   // Add `t` column full of nulls
   let df = df
       .with_column("t", cast(Expr::Literal(ScalarValue::Null), 
DataType::Int32))
       .unwrap();
   df.clone().show().await.unwrap();
   
   let df = df
       // (case when id = 1 then 10 else t) as t
       .with_column(
           "t",
           when(col("id").eq(lit(1)), lit(10))
               .otherwise(col("t"))
               .unwrap(),
       )
       .unwrap()
       // (case when id = 1 then 10 else t) as t2
       .with_column(
           "t2",
           when(col("id").eq(lit(1)), lit(10))
               .otherwise(col("t"))
               .unwrap(),
       )
       .unwrap();
   
   df.clone().show().await.unwrap();
   ```
   
   Code above will show:
   ```
   +----+----+----+
   | id | t  | t2 |
   +----+----+----+
   | 0  |    |    |
   | 1  | 10 | 10 |
   +----+----+----+
   ```
   which is correct.
   
   Now comment out the `with_optimizer_rules(vec![])` and you will get a very 
different result:
   ```
   +----+---+----+
   | id | t | t2 |
   +----+---+----+
   | 0  |   |    |
   | 1  |   | 10 |
   +----+---+----+
   ```
   Note that despite `t` and `t2` having identical expressions, column `t` is 
now different.
   
   Perhaps the fact that `t` column is being replaced with expression that 
depends on previous value of `t` is what triggers the issue.
   
   ### Expected behavior
   
   Logical optimization does not produce incorrect results.
   
   ### Additional context
   
   This broke in `datafusion 34`, version `33` worked fine.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to