notfilippo opened a new pull request, #11978:
URL: https://github.com/apache/datafusion/pull/11978

   This PR removes LargeUtf8|Binary, Utf8|BinaryView, and Dictionary from 
ScalarValue, following the discussion on #11513 
   
   ---
   
   ## Open questions
   
   ### Top level ScalarValue cast
   
   This change currently fails this test (and potentially others) and before 
proceeding I would like to get some feedback on how to solve this issue. The 
test fails because of an interaction of `expression_simplifier` + 
`optimize_projections`. 
   
   The query is: `SELECT arrow_cast(1234, 'Float64') as f64, arrow_cast('foo', 
'LargeUtf8') as large`
   
   1. expression_simplifier correctly tries to evaluate_to_scalar the 
`arrow_cast('foo', 'LargeUtf8')` cast 
([here](https://github.com/apache/datafusion/blob/main/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs#L626-L626))
   2. this leads to the cast physical expression 
([here](https://github.com/apache/datafusion/blob/main/datafusion/expr-common/src/columnar_value.rs#L187-L187)),
 but this is a no-op since it transforms `Utf8('foo')` (scalar) → 
`LargeUtf8('foo')` (array) → `Utf8('foo')` (scalar)
        - TODO: if this is what we want this should probably not execute at all
   3. This edit doesn't change the underlying schema of the LogicalPlan
   4. `optimize_projections` rewrites the Projection and updates the schema, 
seeing the `Utf8('foo')` it correctly assumes that the LogicalPlan's schema 
field for this value should have DataType == Utf8
   
   [This 
check](https://github.com/apache/datafusion/blob/main/datafusion/optimizer/src/optimizer.rs#L471-L471)
 is the one raising this error but I guess it should instead check if schema 
fields are _logically_ equivalent to eachother. I'm not totally convinced this 
is the correct solution because it removes some guarantees that might be 
expected by users downstream. Happy to hear everyone's opinion on this.
   
   ```
   ---- optimizer::select_arrow_cast stdout ----
   thread 'optimizer::select_arrow_cast' panicked at 
datafusion/core/tests/optimizer/mod.rs:118:30:
   called `Result::unwrap()` on an `Err` value: Context("Optimizer rule 
'optimize_projections' failed", Context("optimize_projections", 
Internal("Failed due to a difference in schemas, original schema: DFSchema { 
inner: Schema { fields: [Field { name: \"f64\", data_type: Float64, nullable: 
true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: 
\"large\", data_type: LargeUtf8, nullable: true, dict_id: 0, dict_is_ordered: 
false, metadata: {} }], metadata: {} }, field_qualifiers: [None, None], 
functional_dependencies: FunctionalDependencies { deps: [] } }, new schema: 
DFSchema { inner: Schema { fields: [Field { name: \"f64\", data_type: Float64, 
nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { 
name: \"large\", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: 
false, metadata: {} }], metadata: {} }, field_qualifiers: [None, None], 
functional_dependencies: FunctionalDependencies { deps: [] } }")))
   note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to