alamb opened a new issue, #5309: URL: https://github.com/apache/arrow-datafusion/issues/5309
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Related to https://github.com/apache/arrow-datafusion/issues/5157 There are many places in the code that use fallible functions on `DFSchema` to check if a column exists: https://docs.rs/datafusion-common/18.0.0/datafusion_common/struct.DFSchema.html#method.index_of https://docs.rs/datafusion-common/18.0.0/datafusion_common/struct.DFSchema.html#method.index_of_column_by_name https://docs.rs/datafusion-common/18.0.0/datafusion_common/struct.DFSchema.html#method.field_from_column For example, there is code that looks like this (call `is_ok()` or `is_err()`and totally discards the error with the string) ```rust input_schema.field_from_column(col).is_ok() ``` This is problematic because they return a DataFusionError that not only has an allocated `String` but also often has a nice error message. You can see them appearing in the trace on https://github.com/apache/arrow-datafusion/issues/5157 As part of making the optimizer faster Related to https://github.com/apache/arrow-datafusion/issues/5157 we need to avoid these string allocations, Thus I propose: 1. Add new functions for checking that return a bool rather than an error 2. Replace the use of `is_err()` with Find the field with the given qualified column For example, ```rust impl DFSchema { // existing function that returns Result pub fn field_from_column(&self, column: &Column) -> Result<&DFField> {...} // new function that returns bool <---- Add this new function pub fn has_column(&self, column: &Column) -> bool {...} } ``` And then replace in the code that have the pattern ```rust input_schema.field_from_column(col).is_ok() ``` With ```rust input_schema.has_column(col) ``` **Describe the solution you'd like** Ideally someone would do this transition one function on DFSchema at a time (not one giant PR please 🙏 ) **Describe alternatives you've considered** There are more involved proposals for larger changes to DFSchema but simply avoiding this check might help a lot **Additional context** I think this is a good first exercise as the desire is well spelled out and it is a software engineering exercise rather than requires deep datafusion expertise -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
