theirix commented on PR #23211: URL: https://github.com/apache/datafusion/pull/23211#issuecomment-4860433670
> [d1c4fab](https://github.com/apache/datafusion/pull/23211/commits/d1c4fabfb561352105b68189245a7ac0bb38554f) shows what I had in mind. Since `Signature::uniform` is just a convenience factory method for `Signature`, I assumed it would not be a problem if we return a non-uniform `Signature`; the data type already provides this flexibility. > > I pushed the `map(Signature::uniform)` call down into `string_concat_coercion` which now returns `Option<Signature>` instead of `Option<DataType>`. Since this is not a public function I think it's safe to make that change. > > This code compiles, but will not yet work in practice. It requires the changes from [apache/arrow-rs#10222](https://github.com/apache/arrow-rs/pull/10222). This is a nice change! > > > so the use of concat_elements_dyn with a heavier downcast_ref, which uses runtime lookups, works fine - please double-check just in case. > > We're replacing one bit of code with another essentially identical bit of code. If you compare `concat_elements(left: &ArrayRef, right: &ArrayRef)` in `binary.rs` with `concat_elements_dyn` there's almost no difference. I did not profile, but my hunch is that the difference in performance we saw in the benchmarks came from the fact that the actual 'concatenate utf8 elements' was also duplicated between datafusion and arrow-rs. When replacing the datafusions's dynamic dispatching function `concat_elements` with arrow-rs' `concat_elements_dyn` we also end up calling the arrow-rs implementation of 'concatenate utf8 elements' which happens to be a bit slower than the datafusion one. With the optimisation work in arrow-rs integrated this flips around and the arrow-rs implementation is then the faster one. You're right, `concat_elements_dyn` calls `downcast_ref`, while old code calls `as_fixed_size_binary_array`, which essentially calls the `downcast_value!` macro - the same effect. It should have the same performance -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
