crepererum commented on issue #14256:
URL: https://github.com/apache/datafusion/issues/14256#issuecomment-2624035825
I have another hunch, but still need to proof it: DataFusion uses a lot of
```rust
fn f(param: impl Into<Type>, ...) -> ... {
let param: Type = param.into();
...
}
// call-site
let x: OtherType = ...;
f(x, ...);
```
This is mostly so that the caller can stick a lot of parameters into methods
w/o needing to convert it to the right type. Sometimes it's also used for
backwards compatible-ish changes[^1].
In Rust that `impl` constructs expands roughly to
```rust
fn f<T>(param: T) -> ...
where
T: Into<Type>
{
let param: Type = param.into();
...
}
```
Since Rust generics are implemented using
[monomorphization](https://rustc-dev-guide.rust-lang.org/backend/monomorph.html),
this leads to the following issues:
1. **type replication:** The function including its body are compiled for
every parameter `T`. And this also includes multiple copies of the LLVM IR and
resulting machine code. Note that this includes the WHOLE body, not just the
`let param: Type = param.into();` part[^2].
2. **delayed compilation:** Since a crate that contains the aforementioned
method cannot know all types `T` that you eventually gonna use, the compilation
is done on the call site (instead the definition site). This however also means
that you get even more copies of the function, because if you have a crate that
defines the method and two crates that use it, the two user crates are compiled
independently.
My guess is that while this leads to "nice APIs", it leads to a lot of
compilation overhead both within DataFusion and for its users. I would
therefore suggest that we change methods to look like this:
```rust
fn f(param: Type, ...) -> ... {
...
}
// call-site
let x: OtherType = ...;
f(x.into(), ...);
```
An intermediate solution that requires more code, has a slightly higher
compilation overhead (due to more generics) but isn't as bad as the original
version would be this:
```rust
fn f(param: impl Into<Type>, ...) -> ... {
f_impl(param.into(), ...)
}
// private helper method
// may want to use `#[inline(never)]` here
fn f_impl(param: Type, ...) -> ... {
...
}
// call-site
let x: OtherType = ...;
f(x, ...);
```
Technically this also applies to `impl IntoIterator<Item = Type>` and `impl
Iterator<Item = Type>`, but there it's usually harder to fix. There we would
have the following options, all of which are slower than the original one due
to dynamic dispatch:
```rust
// original version
fn g(it: impl IntoIterator<Item = Type>) {}
// unsized locals
// requires
https://doc.rust-lang.org/beta/unstable-book/language-features/unsized-locals.html
fn g(it: dyn Iterator<Item = Type>) {}
// box version
fn g(it: Box<dyn Iterator<Item = Type>>) {}
// pass mut ref
fn g(it: &mut dyn Iterator<Item = Type>) {}
```
So I would leave the iterator cases out for now.
[^1]: That's strictly speaking NOT true, changing `param: String` to `param:
impl Into<String>` is NOT backwards compatible since it affects type inference
on the call site.
[^2]: This partial replication of code might one day be possible via
[polymorphization](https://github.com/rust-lang/rust/issues/124962), but I
don't expect this to happen within the next 3 years.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]