klemniops commented on PR #15861: URL: https://github.com/apache/datafusion/pull/15861#issuecomment-2833568011
From the lint description: > Enum size is bounded by the largest variant. Having one large variant can penalize the memory layout of that enum. That is to say, the presence of the large variant `AvroError` affects the whole layout of `DataFusionError`. Transitively, the presence of the large variant `AvroError` affects the whole layout of `Result<T, DataFusionError>`. This affects nearly every function in the DataFusion API. This [related lint pull request](https://github.com/rust-lang/rust-clippy/pull/9373) elaborates more specifically: > - A large Err-variant may force an equally large Result if Err is actually bigger than Ok. > - There is a cost involved in large Result, as LLVM may choose to memcpy them around above a certain size. > - We usually expect the Err variant to be seldomly used, but pay the cost every time. > - Result returned from library code has a high chance of bubbling up the call stack, getting stuffed into MyLibError { IoError(std::io::Error), ParseError(parselib::Error), ...}, exacerbating the problem. As applied here: 1) every API which returns `Result<T, DataFusionError>` might pay a large `memcpy` cost 2) a return of `Err(DataFusionError::AvroError(...))` will bubble up the call stack in nearly all cases, such that (2a) downstream libraries wrapping `DataFusionError` in their own error types will also suffer this problem, and (2b) the end user request in application code will terminate > I think this error is not rarely used Indeed `DataFusionError` is used nearly everywhere which is precisely the point. Whereas the `DataFusion::AvroError` is only produced by the avro reader but it affects every place where `DataFusionError` can appear. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org