rroelke commented on PR #15861: URL: https://github.com/apache/datafusion/pull/15861#issuecomment-2833570923
From the lint description: > Enum size is bounded by the largest variant. Having one large variant can penalize the memory layout of that enum. That is to say, the presence of the large variant AvroError affects the whole layout of DataFusionError. Transitively, the presence of the large variant AvroError affects the whole layout of `Result<T, DataFusionError>`. This affects nearly every function in the DataFusion API. This [related lint pull request](https://github.com/rust-lang/rust-clippy/pull/9373) elaborates more specifically: > - A large Err-variant may force an equally large Result if Err is actually bigger than Ok. > - There is a cost involved in large Result, as LLVM may choose to memcpy them around above a certain size. > - We usually expect the Err variant to be seldomly used, but pay the cost every time. > - Result returned from library code has a high chance of bubbling up the call stack, getting stuffed into MyLibError { IoError(std::io::Error), ParseError(parselib::Error), ...}, exacerbating the problem. As applied here: 1) every API which returns `Result<T, DataFusionError>` might pay a large memcpy cost 2) a return of `Err(DataFusionError::AvroError(...))` will bubble up the call stack in nearly all cases, such that (2a) downstream libraries wrapping `DataFusionError` in their own error types will also suffer this problem, and (2b) the end user request in application code will terminate > I think this error is not rarely used Indeed DataFusionError is used nearly everywhere which is precisely the point. Whereas the DataFusion::AvroError is only produced by the avro reader but it affects every place where DataFusionError can appear. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org