Jefffrey opened a new issue, #20150: URL: https://github.com/apache/datafusion/issues/20150
### Is your feature request related to a problem or challenge? In our hash util functions, we have a `rehash` argument across many of them: https://github.com/apache/datafusion/blob/b80bf2ca8ef74900fee96a1cc169bdedf53b36fc/datafusion/common/src/hash_utils.rs#L185-L190 https://github.com/apache/datafusion/blob/b80bf2ca8ef74900fee96a1cc169bdedf53b36fc/datafusion/common/src/hash_utils.rs#L230-L235 https://github.com/apache/datafusion/blob/b80bf2ca8ef74900fee96a1cc169bdedf53b36fc/datafusion/common/src/hash_utils.rs#L282-L287 It's not clearly obvious why we do this from the code alone; it seems it used to be named `multi_col` and would be true if we needed to hash multiple columns, but was changed in #6816 to also skip rehash if it is the first column, for performance reasons. - It seems dictionary function also still calls it `multi_col` I also found it confusing how certain hash functions don't have a rehash parameter; specifically the nested types such as list, struct, etc. https://github.com/apache/datafusion/blob/b80bf2ca8ef74900fee96a1cc169bdedf53b36fc/datafusion/common/src/hash_utils.rs#L447-L451 https://github.com/apache/datafusion/blob/b80bf2ca8ef74900fee96a1cc169bdedf53b36fc/datafusion/common/src/hash_utils.rs#L475-L479 https://github.com/apache/datafusion/blob/b80bf2ca8ef74900fee96a1cc169bdedf53b36fc/datafusion/common/src/hash_utils.rs#L510-L515 ### Describe the solution you'd like Add some documentation explaining why we have a `rehash` parameter across the functions. Also look into adding `rehash` parameter for those hash functions missing them. If this parameter was omitted on purpose for such functions, leave an explanation of why this is the case. ### Describe alternatives you've considered _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
