judahrand commented on PR #39836: URL: https://github.com/apache/arrow/pull/39836#issuecomment-2208352180
> There is ambiguity in how a "deep" nested type should be hashed (maybe the nested structure should produce a different hash for a "row" of values), but what makes the most sense to me is for it to be value-based (the above example). I'm not entirely sure I've understood your distinction between the two cases. Am I correct in understanding that in the first of the two options that you describe a row containing`[[1], [1,2,3,4]]` would have a different hash value to `[[1,1], [2,3,4]]` but in the second, 'value-based' option these two lists would have the same hash? If that is the case then the 'value-based' behaviour would be surprising to me. I'd have thought that two rows with what is definitely different data should have different hashes (not considering the small chance of a hash collision). For the case of nested listed I wonder if this issue could be avoided by including the offsets of the parent list in the hash? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
