judahrand commented on PR #39836:
URL: https://github.com/apache/arrow/pull/39836#issuecomment-2208352180

   > There is ambiguity in how a "deep" nested type should be hashed (maybe the 
nested structure should produce a different hash for a "row" of values), but 
what makes the most sense to me is for it to be value-based (the above example).
   
   I'm not entirely sure I've understood your distinction between the two 
cases. Am I correct in understanding that in the first of the two options that 
you describe a row containing`[[1], [1,2,3,4]]` would have a different hash 
value to `[[1,1], [2,3,4]]` but in the second, 'value-based' option these two 
lists would have the same hash?
   
   If that is the case then the 'value-based' behaviour would be surprising to 
me. I'd have thought that two rows with what is definitely different data 
should have different hashes (not considering the small chance of a hash 
collision).
   
   For the case of nested listed I wonder if this issue could be avoided by 
including the offsets of the parent list in the hash? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to