KalleOlaviNiemitalo commented on PR #1742:
URL: https://github.com/apache/avro/pull/1742#issuecomment-1312634759

   The difficulty is in recursive calls
   
   
<https://github.com/apache/avro/blob/3f55816d2baec1ec6bd1ed4be69fd6977a8316d5/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java#L1116>
   
<https://github.com/apache/avro/blob/3f55816d2baec1ec6bd1ed4be69fd6977a8316d5/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java#L1127>
   
   The limit is set to 10 hashes.  Suppose you have a record type R1 with two 
properties, each of which has record type R2, which in turn has seven string 
properties.  Now when you call hashCode on the R1 instance, it should hash all 
7 strings of the first R2 instance but only the first 3 strings of the second 
R2 instance.  So the recursive hashCode call on the first R2 instance must 
somehow let the caller know how many values it hashed.  Passing an 
AtomicInteger as a parameter achieves that because hashCode can decrement the 
value of the AtomicInteger and this change is visible to the caller.  If you 
passed an int counter parameter instead, then you'd have to return the counter 
change to the caller in some other way; perhaps by returning an instance of a 
class rather than just the hash code as int.
   
   That said -- it might be advantageous to replace the AtomicInteger with a 
class that contains both the counter and the hash code being computed.  The 
hashCodeAdd method effectively multiplies hash codes by powers of 31:
   
   
<https://github.com/apache/avro/blob/3f55816d2baec1ec6bd1ed4be69fd6977a8316d5/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java#L1143-L1146>
   
   In the scenario with record types R1 and R2, the hash codes of the strings 
in the first R2 instance get multiplied with 31 raised to the powers of 6, 5, 
4, 3, 2, and 1.  The hash codes of the first three strings in the second R2 
instance get multiplied with 31 raised to the powers of 2, 1, and 0.  So two of 
these multipliers get reused.  If there instead were a static class 
HashCodeCollector that had one int field for the hash code and another int 
field for the counter, and all the recursive calls fed the values to that, then 
it would be able to give a separate multiplier to the hash code of each value.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to