claudeyj opened a new pull request, #10445:
URL: https://github.com/apache/pinot/pull/10445

   ## What is the purpose of this PR
   * This PR replaces two `HashMap` with `LinkedHashMap` to avoid 
non-determinism of `HashMap`. 
   * Due to the non-determinism of the order of `HashMap`, the content of 
produced file in the segment named `metadata.properties` will be 
non-deterministic. This file will be encoded to the checksum of CRC 
verification. So the non-determinism of this file could lead to failure of 
verification (different checksum). Securing determinism of two `HashMap` by 
replacing them with `LinkedHashMap` can remove the flakiness.
   
   ## Reproduce the test failure
   * Run [NonDex](https://github.com/TestingResearchIllinois/NonDex) (a flaky 
test detection tool that can shuffle the order of data structures like HashMap, 
HashSet) on CrcUtilsTest:
   `mvn clean install -DskipTests -pl pinot-core -am`
   `mvn -pl pinot-core edu.illinois:nondex-maven-plugin:2.1.1:nondex 
-Dtest=org.apache.pinot.core.util.CrcUtilsTest#test1`
   
   ## Expected result
   * The CrcUtilsTest successsfully pass without shuffling and can still pass 
with NonDex shuffling
   
   ## Actual result
   * The CrcUtilsTest successsfully pass without shuffling but cannot pass with 
NonDex shuffling
   
   ## Why the test fails
   * The CrcUtilsTest creates two segments in the same way and wrapped them 
into two CrcUtil objects, which then compute the checksum of CRC and MD5 values 
based on the files in the segments.
   * The current implementation tried to make the CRC process deterministic, 
including the order of files. But the content of one file named 
`metadata.properties` is not deterministic. It may vary in different runs 
because of two non-deterministic data structures, HashMap. When NonDex shuffles 
any one of the HashMaps, the order of properties in the `metadata.properties` 
will be different for different runs, leading to different checksums and the 
test assertion failure.
   
   ## Fix
   * Locate the non-deterministic data structures that affect the order of 
properties (private field `_fieldSpecMap` in 
pinot-spi/src/main/java/org/apache/pinot/spi/data/Schema.java and private field 
`_indexCreationInfoMap` in 
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentIndexCreationDriverImpl.java)
 and replace them with their deterministic alternative (LinkedHashMap or 
TreeMap).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to