claudeyj opened a new pull request, #10445: URL: https://github.com/apache/pinot/pull/10445
## What is the purpose of this PR * This PR replaces two `HashMap` with `LinkedHashMap` to avoid non-determinism of `HashMap`. * Due to the non-determinism of the order of `HashMap`, the content of produced file in the segment named `metadata.properties` will be non-deterministic. This file will be encoded to the checksum of CRC verification. So the non-determinism of this file could lead to failure of verification (different checksum). Securing determinism of two `HashMap` by replacing them with `LinkedHashMap` can remove the flakiness. ## Reproduce the test failure * Run [NonDex](https://github.com/TestingResearchIllinois/NonDex) (a flaky test detection tool that can shuffle the order of data structures like HashMap, HashSet) on CrcUtilsTest: `mvn clean install -DskipTests -pl pinot-core -am` `mvn -pl pinot-core edu.illinois:nondex-maven-plugin:2.1.1:nondex -Dtest=org.apache.pinot.core.util.CrcUtilsTest#test1` ## Expected result * The CrcUtilsTest successsfully pass without shuffling and can still pass with NonDex shuffling ## Actual result * The CrcUtilsTest successsfully pass without shuffling but cannot pass with NonDex shuffling ## Why the test fails * The CrcUtilsTest creates two segments in the same way and wrapped them into two CrcUtil objects, which then compute the checksum of CRC and MD5 values based on the files in the segments. * The current implementation tried to make the CRC process deterministic, including the order of files. But the content of one file named `metadata.properties` is not deterministic. It may vary in different runs because of two non-deterministic data structures, HashMap. When NonDex shuffles any one of the HashMaps, the order of properties in the `metadata.properties` will be different for different runs, leading to different checksums and the test assertion failure. ## Fix * Locate the non-deterministic data structures that affect the order of properties (private field `_fieldSpecMap` in pinot-spi/src/main/java/org/apache/pinot/spi/data/Schema.java and private field `_indexCreationInfoMap` in pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentIndexCreationDriverImpl.java) and replace them with their deterministic alternative (LinkedHashMap or TreeMap). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
