yesh385 opened a new pull request, #4900: URL: https://github.com/apache/hive/pull/4900
Created this PR to fix 8 flaky tests in `TestHBaseSerDe` which can be found [here](https://github.com/apache/hive/blob/master/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java). 1. How was this test identified as flaky? This test was identifies as flaky by using an open-source research tool named [NonDex](https://github.com/TestingResearchIllinois/NonDex) which is responsible for finding and diagnosing non-deterministic runtime exceptions in Java programs. 2. What do the tests do? - `testHBaseSerDeWithTimestamp` Tests the serialization and deserialization of data with timestamps. It involves creating a test scenario with specific column families, qualifiers, and data types, then sorting and comparing the results. The test checks if the serialized and deserialized data matches the expected fields data. - `testHBaseSerDeWithColumnPrefixes` Focuses on serialization and deserialization with column prefixes. It sets up a test scenario with specific column families, qualifiers, and data, then checks if the serialized and deserialized data matches the expected fields data. The test also verifies the handling of unwanted columns and ensures that the column prefixes are appropriately considered in the process. - `testHBaseSerDeCompositeKeyWithoutSeparator` Focuses on serialization and deserialization of data with a composite key that lacks separators. It sets up a scenario with a composite key, a specific column family, qualifier, and test data. The test checks if the serialized and deserialized data match the expected fields, taking into account the absence of separators in the composite key. - `testHBaseSerDeCustomStructValue` Focuses on the serialization and deserialization of data with a custom struct value. It sets up a scenario with a specific column family, qualifier, and test data represented by a custom struct `TestStruct`. The test checks if the serialized and deserialized data match the expected fields, taking into account automatic insertion of separators between different fields in the struct during serialization. - `testHBaseSerDeII` Focuses on the serialization and deserialization of data with various data types and values. It sets up a test scenario with specific column families, qualifiers, and test data, then checks if the serialized and deserialized data match the expected fields data. The test covers a range of data types including byte, short, int, long, float, double, string, and boolean. - `testHBaseSerDeCompositeKeyWithSeparator` Focuses on the serialization and deserialization of data with a composite key that includes separators. It sets up a scenario with a specific column family, qualifier, and test data represented by a custom struct `TestStruct`. The test checks if the serialized and deserialized data match the expected fields, considering the automatic insertion of separators between different fields in the struct during serialization. - `testHBaseSerDeI` Focuses on the serialization and deserialization of data with various data types and values. It sets up a test scenario with specific column families, qualifiers, and test data, then checks if the serialized and deserialized data match the expected fields data. The test covers a range of data types, including byte, short, int, long, float, double, string, and boolean. The scenario includes different configurations, verifying the SerDe functionality under various property settings. - `testHBaseSerDeWithHiveMapToHBaseColumnFamilyII` Focuses on mapping Hive columns to HBase column families. It sets up a test scenario with specific HBase column families, qualifiers, and test data. The test checks if the serialized and deserialized data match the expected fields data and if the Hive columns are correctly mapped to the specified HBase column families. 3. Why do the tests fail? All of the above tests fail because we are comparing the strings of 2 `Put` objects i.e. `p.toString()` and `put.toString()`. However, there is an order mismatch in the strings returned by the `toString()` method between the fields of the 2 `Put` object causing the assertions to fail. The mismatch in the order of the fields happens because the `toString()`method of `Put` creates a `Map<String, Object>` which is then converted to a string using a JSONMapper. This `Map<String, Object>` does not guarantee the same order of the fields every time which causes the assertions to fail. For example, in the test `testHBaseSerDeCompositeKeyWithoutSeparator`, the assertions which causes the test to fail is shown below: https://github.com/apache/hive/blob/9546c10a748630ac5cf39d935a90a97446b93be8/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java#L1052 4. How I fixed these tests? This PR fixes the above tests by comparing the individual fields of the `Put` object instead of the strings of the `Put` objects. You can run the following commands to run the tests using NonDex tool: ``` mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeWithTimestamp ```` ``` mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeWithColumnPrefixes ```` ``` mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeCompositeKeyWithoutSeparator ```` ``` mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeCustomStructValue ```` ``` mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeII ```` ``` mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeCompositeKeyWithSeparator ```` ``` mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeI ```` ``` mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeWithHiveMapToHBaseColumnFamilyII ```` (Optional) You can also run the following command to run the test: ``` mvn test -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeWithTimestamp ```` ``` mvn test -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeWithColumnPrefixes ```` ``` mvn test -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeCompositeKeyWithoutSeparator ```` ``` mvn test -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeCustomStructValue ```` ``` mvn test -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeII ```` ``` mvn test -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeCompositeKeyWithSeparator ```` ``` mvn test -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeI ```` ``` mvn test -pl hbase-handler -Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeWithHiveMapToHBaseColumnFamilyII ```` Test Environment: ``` java version "1.8.0_202" Apache Maven 3.6.3 ``` Kindly let me know if this fix is acceptable. Thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org