yesh385 opened a new pull request, #4900:
URL: https://github.com/apache/hive/pull/4900

   Created this PR to fix 8 flaky tests in `TestHBaseSerDe` which can be found 
[here](https://github.com/apache/hive/blob/master/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java).
   
   
   1. How was this test identified as flaky?
   This test was identifies as flaky by using an open-source research tool 
named [NonDex](https://github.com/TestingResearchIllinois/NonDex) which is 
responsible for finding and diagnosing non-deterministic runtime exceptions in 
Java programs.
   
   2. What do the tests do?
   - `testHBaseSerDeWithTimestamp`
   Tests the serialization and deserialization of data with timestamps. It 
involves creating a test scenario with specific column families, qualifiers, 
and data types, then sorting and comparing the results. The test checks if the 
serialized and deserialized data matches the expected fields data.
   - `testHBaseSerDeWithColumnPrefixes`
   Focuses on serialization and deserialization with column prefixes. It sets 
up a test scenario with specific column families, qualifiers, and data, then 
checks if the serialized and deserialized data matches the expected fields 
data. The test also verifies the handling of unwanted columns and ensures that 
the column prefixes are appropriately considered in the process.
   - `testHBaseSerDeCompositeKeyWithoutSeparator`
   Focuses on serialization and deserialization of data with a composite key 
that lacks separators. It sets up a scenario with a composite key, a specific 
column family, qualifier, and test data. The test checks if the serialized and 
deserialized data match the expected fields, taking into account the absence of 
separators in the composite key. 
   - `testHBaseSerDeCustomStructValue`
   Focuses on the serialization and deserialization of data with a custom 
struct value. It sets up a scenario with a specific column family, qualifier, 
and test data represented by a custom struct `TestStruct`. The test checks if 
the serialized and deserialized data match the expected fields, taking into 
account automatic insertion of separators between different fields in the 
struct during serialization.
   - `testHBaseSerDeII`
   Focuses on the serialization and deserialization of data with various data 
types and values. It sets up a test scenario with specific column families, 
qualifiers, and test data, then checks if the serialized and deserialized data 
match the expected fields data. The test covers a range of data types including 
byte, short, int, long, float, double, string, and boolean.
   - `testHBaseSerDeCompositeKeyWithSeparator`
   Focuses on the serialization and deserialization of data with a composite 
key that includes separators. It sets up a scenario with a specific column 
family, qualifier, and test data represented by a custom struct `TestStruct`. 
The test checks if the serialized and deserialized data match the expected 
fields, considering the automatic insertion of separators between different 
fields in the struct during serialization.
   - `testHBaseSerDeI`
   Focuses on the serialization and deserialization of data with various data 
types and values. It sets up a test scenario with specific column families, 
qualifiers, and test data, then checks if the serialized and deserialized data 
match the expected fields data. The test covers a range of data types, 
including byte, short, int, long, float, double, string, and boolean. The 
scenario includes different configurations, verifying the SerDe functionality 
under various property settings.
   - `testHBaseSerDeWithHiveMapToHBaseColumnFamilyII`
   Focuses on mapping Hive columns to HBase column families. It sets up a test 
scenario with specific HBase column families, qualifiers, and test data. The 
test checks if the serialized and deserialized data match the expected fields 
data and if the Hive columns are correctly mapped to the specified HBase column 
families. 
   
   3. Why do the tests fail?
   All of the above tests fail because we are comparing the strings of 2 `Put` 
objects i.e. `p.toString()` and `put.toString()`. However, there is an order 
mismatch in the strings returned by the `toString()` method between the fields 
of the 2 `Put` object causing the assertions to fail. 
   
   The mismatch in the order of the fields happens because the 
`toString()`method of  `Put` creates a `Map<String, Object>` which is then 
converted to a string using a JSONMapper. This `Map<String, Object>` does not 
guarantee the same order of the fields every time which causes the assertions 
to fail.
   
   For example, in the test `testHBaseSerDeCompositeKeyWithoutSeparator`, the 
assertions which causes the test to fail is shown below:
   
https://github.com/apache/hive/blob/9546c10a748630ac5cf39d935a90a97446b93be8/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java#L1052
   
   4. How I fixed these tests?
   
   This PR fixes the above tests by comparing the individual fields of the 
`Put` object instead of the strings of the `Put` objects.
   
   You can run the following commands to run the tests using NonDex tool:
   
   ```
   mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex -pl hbase-handler 
-Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeWithTimestamp
   ````
   ```
   mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex -pl hbase-handler 
-Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeWithColumnPrefixes
   ````
   ```
   mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex -pl hbase-handler 
-Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeCompositeKeyWithoutSeparator
   ````
   ```
   mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex -pl hbase-handler 
-Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeCustomStructValue
   ````
   ```
   mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex -pl hbase-handler 
-Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeII
   ````
   ```
   mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex -pl hbase-handler 
-Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeCompositeKeyWithSeparator
   ````
   ```
   mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex -pl hbase-handler 
-Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeI
   ````
   ```
   mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex -pl hbase-handler 
-Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeWithHiveMapToHBaseColumnFamilyII
   ````
   
   (Optional) You can also run the following command to run the test:
   ```
   mvn test -pl hbase-handler 
-Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeWithTimestamp
   ````
   ```
   mvn test -pl hbase-handler 
-Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeWithColumnPrefixes
   ````
   ```
   mvn test -pl hbase-handler 
-Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeCompositeKeyWithoutSeparator
   ````
   ```
   mvn test -pl hbase-handler 
-Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeCustomStructValue
   ````
   ```
   mvn test -pl hbase-handler 
-Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeII
   ````
   ```
   mvn test -pl hbase-handler 
-Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeCompositeKeyWithSeparator
   ````
   ```
   mvn test -pl hbase-handler 
-Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeI
   ````
   ```
   mvn test -pl hbase-handler 
-Dtest=org.apache.hadoop.hive.hbase.TestHBaseSerDe#testHBaseSerDeWithHiveMapToHBaseColumnFamilyII
   ````
   Test Environment:
   ```
   java version "1.8.0_202"
   Apache Maven 3.6.3
   ```
   
   Kindly let me know if this fix is acceptable.
   
   Thank you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to