[jira] [Created] (PHOENIX-5265) Phoenix Test should use gold files for result comparison instead of using hard-corded comparison.

Bin Shi (JIRA) Tue, 30 Apr 2019 12:19:05 -0700

Bin Shi created PHOENIX-5265:
--------------------------------

             Summary: Phoenix Test should use gold files for result comparison 
instead of using hard-corded comparison.
                 Key: PHOENIX-5265
                 URL: https://issues.apache.org/jira/browse/PHOENIX-5265
             Project: Phoenix
          Issue Type: Improvement
         Environment: {code:java}
// code placeholder
@Test
public void testWithMultiCF() throws Exception {
    int nRows = 20;
    Connection conn = getConnection(0);
    PreparedStatement stmt;
    conn.createStatement().execute(
            "CREATE TABLE " + fullTableName
                    + "(k VARCHAR PRIMARY KEY, a.v INTEGER, b.v INTEGER, c.v 
INTEGER NULL, d.v INTEGER NULL) "
                    + tableDDLOptions );
    stmt = conn.prepareStatement("UPSERT INTO " + fullTableName + " VALUES(?,?, 
?, ?, ?)");
    byte[] val = new byte[250];
    for (int i = 0; i < nRows; i++) {
        stmt.setString(1, Character.toString((char)('a' + i)) + 
Bytes.toString(val));
        stmt.setInt(2, i);
        stmt.setInt(3, i);
        stmt.setInt(4, i);
        stmt.setInt(5, i);
        stmt.executeUpdate();
    }
    conn.commit();
    stmt = conn.prepareStatement("UPSERT INTO " + fullTableName + "(k, c.v, 
d.v) VALUES(?,?,?)");
    for (int i = 0; i < 5; i++) {
        stmt.setString(1, Character.toString((char)('a' + 'z' + i)) + 
Bytes.toString(val));
        stmt.setInt(2, i);
        stmt.setInt(3, i);
        stmt.executeUpdate();
    }
    conn.commit();


    ResultSet rs;
    String actualExplainPlan;
    collectStatistics(conn, fullTableName);
    List<KeyRange> keyRanges = getAllSplits(conn, fullTableName);
    assertEquals(26, keyRanges.size());
    rs = conn.createStatement().executeQuery("EXPLAIN SELECT * FROM " + 
fullTableName);
    actualExplainPlan = QueryUtil.getExplainPlan(rs);
    assertEquals(
            "CLIENT 26-CHUNK 25 ROWS " + (columnEncoded ? ( mutable ? "12530" : 
"14190" ) : 
(TransactionFactory.Provider.OMID.name().equals(transactionProvider)) ? "25320" 
: "12420") +
                    " BYTES PARALLEL 1-WAY FULL SCAN OVER " + physicalTableName,
            actualExplainPlan);

    ConnectionQueryServices services = 
conn.unwrap(PhoenixConnection.class).getQueryServices();
    List<HRegionLocation> regions = 
services.getAllTableRegions(Bytes.toBytes(physicalTableName));
    assertEquals(1, regions.size());

    collectStatistics(conn, fullTableName, Long.toString(1000));
    keyRanges = getAllSplits(conn, fullTableName);
    boolean oneCellPerColFamilyStorageScheme = !mutable && columnEncoded;
    boolean hasShadowCells = 
TransactionFactory.Provider.OMID.name().equals(transactionProvider);
    assertEquals(oneCellPerColFamilyStorageScheme ? 14 : hasShadowCells ? 24 : 
13, keyRanges.size());

    rs = conn
            .createStatement()
            .executeQuery(
                    "SELECT 
COLUMN_FAMILY,SUM(GUIDE_POSTS_ROW_COUNT),SUM(GUIDE_POSTS_WIDTH),COUNT(*) from 
\"SYSTEM\".STATS where PHYSICAL_NAME = '"
                            + physicalTableName + "' GROUP BY COLUMN_FAMILY 
ORDER BY COLUMN_FAMILY");

    assertTrue(rs.next());
    assertEquals("A", rs.getString(1));
    assertEquals(25, rs.getInt(2));
    assertEquals(columnEncoded ? ( mutable ? 12530 : 14190 ) : hasShadowCells ? 
25320 : 12420, rs.getInt(3));
    assertEquals(oneCellPerColFamilyStorageScheme ? 13 : hasShadowCells ? 23 : 
12, rs.getInt(4));

    assertTrue(rs.next());
    assertEquals("B", rs.getString(1));
    assertEquals(oneCellPerColFamilyStorageScheme ? 25 : 20, rs.getInt(2));
    assertEquals(columnEncoded ? ( mutable ? 5600 : 7260 ) : hasShadowCells ? 
11260 : 5540, rs.getInt(3));
    assertEquals(oneCellPerColFamilyStorageScheme ? 7 : hasShadowCells ? 10 : 
5, rs.getInt(4));

    assertTrue(rs.next());
    assertEquals("C", rs.getString(1));
    assertEquals(25, rs.getInt(2));
    assertEquals(columnEncoded ? ( mutable ? 7005 : 7280 ) : hasShadowCells ? 
14085 : 6930, rs.getInt(3));
    assertEquals(hasShadowCells ? 13 : 7, rs.getInt(4));

    assertTrue(rs.next());
    assertEquals("D", rs.getString(1));
    assertEquals(25, rs.getInt(2));
    assertEquals(columnEncoded ? ( mutable ? 7005 : 7280 ) : hasShadowCells ? 
14085 : 6930, rs.getInt(3));
    assertEquals(hasShadowCells ? 13 : 7, rs.getInt(4));

    assertFalse(rs.next());

    // Disable stats
    conn.createStatement().execute("ALTER TABLE " + fullTableName + 
            " SET " + PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH + "=0");
    collectStatistics(conn, fullTableName);
    // Assert that there are no more guideposts
    rs = conn.createStatement().executeQuery("SELECT count(1) FROM " + 
PhoenixDatabaseMetaData.SYSTEM_STATS_NAME + 
            " WHERE " + PhoenixDatabaseMetaData.PHYSICAL_NAME + "='" + 
physicalTableName + "' AND " + PhoenixDatabaseMetaData.COLUMN_FAMILY + " IS NOT 
NULL");
    assertTrue(rs.next());
    assertEquals(0, rs.getLong(1));
    assertFalse(rs.next());
    rs = conn.createStatement().executeQuery("EXPLAIN SELECT * FROM " + 
fullTableName);
    actualExplainPlan = QueryUtil.getExplainPlan(rs);
    assertEquals("CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER " + 
physicalTableName, actualExplainPlan);
}

{code}
            Reporter: Bin Shi


Currently, in Phoenix Test, the comparison of the returned result and the 
expected result is always hard-coded, which is especially widely used by E2E 
and Integration test for comparing the query result, including the result of 
EXPLAIN query plan. This has significantly impaired the productivity and 
efficiency of workin on Phoenix.

The right approach is, each test case should write the generated result to a 
file under a new directory, then compares this file with a gold file under 
"gold" directory. After we make some code changes and make sure the change is 
correct by manually verifying several specific test cases, we can safely 
replace the whole "gold" directory with the directory which contains all new 
generated test files during test running. In this way, we can easily rebase all 
the tests.

For example, in BaseStatsCollectorIT.java, we verify the estimated row size in 
the returned result of EXPLAIN query plan, the row size is decided by many 
factors, like the column is encoded or not, it is mutable or not, it uses 
transaction provider or not, it uses TEPHRA or OMID as transaction provider, 
etc. The code snippet "testWithMultiCF" shows a typical test case. The 
comparisons of the result result and the expected result are hard-coded in 
those asserts. Now imagine, if we change the way collecting stats or we change 
column encoding scheme or we change the cell storage format for TEPHRA or OMID, 
which is very likely to happen, then we need to manually change all those 
hard-coded comparison everywhere, and it isn't trivial to re-calculate all 
expected row sizes with the different conditions . Today you might need one or 
two weeks to rebase all the tests, which is a huge waste. We should use "gold" 
files here, so that we can rebase the test very easily.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (PHOENIX-5265) Phoenix Test should use gold files for result comparison instead of using hard-corded comparison.

Reply via email to