[
https://issues.apache.org/jira/browse/PHOENIX-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17677919#comment-17677919
]
ASF GitHub Bot commented on PHOENIX-6832:
-----------------------------------------
kadirozde commented on code in PR #1531:
URL: https://github.com/apache/phoenix/pull/1531#discussion_r1072709179
##########
phoenix-core/src/it/java/org/apache/phoenix/end2end/index/UncoveredGlobalIndexRegionScannerIT.java:
##########
@@ -369,18 +523,97 @@ public void testCount() throws Exception {
ResultSet rs = conn.createStatement().executeQuery(selectSql);
assertTrue(rs.next());
long count = rs.getLong(1);
- selectSql = "SELECT /*+ INDEX(" + dataTableName + " " +
indexTableName
- + ")*/ Count(v3) from " + dataTableName + " where v1 = 5";
- //Verify that we will read from the index table
+ selectSql = "SELECT"+ (uncovered ? " " : "/*+ INDEX(" +
dataTableName + " " + indexTableName + ")*/ ")
+ + "Count(v3) from " + dataTableName + " where v1 = 5";
+ // Verify that we will read from the index table
assertExplainPlan(conn, selectSql, dataTableName, indexTableName);
rs = conn.createStatement().executeQuery(selectSql);
assertTrue(rs.next());
assertEquals(count, rs.getInt(1));
}
}
+ @Test
+ public void testFailDataTableRowUpdate() throws Exception {
+ String dataTableName = generateUniqueName();
+ populateTable(dataTableName); // with two rows ('a', 'ab', 'abc',
'abcd') and ('b', 'bc', 'bcd', 'bcde')
+ try (Connection conn = DriverManager.getConnection(getUrl())) {
+ String indexTableName = generateUniqueName();
+ conn.createStatement().execute("CREATE " + (uncovered ? "UNCOVERED
" : " ") + "INDEX "
+ + indexTableName + " on " + dataTableName + " (val1)");
+ // Configure IndexRegionObserver to fail the data table update
+ // and check that this does not impact the correctness
+ IndexRegionObserver.setFailDataTableUpdatesForTesting(true);
+ conn.createStatement().execute("upsert into " + dataTableName + "
(id, val2) values ('a', 'abcc')");
+ try {
+ conn.commit();
+ fail();
+ } catch (Exception e) {
+ // this is expected
+ }
+ IndexRegionObserver.setFailDataTableUpdatesForTesting(false);
+
+ conn.createStatement().execute("upsert into " + dataTableName + "
(id, val3) values ('a', 'abcdd')");
+ conn.commit();
+ String selectSql = "SELECT"+ (uncovered ? " " : "/*+ INDEX(" +
dataTableName + " "
+ + indexTableName + ")*/ ") + "val2, val3 from " +
dataTableName
+ + " WHERE val1 = 'ab'";
+ // Verify that we will read from the first index table
+ assertExplainPlan(conn, selectSql, dataTableName, indexTableName);
+ ResultSet rs = conn.createStatement().executeQuery(selectSql);
+ assertTrue(rs.next());
+ assertEquals("abc", rs.getString(1));
+ assertEquals("abcdd", rs.getString(2));
+ assertFalse(rs.next());
+ }
+ }
+ @Test
+ public void testFailPostIndexDeleteUpdate() throws Exception {
+ String dataTableName = generateUniqueName();
+ populateTable(dataTableName); // with two rows ('a', 'ab', 'abc',
'abcd') and ('b', 'bc', 'bcd', 'bcde')
+ try (Connection conn = DriverManager.getConnection(getUrl())) {
+ String indexTableName = generateUniqueName();
+ conn.createStatement().execute("CREATE " + (uncovered ? "UNCOVERED
" : " ") + "INDEX "
+ + indexTableName + " on " + dataTableName + " (val1)");
+ String selectSql = "SELECT id from " + dataTableName + " WHERE
val1 = 'ab'";
+
+ // Verify that we will read from the index table
+ assertExplainPlan(conn, selectSql, dataTableName, indexTableName);
+ ResultSet rs = conn.createStatement().executeQuery(selectSql);
+ assertTrue(rs.next());
+ assertEquals("a", rs.getString(1));
+ assertFalse(rs.next());
+ TestUtil.dumpTable(conn, TableName.valueOf(indexTableName));
Review Comment:
Good catch. Those were added for debugging. I will remove them.
> Uncovered Global Secondary Indexes
> ----------------------------------
>
> Key: PHOENIX-6832
> URL: https://issues.apache.org/jira/browse/PHOENIX-6832
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Kadir Ozdemir
> Assignee: Kadir Ozdemir
> Priority: Major
>
> An index can be called an uncovered index if the index cannot serve a query
> alone. The sole purpose of an uncovered index would be identifying the data
> table rows to be scanned for the query. This implies that the DDL for an
> uncovered index does not have the INCLUDE clause.
> Then an index is called a covered index if the index can serve a query alone.
> Please note that a covered index does not mean that it can cover all queries.
> It just means that it can cover a query. A covered index can still cover some
> queries even if the index DDL does not have the INCLUDE clause. This is
> because a given query may reference only PK and/or indexed columns, and thus
> a covered index without any included columns can serve this query by itself
> (i.e., without joining index rows with data table rows). Another use case
> for covered indexes without included columns is the count(*) queries.
> Currently Phoenix uses indexes for count(*) queries by default.
> Since uncovered indexes will be used to identify data table rows affected by
> a given query and the column values will be picked up from the data table, we
> can provide a solution that is much simpler than the solution for covered
> indexes by taking the advantage of the fact that the data table is the source
> of truth, and an index table is used to only map secondary keys to the
> primary keys to eliminate full table scans. The correctness of such a
> solution is ensured if for every data table row, there exists an index row.
> Then our solution to update the data tables and their indexes in a consistent
> fashion for global secondary indexes would be a two-phase update approach,
> where we first insert the index table rows, and only if they are successful,
> then we update the data table rows.
> This approach does not require reading the existing data table rows which is
> currently required for covered indexes. Also, it does not require two-phase
> commit writes for updating and maintaining global secondary index table rows.
> Eliminating a data table read operation and an RPC call to update the index
> row verification status on the corresponding index row would cut down index
> write latency overhead by at least 50% for global uncovered indexes when
> compared to global covered indexes. This is because global covered indexes
> require one data table read and two index write operations for every data
> table update whereas global uncovered indexes would require only one index
> write. For batch writes, the expected performance and latency improvement
> would be much higher than 50% since a batch of random row updates would not
> anymore require random seeks on the data table for reading existing data
> table rows.
> PHOENIX-6458, PHOENIX-6501 and PHOENIX-6663 improve the performance and
> efficiency of joining index rows with their data table rows when a covered
> index cannot cover a given query. We can further leverage it to support
> uncovered indexes.
> The uncovered indexes would be a significant performance improvement for
> write intensive workloads. Also a common use case where uncovered indexes
> will be desired is the upsert select use case on the data table, where a
> subset of rows are updated in a batch. In this use case, the select query
> performance is greatly improved via a covered index but the upsert part
> suffers due to the covered index write overhead especially when the selected
> data table rows are not consecutively stored on disk which is the most common
> case.
> As mentioned before, the DDL for index creation does not include the INCLUDE
> clause. We can add the UNCOVERED keyword to indicate the index to be created
> is an uncovered index, for example, CREATE UNCOVERED INDEX.
> As in the case of covered indexes, we can do read repair for uncovered
> indexes too. The difference is that instead of using the verify status for
> index rows, we would check if the corresponding data table row exists for a
> given index row. Since we would always retrieve the data table rows to join
> back with index rows for uncovered indexes, the read repair cost would occur
> only for deleting invalid index rows. Also, the existing index reverse
> verification and repair feature supported by IndexTool can be used to do bulk
> repair operations from time to time.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)