[jira] [Commented] (PHOENIX-6832) Uncovered Global Secondary Indexes

ASF GitHub Bot (Jira) Tue, 10 Jan 2023 12:33:05 -0800


    [ 
https://issues.apache.org/jira/browse/PHOENIX-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17656777#comment-17656777
 ]


ASF GitHub Bot commented on PHOENIX-6832:
-----------------------------------------

gjacoby126 commented on code in PR #1531:
URL: https://github.com/apache/phoenix/pull/1531#discussion_r1065081919


##########
phoenix-core/src/it/java/org/apache/phoenix/end2end/index/UncoveredGlobalIndexRegionScannerIT.java:
##########
@@ -80,7 +93,39 @@ private void populateTable(String tableName) throws 
Exception {
     }
 
     @Test
-    public void testUncoveredIndexWithPhoenixRowTimestamp() throws Exception {
+    public void testDDL() throws Exception {
+        try (Connection conn = DriverManager.getConnection(getUrl())) {
+            String dataTableName = generateUniqueName();
+            String indexTableName = generateUniqueName();
+            conn.createStatement().execute("create table " + dataTableName +
+                    " (id varchar(10) not null primary key, val1 varchar(10), 
val2 varchar(10), val3 varchar(10))");
+            if (uncovered) {
+                // The INCLUDE clause should not be allowed
+                try {
+                    conn.createStatement().execute("CREATE UNCOVERED INDEX " + 
indexTableName
+                            + " on " + dataTableName + " (val1) INCLUDE 
(val2)");
+                    Assert.fail();
+                } catch (Exception e) {
+                    // Expected
+                }
+                // The LOCAL keyword should not be allowed with UNCOVERED
+                try {
+                    conn.createStatement().execute("CREATE UNCOVERED LOCAL 
INDEX " + indexTableName

Review Comment:
   What's wrong with an UNCOVERED local? They were implicitly allowed even 
before this JIRA, weren't they?



##########
phoenix-core/src/it/java/org/apache/phoenix/end2end/index/UncoveredGlobalIndexRegionScannerIT.java:
##########
@@ -80,7 +93,39 @@ private void populateTable(String tableName) throws 
Exception {
     }
 
     @Test
-    public void testUncoveredIndexWithPhoenixRowTimestamp() throws Exception {
+    public void testDDL() throws Exception {
+        try (Connection conn = DriverManager.getConnection(getUrl())) {
+            String dataTableName = generateUniqueName();
+            String indexTableName = generateUniqueName();
+            conn.createStatement().execute("create table " + dataTableName +
+                    " (id varchar(10) not null primary key, val1 varchar(10), 
val2 varchar(10), val3 varchar(10))");
+            if (uncovered) {
+                // The INCLUDE clause should not be allowed
+                try {
+                    conn.createStatement().execute("CREATE UNCOVERED INDEX " + 
indexTableName
+                            + " on " + dataTableName + " (val1) INCLUDE 
(val2)");
+                    Assert.fail();
+                } catch (Exception e) {

Review Comment:
   Shouldn't this be a SqlException in particular?



##########
phoenix-core/src/main/java/org/apache/phoenix/hbase/index/IndexRegionObserver.java:
##########
@@ -807,14 +810,39 @@ private void handleLocalIndexUpdates(TableName table,
             miniBatchOp.addOperationsFromCP(0, localUpdates.toArray(new 
Mutation[localUpdates.size()]));
         }
     }
+
+    private boolean isPartialUncoveredIndexUpdate(PhoenixIndexMetaData 
indexMetaData,
+            MiniBatchOperationInProgress<Mutation> miniBatchOp) throws 
IOException {
+        int indexedColumnCount = 0;
+        for (IndexMaintainer indexMaintainer : 
indexMetaData.getIndexMaintainers()) {
+            indexedColumnCount += indexMaintainer.getIndexedColumns().size();
+        }
+        Set<ColumnReference> indexedColumns = new 
HashSet<ColumnReference>(indexedColumnCount);
+        for (IndexMaintainer indexMaintainer : 
indexMetaData.getIndexMaintainers()) {
+            indexedColumns.addAll(indexMaintainer.getIndexedColumns());
+        }
+        for (int i = 0; i < miniBatchOp.size(); i++) {
+            if (miniBatchOp.getOperationStatus(i) == IGNORE) {
+                continue;
+            }
+            Mutation m = miniBatchOp.getOperation(i);
+            if (!this.builder.isEnabled(m)) {
+                continue;
+            }
+            for (ColumnReference indexedColumn : indexedColumns) {
+                if (m.get(indexedColumn.getFamily(), 
indexedColumn.getQualifier()).isEmpty()) {
+                    return true;

Review Comment:
   nit: a comment here explaining how an empty column qualifier means the index 
update is partial



##########
phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UncoveredIndexRegionScanner.java:
##########
@@ -162,7 +174,7 @@ protected Scan prepareDataTableScan(Collection<byte[]> 
dataRowKeys) throws IOExc
         scanRanges.initializeScan(dataScan);
         SkipScanFilter skipScanFilter = scanRanges.getSkipScanFilter();
         dataScan.setFilter(new SkipScanFilter(skipScanFilter, false));
-        scan.setAttribute(BaseScannerRegionObserver.SERVER_PAGE_SIZE_MS,
+        dataScan.setAttribute(BaseScannerRegionObserver.SERVER_PAGE_SIZE_MS,

Review Comment:
   Was this a pre-existing bug that we weren't paging properly in this case?



##########
phoenix-core/src/main/java/org/apache/phoenix/schema/PTable.java:
##########
@@ -104,8 +104,9 @@ public ViewType combine(ViewType otherType) {
     }
 
     public enum IndexType {
-        GLOBAL((byte)1),
-        LOCAL((byte)2);
+        GLOBAL((byte)1), // Covered Global
+        LOCAL((byte)2), // Covered Local
+        UNCOVERED((byte)3); // Uncovered Global

Review Comment:
   Since we can do uncovered queries on local indexes, would it be more clear 
if the enum were UNCOVERED_GLOBAL (or GLOBAL_UNCOVERED)? 



##########
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/index/IndexUpgradeTool.java:
##########
@@ -52,9 +52,7 @@
 import org.apache.phoenix.schema.PIndexState;
 import org.apache.phoenix.schema.PTable;
 import org.apache.phoenix.schema.PTableType;
-import org.apache.phoenix.util.EnvironmentEdgeManager;
-import org.apache.phoenix.util.MetaDataUtil;
-import org.apache.phoenix.util.PhoenixRuntime;
+import org.apache.phoenix.util.*;

Review Comment:
   nit: avoid star imports



##########
phoenix-core/src/main/java/org/apache/phoenix/execute/MutationState.java:
##########
@@ -1693,11 +1693,14 @@ private void filterIndexCheckerMutations(Map<TableInfo, 
List<Mutation>> mutation
                         continue;
                     }
                     if (m instanceof Delete) {
-                        Put put = new Put(m.getRow());
-                        put.addColumn(emptyCF, emptyCQ, 
IndexRegionObserver.getMaxTimestamp(m),
-                                QueryConstants.UNVERIFIED_BYTES);
-                        // The Delete gets marked as unverified in Phase 1 and 
gets deleted on Phase 3.
-                        addToMap(unverifiedIndexMutations, tableInfo, put);
+                        if 
(tableInfo.getOrigTableRef().getTable().getIndexType()

Review Comment:
   nit: a quick comment on why the uncovered case doesn't need the Phase 1 
Unverified flag nor the Phase 3 verified flag would be helpful (since otherwise 
it's just mentioned in the JIRA)



##########
phoenix-core/src/it/java/org/apache/phoenix/end2end/index/UncoveredGlobalIndexRegionScannerIT.java:
##########
@@ -80,7 +93,39 @@ private void populateTable(String tableName) throws 
Exception {
     }
 
     @Test
-    public void testUncoveredIndexWithPhoenixRowTimestamp() throws Exception {
+    public void testDDL() throws Exception {
+        try (Connection conn = DriverManager.getConnection(getUrl())) {
+            String dataTableName = generateUniqueName();
+            String indexTableName = generateUniqueName();
+            conn.createStatement().execute("create table " + dataTableName +
+                    " (id varchar(10) not null primary key, val1 varchar(10), 
val2 varchar(10), val3 varchar(10))");
+            if (uncovered) {
+                // The INCLUDE clause should not be allowed
+                try {
+                    conn.createStatement().execute("CREATE UNCOVERED INDEX " + 
indexTableName
+                            + " on " + dataTableName + " (val1) INCLUDE 
(val2)");
+                    Assert.fail();
+                } catch (Exception e) {
+                    // Expected
+                }
+                // The LOCAL keyword should not be allowed with UNCOVERED
+                try {
+                    conn.createStatement().execute("CREATE UNCOVERED LOCAL 
INDEX " + indexTableName

Review Comment:
   This makes more sense when you know from the PTable.IndexType enum comments 
that UNCOVERED really means "UNCOVERED GLOBAL", but that's not obvious from the 
syntax. 



##########
phoenix-core/src/main/java/org/apache/phoenix/index/IndexMaintainer.java:
##########
@@ -105,16 +107,7 @@
 import org.apache.phoenix.schema.tuple.ValueGetterTuple;
 import org.apache.phoenix.schema.types.PDataType;
 import org.apache.phoenix.transaction.PhoenixTransactionProvider.Feature;
-import org.apache.phoenix.util.BitSet;
-import org.apache.phoenix.util.ByteUtil;
-import org.apache.phoenix.util.EncodedColumnsUtil;
-import org.apache.phoenix.util.ExpressionUtil;
-import org.apache.phoenix.util.IndexUtil;
-import org.apache.phoenix.util.MetaDataUtil;
-import org.apache.phoenix.util.PhoenixRuntime;
-import org.apache.phoenix.util.SchemaUtil;
-import org.apache.phoenix.util.TransactionUtil;
-import org.apache.phoenix.util.TrustedByteArrayOutputStream;
+import org.apache.phoenix.util.*;

Review Comment:
   nit: avoid star imports



##########
phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UncoveredIndexRegionScanner.java:
##########
@@ -244,20 +256,62 @@ protected boolean scanIndexTableRows(List<Cell> result,
         return scanIndexTableRows(result, startTime, null, 0);
     }
 
-    private boolean getNextCoveredIndexRow(List<Cell> result) {
+    private boolean verifyIndexRowAndRepairIfNecessary(Result dataRow, byte[] 
indexRowKey, long ts) throws IOException {
+        Put put = new Put(dataRow.getRow());
+        for (Cell cell : dataRow.rawCells()) {
+            put.add(cell);
+        }
+        if (indexMaintainer.checkIndexRow(indexRowKey, put)) {
+            if (IndexUtil.getMaxTimestamp(put) != ts) {
+                Mutation[] mutations;
+                Put indexPut = new Put(indexRowKey);
+                indexPut.addColumn(emptyCF, emptyCQ, ts, 
QueryConstants.VERIFIED_BYTES);
+                if ((EnvironmentEdgeManager.currentTimeMillis() - ts) > 
ageThreshold) {
+                    Delete indexDelete = 
indexMaintainer.buildRowDeleteMutation(indexRowKey,

Review Comment:
   Not sure I follow the Delete logic here. The index row matches the data row, 
but the index timestamp is old enough to be before the age threshold, so since 
we _should_ have deleted it earlier we delete it anyway even though it's 
correct? Is this because we assume it's still from a previous, failed data 
table update that was retried successfully later? 
   
   Or am I completely misunderstanding this? 



##########
phoenix-core/src/it/java/org/apache/phoenix/end2end/index/UncoveredGlobalIndexRegionScannerIT.java:
##########
@@ -369,18 +523,97 @@ public void testCount() throws Exception {
             ResultSet rs = conn.createStatement().executeQuery(selectSql);
             assertTrue(rs.next());
             long count = rs.getLong(1);
-            selectSql = "SELECT /*+ INDEX(" + dataTableName + " " + 
indexTableName
-                    + ")*/  Count(v3) from " + dataTableName + " where v1 = 5";
-            //Verify that we will read from the index table
+            selectSql = "SELECT"+ (uncovered ? " " : "/*+ INDEX(" + 
dataTableName + " " + indexTableName + ")*/ ")
+                    + "Count(v3) from " + dataTableName + " where v1 = 5";
+            // Verify that we will read from the index table
             assertExplainPlan(conn, selectSql, dataTableName, indexTableName);
             rs = conn.createStatement().executeQuery(selectSql);
             assertTrue(rs.next());
             assertEquals(count, rs.getInt(1));
         }
     }
 
+    @Test
+    public void testFailDataTableRowUpdate() throws Exception {
+        String dataTableName = generateUniqueName();
+        populateTable(dataTableName); // with two rows ('a', 'ab', 'abc', 
'abcd') and ('b', 'bc', 'bcd', 'bcde')
+        try (Connection conn = DriverManager.getConnection(getUrl())) {
+            String indexTableName = generateUniqueName();
+            conn.createStatement().execute("CREATE " + (uncovered ? "UNCOVERED 
" : " ") + "INDEX "
+                    + indexTableName + " on " + dataTableName + " (val1)");
+            // Configure IndexRegionObserver to fail the data table update
+            // and check that this does not impact the correctness
+            IndexRegionObserver.setFailDataTableUpdatesForTesting(true);
+            conn.createStatement().execute("upsert into " + dataTableName + " 
(id, val2) values ('a', 'abcc')");
+            try {
+                conn.commit();
+                fail();
+            } catch (Exception e) {
+                // this is expected
+            }
+            IndexRegionObserver.setFailDataTableUpdatesForTesting(false);
+
+            conn.createStatement().execute("upsert into " + dataTableName + " 
(id, val3) values ('a', 'abcdd')");
+            conn.commit();
+            String selectSql = "SELECT"+ (uncovered ? " " : "/*+ INDEX(" + 
dataTableName + " "
+                    + indexTableName + ")*/ ") + "val2, val3 from " + 
dataTableName
+                    + " WHERE val1  = 'ab'";
+            // Verify that we will read from the first index table
+            assertExplainPlan(conn, selectSql, dataTableName, indexTableName);
+            ResultSet rs = conn.createStatement().executeQuery(selectSql);
+            assertTrue(rs.next());
+            assertEquals("abc", rs.getString(1));
+            assertEquals("abcdd", rs.getString(2));
+            assertFalse(rs.next());
+        }
+    }
+    @Test
+    public void testFailPostIndexDeleteUpdate() throws Exception {
+        String dataTableName = generateUniqueName();
+        populateTable(dataTableName); // with two rows ('a', 'ab', 'abc', 
'abcd') and ('b', 'bc', 'bcd', 'bcde')
+        try (Connection conn = DriverManager.getConnection(getUrl())) {
+            String indexTableName = generateUniqueName();
+            conn.createStatement().execute("CREATE " + (uncovered ? "UNCOVERED 
" : " ") + "INDEX "
+                    + indexTableName + " on " + dataTableName + " (val1)");
+            String selectSql = "SELECT id from " + dataTableName + " WHERE 
val1  = 'ab'";
+
+            // Verify that we will read from the index table
+            assertExplainPlan(conn, selectSql, dataTableName, indexTableName);
+            ResultSet rs = conn.createStatement().executeQuery(selectSql);
+            assertTrue(rs.next());
+            assertEquals("a", rs.getString(1));
+            assertFalse(rs.next());
+            TestUtil.dumpTable(conn, TableName.valueOf(indexTableName));

Review Comment:
   nit: are these TestUtil.dumpTable calls still necessary?





> Uncovered Global Secondary Indexes
> ----------------------------------
>
>                 Key: PHOENIX-6832
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6832
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Kadir Ozdemir
>            Assignee: Kadir Ozdemir
>            Priority: Major
>
> An index can be called an uncovered index if the index cannot serve a query 
> alone. The sole purpose of an uncovered index would be identifying the data 
> table rows to be scanned for the query.  This implies that the DDL for an 
> uncovered index does not have the INCLUDE clause. 
> Then an index is called a covered index if the index can serve a query alone. 
> Please note that a covered index does not mean that it can cover all queries. 
> It just means that it can cover a query. A covered index can still cover some 
> queries even if the index DDL does not have the INCLUDE clause. This is 
> because a given query may reference only PK and/or indexed columns, and thus 
> a covered index without any included columns can serve this query by itself 
> (i.e.,  without joining index rows with data table rows). Another use case 
> for covered indexes without included columns is the count(*) queries. 
> Currently Phoenix uses indexes for count(*) queries by default.
> Since uncovered indexes will be used to identify data table rows affected by 
> a given query and the column values will be picked up from the data table, we 
> can provide a solution that is much simpler than the solution for covered 
> indexes by taking the advantage of the fact that the data table is the source 
> of truth, and an index table is used to only map secondary keys to the 
> primary keys to eliminate full table scans. The correctness of such a 
> solution is ensured if for every data table row, there exists an index row. 
> Then our solution to update the data tables and their indexes in a consistent 
> fashion for global secondary indexes would be a two-phase update approach, 
> where we first insert the index table rows, and only if they are successful, 
> then we update the data table rows. 
> This approach does not require reading the existing data table rows which is 
> currently required for covered indexes. Also, it does not require two-phase 
> commit writes for updating and maintaining global secondary index table rows. 
> Eliminating a data table read operation and an RPC call to update the index 
> row verification status on the corresponding index row would cut down index 
> write latency overhead by at least 50% for global uncovered indexes when 
> compared to global covered indexes. This is because global covered indexes 
> require one data table read and two index write operations for every data 
> table update whereas global uncovered indexes would require only one index 
> write. For batch writes, the expected performance and latency improvement 
> would be much higher than 50% since a batch of random row updates would not 
> anymore require random seeks on the data table for reading existing data 
> table rows.
> PHOENIX-6458, PHOENIX-6501 and PHOENIX-6663 improve the performance and 
> efficiency of joining index rows with their data table rows when a covered 
> index cannot cover a given query. We can further leverage it to support 
> uncovered indexes. 
> The uncovered indexes would be a significant performance improvement for 
> write intensive workloads. Also a common use case where uncovered indexes 
> will be desired is the upsert select use case on the data table, where a 
> subset of rows are updated in a batch. In this use case, the select query 
> performance is greatly improved via a covered index but the upsert part 
> suffers due to the covered index write overhead especially when the selected 
> data table rows are not consecutively stored on disk which is the most common 
> case.
> As mentioned before, the DDL for index creation does not include the INCLUDE 
> clause. We can add the UNCOVERED keyword to indicate the index to be created 
> is an uncovered index, for example, CREATE UNCOVERED INDEX. 
> As in the case of covered indexes, we can do read repair for uncovered 
> indexes too. The difference is that instead of using the verify status for 
> index rows, we would check if the corresponding data table row exists for a 
> given index row. Since we would always retrieve the data table rows to join 
> back with index rows for uncovered indexes, the read repair cost would occur 
> only for deleting invalid index rows. Also, the existing index reverse 
> verification and repair feature supported by IndexTool can be used to do bulk 
> repair operations from time to time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PHOENIX-6832) Uncovered Global Secondary Indexes

Reply via email to