[jira] [Commented] (PHOENIX-1812) Only sync table metadata when necessary

James Taylor (JIRA) Tue, 18 Aug 2015 17:07:17 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702197#comment-14702197
 ]


James Taylor commented on PHOENIX-1812:
---------------------------------------

Thanks for the revisions, [~tdsilva]. I think a few things can be simplified a 
bit more now that we're consistently tracking when we resolved the table. 
Here's some feedback:
- Combine the logic of the getCurrentScn() and getCurrentTxnTimestamp() to get 
the correct client timestamp wrt metadata. If transactional, we should be using 
the read pointer scaled down by 1M through our TransactionUtil function. If non 
transactional, then the logic will look like the getCurrentScn() function.
{code}
+    private long getCurrentScn() {
         Long scn = connection.getSCN();
-        long clientTimeStamp = scn == null ? HConstants.LATEST_TIMESTAMP : scn;
-        return clientTimeStamp;
+        long currentScn = scn == null ? HConstants.LATEST_TIMESTAMP : scn;
+        return currentScn;
     }
 
+    private long getCurrentTxnTimestamp() {
+        Transaction transaction = 
connection.getMutationState().getTransaction();
+        long currentTxnTimestamp = transaction == null ? 
HConstants.LATEST_TIMESTAMP : transaction.getReadPointer();
+        return currentTxnTimestamp;
+    }
+    
{code}
- Simplify the logic in MetaDataClient.updateCache() by not having a special 
case for scn versus txn in this code:
{code}
+        long currentScn = getCurrentScn();
+        long currentTxnTimestamp = getCurrentTxnTimestamp();
+        // Do not make rpc to getTable if 
+        // 1. table is a system table
+        // 2. currentScn is set and equals tableTablestamp-1 (in which case 
there can't possibly be a newer schema)
+        // 3. currentScn is set and equals tableResolvedTimestamp
+        // 4. table is transactional and the current txn timestamp equals 
tableResolvedTimestamp
+        if (table != null && !alwaysHitServer 
+                       && ( systemTable || 
+                                (currentScn!=HConstants.LATEST_TIMESTAMP && 
(tableTimestamp == currentScn-1 || currentScn == tableResolvedTimestamp)) || 
+                                (table.isTransactional() && 
currentTxnTimestamp == tableResolvedTimestamp))) {
+            return new 
MetaDataMutationResult(MutationCode.TABLE_ALREADY_EXISTS, 
QueryConstants.UNSET_TIMESTAMP, table);
         }
 {code}
-  We don't need that special case of currentScn!=HConstants.LATEST_TIMESTAMP 
nor do we need the special case of tableTimestamp == currentScn-1. If the table 
was resolved already at our scn, then it shouldn't be resolved again. Something 
like this instead:
{code}
+        long resolvedTimestamp = getResolvedTimestamp();
+        // Do not make rpc to getTable if 
+        // 1. table is a system table
+        // 2. table was already resolved as-of that timestamp
+        if (table != null && !alwaysHitServer && ( systemTable || 
resolvedTimestamp == tableResolvedTimestamp) {
+            return new 
MetaDataMutationResult(MutationCode.TABLE_ALREADY_EXISTS, 
QueryConstants.UNSET_TIMESTAMP, table);
         }
{code}
- For the rest of the MetaDataClient.updateCache(), make sure to pass through 
resolvedTimestamp. I think you'll need to pass this through the 
addIndexesFromPhysicalTable call too. That way, if we're transactional, we're 
consistently resolving everything as of the scaled down read pointer.
{code}
         int maxTryCount = tenantId == null ? 1 : 2;
@@ -347,7 +377,8 @@ public class MetaDataClient {
         do {
             final byte[] schemaBytes = PVarchar.INSTANCE.toBytes(schemaName);
             final byte[] tableBytes = PVarchar.INSTANCE.toBytes(tableName);
-            result = connection.getQueryServices().getTable(tenantId, 
schemaBytes, tableBytes, tableTimestamp, clientTimeStamp);
+            ConnectionQueryServices queryServices = 
connection.getQueryServices();
+                       result = queryServices.getTable(tenantId, schemaBytes, 
tableBytes, tableTimestamp, resolvedTimestamp); // HERE
 
             if (SYSTEM_CATALOG_SCHEMA.equals(schemaName)) {
                 return result;
@@ -372,14 +403,18 @@ public class MetaDataClient {
                 // server again.
                 if (table != null) {
                     // Ensures that table in result is set to table found in 
our cache.
-                    result.setTable(table);
                     if (code == MutationCode.TABLE_ALREADY_EXISTS) {
-                        // Although this table is up-to-date, the parent table 
may not be.
-                        // In this case, we update the parent table which may 
in turn pull
-                        // in indexes to add to this table.
-                        if (addIndexesFromPhysicalTable(result)) {
-                            connection.addTable(result.getTable());
-                        }
+                       result.setTable(table);
+                       // Although this table is up-to-date, the parent table 
may not be.
+                       // In this case, we update the parent table which may 
in turn pull
+                       // in indexes to add to this table.
+                       if (addIndexesFromPhysicalTable(result, 
resolvedTimestamp)) {    // HERE (I think)
+                           connection.addTable(result.getTable(), 
resolvedTimestamp);     // HERE
+                       }
+                       else {
+                               // if we aren't adding the table, we still need 
to update the resolved time of the table 
+                               connection.updateResolvedTimestamp(table, 
resolvedTimestamp);  // HERE
+                       }
                         return result;
                     }
{code}
- Add one more overloaded updateCache call that allows us to pass in the 
resolveTimestamp when we already know it. Maybe a Long parameter with null 
being the default? We'll use this from addIndexesFromPhysicalTable where we 
already have the resolvedTimestamp, as otherwise when we call it with the 
parent table, we'll end up making another RPC to get the latest parent table 
metadata. I think it's better to consistently use the same resolveTimestamp for 
these cases.
- Seems like there are a couple of place where you're calling getCurrentSCN(), 
but I'm not sure if this takes into account if a table or index is 
transactional. Maybe just make a pass through and ensure it's consistent. For 
example:
{code}
@@ -2700,13 +2747,13 @@ public class MetaDataClient {
          */
         boolean isSharedIndex = table.getViewIndexId() != null;
         if (isSharedIndex) {
-            return 
connection.getQueryServices().getTableStats(table.getPhysicalName().getBytes(), 
getClientTimeStamp());
+            return 
connection.getQueryServices().getTableStats(table.getPhysicalName().getBytes(), 
getCurrentScn());
         }
{code}

> Only sync table metadata when necessary
> ---------------------------------------
>
>                 Key: PHOENIX-1812
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1812
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: Thomas D'Silva
>         Attachments: PHOENIX-1812-v2.patch, PHOENIX-1812-v3.patch, 
> PHOENIX-1812-v4-WIP.patch, PHOENIX-1812-v5.patch, PHOENIX-1812.patch, 
> PHOENIX-1812.patch, PHOENIX-1812.patch
>
>
> With transactions, we hold the timestamp at the point when the transaction 
> was opened. We can prevent the MetaDataEndpoint getTable RPC in 
> MetaDataClient.updateCache() to check that the client has the latest table if 
> we've already checked at the current transaction ID timestamp. We can keep 
> track of which tables we've already updated in PhoenixConnection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-1812) Only sync table metadata when necessary

Reply via email to