[jira] [Work logged] (HIVE-24515) Analyze table job can be skipped when stats populated are already accurate

ASF GitHub Bot (Jira) Wed, 03 May 2023 03:51:04 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-24515?focusedWorklogId=860286&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-860286
 ]


ASF GitHub Bot logged work on HIVE-24515:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 03/May/23 10:50
            Start Date: 03/May/23 10:50
    Worklog Time Spent: 10m 
      Work Description: deniskuzZ commented on code in PR #1834:
URL: https://github.com/apache/hive/pull/1834#discussion_r570565029


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java:
##########
@@ -4797,6 +4800,84 @@ private void heartbeatTxn(Connection dbConn, long txnid)
     }
   }
 
+  private boolean foundCommittedTransaction(Connection dbConn, long txnId, 
FindStatStatusByWriteIdRequest rqst,
+                                           String condition) throws 
SQLException, MetaException {
+    String s = sqlGenerator.addLimitClause(1,
+            "1 FROM \"COMPLETED_TXN_COMPONENTS\" WHERE \"CTC_TXNID\" " + 
condition + " " + txnId +
+                    " AND \"CTC_DATABASE\" = ? AND \"CTC_TABLE\" = ?");
+    if (rqst.getPartName() != null) {
+      s += " AND \"CTC_PARTITION\" = ?";
+    }
+
+    try (PreparedStatement pStmt =
+           sqlGenerator.prepareStmtWithParameters(dbConn, s,  
Arrays.asList(rqst.getDbName(), rqst.getTblName()))) {
+      if (rqst.getPartName() != null) {
+        pStmt.setString(3, rqst.getPartName());
+      }
+      LOG.debug("Going to execute query <" + s + ">");
+      try (ResultSet rs2 = pStmt.executeQuery()) {
+        if (rs2.next()) {
+          return true;
+        }
+      }
+    }
+    return false;
+  }
+
+  @Override
+  @RetrySemantics.Idempotent
+  public FindStatStatusByWriteIdResponse 
findStatStatusByWriteId(FindStatStatusByWriteIdRequest rqst)
+          throws SQLException, MetaException {
+    try {
+      Connection dbConn = null;
+      Statement stmt = null;
+      try {
+        lockInternal();
+        dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED);
+        stmt = dbConn.createStatement();
+        TxnState state;
+        long txnId = getTxnIdForWriteId(rqst.getDbName(), rqst.getTblName(), 
rqst.getWriteId());
+        TxnStatus txnStatus = findTxnState(txnId, stmt);
+        if (txnStatus == TxnStatus.ABORTED) {
+          state = TxnState.ABORTED;
+        } else if (txnStatus == TxnStatus.OPEN) {
+          state = TxnState.OPEN;
+        } else if (foundCommittedTransaction(dbConn, txnId, rqst, ">")) {

Review Comment:
   That's not entirely correct. Txn with higher txnId might be commited before 
txn with lower id. See how WRITE_SET table works. It has 2 properties WS_TXNID 
and WS_COMMIT_ID. This table only tracks update/delete operations, that are 
conflicting, insert doesn't belong to this category, so you cannot rely on it 
as well.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 860286)
    Time Spent: 3h 20m  (was: 3h 10m)

> Analyze table job can be skipped when stats populated are already accurate
> --------------------------------------------------------------------------
>
>                 Key: HIVE-24515
>                 URL: https://issues.apache.org/jira/browse/HIVE-24515
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Assignee: mahesh kumar behera
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> For non-partitioned tables, stats detail should be present in table level,
> e.g
> {noformat}
> COLUMN_STATS_ACCURATE={"BASIC_STATS":"true","COLUMN_STATS":{"d_current_day":"true"...
>  }}
>   {noformat}
> For partitioned tables, stats detail should be present in partition level,
> {noformat}
> store_sales(ss_sold_date_sk=2451819)
> {totalSize=0, numRows=0, rawDataSize=0, 
> COLUMN_STATS_ACCURATE={"BASIC_STATS":"true","COLUMN_STATS":{"ss_addr_sk":"true"....}}
>  
>  {noformat}
> When stats populated are already accurate, {{analyze table tn compute 
> statistics for columns}} should skip launching the job.
>  
> For ACID tables, stats are auto computed and it can skip computing stats 
> again when stats are accurate.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-24515) Analyze table job can be skipped when stats populated are already accurate

Reply via email to