[GitHub] [hive] klcopp commented on a change in pull request #3034: HIVE-25943: Introduce compaction cleaner failed attempts threshold

GitBox Mon, 28 Feb 2022 09:39:47 -0800


klcopp commented on a change in pull request #3034:
URL: https://github.com/apache/hive/pull/3034#discussion_r816026374




##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionInfo.java
##########
@@ -87,6 +89,21 @@ public CompactionInfo(String dbname, String tableName, 
String partName, Compacti
   }
   CompactionInfo() {}
 
+  public String getProperty(String key) {

Review comment:
       Why use a map instead of just an integer?

##########
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##########
@@ -3318,6 +3322,10 @@ private static void 
populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal
     
HIVE_COMPACTOR_CLEANER_RETENTION_TIME("hive.compactor.cleaner.retention.time.seconds",
 "300s",
         new TimeValidator(TimeUnit.SECONDS), "Time to wait before cleanup of 
obsolete files/dirs after compaction. \n"
         + "This is the minimum amount of time the system will wait, since it 
will not clean before all open transactions are committed, that were opened 
before the compaction"),
+    
HIVE_COMPACTOR_CLEANER_MAX_RETRY_ATTEMPTS("hive.compactor.cleaner.retry.maxattempts",
 5,

Review comment:
       It makes more sense to put this in MetastoreConf.java, since the Cleaner 
runs in HMS always.

##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
##########
@@ -1430,6 +1427,44 @@ public void markRefused(CompactionInfo info) throws 
MetaException {
     updateStatus(info);
   }
 
+
+  @Override
+  public void retryCleanerAttemptWithBackoff(CompactionInfo info, long 
retentionTime) throws MetaException {
+    try {
+      try (Connection dbConn = 
getDbConn(Connection.TRANSACTION_READ_COMMITTED)) {
+        try (PreparedStatement stmt = dbConn.prepareStatement("UPDATE 
\"COMPACTION_QUEUE\" " +
+                "SET \"CQ_TBLPROPERTIES\" = ?, CQ_COMMIT_TIME = ?, 
CQ_ERROR_MESSAGE= ? "

Review comment:
       CQ_TBLPROPERTIES is for setting TBLPROPERTIES (currently only for MR 
compaction) like this:
   
   ALTER TABLE table_name [PARTITION (partition_key = 'partition_value' [, 
...])]  COMPACT 'compaction_type' WITH OVERWRITE TBLPROPERTIES 
("property"="value" [, ...]);
   
   I definitely wouldn't overwrite them for observability reasons. You could 
add to them, but probably the nicest solution would be to add a new column in 
the COMPACTION_QUEUE (unfortunately :))

##########
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##########
@@ -3318,6 +3318,10 @@ private static void 
populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal
     
HIVE_COMPACTOR_CLEANER_RETENTION_TIME("hive.compactor.cleaner.retention.time.seconds",
 "300s",
         new TimeValidator(TimeUnit.SECONDS), "Time to wait before cleanup of 
obsolete files/dirs after compaction. \n"
         + "This is the minimum amount of time the system will wait, since it 
will not clean before all open transactions are committed, that were opened 
before the compaction"),
+    
HIVE_COMPACTOR_CLEANER_MAX_RETRY_ATTEMPTS("hive.compactor.cleaner.retry.maxattempts",
 5,
+        new RangeValidator(0, 10), "Maximum number of attempts to clean a 
table again after a " +
+            "failed cycle. The delay has a backoff, and calculated the 
following way: " +
+            "pow(2, number_of_failed_attempts) * 
HIVE_COMPACTOR_CLEANER_RETENTION_TIME. Must be between 0 and 10"),

Review comment:
       I don't know, this seems pretty complicated for users to understand. 
They also might set hive.compactor.cleaner.retention.time.seconds to an 
excessively high number for some reason, without realizing that 
hive.compactor.cleaner.retry.maxattempts will be affected. I vote for 
simplifying this, like maybe try every 5 mins (configurable) ... what do you 
think?

##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
##########
@@ -1430,6 +1427,44 @@ public void markRefused(CompactionInfo info) throws 
MetaException {
     updateStatus(info);
   }
 
+
+  @Override
+  public void retryCleanerAttemptWithBackoff(CompactionInfo info, long 
retentionTime) throws MetaException {
+    try {
+      try (Connection dbConn = 
getDbConn(Connection.TRANSACTION_READ_COMMITTED)) {
+        try (PreparedStatement stmt = dbConn.prepareStatement("UPDATE 
\"COMPACTION_QUEUE\" " +
+                "SET \"CQ_TBLPROPERTIES\" = ?, CQ_COMMIT_TIME = ?, 
CQ_ERROR_MESSAGE= ? "
+                + " WHERE \"CQ_ID\" = ?")) {
+          stmt.setString(1, info.properties);
+          stmt.setLong(2, retentionTime);

Review comment:
       Also, sadly updating CQ_COMMIT_TIME messes with observability 
(oldest_ready_for_cleaning_age_in_sec)

##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
##########
@@ -1430,6 +1427,44 @@ public void markRefused(CompactionInfo info) throws 
MetaException {
     updateStatus(info);
   }
 
+
+  @Override

Review comment:
       There's some RetrySemantics annotation that I don't understand needed 
here :D 

##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
##########
@@ -1430,6 +1427,44 @@ public void markRefused(CompactionInfo info) throws 
MetaException {
     updateStatus(info);
   }
 
+
+  @Override
+  public void retryCleanerAttemptWithBackoff(CompactionInfo info, long 
retentionTime) throws MetaException {

Review comment:
       The name sounds like this method is supposed to retry cleaning. Maybe 
setNextCleanerAttemptTime or something like that would be better?

##########
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##########
@@ -3318,6 +3322,10 @@ private static void 
populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal
     
HIVE_COMPACTOR_CLEANER_RETENTION_TIME("hive.compactor.cleaner.retention.time.seconds",
 "300s",
         new TimeValidator(TimeUnit.SECONDS), "Time to wait before cleanup of 
obsolete files/dirs after compaction. \n"
         + "This is the minimum amount of time the system will wait, since it 
will not clean before all open transactions are committed, that were opened 
before the compaction"),
+    
HIVE_COMPACTOR_CLEANER_MAX_RETRY_ATTEMPTS("hive.compactor.cleaner.retry.maxattempts",
 5,

Review comment:
       Yeah, hive.compactor.cleaner.retention.time.seconds is here but it 
probably shouldn't be :)

##########
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##########
@@ -2956,6 +2956,7 @@ PartitionsResponse get_partitions_req(1:PartitionsRequest 
req)
   void mark_compacted(1: CompactionInfoStruct cr) throws(1:MetaException o1)
   void mark_failed(1: CompactionInfoStruct cr) throws(1:MetaException o1)
   void mark_refused(1: CompactionInfoStruct cr) throws(1:MetaException o1)
+  void retry_cleaner_attempt_with_backoff(1: CompactionInfoStruct cr, 2:i64 
retentionTime) throws(1:MetaException o1)

Review comment:
       I don't think this is necessary, since the Cleaner runs only in HMS and 
can communicate directly with the CompactionTxnHandler.

##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java
##########
@@ -522,6 +522,16 @@ void onRename(String oldCatName, String oldDbName, String 
oldTabName, String old
    */
   void markRefused(CompactionInfo info) throws MetaException;
 
+  /**
+   * Updates the cleaner retry time related information (compaction properties 
and commit time) of the CompactionInfo
+   * in the HMS database.
+   * @param info The {@link CompactionInfo} object to be updated.
+   * @param retentionTime The time until the clean won't be attempted again.

Review comment:
       The time when cleaning will be reattempted?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hive] klcopp commented on a change in pull request #3034: HIVE-25943: Introduce compaction cleaner failed attempts threshold

Reply via email to