ashish-kumar-sharma commented on a change in pull request #2651:
URL: https://github.com/apache/hive/pull/2651#discussion_r712701687



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##########
@@ -179,6 +180,13 @@ private void clean(CompactionInfo ci, long minOpenTxnGLB, 
boolean metricsEnabled
         txnHandler.markCleaned(ci);
         return;
       }
+      if (MetaStoreUtils.isNoCleanUpSet(t.getParameters())) {
+        // The table was marked no clean up true.
+        LOG.info("Skipping " + ci.getFullTableName() + " clean up, as 
NO_CLEANUP set to true");
+        txnHandler.markCleaned(ci);

Review comment:
       @deniskuzZ I agree with you there will be some obsolete files could stay 
forever only if user is replying on auto compaction. This config is for user 
who manually trigger compaction in there data pipelines. i.e 
   
   ALTER TABLE table_name COMPACT 'major' WITH OVERWRITE TBLPROPERTIES 
("no_cleanup"="true"); 
   
   {some ETL operation}
   
   ALTER TABLE table_name COMPACT 'major' WITH OVERWRITE TBLPROPERTIES 
("no_cleanup"="false"); 
   
   Any hive query/ETL pipeline always refer to handful of table. So disabling 
Cleaner completely will increase the blast radius and over head for cleaner. if 
a user forget to revert no_cleanup back to false then it will have performance 
impact which we should document in hive wiki.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to