Mahesh Raju Somalaraju created HIVE-29467:
---------------------------------------------

             Summary: Separate Config to Limit Aborted Compaction
                 Key: HIVE-29467
                 URL: https://issues.apache.org/jira/browse/HIVE-29467
             Project: Hive
          Issue Type: Improvement
            Reporter: Mahesh Raju Somalaraju
            Assignee: Mahesh Raju Somalaraju


Currently, both regular and aborted compaction candidates are governed by the 
same configuration parameter, metastore.compactor.fetch.size, which controls 
how many potential compactions the HMS Initiator can pick in a single cycle. In 
environments with a large backlog of aborted compactions, this can lead to 
excessive initiator workload and performance pressure on the Hive Metastore.

Introduce a separate configuration parameter to control the rate at which 
aborted compactions are picked by the HMS cleaner, independent of regular 
compactions (for example, metastore.aborted.compactor.fetch.size).

The code block in 
{code:java}
ReadyToCleanAbortHandler.java{code}
{code:java}
public ReadyToCleanAbortHandler(SQLGenerator sqlGenerator, Configuration conf, 
long abortedTimeThreshold, int abortedThreshold)  {    this.sqlGenerator = 
sqlGenerator; this.abortedTimeThreshold = abortedTimeThreshold; 
this.abortedThreshold = abortedThreshold; this.fetchSize = 
MetastoreConf.getIntVar(conf, ConfVars.COMPACTOR_FETCH_SIZE); // suggesting for 
new config instead of ConfVars.COMPACTOR_FETCH_SIZE    }{code}
{{}}
As a result, the Initiator batch size for regular compactions and the Cleaner 
batch size for aborted transaction cleanup are both governed by 
metastore.compactor.fetch.size.

if we have  a very large historical backlog of aborted transactions and wants 
to keep metastore.compactor.fetch.size high to maintain good throughput for 
normal compactions, while also avoiding the Cleaner picking too many aborted 
transactions in a single cycle. This behaviour can trigger scanning of a large 
number of directories, cause long Cleaner runtimes and performance pressure, 
and impact overall HMS/Cleaner stability.

Introduce a Cleaner-specific configuration to independently limit aborted 
transaction cleanup batch size, for example: 
metastore.aborted.compactor.fetch.size.
This would allow independent throttling of aborted transaction cleanup, safer 
recovery from large aborted transaction backlogs, better operational tuning 
without impacting regular compaction throughput, and improved HMS/Cleaner 
stability in high-churn, real-world environments.

Even though aborted cleanup is handled exclusively by the Cleaner, the 
batch-size control remains coupled to metastore.compactor.fetch.size.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to